Determination of the top quark mass from leptonic observables

We present a procedure for the determination of the mass of the top quark at the LHC based on leptonic observables in dilepton $t\bar{t}$ events. Our approach utilises the shapes of kinematic distributions through their few lowest Mellin moments; it is notable for its minimal sensitivity to the modelling of long-distance effects, for not requiring the reconstruction of top quarks, and for having a competitive precision, with theory errors on the extracted top mass of the order of 0.8 GeV. A novel aspect of our work is the study of theoretical biases that might influence in a dramatic way the determination of the top mass, and which are potentially relevant to all template-based methods. We propose a comprehensive strategy that helps minimise the impact of such biases, and leads to a reliable top mass extraction at hadron colliders.


Introduction
The current world average of the top quark mass [1] m t = 173.34 ± 0.76 GeV [World Average] (1.1) implies that m t is known with a precision better than 0.5%. Such an accuracy is perfectly adequate for present collider-physics applications [2] including, notably, the global electroweak (EW) fits [3], which are saturated by the uncertainty on the W -boson mass, and not by that on m t . Still, the accurate determination of the top quark mass at hadron colliders remains a subject of much activity and debate. Two separate developments have been the main drivers behind the above mentioned activity: the outsize role played by the top quark mass in determining the stability of the electroweak vacuum (both in the Standard Model (SM) [4][5][6] and beyond [7]), and the recognition that the extraction of m t at hadron colliders involves significant theoretical challenges, that might conceivably affect its value at the level of O(1 GeV) (see ref. [2] for detailed discussion).
The bottom-up extrapolation of EW-scale physics, based on eq. (1.1), implies either that the EW vacuum becomes unstable below the Planck scale, or that the result of eq. (1.1) deviates from the value needed for the stability of the SM EW vacuum up to the Planck scale by about two to four sigma's [6,8]. If confirmed, such a conclusion might indirectly imply the existence of Beyond the SM (BSM) physics somewhere below the Planck scale. Given the non-observation of BSM signals so far, it would be hard to overstate the importance of this implication. We stress that these facts are mainly driven by the m t value of eq. (1.1), and this because of the large parametric dependence of the stability condition on the top quark mass.
At this point one might wonder about the need for revisiting the subject of m t determination, given the quite high precision of the top mass of eq. (1.1). To this end let us remind the reader that there are a number of high-precision measurements that marginally agree with the current world average. Examples are the very recent CMS [9] and D0 [10] measurements: (1. 2) The above measurements have the same uncertainty as the combination in eq. (1.1), but notably different central values 1 . In particular, the CMS measurement [9] is consistent with the SM EW vacuum being stable up to the Planck scale, while the D0 measurement [10] implies a rather unstable SM EW vacuum. Therefore, the spread in the available m t measurements alone warrants a closer inspection of the determination of the top quark mass. As we shall detail in the following, there are also strong theoretical reasons that motivate further studies of the extraction of this parameter from hadron collider data. The determination of the top quark mass is as much dependent on theoretical assumptions as it is on measurements. The reason is that the top quark mass is not an observable and thus cannot be measured directly 2 : it is a theoretical concept, and its value is extracted from data in collider events that feature top quarks. Such an extraction depends on the definition of the mass (pole mass, running mass, and so forth), on the observables chosen, and on the various approximations made when computing those observables. Since measurements are insensitive to theory assumptions 3 , any modification in the theoretical modelling will result in a different value of the extracted top mass. If everything is consistent, i.e. if the estimated uncertainty is a realistic representation of the true uncertainty, then the differences in the returned values should fall within the corresponding theory errors. In reality, this may not be the case due to the presence of biases, whose very existence might be difficult to establish. With this important subtlety in mind, one of the main aspects of the present work is to devise a structured approach towards the identification of such hidden biases. A significant number of techniques for the determination of the top mass exist or are under study; see ref. [2] for a recent in-depth overview. Such techniques may be organised into two classes, whose definitions cannot be given in a rigorous way, but which are nevertheless based on clearly distinct physical principles. The first class includes all those approaches that use, in some form, the fact that the top is a particle that decays: the knowledge of the decay products (i.e. their identities and kinematic configurations) is then exploited to reconstruct some quantity which is directly related to the top, and thus bears information on its mass. The crucial characteristic is that, by emphasising the role of the decay, one factors out the details of the process in which the primary top(s) is(are) produced, so that the details of the production mechanism become irrelevant. The ideal (i.e. not realistic) procedure which belongs to this class is the one where the top virtuality is reconstructed exactly by measuring the invariant mass of its decay products, thus scanning its lineshape. In the approaches that belong to the second class the role of the top as a mother particle must not matter; the only important thing is that some observable(s) of a top-mediated process depend in a significant way on m t , so that their measurements can be mathematically inverted (using suitable theoretical predictions) to return the top mass. We stress that the fact that the observables mentioned above are most likely constructed by using the top decay products is not relevant. The only important thing is that they depend on the top quark mass, a feature that might be possessed by other quantities as well (for example, the primary QCD radiation in the production process).
The approaches that belong to the first class are often perceived to be affected by smaller theoretical systematics than those of the second class, because by their very definition one assumes that many sources of uncertainties, such as PDF dependence, absence of higher-order perturbative corrections, and new-physics contributions, will drop out, being mostly associated with the production mechanism. Unfortunately, this is not really the case. Firstly, some of these sources might be relevant to decays as well. Secondly, different kind of uncertainties could become important: a good example is the so-called J/Ψ method [11] which, although experimentally very clean and theoretically well defined, is hampered by its sensitivity to the non-perturbative b-fragmentation. Thirdly, in these approaches one must start by defining what one means by "top", which introduces some auxiliary (if only intermediate) concept in the procedure, and renders it difficult to assign a proper theoretical error to it. Note that this necessity goes beyond what one must do in order to reconstruct the top quark experimentally, and is purely theoretical.
The bottom line is that, regardless of which class an m t -extraction technique belongs to, some amount of theoretical modelling will be involved. In this paper, we follow an approach of the second class; we believe that not having to define the top as a final-state object is a virtue that more than compensates a larger dependence on the production process.
Another important motivation behind the procedure we are proposing is the use of observables that can be both reliably predicted within the SM, and cleanly measured. Thus, we employ kinematic distributions of leptons in dilepton tt events; more precisely, we are interested in their shapes. Furthermore, we find that the information on the top mass that such shapes encode can be very effectively provided by the Mellin moments of the corresponding distributions, and it is such moments that will play a central role in our method. Our goal is the determination of m t with competitive precision, supplemented by a detailed study of the various sources of theoretical systematics. Apart from not having to rely, directly or indirectly, on the reconstruction of top quarks, our approach has minimal sensitivity to the modelling of both perturbative and non-perturbative QCD effects 4 . We believe that the latter property is one of the chief advantages of the method we are pursuing.
In this paper we shall be working with the top quark pole mass, and shall not consider alternative mass definitions. Our viewpoint is that the intrinsic differences between any two of these definitions (renormalon-related effects are a good example) are largely below the present level of uncertainties, and therefore we do not see them as a reason for concern at present. A fuller discussion can be found in ref. [2].
We shall conclude that, with the procedure we employ, the extraction of the top pole mass can be achieved with a theoretical error of about 0.8 GeV, and possibly smaller. While a significant number of tt dilepton events have been recorded during Run I of the LHC, no measurements are published of the Mellin moments that would allow us to apply our procedure to real data. We thus hope that this paper will encourage the LHC experimental collaborations to measure directly such moments, so that the present analysis could be repeated, and its results compared with those of eqs. (1.1) and (1.2). Furthermore, we are hopeful that the reliability and small theoretical systematics of the method proposed in this work will help shed light on the issue of the EW vacuum stability. This paper is organised as follows: in sect. 2 we introduce our method in detail and define, in particular, its associated theoretical errors (sect. 2.2) and biases that may affect it (sect. 2.3). Our results are presented in sect. 3: those with the highest theoretical accuracy in sect. 3.2.3, with discussions on the effects due to parton showers, higher orders, and spin correlations in sect. 3

The method
Our goal is to study the determination of the top quark pole mass m t from several differential distributions of leptons in dilepton tt events: Each of the observables that we consider features the following important properties: • It does not require the reconstruction of the t and/ort quark; indeed, we do not even need to speak of top quarks 5 .
• It is almost completely inclusive in hadronic radiation: the only possible dependence on strongly-interacting final-state objects is that due to selection cuts (on b-jets).

Label
Observable • Owing to this inclusiveness, the observable is minimally sensitive to the modelling of long-distance effects. This feature increases the reliability of the theoretical predictions.
The set of observables considered in this paper and their labelling conventions are given in table 1: p T (ℓ + ) is the single-inclusive transverse momentum of the positively-charged lepton; p T (ℓ + ℓ − ) and M (ℓ + ℓ − ) are the transverse momentum and the invariant mass, respectively, of the charged-lepton pair; finally, E(ℓ + ) + E(ℓ − ) and p T (ℓ + ) + p T (ℓ − ) are the scalar sums of the energies and transverse momenta of the two charged leptons, respectively. We point out that the latter two sums are computed event-by-event; in other words, observables #4 and #5 are not constructed a-posteriori given the single-inclusive energy and transverse momentum distributions of the leptons. The extraction of the top quark mass utilises the sensitivity of the shapes of kinematic distributions to the value of m t . It is cumbersome to work directly with differential distributions. Instead, we utilise their lower Mellin moments, whose precise definition is given in sect. 2.1. The idea of the method proposed in this paper is to predict the m t dependence of the moments, and then to extract the value of m t by comparing the predicted and measured values of those moments. The procedure is detailed in sect. 2.2.
The use of moments for the extraction of the top mass has been suggested previously in the context of the so-called J/Ψ method [11], or in connection with variables supposed to minimise the dependence on the jet-energy scale [12,13]. To our knowledge, the most up-to-date theoretical treatment of this technique is in ref. [14]. All these papers consider only the first moment (of various distributions); in the case of m t extraction from different observables, the results are either not combined [14], or limited to two observables [13]. These choices may lead to issues, as we shall discuss in sects. 2.3, 3.2.2, and 3.2.4. In the case of the dilepton channel, ref. [14] also employs one of the observables considered in this paper (E(ℓ + ) + E(ℓ − )); owing to the different choices made for cuts, jet algorithm, collider energy, and PDFs, we have refrained from making a direct comparison with those results. We also point out that in ref. [14] the simultaneous variation of the factorisation and renormalisation scales has been adopted, which leads to smaller scale uncertainties than those we find in this paper (where the two scales are varied independently, see sect. 3).
Finally, we remark that other discrete parameters of kinematic distributions, such as medians and maxima, might also be used for a top mass extraction. We have chosen to work with moments because of the ease of their calculation, and of the fact that the results they give can be systematically improved by including those of increasingly higher rank. For other previous theoretical approaches whose philosophy differs, in one or more aspects, w.r.t. the one adopted here, see e.g. refs. [15][16][17][18].

Definition of moments
We denote by σ and dσ the total and fully-differential tt cross sections respectively (possibly within cuts), so that: where the integral in understood over all degrees of freedom. Given an observable O (e.g. one of those in table 1), its normalised moments are defined as follows: for any non-negative integer i. In this way, one has: and so forth. Note that, when selection cuts are applied (see eq. (3.6)) in the calculation of moments, they are applied exactly in the same manner in the denominator and in the numerator of eq. (2.3). A short technical remark: the numerator of eq. (2.3) is usually derived from the result relevant to the differential distribution dσ/dO. On the other hand, it could also be computed directly (i.e., without using dσ/dO), which is a procedure affected by smaller uncertainties, as we explain in appendix A. The important thing to point out here is that such a direct calculation has a fully analogous experimental counterpart: Mellin moments can be measured directly, which is the procedure we recommend. See appendix A for more details.

Extraction of the top quark mass and its uncertainties
The method for extracting m t from the first moment of any one of the observables of table 1 is given schematically in fig. 1. The x and y axes of fig. 1 are associated with the top pole mass m t , and the first moment of O, µ   O increases with m t , which is indeed the case in the SM and for the observables considered here; the formulae given below can however be trivially extended to the case of µ (i) O decreasing with m t . As fig. 1 suggests, the functions f C,U,L are linear in m i t ; although in general they need not be so, we have found that straight lines are perfectly adequate to our purposes. We explain how such functions are computed in sect. 3.1. Figure 1: Graphic representation of the method used in this paper to extract the top mass from the first moment of any given observable. The case of higher moment is identical, except for the fact that the x axis is associated with m i t .
Given the data 6 µ D the extracted top mass will be (see fig. 1): (2.7) We define the central value and theoretical uncertainties associated with such an extraction as follows: with Since the functions f C,U,L are linear in m i t , their inversion is trivial; however, we point out that eq. (2.9) remains valid regardless of the particular (monotonic and increasing) forms of f C,U,L . While the quantities introduced in eq. (2.8) are the theory errors that affect the top-mass extraction from any given observable and moment, there might be cases in which they are inadequate to measure the actual difference between the central value m C and the physical top mass. This happens in the presence of what we call theory biases, which we shall discuss at length in sect. 2.3.
In keeping with fig. 1, we define the experimental errors as: with It is easy to convince oneself that the more conservative choice: is not correct, since it leads to non-zero uncertainties also in the case of null experimental errors. In this paper, we shall not consider the experimental uncertainties any longer, and shall be concerned only with the theoretical ones. We point out that the size of these depend on two factors: the uncertainty on the theoretical predictions for µ at a given m t , and the slope of f C (m t ): the steeper the latter, the smaller the errors on the extracted values of m t .

Theory biases
In this section we address the question whether there might be some biases in the method outlined in sect. 2.2, that would prevent the errors defined in eq. (2.8) from being a reliable representation of the true uncertainties underlying the m t extraction.
It is not difficult to devise a scenario where the answer to the previous question is positive. Let us suppose that tt production, as is seen in LHC detectors, proceeds through both the well-known SM mechanisms, and the exchange of a hypothetical heavy non-SM resonance. The nature of the latter is irrelevant here; what matters is the fact that, owing to its being very massive, it will cause the t andt, and hence their decay products, to be slightly more boosted on average than if only SM physics would exist. Thus, for example, the measured first moment of p T (ℓ + ) (which is observable #1 in table 1) will have a larger value than what would be measured if only the SM were present. Let us further suppose that the theoretical predictions used to extract m t with the procedure of sect. 2.2 are the pure-SM ones: what has been said above also implies that the functions f C , f U , and f L will have lower values, for any given top mass, than their counterparts computed in the BSM theory that corresponds to the measured data. Figure 1 then leads immediately to the conclusion that the value of m t extracted from BSM data using SM predictions will be larger than the "true" top mass value 7 : the difference between the extracted and the true m t is then a theoretical bias. The crucial point is that the uncertainties associated with such an extraction will be essentially the same 8 as those one would obtain in the complete absence of BSM physics in the data: in other words, the errors of eq. (2.8) would fail to capture the presence of the existing theoretical bias.
The main lesson to be drawn from the previous example is the following. Given only one observable and one of its Mellin moments, the extraction of the top mass according to eq. (2.7) will always be possible with "small" theoretical errors 9 , regardless of whether the theory employed gives a correct representation of the physics model embedded in Nature. This observation, however, implicitly suggests a solution to the problem posed by theoretical biases. In fact, while the above indeed applies to each individual observableand-moment choice, if the theoretical description is ultimately not compatible with Nature it is not likely that two values of m t extracted from two different observables will be compatible with each other. Conversely, the probability that the extracted values of m t be all mutually compatible (in the case of an incorrect underlying theoretical model) will decrease with the number of observables and moments considered.
Note that it is easier to establish the possible incompatibility of any two m t results when their theoretical errors are small. Therefore, the property of the uncertainties of eq. (2.8) of being insensitive to theoretical biases is actually a positive feature in this context, and underlines the importance of accurate predictions. The bottom line is that, in order for the presence of theoretical biases to be clearly uncovered, it is of utmost importance to consider as many observables and moments as is possible. The choice of the set of table 1 reflects this view, but it is clear than any further addition to it will be beneficial.
We conclude this section by making various further observations. Firstly, it is not necessary to have a BSM-vs-SM scenario for theoretical biases to appear: it is sufficient that theory and data are not fully compatible. We shall give several examples of this in sect. 3, all of them within the SM. Secondly, although possibly biased, the m t value extracted in a single-observable-and-moment procedure is not "wrong": it is, by construction, the result that, for the given data, will give the best prediction with the assumed theoretical model. Therefore, as long as one uses such m t with that model and only for that observable, one is perfectly consistent. It is in the interpretation of the results, however, that one must be careful, since the larger the bias, the less the extracted top quark mass will have to do with the fundamental parameter so important e.g. for the stability of the vacuum. This stresses again the fact that it is always recommended, for example through the multi-observable approach advocated here, to determine the presence of theoretical biases. Thirdly and finally, the relationship between m t -extraction and biases is by no means specific to the use of Mellin moments; it is common to all template-based methods. If anything, Mellin moments just render the discussion particularly transparent.

Results
Our predictions are obtained by simulating tt production in the SM, by treating the top quarks as stable, and by decaying them afterwards. We perform the calculations in the fully automated MadGraph5 aMC@NLO framework [19], where we can easily investigate the impact of the various approximations that may be employed; in particular, we shall  consider both LO and NLO results, with or without their matching to parton showers, with or without including spin-correlation effects. We have thus several calculational scenarios, which we summarise in table 2. We shall refer to each of them interchangeably with either their labels or their extended names, the latter chosen in agreement with ref. [19]. NLO fixed-order computations are based on the FKS subtraction method [20,21]. NLO results are matched to parton showers according to the MC@NLO formalism [22]; throughout this paper, we have used HERWIG6 [23,24]. Spin-correlation effects in the computations matched to parton showers are accounted for with the method of ref. [25] through its implementation in MadSpin [26] (shortened to MS henceforth), a package embedded in MadGraph5 aMC@NLO. As far as fixed-order results are concerned, only decay spin correlations (i.e. those described by the matrix elements relevant to t → ℓ + ν ℓ b) are taken into account, whence the "No" in the rightmost entry of the last two rows of table 2.
We have used a five-light-flavour scheme, and the MSTW2008 (68% CL) PDF sets [27] and their associated errors, at the LO or the NLO depending on the perturbative accuracy of the various scenarios reported in table 2. We have included both PDF and scale uncertainties in our predictions; both have been computed with the reweighting method of ref. [28]. As far as the latter uncertainties are concerned, they have been obtained with an independent variation of the renormalisation and factorisations scales, subject to the constraints andμ is a reference scale; the default values or central scale choices correspond to ξ F = ξ R = 1. We point out that eq. (3.1) is a conservative scale variation (as was done e.g. in ref. [29], and as opposed to setting the two scales equal to a common value), which estimates well the missing higher-order corrections to the total tt cross section at the NNLO [30,31]. We have considered three different functional forms for the reference scaleμ in eq. (3.2): with the transverse masses m T ,i = p 2 T ,i + m 2 i . We point out that, since in our calculations the top quarks are treated as stable particles at the level of hard matrix elements, the difference between eq. (3.3) and (3.4) is the contribution to the latter of the transverse momentum of the massless parton which is possibly present in the final state (owing to real-emission corrections); the scale of eq. (3.4) is nothing but H T /2.
Our simulations are carried out at the 8 TeV LHC. Since we only consider the process of eq. (2.1), i.e. top-pair production without any background contamination, all of our events are tt ones by construction. On the other hand, in order to perform a more realistic analysis, we also impose the following event selection: on top of having two oppositelycharged leptons (electrons and/or muons), events are required to contain at least two b-flavored jets, with jets defined according to the anti-k T algorithm [32] with R = 0.5, as implemented in FastJet [33]. The events so selected are then subject to the following cuts: If more than two b-jets are present, the cuts above are imposed on the two hardest ones. In order to simplify our analysis, b-hadrons have been set stable in HERWIG6, so that the vast majority of the events just contain the two charged leptons arising from top decays. In addition to the cuts of eq. (3.6), we have also checked the effects of imposing lepton-jet isolation cuts: these being negligible, we shall not consider them any further in this paper.
3.1 Calculation of the moments and of the functions f C,U,L (m t ) With the settings described above, we have simulated tt production in all of the six calculational scenarios of table 2; in the case of NLO+PS+MS (which we believe to give the best description of SM physics, and is thus treated as our reference computation), results have been obtained with all of the three scales choices of eqs. In each of these runs, we have computed the first four Mellin moments for all the observables listed in table 1, both without applying any cuts, and with the selection cuts of eq. (3.6); all moments are evaluated on the fly (i.e. not a-posteriori using the corresponding differential distribution), as explained in appendix A. At the end of the runs, we have the predictions for the Mellin moments that correspond to the central scales and PDF set, and to all noncentral scales and PDFs that belong to the relevant error set; as already explained, all the non-central results do not require additional runs, but are obtained through reweighting. Having the above results, the set of the eleven central, or upper, or lower, values for each of the moments is then fitted with the following functional form: (3.8) The  fig. 2, are reported in table 3. They will not be explicitly used in what follows, and simply constitute a benchmark for future applications. We conclude this section by pointing out that statistical integration errors are completely neglected in the fitting procedure described above (which is equivalent to taking all of them equal). In fact, the main reason behind choosing such a large number (11) of m t values for our simulations is that of minimising the impact of statistical fluctuations, without having to bother about the statistical errors, which are notoriously tricky in the case of the integration of NLO cross sections. The typical size of the statistical fluctuations can be gathered from fig. 2; it tends to increase in the case of higher moments, but  it remains manageable up to the fourth moment, which is the largest we have considered.
Obviously, statistical fluctuations can be reduced by increasing the accuracy of all runs performed. Given the large number of simulations relevant to the present paper, we have limited ourselves to work with 10 6 events (of which, about 30% pass the selection cuts of eq. (3.6)) in the case of computations matched to partons showers, and with a comparable accuracy in the case of fixed-order calculations. With this setup, we have found that over the interval 168 − 178 GeV the functional form of eq. (3.8) gives an excellent fit up to the fourth moment, and we believe that this conclusion applies regardless of the statistics; in other words, we see no reason for considering polynomials of higher orders in m i t in the fitting procedure.

Extraction of the top quark mass
We now use the predictions for f C,U,L (m t ), calculated as described in sect. 3.1, to extract the value of the top quark mass according to the procedure outlined in sect. 2.2. In this way, we shall obtain the main figure of merit relevant to the method proposed in this paper, namely the size of the theoretical errors, eq. (2.8), associated with the extraction. In addition, by comparing the results emerging from the different computational scenarios of table 2, we shall assess the presence and numerical impact of the possible theory biases that affect the various approximations.
In order to carry out the programme just described, we need to start from some data, as in eq. (2.5). In view of the fact that Mellin moments for the observables of table 1 have not been measured at the LHC, we shall generate them ourselves, by using the procedure to be described in sect. 3.2.1. We point out that theoretically-generated (pseudo)data are actually more advantageous than real data if one is interested in studying the performances of a given procedure, since they provide one with a fully-controlled environment.
All of the theory predictions and pseudodata used in this section have been subject to the selection cuts of eq. (3.6).

Pseudodata
Since we believe that our reference scenario, namely NLO+PS+MS, will give the best description of actual (SM) physics, it is natural to adopt it for the generation of the pseudodata. While well-motivated from a physics viewpoint, we stress that, for the sake of a purely theoretical exercise, this choice is completely arbitrary, and that the conclusions we shall arrive at would be qualitatively unchanged had we chosen a different scenario. The pseudodata are generated by setting:  5). Therefore, one must not expect that the extractions of the top mass will return exactly m pd t , owing to the presence of the biases discussed in sect. 2.3. Having said that, we expect pseudodata to show a clear "preference" (i.e. smaller biases) towards simulations based on the NLO+PS+MS scenario, since in those cases the biases must be due only to scale choices. The verification that this is indeed the case will constitute a self-consistency check of the procedure we are following. Note that the information relevant to theory biases is encoded not in the actual value of the extracted m t , but in its difference with m pd t . Because of the behaviour of the f C,U,L functions, such a difference is very much insensitive to the choice of m pd t , which allows one to pick an arbitrary value for the latter, as is done in eq. (3.9), and which is ultimately the reason for the robustness of the usage of pseudodata.

Shower, NLO, and spin-correlation effects
The scenarios of table 2 differ by the various approximations they are based upon, each of which may lead to biases in the extraction of the top mass. An interesting question is then whether the different sources of possible biases can be disentangled from each other (i.e. whether in a sense they factorise). This is not only relevant in the context of the present exercise, but also because it may help assess the impact of approximations not considered here (such as NNLO corrections), and which might become crucial in the presence of real data. Furthermore, it also sheds light on the characteristics of the various observables used in this paper, and in so doing suggests how to enlarge their set.
In order to address the items above, we proceed as follows. We select pairs of scenarios that differ in one and only one aspect of the approximations they involve; for example, scenarios #1 and #2 differ in the perturbative accuracy (NLO vs LO) of the underlying computations. The aspects that we shall be able to consider are three, namely partonshower, NLO-correction, and spin-correlation effects, which we shall discuss in turn below. The top mass extracted within scenario #i will be denoted by: (3.10) Let us then suppose to have chosen a pair of scenarios (#i, #j) that differ only by aspect A. What we may consider are the quantities: While the differences in eq. (3.11) are sensitive to all theory biases that affect scenarios #i and #j, we expect that the difference in eq. (3.12) is solely sensitive to the effect of A (if the factorisation property mentioned above holds to some extent). In the following, we report the differences that appear in eq. (3.11) and (3.12) 10 , for all the relevant (#i, #j) pairs and all the observables of table 1. We shall limit ourselves to considering the first moments, which are sufficient for the sake of the present exercise; all results are obtained with the scale of eq. (3.3). In the case of eq. (3.12), which is our main interest here, we also report the errors affecting the difference, which is computed by combining in quadrature the errors (determined according to eq. (2.8)) that affect the individual m values. The errors on the differences in eq. (3.11) are of comparable size, up to a factor √ 2 smaller since m pd t is assumed to be known with infinite precision. We start with shower effects, and report the corresponding results in table 4. The relevant scenario pairs are (3,5) and (4,6), the latter being the LO counterpart of the former, which is accurate to NLO. Note that scenarios #1 and #2 have not been considered here, owing to the lack of fixed-order results that include production spin correlations. The  first observation is that the (3,5) and (4,6) cases are rather consistent with each other; however, the results for eq. (3.12) of the latter are in absolute value systematically larger than those of the former. This is compatible with the expectation, corroborated by ample heuristic evidence in many different processes, that shower effects are milder if the underlying computations are NLO-accurate (as opposed to LO ones), for the simple reason that NLO results do already include part of the radiation to be generated by parton showers 11 . While in the case of NLO-based simulations all differences are statistically compatible with zero (within 1σ) except for observable #2, in the case of LO-based simulations more significant deviations can be seen in the cases of observables #1 and #5 as well. The take-home message, then, is that shower effects are moderate if higher-order corrections are taken into account, which is good news in view of the future availability of NNLO parton-level differential results; however, this conclusion does not apply to the transverse momentum of the charged-lepton pair, for which a proper matching with parton showers appears to be needed. As far as the results for eq. (3.11) are concerned, table 4 shows that values significantly different from zero are obtained in the cases of observables #2 and #3. The size of the difference relevant to #2 is larger than that resulting from eq. (3.12), which implies that for such an observable other effects, on top of those due to showers, are sources of theory biases as well (both NLO and spin correlations, as we shall show later). A similar conclusion applies to the lepton-pair invariant mass #3, for which the absence of shower effects implies that biases are entirely due to some other mechanism (spin correlations, as it will turn out).  We next consider NLO effects, which we document in table 5, and for which the relevant scenario pairs are (1,2), (3,4), and (5, 6). As far as eq. (3.12) is concerned, the differences for all pairs and all observables except #2 are compatible with zero; thus, the first moments of such observables appear to be quite stable perturbatively, regardless of the matching to parton showers, and of the presence of spin correlations. For what concerns p T (ℓ + ℓ − ), on top of the fact that NLO effects are significant in all scenarios, we observe that they are particularly strong when the matching to showers is not performed (pair (5,6)); this is again related to the fact that, in certain corners of the phase space, showers and NLO corrections affect the kinematics in a similar way. Coming to the absolute size of theory biases, eq. (3.11), we see that they are all rather small in the case of NLO+PS+MS predictions (second column); this is what we expect, as explained in sect. 3.2.1. For the other scenarios, large differences are observed in the case of observables #2 and #3, which was expected in view of table 4. For the latter observable, this fact, the absence of NLO effects, and the results of table 4 imply that the biases are solely due to spin correlations.
We finally turn to spin-correlations effects, whose results are reported in table 6, and for which the relevant scenario pairs are (1,3) and (2,4); these two pairs differ in the underlying perturbative accuracy, which is NLO and LO respectively. The conclusions that can be drawn from table 6 have already been anticipated. Namely, that the differences resulting from eq. (3.12) are significantly different from zero for both observables #2 and #3, while they are negligible in the other cases. The sizes of the former differences appear to be fairly insensitive to NLO corrections, which is an indirect confirmation of the factorisation of spin-correlation effects.
The general conclusion of this section is the following. Observables that are single-  inclusive (p T (ℓ + )), and that feature a mild correlation between the decay products of the top and antitop (E(ℓ + ) + E(ℓ − ) and p T (ℓ + ) + p T (ℓ − )), are rather stable against shower, NLO, and spin-correlations effects. This is not true for observables for which the correlation between the two charged leptons is stronger (p T (ℓ + ℓ − ) and M (ℓ + ℓ − )): the fact that either shower or spin-correlation effects (or both) are relevant implies, among other things, that the computation of the tt cross section at the NNLO with stable tops will not be sufficient to give a good description of such observables, at the very least in the context of the top mass extraction considered in this paper.

Results for the top quark mass
In this section we present the results for the extraction of the top quark mass obtained with our reference computational scenario, NLO+PS+MS. We are specifically interested in checking the size of the theory uncertainty affecting such an extraction, and its behaviour (together with that of the central top quark mass) when the results emerging from the individual observables and moments are combined together. These findings will also serve as benchmarks for the studies that we shall carry out in sect. 3.2.4, where the extraction of the top mass will be performed by using the other scenarios of table 2. Furthermore, we want to study how the above results are influenced by the scale choice, and therefore we shall consider all of the three forms given in eqs. (3.3)-(3.5). The general strategy is the following. For a given scale choice, we extract the top mass from each of the five observables of table 1 and their first three moments 12 , i.e. fifteen m t values in total, each with its theory errors of eq. (2.8). These values, or any subset of them, are then combined to obtain the "best" result. The combination technique is briefly explained in appendix B, and is rather standard: basically, the central values are weighted with the inverse of the square of their errors. Since the various observables and their moments are correlated, it is necessary to take these correlations into account, lest one skew the final central value of m t and underestimate its error.
The simplest case is that where one uses a single observable for extracting m t ; as was explained in sect. 2.3, this is far from being ideal, and we present it here only as a way to compare with the multi-observable results that will be shown later. We use observable #1 (p T (ℓ + )) because it is the one whose top-mass extractions are affected by the smallest errors (in the case of the scale of eq. (3.3)). The values of m t that we obtain are given in table 7, which should be read as follows (this layout will be used for the other tables of this section as well). Each one of the first three rows corresponds to one of the scales of eqs. (3.3)-(3.5) (i.e. the i th row is obtained withμ (i) ). The first, second, and third column reports the results obtained by considering only the first, up to the second, and up to the third Mellin moments, respectively. The results in the fourth row are obtained by combining the three results that appear in the first three rows of the same column. Such a combination is achieved by weighting those three results with the inverse of the square of their errors. Since the errors are asymmetric, one treats separately the + and − ones; the two resulting "central" m t values are possibly different, and the single m t reported in table 7 is then obtained again with a weighted average. Finally, the numbers in square brackets are the values of χ 2 per degree of freedom, computed by always considering the first four Mellin moments, regardless of how many of them had been actually used in the combination. One should not seek a deep meaning in this χ 2 , in particular because of the way the errors that enter into it are obtained (i.e. their behaviour from a statistical viewpoint is unknown to us). On the other hand, while its precise value is not of particular significance, it represents a very useful reference for the performance of the extraction procedure, as we shall see in sect. 3.2.4.  The messages to be taken out of table 7 are the following. Firstly, the impact of the addition of moments beyond the first is extremely modest, if visible at all. This is due to the fact that the errors affecting m t increase with higher moments, and to the non-negligible correlations between the moments (see appendix B). Secondly, the scalesμ (1) andμ (2) tend to give central results larger than the "true" one of the pseudodata, m pd t = 174.32 GeV, while the opposite applies to scaleμ (3) , where the effect is more evident (but still within 1σ). Let us then consider the latter case to be definite, and compare the functional form of eq. (3.5) with those of eq. (3.9). Because of the dependence on the transverse momenta of the scales used in the pseudodata, which is absent in the case ofμ (3) , the tails of the p T -related distributions obtained withμ (3) will be less rapidly falling than those of the pseudodata (mainly because the p T -dependence of µ R in eq. (3.9) will induce a stronger α S suppression, relative to the small-p T region, than in the case ofμ (3) ; this effect is only mildly compensated by that due to µ F ). Thus, the moments computed with scale #3 will be slightly larger than their analogues in the pseudodata. For the reasons explained in sect. 2.3, this difference then results in a lower (than the input m pd t ) value for the extracted top mass, which is what we see in the third row of table 7. The same effect, but (slightly) in the opposite direction, is at play in the case of scales #1 and #2. Here, the numerical values of such scales at large p T 's relative to their small p T counterparts are closer to those relevant to the pseudodata scales than in the case of scale #3, whence closer-to-m pd t central results for the top mass. Given these opposite behaviours, not surprisingly the average of the three results is closer to m pd t than any of them; such an average is biased towards the results ofμ (1) andμ (2) , owing to their errors being smaller than those associated with the extractions withμ (3) .  We now repeat the combination procedure that has led to the results of table 7, by including, on top of the m t values obtained with observable #1, also those relevant to observables #4 and #5; the new combined results are presented in table 8. By far and large, all comments relevant to table 7 can be repeated here. There is a decrease (less than 10% for all scales) of the errors, which is not large because of two facts: observable #1 induces the smallest errors (in the present observable set), and the observables considered are sizably correlated, as documented in appendix B. By adding more observables one starts to see the effects of the inclusion of higher moments; although statistically not significant, there are trends in the central values which were not visible in the case of a single observable.   (1) andμ (2) , but by a large factor forμ (3) ; this is because, for such a scale, it is the p T of the lepton pair that happens to be affected by the smallest errors). The trend induced by the addition of higher moments becomes more visible than before, and statistically significant (a 2σ effect) in the case of scale #3. However, the final results of the fourth row, obtained by combining the outcomes associated with the different scales, are quite stable. The case of the results associated withμ (3) is interesting, because it stresses again the importance of considering as many observables and as diverse as possible in order to expose potential theory biases in the top-mass extraction. Given that here all our predictions are based on the same computational scenario as the pseudodata, namely NLO+PS+MS, the only deviations from a perfect reconstruction can only be due to the different choice of scales, andμ (3) happens to be farther from the pseudodata ones of eq. (3.9) than eitherμ (1) orμ (2) . The crucial point is that this observation is true regardless of the type of observables considered, but it is only when the lepton-pair correlations #2 and #3 enter the combination that the effects become more noticeable. This is related to the behaviour of these two observables discussed in sect. 3.2.2, which exhibit the strongest sensitivity to (among other things) extra radiation. A change of scale is an effective, if quite mild, way of probing some of these extra-radiation effects. As we shall see in sect. 3.2.4, the impact of the addition of these two observables on the theory biases is spectacular when the underlying calculational scenario is different w.r.t. that used in the generation of the pseudodata.
There are two conclusions that can be drawn from this section. The first is that the procedure proposed in this paper appears to be able to give theory errors on the extracted top mass of the order of 0.8 GeV. While we have neglected background contaminations, we have also been conservative with the range of scale variations; on top of that, the addition of further observables may help reduce further those errors. The second conclusion is more general, in that it applies to any extraction method based on templates. Our exercise demonstrates that one thing is the variation of the scales induced by pre-factors that multiply a given functional form, and quite another the change of that functional form. Although the two procedures overlap, they are not equivalent. We have shown a practical way to probe the changes of the above functional form: the idea is that, by re-computing theoretical predictions for many different scale choices, and by performing a weighted average of their outcomes, one might effectively capture the scale settings which optimally describe Nature.

More on theory biases
The aim of this section is that of repeating what has been done in sect. 3.2.3, for scenarios other than NLO+PS+MS. In other words, all of the computations considered here are different w.r.t. that used in the generation of the pseudodata; we shall thus study the theory biases, whose sources we have already discussed in sect. 3.2.2, at the level of the combined results for the extracted top quark mass. All the calculations are performed by using the scaleμ (1) . We report the results in table 10, which is organised with the same conventions as those used in the tables of sect. 3.2.3. This table is split into two parts, relevant to the m t extraction performed by using only three observables (#1, #4, and #5), or all of them. These two parts thus are in one-to-one correspondence with (the first row of) tables 8 and 9, respectively.  From the upper part of table 10, we see that the use of observables #1, #4, and #5 leads to central m t values which may not be in perfect agreement with the pseudodata value m pd t , but are not far from it either, irrespective of the calculational scenario considered. Furthermore, both the errors and the χ 2 values are totally reasonable, and rather consistent with those of table 8. These findings need not be surprising, because they could be anticipated in sect. 3.2.2, where observables #1, #4, and #5 have been shown to be fairly insensitive to shower, NLO, and spin-correlation effects. These effects are ultimately the difference between each of the scenarios considered here, and our reference one, NLO+PS+MS. It is therefore instructive to see what happens when observables #2 and #3 are used in the extractions as well (lower part of table 10). Not only the differences among the central results for the extracted top mass are much larger than before (and particularly so at the LO in absence of proper spin correlations), but it is especially the χ 2 values that increase dramatically, in spite of (and, in a sense, thanks to) the fact that the errors remain quite moderate. This is exactly the situation that has been described in sect. 2.3: the extraction of m t from individual observables is always acceptable and affected by small errors; however, if the underlying theoretical description is incompatible with that of the (pseudo)data, the different results will be mutually incompatible. A (certainly non-unique) way of making explicit the presence of such incompatibilities is through the computation of a χ 2 . The lower part of table 10 is thus another, very explicit way of showing why considering a large number of observables with different characteristics is always beneficial, in this or in other template-based methods.
A final comment on table 10. The errors that affect the extracted top mass do not follow the usual LO→NLO reduction pattern, and they need not to. Indeed, the relationship between the above errors, and those which are usually considered at the level of rates, is rather indirect. Furthermore, in the combination of the results obtained from different observables, a single m t value affected by errors much smaller than the others will have a very large weight, with the picture being further complicated by the presence of strong correlations among the observables studied here. While the particular combination technique used in this paper (see appendix B) can certainly be refined, possibly leading to changes in the central values of m t and their associated errors, the conclusions reached before will not change, being based on a few well-understood physics phenomena.

Conclusions
In this paper we have proposed a procedure for the determination of the top quark pole mass from dilepton tt events. Our main proposals and findings are the following: • We use leptonic single-inclusive and correlation observables, which are clean and largely insensitive to the modelling of long-distance effects. Our method, based on Mellin moments, relies neither on the definition of the top quark as a pseudo-particle, nor on its reconstruction.
• The quality of the results for m t and their reliability improves by increasing the number of observables and of their moments. It is important that the observables employed have different sensitivities to the various mechanisms relevant to tt production and decay, such as higher-order corrections, and shower and spin-correlation effects. Several theoretical simulations must be used that differ in the choice of the functional form for the hard scales, and the extracted m t values must be combined. Thus, we consider the entry in the rightmost column and last row of table 9 as our "best" result.
• The errors associated with m t may underestimate the difference between the extracted value and the actual pole mass, in the case of an inadequate theoretical description of the underlying production mechanism. A χ 2 -type test is effective in identifying the presence of such biases, provided that a sufficiently large number of observables has been employed in the extraction procedure, as is documented in table 10.
We stress that the second and third items above apply to any template method that exploits the shapes of observables for the extraction of the top quark mass. The most precise m t determination that we have achieved with our method in the context of the purely-theoretical exercise performed here is affected by errors of the order of 0.8 GeV. It is probably possible to reduce this figure further, by using a set of observables larger than the one considered in this paper. On the other hand, we have not addressed two important aspects which will need to be taken into account in an extraction of m t from real data, namely the contamination due to backgrounds, and the systematics due to the choice of the parton shower Monte Carlo. For what concerns this Monte-Carlo systematics, it is worth pointing out that within our approach two different Monte Carlos must lead to two separate top quark mass values, which should eventually be combined on the basis of their respective errors and of the results of some χ 2 tests. An interesting, if not particularly desirable, case is that where two Monte Carlos would lead to statistically-incompatible m t results, with two small χ 2 values similar to each other. This implies that the observables chosen in the extraction procedure do not constrain well enough the theoretical models adopted by the Monte Carlos, and it is thus doubtful which (if any) of the two m t results best describes the "physical" pole mass.
While we believe that our approach has many competitive features, it remains true that the determination of the top quark mass will benefit from the use of many different techniques. For example, any BSM physics able to modify in a significant manner the kinematic distributions w.r.t. those predicted by the SM may induce large biases in template-based m t extractions, unless the simulation of such BSM contribution is also taken into account. In this case, an approach insensitive to the production dynamics (which thus belongs to the first class introducted in sect. 1) would offer a valuable addition; one may mention here the CMS end-point method [34], or the promising energy-peak method suggested in ref. [35], provided that it could be extended to include NLO QCD corrections to top decays.
The approach we have pursued here has many variants which do not change its essence. For example, one may start looking into b-jet variables in order to increase the sensitivity to m t ; this has the downside of introducing a larger dependence on long-distance modelling, and the balance between these two competing aspects must be carefully addressed. Conversely, one can try and select dilepton events of opposite flavour without imposing cuts on the b jets, in order to further reduce the impact of hadronisation; the problem then becomes that of the control of the backgrounds. Our method is also immediately applicable in the context of NNLO simulations. However, for this to be effective, a proper description of top decays, and in particular one that incorporates production-spin-correlation effects, must be included. The matching to parton shower would also be highly desirable.
configurations ("events") and their associated weights: Note that the W k 's need not necessarily be equal to each other (in absolute value); in other words, what follows is valid in the context of both unweighted and weighted event generation, these being typically relevant to calculations matched to parton-shower Monte Carlos and at fixed order, respectively. When one computes a differential cross section, one evaluates event-by-event the value of the observable of interest in the generated kinematic configuration, O(K k ); such a value determines, in turn, the bin of the corresponding histogram where the weight W k must be stored. In a completely analogous manner, the calculation of the (unnormalised) moments can also be performed on the fly. In order to do so, for a given observable O one will book a histogram with bins of width one centered at non-negative integers. When the k th event is generated, one stores the weight: in the i th bin of the histogram; this must be done for all bins. By using eqs. (2.3), (A.2), and (A.3) , one sees that at the end of the run the i th bin will be equal to the normalised i th moment, times σ, so that the normalised moments themselves can be obtained by dividing the content of each bin by that of the bin centered at zero.
We point out that this direct way of computing moments is exact in the N → ∞ limit. On the other hand, the (indirect) calculation which uses the result of dσ/dO is not exact even in the N → ∞ limit, unless the limit of vanishing bin size (in dσ/dO) is taken as well, which is impossible in the context of an actual simulation, where one thus might have a residual bin-size inaccuracy. Furthermore, in the case where the range of the histogram in O does not cover the whole kinematically-accessible range for such an observable, another inaccuracy affects the indirect computation. For these reasons, and for its greater simplicity, in this paper we have always adopted the direct, event-by-event method outlined above in the calculation of the moments. We have checked, in the case of the first moments, that the results of the direct computations are very similar, but not identical, to those obtained a-posteriori by using the distributions. It must be stressed that the distributions we have used cover rather large ranges (up to 400 GeV for observables #1, #2, and #3, up to 1.2 TeV for observable #4, and up to 1 TeV for observable #5), and contain 100 bins. Therefore, in the context of e.g. an experimental analysis, where the use of large-size bins is typical at large momenta, the risk of inaccuracies affecting the moments computed from distributions may be non negligible. The latter definition has been adopted in keeping with what has been done in ref. [36], which in turn follows closely the prescriptions of the LEP QCD Working Group [37]. The correlation matrix C αβ , given explicitly below 13 in eq. (B.8), has been computed at one given value of the top mass (173 GeV): we thus neglect effects possibly due to the dependence of such correlations on the top mass, since we expect them to be negligible, especially in the context of eq. (B.5). Given that the correlation between two variables X and Y is defined as with σ X and σ Y the standard deviations, for any two observables O r and O s and their i th and j th moments µ Os and proceed similarly to what is done in eq. (A.3); in particular, we have: We also point out that the calculation of C αβ has been performed by choosing the scale of eq. (3.3), and in the context of an NLO+PS+MS simulations. Although, owing to the form of eq. (B.5), these choices have only a moderate impact on the central values of the combined top masses (as we have verified by setting C αβ = 0), we emphasise again that a more refined procedure will lead exactly to the same conclusions: namely, the necessity of combining the results obtained with different observables and moments, and that of performing a χ 2 -type test on the final outcome. In eq. (B.1) the errors affecting m (α) t are symmetric. In the case when they are asymmetric, the procedure above, and in particular the construction of eqs. (B.4) and (B.5), is repeated twice, for the + and − errors. The two resulting central values for the top mass need not coincide; when this happens, the final central value is taken to be the weighted average of the two, with the weights defined as the inverse of the respective σ 2 (m t )'s as given in eq. (B.3).