NLO and off-shell effects in top quark mass determinations

We study the impact of different theoretical descriptions of top quark pair production on top quark mass measurements in the di-lepton channel. To this aim, the full NLO corrections to $pp\rightarrow W^+W^-b\bar b\rightarrow (e^+ \nu_e)\,(\mu^- \bar{\nu}_{\mu})\,b\bar b$ production are compared to calculations in the narrow width approximation, where the production of a top quark pair is calculated at NLO and combined with three different descriptions of the top quark decay: leading order, next-to-leading order and via a parton shower. The different theory predictions then enter the calibration of template fit functions, which are used for a fit to pseudo-data. The offsets in the top quark mass resulting from the fits based on the various theoretical descriptions are determined.


Introduction
The top quark mass is one of the most important parameters in the Standard Model (SM). As the top quark features the largest Yukawa coupling, it is closely linked to Higgs physics. Furthermore, the Higgs potential and therefore the vacuum stability of the SM depends critically on the value of the top quark mass. Processes involving top quarks allow for important precision tests of the SM and appear amongst the dominant backgrounds for many New Physics searches. They also allow to further constrain the gluon PDF at large x-values [1][2][3][4].
The measurement of the top quark mass is complicated due to the fact that the reconstruction of tt events from complex hadronic and leptonic final states is an arduous task. Measurements of the top quark mass have been performed in various channels by the Tevatron and LHC collaborations, where the latest combinations can be found in Refs. [5][6][7][8].
While the most precise result in the di-lepton channel has an uncertainty of 0.84 GeV [9], the most precise combined results for the top quark mass achieve a precision of about 0.5 GeV [7,8]. The precision achieved nowadays is the result of joint efforts in the experimental as well as the theory community to reduce the systematic uncertainties inherent to top quark mass measurements. For recent theoretical studies with regards to the definition and extraction of the top quark mass, see e.g. [10][11][12][13][14][15][16][17][18][19][20][21].
The theoretical description of top quark pair production at hadron colliders has improved substantially in recent years. For stable top quarks, NNLO corrections to differential the framework developed in Ref. [57] and the 4FNS calculation of Ref. [48]. An alternative algorithm to treat radiation from heavy quarks in the Powheg NLO+PS framework has been presented in Ref. [60]. An improved resonance treatment in the matching to parton showers for off-shell single top production at NLO has been worked out in Refs. [57,61], and similarly for off-shell tt and ttH production in e + e − collisions in Ref. [62].
In this paper, we investigate the impact of different approximations on the top quark mass measurement simulating a concrete experimental setup. In particular, we follow up on an open question raised in Ref. [45], where we performed a study of NLO effects in top quark mass measurements based on the observable m lb in the framework of a top quark mass measurement as performed by ATLAS using the template method [9,63]. Substantial distortions in the m lb distribution are induced by scale variations calculated by including the full NLO corrections to the W + W − bb final state (with leptonic W -decays). On the other hand, in the factorised approach, where the tt cross section calculated at NLO is combined with LO top quark decays in the NWA, the shape distortions due to the scale variations are minor. As the experimental analysis is based on normalised distributions, the shape differences induced by scale variations translate in a very sensitive manner into the theoretical uncertainties on the extraction of the top quark mass.
The question arises where the shape changes come from, i.e. whether they mainly come from the non-factorisable contributions contained in the full NLO corrections to W + W − bb, or from factorisable NLO corrections to the top quark decay. And, if the latter is true, what is the effect of a parton shower in combination with the factorised approach, as it should contain the leading contributions of the NLO corrections to the top quark decay. To answer these questions, we compare the NLO calculation of W + W − bb production of Ref. [45] with the calculation based on the narrow-width approximation where both tt production and decay are calculated at NLO, as described in Ref. [32]. We further quantify the impact of a parton shower in the narrow-width approximation, combining the NLO matrix elements of top quark pair production with Sherpa [64].
The structure of this paper is as follows. In Section 2, we describe our different calculations performed to compare theoretical descriptions of the complex final state of two charged leptons, two b-jets and missing energy. In Section 3, we compare these different theoretical descriptions for a number of observables relevant to top quark mass measurements. We then quantify in Section 4 how the differences in the theoretical descriptions impact a template fit as utilised in experimental determinations of the top quark mass, before we conclude in Section 5.

The different stages of the theoretical description
We study the following descriptions of top quark pair production in the di-lepton channel: NLO full : full NLO corrections to pp → W + W − bb with leptonic W -decays, NLO PS : NLO tt production+shower ⊗ decay via parton showering.
We furthermore use the abbreviation LO full for W + W − bb calculated at leading order, the abbreviation LO LOdec NWA for LO tt production ⊗ LO decay and the abbreviation LO PS for LO tt production ⊗ decay via parton showering. We investigate the effects of different levels in the description of the top quark decay, isolating the latter from the effects of the nonresonant and non-factorisable contributions contained in the NLO full calculation. This is done by emulating a concrete experimental analysis used for top quark mass determinations. As we match to a parton shower only in combination with LO top quark decays, we do not need to address the problem of "resonance-aware matching" [59,61]. This allows us to get a clear idea of the effects of the various approximations used here, which in turn can serve as a basis for future studies entirely relying on showered results.
The calculations NLO full and NLO LOdec NWA have been already described in detail in Ref. [45]. 1 Here we briefly summarise only the main features. We use GoSam [65,66] plus Sherpa [64], version 2.2.3, where the virtual corrections generated by GoSam are linked to Sherpa via the Binoth-Les-Houches-interface [67,68]. This applies not only to the calculations NLO full and NLO LOdec NWA but also to the NLO PS computation. We note that our full NLO calculation of the process pp → W + W − bb → (e + ν e ) (µ −ν µ ) bb provides a complete description of the final state including singly-resonant and non-resonant top quark contributions. Example diagrams are shown in Fig. 1. The computation relies on the 5-flavour scheme, i.e. the b-quark is treated as massless. To take the top quark decay width into account in a gauge invariant way, the complex mass scheme [69] is used. In our setup, this entails a replacement of the top quark mass by a complex number µ t evaluated according to The W -bosons and intermediate Z-bosons also have complex masses due to their widths. Note that we only consider resonant W -boson decays.
The results for the NLO NLOdec NWA calculation are obtained as described in Ref. [32]. 2 This framework relies on the factorisation of the matrix elements according to where P ij→tt describes the tt production process and D t→blν the top quark decay dynamics. Spin correlations are included as indicated by the symbol ⊗. Squaring Eq. (2.2) and integrating over the phase space yields the double-resonant partonic cross section where off-shell effects are parametrically suppressed by Γ t m t ≈ 0.7%. Expanding Eq. (2.2) up to NLO yields The NLO corrections to the production process P δNLO ij→tt involve the virtual and real emission matrix elements M virt gg/qq→tt , M real gg/qq→tt+g and M real qg/qg→tt+q/q . The corresponding NLO decay parts are given by We note that, in contrast to Ref. [32], the top quark width Γ NLO t in the denominator is not expanded as (Γ NLO For our studies relying on NLO LOdec NWA results, we remove all contributions in the second line of Eq. (2.4) and use Γ LO t,W instead of Γ NLO t,W . This treatment guarantees that dPS |D t→blν | 2 = BR(t → blν) at LO and NLO, with BR(t → blν) denoting the branching ratio for the top quark decay.
Finally, the NLO PS computations are based on the NLO plus parton-shower matching scheme as implemented in Sherpa [70]. The original scheme was extended in Ref. [71] to incorporate heavy-quark mass effects. Utilising this scheme, we obtain an NLO+PS accurate description of tt+jets, or, in other words, the NLO description of the tt production shower. The top quark decays are attached afterwards such that LO spin correlations are preserved, and each decay configuration is supplemented by its respective decay shower following the same procedure as described in Ref. [72]. For our investigations, we used Sherpa version 2.2.3. In the course of this work, it was found that this version treats radiation emerging from top quark decays in resonant top quark processes in the same manner as radiation arising from continuum production processes. This resulted in an omission of the initial-state spectator mass term that suppresses the ordinary eikonal radiation of continuum initial-final dipoles. The problem has been identified and solved by the implementation of a dedicated dipole-shower algorithm for the decays, similar to Ref. [73]. The patch implementing these changes has been provided by the Sherpa authors and was used for our results presented below. It will be made available on the corresponding software download pages, and included in the Sherpa program from version 2.2.5 onwards.
3 Phenomenological study of observables sensitive to the top quark mass

Definition of the observables
We study the following observables: • m lb -which we define using the invariant mass squared where p l denotes the four-momentum of the lepton and p b the four-momentum of the b-jet. As there are two top quarks, there are also two possible m lb values per event. Since experimentally, it is not possible to reconstruct the b-quark charge on an event-by-event basis with sufficient accuracy, one also needs a criterion to assign a pair of a charged lepton and a b-jet as the one stemming from the same top quark decay. Following [9], the algorithm applied here is to choose that (l + b-jet, l − b-jet ) pairing which minimises the sum of the two m lb values per event. Finally, the m lb observable used in the analysis is the mean of the two m lb values per event obtained when applying the above procedure.
• m ll -as given by the invariant mass squared of the two charged leptons, defined as • p T,µ -which is the transverse momentum of the muon.
• η µ -which is the rapidity of the muon.

Input parameters and event requirements
We use the PDF4LHC15 nlo 30 pdfas sets [77][78][79][80] and a centre-of-mass energy of √ s = 13 TeV. Our default top quark mass is m t = 172.5 GeV. Leading order top quark and W boson widths are used in the LO calculations and the NLO tt ⊗ LO decay calculation, while NLO widths [81] are used in the remaining NLO calculations. Widths at NLO appearing in propagators are not expanded in α s . The QCD coupling in the NLO widths is varied according to the chosen scale. For α s evaluated at the central scale m t , the numerical values for the widths are Jets are defined using the anti-k T algorithm [82] as implemented in Fastjet [83], with R = 0.4. For the electroweak parameters, we employ the following settings: Inspired by Ref. [9], and taking into account the stronger trigger requirements for a 13 TeV analysis, the following list of event requirements is used. We require • exactly two b-tagged jets with p jet T > 25 GeV and |η jet | < 2.5. Jets containing a bb pair are also defined as b-jets.
• exactly two oppositely charged leptons which fulfill p µ T > 28 GeV, |η µ | < 2.5 for muons and p e T > 28 GeV, |η e | < 2.47 for electrons excluding the range 1.37 < |η e | < 1.52. For both types of charged leptons with respect to any jet fulfilling the jet requirements, a separation of ∆R(l, jet) > 0.4 is required.
• p lb T > 120 GeV. Using the same lepton b-jet assignments as for m lb , the observable p lb T denotes the mean transverse momentum of the two lepton-b-quark systems.
The b-quarks are treated as massless in all fixed-order calculations. We chose µ R = µ F = m t as our central scale. The impact of choosing H T /2 (rather than m t ) as the central scale on the top quark mass determined by our method has been shown to be very small [45]. It furthermore would be difficult to facilitate an H T definition for the NLO PS approach that matches the one used in the NLO full calculation. Even a simplified H T definition that involved only the charged lepton and b-jet transverse momenta and neglected the neutrino momenta would be affected because the parton showering changes the p T spectrum of the final state particles. We therefore considered it more consistent to choose m t as the central renormalisation and factorisation scale throughout all calculations. The scale variation bands are obtained by varying µ R and µ F simultaneously by a factor of two and one half with respect to the central scale. We have also performed 7-point scale variations and found that the simultaneous variations always formed the most conservative uncertainty band in the m lb and m T 2 distributions, Figs. 2 and 3.
For the parton shower results, we have also investigated the impact of a dynamic scale, which we call µ tt , to compute the matrix elements of the hard scattering processes producing the top quark pairs. The scale µ tt is a "colour flow inspired" QCD scale, introduced in Ref. [71]. Using Mandelstam variables s, t and u, it is defined as The value for the gg partonic process is chosen randomly according to the relative size of the two weights w 1 and w 2 . The standard µ R and µ F variations that we employ for our fixed-order calculations are not fully appropriate to assess the theory uncertainties of the NLO PS computations, as the showering depends on further scale and parameter choices. For our studies, it is  Table 2: Fiducial cross sections in various approximations. The first uncertainty is the precision of the Monte Carlo phase space integration. The scale variation uncertainty obtained by simultaneously varying renormalisation and factorisation scales by a factor of two (superscript) and one half (subscript) is given in percent. For the parton shower results, the given scale uncertainties are obtained by using the variation prescription µ F µ R α PS s as detailed in the text.
interesting to vary µ prod Q as well as µ dec Q , which are the parameters controlling the overall size of the resummation domains assigned to the tt production and top quark decay showers, respectively. Within these resummation domains, subsequent shower emissions are evaluated from the values taken by the ordering variable of the parton shower. We therefore also alter the strength of the parton shower emissions by variations of µ PS R , the scale entering the evaluation of the strong coupling α s (µ PS R ) used in the shower kernels. For the Sherpa CSshower, the ordering variable is associated with the local p emit T scales of the individual branchings, which means µ PS R,k ∼ p emit T,k for the k-th branching. For the combined variation of several NLO PS parameters, we follow the principle of identifying the strongest and weakest shower option that one can possibly obtain from the given individual parameter ranges. This is supposed to lead to a conservative shower uncertainty estimate.
Our default variation in the NLO PS case, denoted by µ F µ R α PS s , is a combination of simultaneously varying µ F , µ R and µ PS R by a factor of two up and down, with central scale m t . Alternative ways of uncertainty assessment include the variation of µ prod Q and µ dec Q . The Sherpa default is to set µ prod Q equal to the factorisation scale, while the starting scale of the decay shower is set to µ dec Q = M W /2 and not varied. 3 The different scale variation schemes, which are used by us in the NLO PS case are summarised in Table 1. For each of the schemes shown in Table 1, the uncertainty bands are defined as the maximum deviation from the central prediction on either side.

Comparison of the different theoretical descriptions
In this section, we compare four different NLO descriptions of the (e + ν e ) (µ −ν µ ) bb final state for the observables described in Section 2. Some of the purely leptonic observables have also been used by the ATLAS collaboration for their recent top quark mass determinations based on 8 TeV data presented in Ref. [76]. Aiming to quantify the relative differences of the theoretical descriptions, which should only mildly depend on the centreof-mass energy, we show results at the present LHC setting of 13 TeV. The corresponding fiducial cross sections are summarised in Table 2. While the level of agreement between the fixed-order full and NWA calculations is as expected, considerably smaller cross sections are obtained for the parton shower calculations. Showering leads to a softening of the final state b-jets. In turn, a good fraction of them no longer satisfy the jet requirements, resulting in an event loss. The parton shower computation with µ = µ tt leads to an even smaller fiducial cross section than the computation relying on µ = m t , which is a consequence of the fact that the µ tt scale is larger, and therefore the value for α s is smaller. In both cases, however, the loss of events due to insufficiently energetic b-jets after parton showering is similar, and amounts to about 12%.
In Fig. 2, we present the normalised differential cross sections for m lb based on the four theoretical descriptions, evaluated at µ R = µ F = m t . In the lower part of the figure, we show their ratio to the NLO full prediction, including an uncertainty band from scale variations by a factor of two and one half with respect to the central scale. We find that 99% of the total fiducial cross section is accumulated in the range 40-150 GeV. A kinematic edge at m edge lb = m 2 t − M 2 W = 152.6 GeV leads to a sharp drop in the distribution beyond which it is only populated by non-resonant contributions, additionally clustered radiation and incorrect b-lepton pairings. The significantly larger scale uncertainty for m lb ≥ 150 GeV is due to the fact that NLO is the first non-trivial order populating this region. This conclusion is further substantiated by the sizeable perturbative correction that we discuss in the following section. Hence, resummation effects are expected to play a larger role in the vicinity of this kinematic boundary.
We now discuss the impact of off-shell and non-resonant contributions on the m lb distribution. Their effect is easiest seen by discussing NLO NLOdec NWA , displayed in the lower part of Fig. 2. In the range 30 GeV ≤ m lb ≤ 130 GeV this prediction agrees with the full calculation to within a few percent. The deviations are barely visible within the statistical fluctuations. Around the peak region of the differential cross section for m lb , the NWA calculation overshoots by about 4%. This level of agreement is to be expected given the parametric suppression of off-shell effects by Γ t m t , which is mildly violated by the applied phase space restrictions. For m lb ≥ 130 GeV, the difference between NLO NLOdec NWA and NLO full starts to grow and saturates at about −50% for m lb values larger than m edge lb . Again, this is to be expected as the NWA does not apply in this part of the phase space. In fact, the LO LOdec NWA prediction (not shown in Fig. 2) vanishes for m lb ≥ m edge lb . It is also interesting to study the NLO LOdec NWA prediction to investigate the importance of NLO corrections to the top quark decay. We find significant shape differences compared to the full calculation of the order of about −10% for m lb around 50 GeV, rising to about +20% around m lb ∼ 140 GeV. Therefore, it is crucial in the application of the NWA to account for a fully consistent NLO treatment of production and decay. For m lb ≥ m edge lb , the description completely fails.
Comparing NLO PS with NLO LOdec NWA , we find that the parton shower treatment of the top quark decay drives the shape more towards the NLO full case for m lb > m edge lb . For low m lb values, the parton shower result mostly lies between the NLO LOdec NWA and NLO NLOdec NWA predictions. Finally, we discuss the shape differences introduced by the different descriptions in the light of the scale uncertainties. For clarity of the presentation, we only show the scale band of the NLO full reference prediction in the lower part of Fig. 2. For the other cases, we refer to Section 3.3.2. We observe that in the bulk of the distribution, shape differences of NLO NLOdec NWA with respect to NLO full lie inside the uncertainty bands. In contrast, both NLO LOdec NWA and NLO PS exhibit differences to NLO full outside their respective uncertainty bands, NLO PS however being much closer to NLO full than NLO LOdec NWA (see also Fig. 6b). In Fig. 3, we show the normalised distribution of m T 2 as defined in Eq. (3.2), for the four theoretical descriptions. By construction, this observable has a sharp kinematic edge at m T 2 = m t , which is clearly visible and mildly washed out by off-shell effects, ambiguities related to missing energy and jet recombination. We find that for the NLO full prediction, 97% of the total fiducial cross section is contained below m T 2 ≤ m t . The shapes of the different theoretical descriptions follow patterns very similar to those observed for m lb . In particular, the NLO NLOdec NWA prediction closely follows NLO full up to the kinematic edge, with shape differences of a few percent, but in general within the scale uncertainty band.
In Fig. 4a, we show the di-lepton invariant mass m ll . We observe that off-shell effects are small and that all theoretical descriptions agree at the 10% level. This is expected because m ll is an observable which is inclusive in what concerns extra radiation. The descriptions NLO LOdec NWA and NLO PS show a very similar behaviour and are outside the uncertainty bands of the NLO full prediction except for low m ll values. In Fig. 4b  contrast to the m ll case, the NLO LOdec NWA and NLO PS predictions also differ significantly from each other.
In Figs. 5a and 5b, we show the muon rapidity η µ and the muon transverse momentum p T,µ , respectively. Our four theoretical predictions for the (e + ν e ) (µ −ν µ ) bb final state show a rather different behaviour in these two distributions. While the whole rapidity spectrum in Fig. 5a is properly modelled by all predictions, the transverse momentum spectrum in Fig. 5b is somewhat softer in the tail for the NLO NLOdec NWA and NLO PS calculations with respect to NLO full . A possible interpretation is that non-resonant contributions in NLO full contain W -bosons stemming from a hard collision rather than the top quark decay. Therefore they can carry higher energies which lead to a harder transverse momentum spectrum of the muon.

Scale dependence at LO and NLO
In this section, we will only consider the observables m lb , m T 2 , m ll and E ∆R T , as they are promising with respect to at least one of the requirements of being observables with small systematics and/or high sensitivity to the top quark mass.
For NLO full , we compare LO and NLO predictions on the left-hand side, while in the figures on the right-hand side, we compare calculations based on the NWA, including scale variations 4 . We observe that the NLO corrections in the NLO full case lead to significant shape differences compared to LO full , see Figs. 6a to 9a. While this is to be expected in the tails of the distributions, it is remarkable that the shape difference also affects the central and in particular the regions with low values of the observables. Given that the differences between the LO and NLO theory predictions in the full W + W − bb calculation are still sizeable in the bulk of the distributions, large differences in the top quark mass extracted from templates based on these predictions can be expected. The shape differences at low values of m lb and m T 2 are less pronounced in the calculations based on the NWA (with NLO in the tt production), as can be seen from Figs. 6b and 7b. However, there are also significant shape differences in the bulk of the distribution. In addition, for the m lb distribution, Fig. 6b, the peak is lower in the NLO NLOdec NWA and the NLO PS case compared to the NLO LOdec NWA case, which can be easily understood considering the fact that more radiation, i.e. a harder distribution in the tail, softens the peak region.
For the observable m ll , the shape differences introduced by the NLO full calculation at low m ll values are particularly pronounced in  Table 2, the total cross section predicted by NLO PS is considerably smaller. This is due to the fact that after the shower, the b-jets are softer and therefore a larger fraction of events does not pass the requirement of two b-jets above p jet T,min = 25 GeV. Even though the observable m ll does not involve jets, the jet requirements affect this observable, since we use the data set produced with the same requirements as for the other observables. A similar pattern is seen in the observable E ∆R T (Figs. 9a and 9b).
The scale variation bands in the NLO full case and the NLO NLOdec NWA case are rather asymmetric: the central scale leads to the largest differential cross section compared to upand downwards variations over a large kinematic range of the corresponding observable. This effect is particularly pronounced for the m ll and E ∆R T distributions.

Distributions for several top quark masses
In this section, we investigate the sensitivity of the four observables m lb , m T 2 , m ll and E ∆R T to variations of the top quark mass. We exploit distributions based on the NLO full calculation using the three values, m t = 165, 172.5, 180 GeV, for the top quark mass.
We observe a strong sensitivity of the m lb and m T 2 distributions to the top quark mass with ratios up to about three in the given range. A lower top quark mass naturally leads to a softer spectrum while a higher top quark mass leads to a harder spectrum in these two observables. The sensitivity of m ll is shown in Fig. 11a and turns out to be very small. Unfortunately, being a purely leptonic observable, the low sensitivity counterbalances its expected [10] better experimental systematics. Compared to the m ll distribution, the E ∆R T distribution in Fig. 11b shows a somewhat larger sensitivity to m t , albeit much smaller than what is observed for m lb and m T 2 .  Figure 11: Effect of top quark mass variation on the normalised differential cross section for m ll and E ∆R T . We also show the ratios to the prediction obtained with m t = 172.5 GeV. The results are obtained with the NLO full description for the 13 TeV LHC.

Measurement of the top quark mass based on pseudo-data
The top quark mass measurements in the di-lepton channel presented in Refs. [9,63,84] use the template method. In this method, simulated distributions are constructed for different input values of the top quark mass, m in t . The distributions (templates) per m in t are then individually fitted to a suitable function. Using templates at different m in t , it is verified that all parameters of the function linearly depend on m t = m in t . Consequently, this linearity is imposed in a combined fit to all templates. This fit fixes the theory prediction (i.e. the parametrisation of the theory hypothesis) by determining all parameters of the function, except for m t and the absolute normalisation. The former is to be determined from the data and represents the fit result, while the latter is left as a free parameter. We therefore follow the experimental procedure to neglect the absolute normalisation in the fit to avoid a dependence on the involved experimental determination of the total luminosity and detector efficiency. This choice makes the results of this study independent of the total cross section of the respective calculations, leaving shape changes of the differential distributions as the measure for m t . Using those parameter values, a likelihood fit of this function to data is performed to obtain the value for m t that best describes the data, namely m out t , together with its statistical uncertainty. In experimental analyses, these templates are constructed at the detector level, i.e. mimicking real data. Here, an analogous procedure is employed to assess the impact of different theory descriptions on the template method used to determine the top quark mass. In our analysis, the pseudo-data mimicking experimental data (i.e. the data model) in each figure are always generated from those predictions, which are believed to be closer to real data, i.e. those that are considered to give the "better" result. We simulate a data luminosity of 50/fb.
The sensitivity to the theoretical assumptions and their uncertainties is assessed by fits to one thousand pseudo-data sets created by random sampling from the underlying theory prediction. The layout of Figure 12a is representative for an entire set of figures presented in the following. For three different values of m in t , each of these figures shows the observed difference of m out t , the mass measured by the procedure, and m in t , the mass used to generate the pseudo-data. The red/blue points correspond to the mean difference observed for all pseudo-data sets that are produced as stated in the second line of the figure legends, and analysed with the template fit functions (the theory hypothesis), denoted by "calibration" in the legend for the red/blue points. The uncertainty per point is statistical only and corresponds to the expected experimental uncertainty for the assumed data luminosity. The points are displaced on the horizontal axis to ensure better visibility in the case of overlapping bands. The horizontal lines stem from a fit of the three points to a constant, displaying the average offset. The values given are the (individual) offsets together with their statistical uncertainties. The bands indicate the effect of the scale variations on the measured m t . They are obtained by replacing the central-scale pseudo-data by those derived from the associated samples, which were calculated using the varied scales.
The ranges of the fits have been chosen on a plateau of good fit performance and high mass sensitivity. The ranges of choice are 80 GeV ≤ m T 2 ≤ 180 GeV .
Note that for the NLO PS calculations employing the µ tt scale, we used a fit range of 50 GeV ≤ m lb ≤ 150 GeV. As the range around the kinematic edge is a particularly m t -sensitive region, the question arises how much our results depend on the chosen fit range. Therefore we produced another set of fits where we restricted the fit range to m lb < 140 GeV, and found that the results are sufficiently stable under this change of the fit range. The results of both fit ranges are reported below. Figure 12a shows results of a fit where the pseudo-data have been generated using the factorised approach with NLO LOdec NWA . The fit has been performed once with LO LOdec NWA as the theory model (blue) and once with NLO LOdec NWA (red). The vanishing offset (i.e. it is compatible with zero) for the red lines (here and in all the following figures) proves that the method is closed, i.e. it finds the input value when the pseudo-data and the calibration coincide. The offset between the blue and red lines in Fig. 12a shows the effect of changing the perturbative order of the production process in the theory model. The offset of 0.51 ± 0.06 GeV demonstrates that these corrections have an impact on the mass determination at the level of the present experimental uncertainties. As the fits are based on normalised differential cross sections, the bands are sensitive to shape differences induced by the scale variations, rather than to their overall magnitudes. Figure 12b shows results of a fit where the pseudo-data have been generated using the factorised approach based on the NLO NLOdec NWA , i.e. the NWA at NLO, while the theory models differ in the decay order only. We observe that the effect of an O(α s ) change in the perturbative order of the decay is more significant than changing the order in the production process. The offset stemming from the former amounts to −1.80 ± 0.06 GeV, while switching from LO to NLO in the description of the production process yields an offset of 0.51 ± 0.06 GeV (cf. Fig. 12a). In addition, the size of the uncertainty bands increases because the NLO corrections to the decay lead to non-uniform scale variation bands. Figure 13a shows the effect of changing the perturbative order in both the production and decay process. Comparing Figs. 12 and 13a, we observe that, within the statistical uncertainties, the offset in Fig. 13a coincides with the sum of the offsets in Figs. 12a and 12b, as is expected for the factorised approach. Figure 13b shows results of a fit where the pseudo-data have been generated using the NLO full calculation, and the calibrations are based on the NLO full and LO full descriptions. While the uncertainty bands are comparable to the factorised case that uses pseudo-data based on NLO NLOdec NWA (Fig. 13a), the offset increases from −1.38 ± 0.07 GeV to −1.52 ± 0.07 GeV. While this increase in the offset is not conclusive when taking the statistical uncertainty into account, it still is an indication of the trend that the inclusion of a richer set of corrections leads to larger offsets.

Fit results for m lb
In Fig. 14a, we again use pseudo-data generated according to NLO full , this time comparing the fit based on the full NLO calibration to the one obtained with the NLO NLOdec NWA calibration representing the factorised NLO approach. We see that the offset of 0.83±0.07 GeV is smaller in magnitude than in Fig. 13b, and goes in the opposite direction. This indicates that the non-factorisable contributions are suppressed in the fit range, since the NWA, with the corrections to the decay included, is a better approximation than LO full only.
In Fig. 14b, we replace the NLO NLOdec NWA calibration by the one from the NLO PS prediction. We observe an offset of −0.09 ± 0.07 GeV, which is surprisingly small compared to that given in Fig. 14a. It is expected that the two NWA-based descriptions, both including the leading radiation in the decay, lead to quite similar results. However, the NLO PS simulation differs from the NLO NLOdec falls short of describing the top quark decay beyond the soft limit owing to the absence of decay matrix-element corrections, the parton shower approach generates a very different, more complete QCD radiation pattern as a result of including resummation effects in the production as well as the decay of the top quarks. This means that the two stages of tt production and decay are not factorised in exactly the same way as in the NLO NLOdec NWA calculation. These differences explain why the offset in Fig. 14a is different from the one in Fig. 14b. In fact, as can be seen from Figs. 2 as well as 6b, the emission pattern and resummation effects of the NLO PS case are relevant at lower m lb values and in particular around (and above) the kinematic edge, and lead to a shape of the m lb distribution, which differs from the fixed-order NLO NLOdec NWA case. Especially for the m lb ∼ 140 GeV region, we notice that the agreement between NLO PS and NLO full is better than between NLO PS and NLO NLOdec NWA . This is an indication that in this region, resummation effects are more important than the inclusion of the radiative correction in the decay. The nearly vanishing mass offset shown in Fig. 14b occurs due to the fact that the shapes of NLO PS and NLO full do not differ significantly in most of the fit range, despite their different theoretical content.
In Fig. 15, we use pseudo-data generated according to the NLO PS prediction using the scale setting µ F = µ R = m t . The related scale variations have been obtained by employing the µ F µ R α PS s scheme as described at the end of Section 3.2. By comparing to Fig. 12a, we observe that the uncertainty bands of NLO PS are smaller than the ones for NLO LOdec NWA . However, for the theory models relying on NLO decays, as shown in Fig. 12b for NLO NLOdec NWA and in Fig. 13b for NLO full , the bands are much wider. Hence, we expect that adding a parton shower to the NLO full calculation, the bands would persist or be only slightly reduced, analogous to the LO decay situation discussed above.
Unlike the case presented in Fig. 14b, the direct comparison between results from the NWAs and NLO PS produces non-vanishing mass shifts. If we analyse the NLO PS pseudo- data using the fixed-order NLO LOdec NWA calibration, we find a mass offset of −0.92 ± 0.07 GeV as shown in Fig. 15a. This indicates that the parton shower emissions (in both stages), supplementing the NLO accurate tt production, have a considerable impact on the results. In addition, a significant dependence of the NLO LOdec NWA calibration offset on the top quark mass is observed, i.e. the blue points are inconsistent with the constant fit. This implies that the NLO LOdec NWA m lb distribution has a stronger dependence on the top quark mass than the one generated by NLO PS . A similar trend has been seen in Fig. 12b, where NLO LOdec NWA is compared to NLO NLOdec NWA . Turning to Fig. 15b, we show the case where the NLO PS pseudo-data have been confronted with the improved fixed-order model NLO NLOdec NWA . For this case, we would expect a pseudo-data-theory agreement which is better than the one seen in Fig. 15a, since both the NLO PS and the NLO NLOdec NWA description contain the major contributions to describe the extra emission in the top quark decays. However, the offset of 0.96 ± 0.07 GeV is similar in size (while opposite in direction) compared to the LO decay case shown in Fig. 15a. This is consistent with the offset differences shown in Table 3, for example subtracting the offset given in Fig. 12b from the one in Fig. 15a, or alternatively the one in Fig. 14b from the one in Fig. 14a.
Further investigations are needed to understand the source of the mass shift observed in Fig. 15b. Based on the current findings, we cannot conclude whether it originates from (i) the inclusion of resummation effects, or (ii) genuine differences in incorporating the fixed-order QCD corrections to the production 5 and decay of the top quark pairs, or both. The different radiation patterns generated by NLO PS and NLO NLOdec NWA do not allow 5 The NLO treatment of production times decay is implemented differently in NLOPS and NLO NLOdec NWA . The parton shower calculation uses a multiplicative approach, whereas the fixed-order calculation is expanded in αs up to O(α 3 s ), therefore leading to differences which are formally of next-to-next-to leading order.  Figure 16: Results of the restricted-shower study for the m lb observable using NLO (n prod max ,n dec max ) PS parton showers that terminate after a certain maximal number of emissions in both the production and decay showers. In (a) mass offsets are shown for a number of pseudo-data sets using the NLO NLOdec NWA and the NLO PS calibrations in the shape analysis. The sets of pseudo-data are generated according to the NLO NLOdec NWA description, the default NLO PS as well as three NLO PS showers that differ in n prod max = n dec max = n max . The corresponding m lb distributions for the case m t = 172.5 GeV are given in (b).
for a strict, same-level comparison between the two approaches, but reducing the amount of radiation produced by NLO PS is expected to bring them closer to each other, and to diminish the role of resummation effects.
There is no unique way of limiting the scope of the resummation. To control the generation of a reduced branching pattern, we use an approach where each showering process can be terminated after a (given) fixed number of emissions, denoted by n max . For our study, we rely on the fully factorised version of combining the subshowers, i.e. we separately restrict the number of emissions to no more than n (n prod max = n dec max = n) in each subshower (the primary one evolving the tt production and the secondary one evolving the decays). The combination of one-emission production and decay showers (n prod max = n dec max = 1) can then be used to emulate the NLO NLOdec NWA calculation, which enables us to approximately separate effects (i) from (ii). In addition, comparing the restricted and full NLO PS prescriptions will provide us with a qualitative estimate of the impact of the full resummation. Starting from n prod max = 1 and n dec max = 1, we can successively restore the full shower by incrementing the number of emissions. Figure 16 summarises the results of the restricted-shower studies. Figure 16a shows the offsets and their statistical uncertainties for sets of pseudo-data analysed with two calibrations, namely NLO PS and NLO NLOdec NWA , while the figure to the right, Fig. 16b, depicts the corresponding m lb distributions. The leftmost bin in Fig. 16a corresponds to the mass shifts displayed in Fig. 15b. The blue bar depicts the offset of the NLO PS pseudo-data,  Figure 17: Results of the restricted-shower study for the m lb observable using pure production, pure decay and pure LO parton showers only. In (a) mass offsets are shown for a number of pseudo-data sets using the NLO NLOdec NWA and the NLO PS calibrations in the shape analysis. The sets of pseudo-data are generated according to the NLO NLOdec NWA description, the default NLO PS and LO PS showers as well as NLO PS showers whose evolution is restricted to the production or decay stage only. The corresponding m lb distributions for the case m t = 172.5 GeV are given in (b).
analysed with the NLO NLOdec NWA calibration. The almost vanishing red bar shows the closure for the NLO PS pseudo-data and calibration. Moving to the right, the parton shower is more and more restricted, allowing for at most 12, 4 and 1 emissions in each subshower. This results in a smooth transition from the offset of 0.96 ± 0.07 GeV to almost zero (with an indication of a small overshoot to negative offsets). The mass shifts becoming fairly small for more restricted showering indicate that most of the differences between the NLO NLOdec NWA and NLO PS predictions emerge from resummation effects. Finally, the rightmost bin is for the NLO NLOdec NWA pseudo-data themselves. The calculated offsets, obtained from fits to the m lb distributions like the ones in Fig. 16b, receive contributions from regions with large differential cross sections and small differences between restricted shower and calibration (NLO NLOdec NWA and NLO PS ) results, as well as from regions with small differential cross sections and relatively large differences. The interplay of these effects can lead to situations such as the one observed here, where the mass offsets obtained from NLO  Fig. 17, the same calibrations as in Fig. 16 are used. We find that the offsets for the LO PS and NLO PS predictions agree very well, although the shape of the LO PS m lb distribution in Fig. 17b substantially deviates from the NLO PS one outside the range 70 GeV < m lb < 140 GeV. This means we observe similar compensating effects in the fit as discussed for Fig. 16. The small difference in the offsets indicates that the NLO treatment of the production process included by the NLO PS prescription has a minor impact on the fit. The nearly vanishing offset between the LO PS pseudo-data and NLO PS calibration is likely to be a consequence of the same resummation corrections being applied in both showers.
The NLO (∞,0) PS prediction in Fig. 17 can be considered as the shower correction to tt production at NLO (NLO LOdec NWA ), while the NLO (0,∞) PS prediction represents the shower approximation to the radiative corrections in the top quark decays. The use of the related pseudo-data increases the absolute mass offsets for both the NLO PS and the NLO NLOdec NWA calibration, illustrating that the production shower predominantly evolves through initialstate radiation (resulting in larger fitted m t ) while the decay showers are mostly driven by final-state radiation (yielding smaller m t ). This is induced by the corresponding m lb distributions in Fig. 17b, where we observe that the NLO pseudo-data. This means that the generation-level factorisation (dissection) of the emission patterns for production and decays almost completely carries over to the analysis level.
The m lb distributions of the restricted and full showering show clear differences. To quantify the significance of these differences, the parton shower scale uncertainties are assessed. For the decay showers, we performed a decay shower starting scale variation by using factors of 0.5 and 2.0 applied to the central scale µ dec Q . Despite this wide range for varying the resummation scales, we find negligible differences in the shapes of the m lb distributions. Therefore, all variations of the shower description employed here are always based on the fixed value µ dec Q = M W /2. We use the different schemes described in Section 3.2 to obtain the scale-variation induced theory uncertainties of the NLO PS prescription presented in Fig. 18a. While the combined variation, µ F µ R µ Q α PS s , leads to the smallest uncertainty band, the band based on the µ F µ R µ Q parameter variation is marginally larger. Most notably, these differences are much smaller than those occurring between the various theory descriptions discussed above.
Finally, for the NLO PS calculations, we compare in Fig. 18b the results for the two central-scale choices µ R = µ F = m t and µ R = µ F = µ tt as defined in Eq. (3.7). Although the predicted total cross sections listed in the last two rows of Table 2 depend on this choice, the two predictions lead to consistent measured top quark masses, i.e. the associated offsets agree within their uncertainties.
As can be inferred from Fig. 10a, the sensitivity of the m lb observable to the top quark mass, and consequently the achievable statistical uncertainty on m t in data, depends on the fit range used. In this context, the range 140-160 GeV is a particularly m t -sensitive region, which however also features sizeable differences in the theoretical descriptions, for  Table 3: Summary of the offsets observed when analysing pseudo-data listed in the first column with template fit functions calibrated based on various theoretical predictions as given in the second column. The observed offsets for the two observables m lb and m T 2 are reported in the second pair of columns, where the corresponding figures are listed in the next pair of columns. Finally, the χ 2 for the differences in the offsets for the two observables are displayed in the rightmost column, see text for further details.
example as shown in Fig. 2. Consequently, the resulting offsets listed in Table 3 depend on the chosen fit range. As an example, restricting the fit to m lb < 140 GeV results in absolute differences in the offsets between full range and reduced range of min = 0.05 GeV and max = 0.36 GeV, where min corresponds to NLO PS pseudo-data versus NLO NLOdec NWA calibration, and max corresponds to NLO NLOdec NWA pseudo-data versus LO LOdec NWA calibration. In general, larger differences are observed either for larger absolute offsets or for cases with large uncertainty bands. As a result, within the given uncertainties the general pattern does not depend on the fit range. An experimental analysis should be optimised for the smallest total uncertainty, including the variation of the relative importance of statistical and systematic uncertainties, while changing the fit range. Therefore we consider the results shown in Table 3, based on the fit ranges given in Eq. (4.1), as our nominal values.

Fit results for m T 2
The investigations performed for the m lb observable are repeated for m T 2 . The results corresponding to Figs. 12a to 18b are shown in Figs. 19a to 25b. Also for m T 2 , the offsets obtained when using the corresponding pair of pseudo-data and calibration are consistent with zero, i.e. the method is closed.
While most observations are consistent for the m lb and m T 2 observables, there are some remarkable differences. For m T 2 , comparing distributions with LO and NLO in production generally results in an m t dependent offset. This indicates that the NLO prediction has a weaker mass dependence than the LO one. The slope of the m lb distribution in Fig. 15 is less steep than the one in Fig. 22. This indicates a different effect of the parton shower on   Figure 19: Same as Fig. 12 but for the observable m T 2 . the more inclusive m T 2 , retaining a higher sensitivity to the top quark mass.
The offsets observed for the various pairs of pseudo-data and calibration are given in Table 3. The comparison of the offsets obtained for m T 2 with those for m lb exhibits a very similar pattern. To investigate whether the sensitivity of the observables to differences in the theoretical predictions coincides, the differences in their offsets are expressed by a χ 2 calculated from the offsets, using the fact that the offsets are uncorrelated for their statistical uncertainties. 6 For a number of pairs the differences of the offsets for m lb and m T 2 are consistent with zero, leading to small values of χ 2 , for example when comparing NLO LOdec NWA with LO LOdec NWA (Figs. 12a and 19a). In contrast, most notably for the pair NLO PS and NLO NLOdec NWA (Figs. 15b and 22b), the difference is significant, leading to a large χ 2 . This means, at the expected statistical precision of the 13 TeV LHC, the two estimators exhibit different sensitivities to this difference in the theoretical prediction. 6 Given oi ± ui for the offsets oi and their uncertainties ui with i = 1, 2 = m lb , mT 2, the χ 2 is defined as:

Conclusions
We have studied the impact of various theoretical descriptions for top quark pair production on measurements of the top quark mass in the di-lepton channel. In particular, we have compared the NLO QCD results for W + W − bb production (NLO full ) to results based on the narrow-width approximation, combining tt production at NLO with (i) LO top quark decays (NLO LOdec NWA ), (ii) NLO top quark decays (NLO NLOdec NWA ) and (iii) a parton shower (NLO PS ). We have assessed the theoretical uncertainties associated with the different theory descriptions via the variation of renormalisation, factorisation and shower scales, and investigated the top quark mass sensitivity of the observables m lb , m T 2 , m ll and E ∆R T . Based on these results, we then studied the prospects of a top quark mass extraction from the observables m lb and m T 2 , which we found to be most sensitive to top quark mass variations. Using pseudo-data based on our calculations, we employed the template method to determine the offset in the top quark mass from calibrations that differ in their underlying theory description. These analyses show that the behaviour of the observables m lb and m T 2 is rather similar in what concerns the observed offsets in the top quark mass.
More importantly, we found that the NLO corrections to the top quark decay play a significant role, because they lead to non-uniform scale uncertainty bands. As the fits are based on normalised differential cross sections, shape differences induced by the scale variations will lead to larger theory uncertainties for the top quark mass extraction. Even though the total scale uncertainties decrease at NLO as to be expected, the shape changes on the m lb distribution induced by scale variations are particularly pronounced in the cases where the decay is described at NLO. For both the NLO full as well as the NLO NLOdec NWA description, the theoretical uncertainties in determining m t therefore increase by at least a factor of two compared to the uncertainties emerging when LO decays are involved. Furthermore, the direct comparison of theories differing in their treatment of the top quark decays can lead to offsets of more than 1 GeV in the measured m t value. This is observed in both cases, i.e. when confronting NLO full pseudo-data with the LO full calibration and pseudo-data with the NLO LOdec NWA calibration. These findings indicate that the non-resonant and non-factorising contributions have a smaller effect on the top quark mass extraction than the NLO treatment of the decay.
Turning to the parton shower (NLO PS ) results of our analysis approach, we have compared them to the theory models NLO full and NLO NLOdec NWA , leading to mass shifts of −0.09 ± 0.07 GeV and 0.96 ± 0.07 GeV, respectively (in the m lb case). The good agreement between NLO full and NLO PS results can be attributed to the fact that the two descriptions are rather similar for an appropriate fit range, but it does not mean that the two descriptions agree for the entire m lb range. Resummation effects for low m lb values in the NLO PS case and off-shell effects affecting the tail in the NLO full case are clearly visible in the m lb distribution. The differences between NLO PS and NLO NLOdec NWA mainly originate from the regions of small and near-edge m lb values, where resummation corrections play an important role.
To better understand these differences, we investigated the parton shower behaviour in more detail. We considered results where we limit the number of emissions in both the production and the decay showers, and indeed observe that the predictions of such restricted parton showers move closer to the fixed-order NLO NLOdec NWA result. These investigations also showed that the resummation corrections incorporated by the unrestricted showers may lead to effects on the top quark mass determination that can be as large as 1 GeV. In addition, we have switched off the shower emissions in either production or decay, and found that both the production and the decay showers impact our analysis in a significant manner. Different ways to assess the shower scale uncertainties within the NLO PS description were also studied but their effect turned out to be small. The choice of a different central scale also had only a minor impact on the mass determination.
We finally investigated how the choice of the fit range impacts our results and found that the corresponding offsets do not change considerably if the fit range is altered (in a way that still leads to acceptable closure).
Based on our results, we expect that the non-uniform scale variation bands in the m lb distribution, induced by NLO corrections to the decay as present in the NLO full calculation, would not level out largely if a parton shower was matched to NLO full . It is therefore conceivable that a top quark mass extraction based on LO (or shower approximated) decays may underestimate the theoretical uncertainties, even if higher perturbative orders in the top quark pair production process are taken into account.
In the future, it would be very interesting to see how the pseudo-data used here compare to real data. In this context, the impact of hadronisation and colour reconnection effects should be studied. Owing to the rather strong impact of the resummation, it would also be useful to perform a dedicated comparison of different parton shower prescriptions such as different evolution variables and recoil strategies. Furthermore, it would be worthwhile to investigate how the NLO results for the full W + W − bb final state, ideally matched to a parton shower, compare to NNLO results for top quark pair production in the narrow-width approximation, combined with different descriptions of the top quark decay.