Measurement of the top quark mass in the $t\bar{t}\to$ lepton+jets channel from $\sqrt{s}=8$ TeV ATLAS data and combination with previous results

The top quark mass is measured using a template method in the $t\bar{t}\to$ lepton+jets channel (lepton is $e$ or $\mu$) using ATLAS data recorded in 2012 at the LHC. The data were taken at a proton-proton centre-of-mass energy of $\sqrt{s}=8$ TeV and correspond to an integrated luminosity of 20.2 fb$^{-1}$. The $t\bar{t}\to$ lepton+jets channel is characterized by the presence of a charged lepton, a neutrino and four jets, two of which originate from bottom quarks ($b$). Exploiting a three-dimensional template technique, the top quark mass is determined together with a global jet energy scale factor and a relative $b$-to-light-jet energy scale factor. The mass of the top quark is measured to be $m_{top}= 172.08 \pm 0.39 (stat) \pm 0.82 (syst)$ GeV. A combination with previous ATLAS $m_{top}$ measurements gives $m_{top}= 172.69 \pm 0.25 (stat) \pm 0.41 (syst)$ GeV.


Introduction
The mass of the top quark m top is an important parameter of the Standard Model (SM). Precise measurements of m top provide crucial information for global fits of electroweak parameters [1][2][3] which help to assess the internal consistency of the SM and probe its extensions. In addition, the value of m top affects the stability of the SM Higgs potential, which has cosmological implications [4][5][6].
In this paper, an ATLAS measurement of m top in the tt → lepton + jets channel is presented. The result is obtained from pp collision data recorded in 2012 at a centre-of-mass energy of √ s = 8 TeV with an integrated luminosity of about 20.2 fb −1 . The analysis exploits the decay tt → W + W − bb → νqq bb, which occurs when one W boson decays into a charged lepton ( is e or µ including τ → e, µ decays) and a neutrino (ν), and the other into a pair of quarks. In the analysis presented here, m top is obtained from the combined sample of events selected in the electron+jets and muon+jets final states. Single-top-quark events with the same reconstructed final states contain information about the top quark mass and are therefore included as signal events.
The measurement uses a template method, where simulated distributions are constructed for a chosen quantity sensitive to the physics parameter under study using a number of discrete values of that parameter. These templates are fitted to functions that interpolate between different input values of the physics parameter while fixing all other parameters of the functions. In the final step, an unbinned likelihood fit to the observed data distribution is used to obtain the value of the physics parameter that best describes the data. In this procedure, the experimental distributions are constructed such that fits to them yield unbiased estimators of the physics parameter used as input in the signal Monte Carlo (MC) samples. Consequently, the top quark mass determined in this way corresponds to the mass definition used in the MC simulation. Because of various steps in the event simulation, the mass measured in this way does not necessarily directly coincide with mass definitions within a given renormalization scheme, e.g. the top quark pole mass. Evaluating these differences is a topic of theoretical investigations [15][16][17][18].
The measurement exploits the three-dimensional template fit technique presented in Ref. [9]. To reduce the uncertainty in m top stemming from the uncertainties in the jet energy scale (JES) and the additional b-jet energy scale (bJES), m top is measured together with the jet energy scale factor (JSF) and the relative b-to-light-jet energy scale factor (bJSF). Given the larger data sample than used in Ref. [9], the analysis is optimized to reject combinatorial background arising from incorrect matching of the observed jets to the daughters arising from the top quark decays, thereby achieving a better balance of the statistical and systematic uncertainties and reducing the total uncertainty. Given this new measurement, an update of the ATLAS combination of m top measurements is also presented. This document is organized as follows. After a short description of the ATLAS detector in Section 2, the data and simulation samples are discussed in Section 3. Details of the event selection are given in Section 4, followed by the description of the reconstruction of the three observables used in the template fit in Section 5. The optimization of the event selection using a multivariate analysis approach is presented in Section 6. The template fits are introduced in Section 7. The evaluation of the systematic uncertainties and their statistical uncertainties are discussed in Section 8, and the measurement of m top is given in Section 9. The combination of this measurement with previous ATLAS measurements is discussed in Section 10 and compared with measurements of other experiments. The summary and conclusions are given in Section 11. Additional information about the optimization of the event selection and on specific uncertainties in the new measurement of m top in the tt → lepton + jets channel are given in Appendix A, while Appendix B contains information about various combinations performed, together with comparisons with results from other experiments.

The ATLAS experiment
The ATLAS experiment [19] at the LHC is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4π coverage in the solid angle.1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadronic calorimeters, and a muon spectrometer. The inner tracking detector covers the pseudorapidity range |η| < 2.5. It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. A hadronic (steel/scintillator-tile) calorimeter covers the central pseudorapidity range (|η| < 1.7). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to |η| = 4.9. The muon spectrometer surrounds the calorimeters and is based on three large air-core toroid superconducting magnets with eight coils each. Its bending power is 2.0 to 7.5 T m. It includes a system of precision tracking chambers and fast detectors for triggering.
A three-level trigger system was used to select events. The first-level trigger is implemented in hardware and used a subset of the detector information to reduce the accepted rate to at most 75 kHz. This is followed by two software-based trigger levels that together reduced the accepted event rate to 400 Hz on average depending on the data-taking conditions during 2012.

Data and simulation samples
The analysis is based on pp collision data recorded by the ATLAS detector in 2012 at a centre-of-mass energy of √ s = 8 TeV. The integrated luminosity is 20.2 fb −1 with an uncertainty of 1.9% [20]. The modelling of top quark pair (tt) and single-top-quark signal events, as well as most background processes, relies on MC simulations. For the simulation of tt and single-top-quark events, the P -B v1 [21-1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upwards. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis.
The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Angular distance is measured in units of ∆R ≡ (∆η) 2 + (∆φ) 2 . 23] program was used. Within this framework, the simulations of the tt [24] and single-top-quark production in the sand t-channels [25] and the Wt-channel [26] used matrix elements at next-to-leading order (NLO) in the strong coupling constant α S with the NLO CT10 [27] parton distribution function (PDF) set and the h damp parameter2 set to infinity. Using m top and the top quark transverse momentum p T for the underlying leading-order Feynman diagram, the dynamic factorization and renormalization scales were set to m 2 top + p 2 T . The P (v6.425) program [28] with the P2011C [29] set of tuned parameters (tune) and the corresponding CTEQ6L1 PDFs [30] provided the parton shower, hadronization and underlying-event modelling.
For m top hypothesis testing, the tt and single-top-quark event samples were generated with five different assumed values of m top in the range from 167.5 to 177.5 GeV in steps of 2.5 GeV. The integrated luminosity of the simulated tt sample with m top = 172.5 GeV is about 360 fb −1 . Each of these MC samples is normalized according to the best available cross-section calculations. For m top = 172.5 GeV, the tt cross-section is σ tt = 253 +13 −15 pb, calculated at next-to-next-to-leading order (NNLO) with next-tonext-to-leading logarithmic soft gluon terms [31][32][33][34][35] with the T ++ 2.0 program [36]. The PDF-and α S -induced uncertainties in this cross-section were calculated using the PDF4LHC prescription [37] with the MSTW2008 68% CL NNLO PDF [38,39], CT10 NNLO PDF [27,40] and NNPDF2.3 5f FFN PDF [41] and were added in quadrature with the uncertainties obtained from the variation of the factorization and renormalization scales by factors of 0.5 and 2.0. The cross-sections for single-top-quark production were calculated at NLO and are σ t = 87.8 +3.4 −1.9 pb [42], σ W t = 22.4 ± 1.5 pb [43] and σ s = 5.6 ± 0.2 pb [44] in the t-, the Wtand the s-channels, respectively.
The A (v2.13) program [45] interfaced to the P 6 program was used for the simulation of the production of W ± or Z bosons in association with jets. The CTEQ6L1 PDFs and the corresponding AUET2 tune [46] were used for the matrix element and parton shower settings. The W+jets and Z+jets events containing heavy-flavour (HF) quarks (W/Z bb+jets, W/Zcc+jets, and W c+jets) were generated separately using leading-order (LO) matrix elements with massive bottom and charm quarks. Double-counting of HF quarks in the matrix element and the parton shower evolution was avoided via a HF overlap-removal procedure that used the ∆R between the additional heavy quarks as the criterion. If the ∆R was smaller than 0.4, the parton shower prediction was taken, while for larger values, the matrix element prediction was used. The Z+jets sample is normalized to the inclusive NNLO calculation [47]. Due to the large uncertainties in the overall W+jets normalization and the flavour composition, both are estimated using data-driven techniques as described in Section 4.2. Diboson production processes (WW, W Z and Z Z) were simulated using the A program with CTEQ6L1 PDFs interfaced to the H (v6.520) [48] and J (v4.31) [49] programs. The samples are normalized to their predicted cross-sections at NLO [50].
The missing transverse momentum E miss T is the absolute value of the vector − → E T miss calculated from the negative vectorial sum of all transverse momenta. The vectorial sum takes into account all energy deposits in the calorimeters projected onto the transverse plane. The clusters are corrected using the calibrations that belong to the associated physics object. Muons are included in the calculation of the E miss T using their momentum reconstructed in the inner tracking detectors [68].

Background estimation
The contribution of events falsely reconstructed as tt → lepton + jets events due to the presence of objects misidentified as leptons (fake leptons) and NP leptons originating from HF decays, is estimated from data using the matrix-method [69]. The technique employed uses ηand p T -dependent efficiencies for NP/fake-leptons and prompt-leptons They are measured in a background-enhanced control region with low E miss T and from events with dilepton masses around the Z boson peak [70], respectively. For the W+jets background, the overall normalization is estimated from data. The estimate is based on the chargeasymmetry method [71], relying on the fact that at the LHC more W + than W − bosons are produced. In addition, a data-driven estimate of the W bb, W cc, W c and W+light-jet fractions is performed in events with exactly two jets and at least one b-tagged jet. Further details are given in Ref. [72]. The Z+jets and diboson background processes are normalized to their predicted cross-sections as described in Section 3.

Event preselection
Triggering of events is based solely on the presence of a single electron or muon, and no information from the hadronic final state is used. A logical OR of two triggers is used for each of the tt → electron + jets and tt → muon + jets channels. The triggers with the lower thresholds of 24 GeV for electrons or muons select isolated leptons. The triggers with the higher thresholds of 60 GeV for electrons and 36 GeV for muons do not include an isolation requirement. The further selection requirements closely follow those in Ref. [9] and are • Events are required to have at least one primary vertex with at least five associated tracks. Each track needs to have a minimum p T of 0.4 GeV. For events with more than one primary vertex, the one with the largest p 2 T is chosen as the vertex from the hard scattering. • The event must contain exactly one reconstructed charged lepton, with E T > 25 GeV for electrons and p T > 25 GeV for muons, that matches the charged lepton that fired the corresponding lepton trigger.  The resulting event sample is statistically independent of the ones used for the measurement of m top in the tt → dilepton and tt → all jets channels at √ s = 8 TeV [14,73]. The observed number of events in the data after this preselection and the expected numbers of signal and background events corresponding to the same integrated luminosity as the data are given in Table 1. For all predictions, the uncertainties are estimated as the sum in quadrature of the statistical uncertainty, the uncertainty in the integrated luminosity and all systematic uncertainties assigned to the measurement of m top listed in Section 8, except for the PDF and pile-up uncertainties, which are small. The normalization uncertainties listed below are included for the predictions shown in this section, but due to their small effect on the measured top quark mass they are not included in the final measurement.
For the signal, the 5.7% uncertainty in the tt cross-section introduced in Section 3 and a 6.0% uncertainty in the single-top-quark cross-section are used. The latter uncertainty is obtained from the cross-section uncertainties given in Section 3 and the fractions of the various single-top-quark production processes after the selection requirements. The background uncertainties contain uncertainties of 48% in the normalization of the diboson and Z+jets production processes. These uncertainties are calculated using Berends-Giele scaling [74]. Assuming a top quark mass of m top = 172.5 GeV, the predicted number of events is consistent within uncertainties with the number observed in the data. 7

Reconstruction of the three observables
As in Ref. [9], a full kinematic reconstruction of the event is done with a likelihood fit using the KLF package [75,76]. The KLF algorithm relates the measured kinematics of the reconstructed objects to the leading-order representation of the tt system decay using tt → νb lep q 1 q 2 b had . In this procedure, the measured jets correspond to the quark decay products of the W boson, q 1 and q 2 , and to the b-quarks, b lep and b had , produced in the semi-leptonic and hadronic top quark decays, respectively.
The event likelihood is the product of Breit-Wigner (BW) distributions for the W bosons and top quarks and transfer functions (TFs) for the energies of the reconstructed objects that are input to KLF . The W boson BW distributions use the world combined values of the W boson mass and decay width from Ref. [3]. A common mass parameter m reco top is used for the BW distributions describing the semileptonically and hadronically decaying top quarks and is fitted event-by-event. The top quark width varies with m reco top according to the SM prediction [3]. The TFs are derived from the P +P tt signal MC simulation sample at an input mass of m top = 172.5 GeV. They represent the experimental resolutions in terms of the probability that the observed energy at reconstruction level is produced by a given parton-level object for the leading-order decay topology and in the fit constrain the variations of the reconstructed objects.
The input objects to the event likelihood are the reconstructed charged lepton, the missing transverse momentum and up to six jets. These are the two b-tagged jets and the four untagged jets with the highest p T . The xand y-components of the missing transverse momentum are starting values for the neutrino transverse-momentum components, and its longitudinal component p ν,z is a free parameter in the kinematic likelihood fit. Its starting value is computed from the W → ν mass constraint. If there are no real solutions for p ν,z , a starting value of zero is used. If there are two real solutions, the one giving the largest likelihood value is taken.
Maximizing the event-by-event likelihood as a function of m reco top establishes the best assignment of reconstructed jets to partons from the tt → lepton + jets decay. The maximization is performed by testing all possibilities for assigning b-tagged jets to b-quark positions and untagged jets to light-quark positions. With the above settings of the reconstruction algorithm, compared with Ref. [9], a larger fraction of correct assignments of reconstructed jets to partons from the tt → lepton + jets decay is achieved. The performance of the reconstruction algorithm is discussed in Section 6.
The value of m reco top obtained from the kinematic likelihood fit is used as the observable primarily sensitive to the underlying m top . The invariant mass of the hadronically decaying W boson m reco W , which is sensitive to the JES, is calculated from the assigned jets of the chosen permutation. Finally, an observable called R reco bq , designed to be sensitive to the bJES, is computed as the scalar sum of the transverse momenta of the two b-tagged jets divided by the scalar sum of the transverse momenta of the two jets associated with the hadronic W boson decay: The values of m reco W and R reco bq are computed from the jet four-vectors as given by the jet reconstruction instead of using the values obtained in the kinematic likelihood fit. This ensures the maximum sensitivity to the jet calibration for light-jets and b-jets.
Some distributions of the observed event kinematics after the event preselection and for the best permutation are shown in Figure 1. Given the good description of the observed number of events by the prediction shown in Section 4.3 and that the measurement of m top is mostly sensitive to the shape of the distributions, the comparison of the data with the predictions is based solely on the distributions normalized to the number of events observed in data. The systematic uncertainty assigned to each bin is calculated from the sum in quadrature of all systematic uncertainties discussed in Section 4.3. Within uncertainties, the predictions agree with the observed distributions in Figure 1, which shows the transverse momentum of the lepton, the average transverse momentum of the jets, the transverse momentum of the hadronically decaying top quark p T,had , the transverse momentum of the tt system, the logarithm of the event likelihood of the best permutation and the distance ∆R of the two untagged jets q 1 and q 2 assigned to the hadronically decaying W boson. The distributions of transverse momenta predicted by the simulation, e.g. the p T,had distribution shown in Figure 1(c), show a slightly different trend than observed in data, with the data being softer. This difference is fully covered by the uncertainties. This trend was also observed in Ref. [14] for the p T, b distribution in the tt → dilepton channel and in the measurement of the differential tt cross-section in the lepton+jets channel [77].
In anticipation of the template parameterization described in Section 7, the following restrictions on the three observables are applied: 125 ≤ m reco top ≤ 200 GeV, 55 ≤ m reco W ≤ 110 GeV, and 0.3 ≤ R reco bq ≤ 3. Since in this analysis only the best permutation is considered, events that do not pass these requirements are rejected. This removes events in the tails of the three distributions, which are typically poorly reconstructed with small likelihood values and do not contain significant information about m top . The resulting templates have simpler shapes, which are easier to model analytically with fewer parameters. The preselection with these additional requirements is referred to as the standard selection to distinguish it from the boosted decision tree (BDT) optimization for the smallest total uncertainty in m top , discussed in the next section.

Multivariate analysis and BDT event selection
For the measurement of m top , the standard event selection is refined by assuming that events with correct assignments of reconstruction-level objects to their generator-level counterparts are better measured and should therefore lead to smaller uncertainties. The optimization of the selection is based on the multivariate BDT algorithm implemented in the TMVA package [78]. The reconstruction-level objects are matched to the closest parton-level object within a ∆R of 0.1 for electrons and muons and 0.3 for jets. A matched object is defined as a reconstruction-level object that falls within the relevant ∆R of any parton-level object of that type, and a correct match means that this generator-level object is the one it originated from. Due to acceptance losses and reconstruction inefficiencies, not all reconstruction-level objects can successfully be matched to their parton-level counterparts. If any object cannot be unambiguously matched, the corresponding event is referred to as unmatched. The efficiency for correctly matched events cm is the fraction of correctly matched events among all the matched events, and the selection purity π cm is the fraction of correctly matched events among all selected events, regardless of whether they could be matched or not.
The BDT algorithm is exploited to enrich the event sample in events that have correct jet-to-parton matching by reducing the remainder, i.e. the sum of incorrectly matched and unmatched events. The BDT algorithm is trained on the simulated tt signal sample with m top = 172.5 GeV. However, the variables are chosen such that the BDT output can be calculated for any event. Many variables were studied and only those with a separation4 larger than 0.1% are used in the training. The 13 variables chosen for the final training are given in Table 2. For all input variables to the BDT algorithm, good agreement between the MC predictions and the data is found, as shown in   Half the simulation sample is used to train the algorithm and the other half to assess its performance. The significant difference between the distributions of the output value r BDT of the BDT classifier between the two classes of events in Figure 2(c) shows their efficient separation by the BDT algorithm. In addition, reasonable agreement is found for the r BDT distributions in the statistically independent test and training  samples. The r BDT distributions in simulation and data in Figure 2(d) agree within the experimental uncertainties. The above findings justify the application of the BDT approach to the data.
The full m top analysis detailed in Section 8 is performed, except for the evaluation of the small method and pile-up uncertainties, for several minimum requirements on r BDT in the range of [−0.10, 0.05] in steps of 0.05 to find the point with smallest total uncertainty. The total uncertainty in m top together with the various classes of uncertainty sources as a function of r BDT evaluated in the BDT optimization are shown in Figure 3. The minimum requirement r BDT = −0.05 provides the smallest total uncertainty in m top . The resulting numbers of events for this BDT selection are given in Table 1. Compared with the preselection, cm is increased from 0.71 to 0.82, albeit at the expense of a significant reduction in the number of selected events. The purity π cm is increased from 0.33 to 0.41. In addition, the intrinsic resolution in m top of the remaining event sample is improved, i.e. the statistical uncertainty in m top in Figure 3 is almost constant as a function of r BDT ; in particular, it does not scale with the square root of the number of events retained. For the signal sample with m top = 172.5 GeV, the template fit functions for the standard selection and the BDT selection, together with their ratios, are shown in Figure 12 in Appendix A.
Some distributions of the observed event kinematics after the BDT selection are shown in Figure 4. Good agreement between the MC predictions and the data is found, as seen for the preselection in Figure 1

Template fit
This analysis uses a three-dimensional template fit technique which determines m top together with the jet energy scale factors JSF and bJSF. The aim of the multi-dimensional fit to the data is to measure m top and, at the same time, to absorb the mean differences between the jet energy scales observed in data and MC simulated events into jet energy scale factors. By using JSF and bJSF, most of the uncertainties in m top induced by JES and bJES uncertainties are transformed into additional statistical components caused by the higher dimensionality of the fit. This method reduces the total uncertainty in m top only for sufficiently large data samples. In this case, the sum in quadrature of the additional statistical uncertainty in m top due to the JSF (or bJSF) fit and the residual JES-induced (or bJES induced) systematic uncertainty is smaller than the original JES-induced (or bJES-induced) uncertainty in m top . This situation was already realized for the √ s = 7 TeV data analysis [9] and is even more advantageous for the much larger data sample of the √ s = 8 TeV data analysis. Since JSF and bJSF are global factors, they do not completely absorb the JES and bJES uncertainties which have p T -and η-dependent components. Independent signal templates are derived for the three observables for all m top -dependent samples, consisting of the tt signal events and single-top-quark production events. This procedure is adopted because single-top-quark production carries information about the top quark mass, and in this way, m topindependent background templates can be used. The signal templates are simultaneously fitted to the sum of a Gaussian and two Landau functions for m reco top , to the sum of two Gaussian functions for m reco W and to the sum of two Gaussian and one Landau function for R reco bq . For the background, the m reco top distribution is fitted to a Landau function, while both the m reco W and the R reco bq distributions are fitted to the sum of two Gaussian functions.
In parameters of the fitting functions are identical to those for the signal, except that they do not depend on m top and that those for R reco bq do not depend on JSF.
Signal and background probability density functions P sig top and P bkg top for the m reco top , m reco W and R reco bq distributions are used in an unbinned likelihood fit to the data for all events, i = 1, . . . N. The likelihood function maximized is with where the fraction of background events is denoted by f bkg . The parameters determined by the fit are m top , JSF and bJSF, while f bkg is fixed to its expectation shown in Table 1 Pseudo-experiments are used to verify the internal consistency of the fitting procedure and to obtain the expected statistical uncertainty for the data. For each set of parameter values, 500 pseudo-experiments are performed, each corresponding to the integrated luminosity of the data. To retain the correlation of the three observables for the three-dimensional fit, individual events are used. Because this exceeds the number of available MC events, results are corrected for oversampling [79]. The results of pseudoexperiments for different input values of m top are obtained from statistically independent samples, while the results for different JSF and bJSF are obtained from statistically correlated samples as explained above. For each fitted quantity and each variation of input parameters, the residual, i.e. the difference between the input value and the value obtained by the fit, is compatible with zero. The three expected statistical uncertainties are where the values quoted are the mean and RMS of the distribution of the statistical uncertainties in the fitted quantities from pseudo-experiments. The widths of the pull distributions are below unity for m top and the two jet scale factors, which results in an overestimation of the uncertainty in m top of up to 7%. Since this leads to a conservative estimate of the uncertainty in m top , no attempts to mitigate this feature are made. Table 3: Systematic uncertainties in m top . The measured values of m top are given together with the statistical and systematic uncertainties in GeV for the standard and the BDT event selections. For comparison, the result in the tt → lepton + jets channel at √ s = 7 TeV from Ref. [9] is also listed. For each systematic uncertainty listed, the first value corresponds to the uncertainty in m top , and the second to the statistical precision in this uncertainty. An integer value of zero means that the corresponding uncertainty is negligible and therefore not evaluated. Statistical uncertainties quoted as 0.00 are smaller than 0.005. The statistical uncertainty in the total systematic uncertainty is calculated from uncertainty propagation. The last line refers to the sum in quadrature of the statistical and systematic uncertainties. 0.25 ± 0.00 0.08 ± 0.00 0.09 ± 0.00 Background normalization 0.10 ± 0.00 0.04 ± 0.00 0.08 ± 0.00 W+jets shape 0.29 ± 0.00 0.05 ± 0.00 0.11 ± 0.00

Uncertainties affecting the m top determination
This section focuses on the treatment of uncertainty sources of a systematic nature. The same systematic uncertainty sources as in Ref. [9] are investigated. If possible, the corresponding uncertainty in m top is evaluated by varying the respective quantities by ±1σ from their default values, constructing the corresponding event sample and measuring the average m top change relative to the result from the nominal MC sample with 500 pseudo-experiments each, drawn from the full MC sample. In the absence of a ±1σ variation, e.g. for the evaluation of the uncertainty induced by the choice of signal MC generator, the full observed difference is assigned as a symmetric systematic uncertainty and further treated as a variation equivalent to a ±1σ variation. Wherever a ±1σ variation can be performed, half the observed difference between the +1σ and −1σ variation in m top is assigned as an uncertainty if the m top values obtained from the variations lie on opposite sides of the nominal result. If they lie on the same side, the maximum observed difference is taken as a symmetric systematic uncertainty. Since the systematic uncertainties are derived from simulation or data samples with limited numbers of events, all systematic uncertainties have a corresponding statistical uncertainty, which is calculated taking into account the statistical correlation of the considered samples, as explained in Section 8.5. The statistical uncertainty in the total systematic uncertainty is dominated by the limited sizes of the simulation samples. The resulting systematic uncertainties are given in Table 3 independent of their statistical significance. Further information is given in Tables 8-12 in Appendix A. This approach follows the suggestion in Ref. [80] and relies on the fact that, given a large enough number of considered uncertainty sources, statistical fluctuations average out.5 The uncertainty sources are constructed to be uncorrelated with each another, and thus the total uncertainty is the sum in quadrature of uncertainties from all sources. The individual uncertainties are compared in Table 3 for three cases: the standard selection for the In general, the experimental uncertainties change only slightly, with the largest reduction observed for the JES uncertainty. In contrast, a large improvement comes from the reduced uncertainties in the modelling of the tt signal processes as shown in Table 3. This, together with the improved intrinsic resolution in m top , more than compensates for the small loss in precision caused by the increased statistical uncertainty. The individual sources of systematic uncertainties and the evaluation of their effect on m top are described in the following.

Statistics and method calibration
Uncertainties related to statistical effects and the method calibration are discussed here.

Statistical:
The quoted statistical uncertainty consists of three parts: a purely statistical component in m top and the contributions stemming from the simultaneous determination of JSF and bJSF. The purely statistical component in m top is obtained from a one-dimensional template method exploiting only the m reco top observable, while fixing the values of JSF and bJSF to the results of the three-dimensional analysis. The contribution to the statistical uncertainty in the fitted parameters due to the simultaneous fit of m top and JSF is estimated as the difference in quadrature between the statistical uncertainty in a two-dimensional fit to m reco top and m reco W while fixing the value of bJSF and the one-dimensional fit to the data described above. Analogously, the contribution of the statistical uncertainty due to the simultaneous fit of m top together with JSF and bJSF is defined as the difference in quadrature between the statistical uncertainties obtained in the three-dimensional and the two-dimensional fits to the data. This separation allows a comparison of the statistical sensitivities of the m top estimators used in analyses, independent of the number of observables exploited by the fit. In addition, the sensitivity of the estimators to the global jet energy scale factors can be compared directly. These uncertainties are treated as uncorrelated uncertainties in m top combinations. Together with the systematic uncertainty in the residual jet energy scale uncertainties discussed below, they directly replace the uncertainty in m top from the jet energy scale variations present without the in situ determination.

Method:
The residual difference between fitted and generated m top when analysing a template from a MC sample reflects the potential bias of the method. Consequently, the largest observed fitted m top residual and the largest observed statistical uncertainty in this quantity, in any of the five signal samples with different assumed values of m top , is assigned as the method calibration uncertainty and its corresponding statistical uncertainty, respectively. This also covers effects from limited numbers of simulated events in the templates and potential deficiencies in the template parameterizations.

Modelling of signal processes
The modelling of tt → lepton + jets events incorporates a number of processes that have to be accurately described, resulting in systematic effects, ranging from the tt production to the hadronization of the showered objects.
Thanks to the restrictive event-selection requirements, the contribution of non-tt processes, comprising the single-top-quark process and the various background processes, is very low. The systematic uncertainty in m top from the uncertainty in the single-top-quark normalization is estimated from the corresponding uncertainty in the theoretical cross-section given in Section 3. The resulting systematic uncertainty is small compared with the systematic uncertainty in the tt production that accounts for most of the signal events and is consequently neglected. For the modelling of the signal processes, the consequence of including single-top-quark variations in the uncertainty evaluation was investigated for various uncertainty sources and found to be negligible. Therefore, the single-top-quark variations are not included in the determination of the signal event uncertainties.
Signal Monte Carlo generator: The full observed difference in fitted m top between the event samples produced with the P -B and MC@NLO [81,82] programs is quoted as a systematic uncertainty. For the renormalization and factorization scales the P -B sample uses the function given in Section 3, while the MC@NLO sample uses µ R,F = m 2 top + 0.5(p 2 T,t + p 2 T,t ). Both samples are generated with a top quark mass of m top = 172.5 GeV with the CT10 PDFs in the matrix-element calculation and use the H and J programs with the ATLAS AUET2 tune [46].

Hadronization:
To cover the choice of parton shower and hadronization models, samples produced with the P -B program are showered with either the P 6 program using the P2011C tune or the H and J programs using the ATLAS AUET2 tune. This includes different approaches in shower modelling, such as using a p T -ordered parton showering in the P program or angularordered parton showering in the H program, the different parton shower matching scales, as well as fragmentation functions and hadronization models, such as choosing the Lund string model [83,84] implemented in the P program or the cluster fragmentation model [85] used in the H program. The full observed difference between the samples is quoted as a systematic uncertainty.
As shown in Figure 1, the distributions of transverse momenta in data are slightly softer than those in the P +P MC simulation samples. Similarly to what was observed in the tt → dilepton channel 20 for the p T, b distribution, in the tt → lepton + jets channel, the P +H sample is much closer to the data for several distributions of transverse momenta. The p T,had distribution is much better described by the P +H sample as was also observed in Ref. [77]. In addition, but to a lesser extent, the MC@NLO sample used to assess the signal Monte Carlo generator uncertainty and the samples to assess the initial-and final-state QCD radiation uncertainty discussed next also lead to a softer distribution in simulation. Given this, the observed difference in the p T,had distribution is covered by a combination of the signal-modelling uncertainties given in Table 3.
Despite the fact that the JES and bJES are estimated independently using dijet and other non-tt samples [62], some double-counting of hadronization-uncertainty-induced uncertainties in the JES and m top cannot be excluded. This was investigated closely for the ATLAS top quark mass measurement in the tt → lepton + jets channel at √ s = 7 TeV. The results in Ref.
[86] revealed that the amount of double-counting of JES and hadronization effects for the tt → lepton + jets channel is small.

Initial-and final-state QCD radiation (ISR/FSR):
ISR/FSR leads to a higher jet multiplicity and different jet energies than the hard process, which affects the distributions of the three observables. The uncertainties due to ISR/FSR modelling are estimated with samples generated with the P -B program interfaced to the P 6 program for which the parameters of the generation are varied to span the ranges compatible with the results of measurements of tt production in association with jets [87-89]. This uncertainty is evaluated by comparing two dedicated samples that differ in several parameters, namely the QCD scale Λ QCD , the transverse momentum scale for space-like parton-shower evolution Q 2 max , the h damp parameter [90] and the used P2012 R L and R H tunes [29]. In Ref.
[89], it was shown that a number of final-state distributions are better accounted for by these P +P samples with h damp = m top . Therefore, these samples are used for evaluating this uncertainty, taking half the observed difference between the up variation and the down variation sample. Because the parameterizations for the template fit to data are obtained from P +P samples using h damp = ∞, it was verified that, considering the method uncertainty quoted in Table 3, applying the functions to the h damp = m top samples leads to a result compatible with the input top quark mass.

Underlying event:
To reduce statistical fluctuations in the evaluation of this systematic uncertainty, the difference in underlying-event modelling is assessed by comparing a pair of P -B samples based on the same partonic events generated with the CT10 PDFs. A sample with the P2012 tune is compared with a sample with the P2012 H tune [29], with both tunes using the same CTEQ6L1 PDFs [91] for parton showering and hadronization. The Perugia 2012 H tune provides more semi-hard multiple parton interactions and is used for this comparison with identical colour reconnection parameters in both tunes. The full observed difference is assigned as a systematic uncertainty.
Colour reconnection: This systematic uncertainty is estimated using a pair of samples with the same partonic events as for the underlying-event uncertainty evaluation but with the P2012 tune and the P2012 CR tune [29] for parton showering and hadronization. The full observed difference is assigned as a systematic uncertainty.

Parton distribution function (PDF):
The PDF systematic uncertainty is the sum in quadrature of three contributions. These are the sum in quadrature of the differences in m top for the 26 eigenvector variations of the CT10 PDF and two differences in m top obtained from reweighting the central CT10 PDF set to the MSTW2008 PDF [38]

Modelling of background processes
Uncertainties in the modelling of the background processes are taken into account by variations of the corresponding normalizations and shapes of the distributions.

Background normalization:
The normalizations are varied for the data-driven background estimates according to their uncertainties. For the negligible contribution from diboson production, no normalization uncertainty is evaluated.
Background shape: For the W+jets background, the shape uncertainty is evaluated from the variation of the heavy-flavour fractions. The corresponding uncertainty is small. Given the very small contribution from Z+jets, diboson and NP/fake-lepton backgrounds, no shape uncertainty is evaluated for these background sources.

Detector modelling
The level of understanding of the detector response and of the particle interactions therein is reflected in numerous systematic uncertainties.

Jet energy scale (JES):
The JES is measured with a relative precision of about 1% to 4%, typically falling with increasing jet p T and rising with increasing jet |η| [92, 93]. The total JES uncertainty consists of more than 60 subcomponents originating from the various steps in the jet calibration. The number of these nuisance parameters is reduced with a matrix diagonalization of the full JES covariance matrix including all nuisance parameters for a given category of the JES uncertainty components.
The analyses of √ s = 7 TeV and √ s = 8 TeV data make use of the EM+JES and LCW+GSC [92] jet calibrations, respectively. The two calibrations feature different sets of nuisance parameters, and the LCW+GSC calibration generally has smaller uncertainties than the EM+JES calibration. While the pile-up correction for the jet calibration for √ s = 7 TeV data only depends on the number of primary vertices (n vtx ) and the average number of interactions per bunch crossing ( µ ), a pile-up subtraction method based on jet area is introduced for the √ s = 8 TeV data. Terms to account for uncertainties in the pile-up estimation are added. They depend on the jet p T and the local transverse momentum density. In addition, the punch-through uncertainty, i.e. an uncertainty for jets that penetrate through to the muon spectrometer, is added. The final reduced number of nuisance parameters for the √ s = 8 TeV analysis is 25. The JES-uncertainty-induced uncertainty in m top is the dominant systematic uncertainty for all results shown in Table 3. When only a one-dimensional fit to m reco top or a two-dimensional fit to m reco top and m reco W is done, this uncertainty is 0.99 GeV or 0.74 GeV, respectively.

Relative b-to-light-jet energy scale (bJES):
The bJES uncertainty is an additional uncertainty for the remaining differences between b-jets and light-jets after the global JES is applied, and therefore the corresponding uncertainty is uncorrelated with the JES uncertainty. An additional uncertainty of 0.2% to 1.2% is assigned to b-jets, with the lowest uncertainty for b-jets with high transverse momenta [62]. Due to the determination of bJSF, the bJES uncertainty leads to a very small contribution to the uncertainty in m top in Table 3

Leptons:
The lepton uncertainties are related to the electron energy or muon momentum scale and resolution, as well as trigger, isolation and identification efficiencies. These are measured very precisely in high-purity J/ψ → + − and Z → + − data [56,57,94]. For each component, the corresponding uncertainty is propagated to the analysis by variation of the respective quantity. The changes are propagated to the E miss T as well.

Missing transverse momentum:
The remaining contribution to the missing-transverse-momentum uncertainty stems from the uncertainties in calorimeter-cell energies associated with low-p T jets (7 GeV < p T < 20 GeV) without any corresponding reconstructed physics object or from pile-up interactions. They are accounted for as described in Ref. [68]. The corresponding uncertainty in m top is small. . The corresponding uncertainty is somewhat larger than for √ s = 7 TeV data but still small.

Statistical precision of systematic uncertainties
The systematic uncertainties quoted in Table 3  Others, which do not share the same generated events or exhibit other significant differences, have a lower correlation, and the corresponding statistical uncertainty is higher, such as in the case of the signal-modelling uncertainty. The statistical uncertainty in the total systematic uncertainty is calculated from the individual statistical uncertainties by the propagation of uncertainties.

Results
For the BDT selection, the likelihood fit to the data results in The left matrix corresponds to the correlations for statistical uncertainties only, while the right matrix is obtained by additionally taking into account all systematic uncertainties.  Table 3. The total uncertainty in all three fitted parameters is dominated by their systematic uncertainty. Therefore, the band shown is much wider than the band that would be obtained by fitting to the distributions with statistical uncertainties only.
The measured value of m top in the tt → lepton + jets channel at This result corresponds to a 19% improvement on the result obtained using the standard selection on the same data. Compared with the result in the tt → lepton + jets channel at √ s = 7 TeV, the improvement is 29%. On top of the smaller statistical uncertainty, the increased precision is mainly driven by smaller theory modelling uncertainties achieved by the BDT selection. The larger number of events in the √ s = 8 TeV dataset is effectively traded for lower systematic uncertainties, resulting in a significant gain in total precision. The new ATLAS result in the tt → lepton + jets channel is more precise than the result from the CDF experiment, but less precise than the CMS and D0 results, measured in the same channel, as shown in Figure 14

Combination with previous ATLAS results of m top
This section presents the combination of the six m top results of the ATLAS analyses in the tt → dilepton, tt → lepton + jets and tt → all jets channels at centre-of-mass energies of √ s = 7 and 8 TeV. The treatment of the results that are input to the combinations are described, followed by a detailed explanation of the evaluation of the estimator correlations for the various sources of systematic uncertainty. The compatibilities of the measured m top values are investigated using a pairwise χ 2 for all pairs of measurements and by evaluating the compatibility of selected combinations. Finally, the six results are combined, displaying the effect of individual results on the combined result.

Inputs to the combination and categorization of uncertainties
The measured values of the individual analyses and their statistical and systematic uncertainties are given in Table 4. For each result, the evaluated systematic uncertainties are shown together with their statistical uncertainties. These statistical uncertainties are propagated to the statistical uncertainties in the total systematic uncertainties and the total uncertainties.6 For the combinations to follow, the combined uncertainties for the previous results, namely tt → dilepton and tt → lepton + jets at √ s = 7 TeV from Ref. [9], tt → all jets at √ s = 7 TeV from Ref. [95], tt → dilepton at √ s = 8 TeV from Ref. [14] and tt → all jets at √ s = 8 TeV from Ref.
[73], were all re-evaluated. In all cases, the numbers agree to within 0.01 GeV with the original publications, which in any case is the rounding precision caused by the precision of some of the inputs. On top of this, the results listed in Table 4 differ in some aspects from the original publications as explained below.
The combination follows the approach developed for the combination of √ s = 7 TeV analyses in Ref. [9], including the evaluation of the correlations given in Section 10.2 below. The treatment of uncertainty categories for the tt → dilepton and tt → lepton + jets measurements at √ s = 7 TeV exactly follows Ref. [9]. The uncertainty categorizations for the tt → all jets measurements at √ s = 7 and 8 TeV from Refs. [95] and [73] closely follow this categorization but have some extra, analysis-specific sources of uncertainty, as shown in Table 4. In addition, the tt → all jets result at √ s = 8 TeV from Ref.
[73] is based on a different treatment of the PDF-uncertainty-induced uncertainty in m top . To allow the evaluation of the estimator correlations also for this uncertainty in m top , for this combination, the respective uncertainty is newly evaluated according to the prescription given in Section 8.
For the tt → all jets result at √ s = 7 TeV the statistical precision in the systematic uncertainties were not evaluated in Ref. [9] but were calculated for this combination. For the tt → all jets result at √ s = 8 TeV in Ref.
[73], for some of the sources, the statistical uncertainty in the systematic uncertainty was not evaluated, such that the quoted statistical uncertainty in the total systematic uncertainty is a lower limit.
For the mapping of uncertainty categories for data taken at different centre-of-mass energies, the choice of Ref. [14] is employed. The most complex cases are the uncertainties involving eigenvector decompositions, such as the JES and b-tagging scale factor uncertainties, and the uncertainty categories that were added or removed. The JES-uncertainty-induced uncertainty in m top is obtained from a number of JES subcomponents. Some JES subcomponents have an equivalent at the other centre-of-mass energy and    [14], the JES subcomponents without an equivalent at the other centre-of-mass energy are treated as independent, resulting in vanishing estimator correlations for that part of the covariance matrix. For the remaining subcomponents, the estimator correlations are partly positive and partly negative. As an example, for the JES flavour uncertainties, which dominate the JES-uncertainty-induced uncertainty in m top , the two most precise results, the tt → dilepton and tt → lepton + jets measurements at √ s = 8 TeV, are negatively correlated. Consequently, for this pair, the resulting estimator correlation for the total JES-induced uncertainty in m top is also negative. At the quoted precision, the two assumptions about the equivalence of the JES subcomponents between the datasets at the two centre-of-mass energies, i.e. the weak and strong correlation scenarios described in Table 10 in Appendix A, leave the combined value and uncertainty unchanged.
Following Ref. [14], the √ s = 7 and 8 TeV measurements are treated as uncorrelated for the nuisance parameters of the b-tagging, c/τ-tagging, mistagging and JER uncertainties. In Ref. [14] it was shown that a correlated treatment of the flavour-tagging nuisance parameters results in an insignificant change in the combination. For the statistical, method calibration, MC-based background shape at √ s = 7 and 8 TeV, and the pile-up uncertainties in m top , the measurements are assumed to be uncorrelated. Details of the evaluation of the correlations for all remaining systematic uncertainties are discussed below.

Mathematical framework and evaluation of estimator correlations
All combinations are performed using the best linear unbiased estimate (BLUE) method [96,97] in a C++ implementation described in Ref. [98]. The BLUE method uses a linear combination of the inputs to combine measurements. The coefficients (BLUE weights) are determined via the minimization of the variance of the combined result. They can be used to construct measures for the importance of a given single measurement in the combination [97]. For any combination, the measured values x i , the list of uncertainties σ ik and the correlations ρ i jk of the estimators (i, j) for each source of uncertainty (k) have to be provided. For all uncertainties, a Gaussian probability distribution function is assumed. For the uncertainties in m top for which the measurements are correlated, when using ±1σ variations of a systematic effect, e.g. when changing the bJES by ±1σ, there are two possibilities. When simultaneously applying a variation for a systematic uncertainty, e.g. +1σ for the bJES, to a pair (i, j) of measurements, e.g. the tt → lepton + jets and tt → dilepton measurements at √ s = 8 TeV, both analyses can result in a larger or smaller m top value than the one obtained for the nominal case (full correlation, ρ i jk = +1), or one analysis can result in a larger and the other in a smaller value (full anti-correlation, ρ i jk = −1). Consequently, an uncertainty from a source only consisting of a single variation, such as the bJESuncertainty-induced uncertainty or the uncertainty related to the choice of MC generator for signal events, results in a correlation of ρ i jk = ±1. The estimator correlations for composite uncertainties are evaluated by calculating the correlation from the subcomponents. As an example, for the tt → lepton + jets result at √ s = 8 TeV, the subcomponents of the JES uncertainty are shown in Table 10 in Appendix A. For any pair of measurements (i, j), this evaluation is done by adding the covariance terms of the subcomponents k with ρ i jk = ±1 and dividing by the total uncertainties for that source. The resulting estimator correlation is The quantity σ 2 i = N comp k=1 σ 2 ik is the sum of the single subcomponent variances in analysis i. This procedure is applied to all uncertainty sources that consist of more than one subcomponent to reduce the large list of uncertainty subcomponents per estimator of O(100) to a suitable number of uncertainty sources, i.e. to those given in Table 4. Since the full covariance matrix is independent of how the subsets are chosen, this does not affect the combination. For many significant sources of uncertainty in Figure 7(a), the tt → lepton + jets and tt → dilepton measurements are anti-correlated. As shown in Ref. [9], this is caused by the in situ determination of the JSF and bJSF in the three-dimensional tt → lepton + jets analysis. In contrast, for most sources of uncertainty, a positive estimator correlation is observed for the tt → dilepton and tt → all jets measurements at √ s = 8 TeV, shown in Figure 7(b). The prominent exception is the hadronizationuncertainty-induced uncertainty in m top , i.e. the single largest uncertainty in the tt → all jets measurement at √ s = 8 TeV, for which the two measurements are anti-correlated. On the contrary, the tt → lepton + jets and tt → all jets measurements at √ s = 8 TeV, shown in Figure 7(c), are positively correlated for this uncertainty. Finally, the tt → lepton + jets measurements at the two centre-of-mass energies in Figure 7(d) show a rather low correlation. The correlations per source of uncertainty and the total estimator correlations are summarized in Table 5.
The improvement in the combination obtained by the use of evaluated correlations compared with using estimator correlations assigned solely by physics assessments (here referred to as assigned correlations) is quantified using an example. Using the choices of assigned correlations from Ref. [11] for the ATLAS results in the tt → dilepton and tt → lepton + jets channels at √ s = 7 TeV listed in Table 4 gives a combined value of m top = 172.91 ± 0.50 (stat) ± 1.05 (syst) GeV compared with m top = 172.99 ± 0.48 (stat) ± 0.78 (syst) GeV. The significant improvement in the precision of the combination demonstrates the particular importance of evaluating the correlations.
For the combinations presented in this paper, most estimator correlations could be evaluated. The most prominent exception is for the b-tagging uncertainty, where the tt → all jets measurement at √ s = 8 TeV is based on a different b-tagging algorithm and calibration than the tt → dilepton and tt → lepton + jets measurements at √ s = 8 TeV. It was verified that assignments of the estimator correlations of ρ i5k ∈ [−1, 1], with i = 3, 4 and k = 17, yield insignificant differences in the full combination. Estimator correlations of ρ i5 = 1 are assigned for this case, as this choice gives the largest uncertainty in the combination. A similar situation arises for the data-driven all-jets background uncertainty in the two tt → all jets measurements, where the method used for the background estimate is similar but not identical for the two measurements. Consequently, the conservative ad hoc assignment of ρ 25k = 1 was also made for this source k = 11.

Compatibility of the inputs and selected combinations
Before any combination is performed, the compatibility of the input results is verified. For each pair of results, their compatibility is expressed by the ratio of the squared difference between the pair of measured values and the uncertainty in this difference [97] as (2) 7 In the course of including more results into the combination of Ref. [14], the definitions of the variations were homogenized while leaving the estimator correlations unchanged. As a consequence, for the corresponding figures some of the points now are located in the respective other quadrant, e.g. for the tt → dilepton result at √ s = 8 TeV.    The corresponding values are given in Table 5. Analysing the χ 2 i j values reveals good χ 2 probabilities, with the smallest χ 2 probability being P( χ 2 , 1) = 15%. The largest sum of χ 2 i j values by far is observed for the tt → all jets result at √ s = 7 TeV.
The dependences of the combined values and their uncertainties on the total correlation for pairwise combinations of results are analysed. The dependences for pairs of the three results from √ s = 8 TeV data are shown in Figure 8. The largest information gain is achieved by combining the tt → dilepton and tt → lepton + jets results at √ s = 8 TeV, shown in Figures 8(a) and 8(b), which are anti-correlated, i.e. ρ = −0.19.
Based on Tables 4 and 5, selected combinations are analysed, yielding the results given in Table 6 and shown in Figure 9. The BLUE weights and the pulls8 of the results are given in Table 7.
To investigate the difference in precision of combined results obtained from , by construction the sum of weights of the results in the corresponding decay channel equals unity, while for each of the other decay channels the sum of weights of the results equals zero [99]. The combination yields compatible results for the three masses listed in Table 6. Given that no dependence of m top on the centre-of-mass energy or the tt decay channel is expected, the above examples of combinations are merely additional investigations of the compatibility of the input results. The compatibility combinations are summarized in Figure 9 and listed in Table 6. For all combinations, the values quoted in Figure 9 are the combined value, the statistical uncertainty, the systematic uncertainty, the total uncertainty and the statistical uncertainty in the total uncertainty.    Table 7: The BLUE weights and the pulls of the results for the combinations reported in Table 6. The upper part refers to the independent combinations of the three results per centre-of-mass energy resulting in uncorrelated results m 7TeV top and m 8TeV top . The middle part is for the combination of the three observables from pairs of results per tt decay channel, resulting in correlated results m

The combined result of m top
The use of the statistical uncertainties in the systematic uncertainties has two main advantages. Firstly, it allows a proper determination of the uncertainties in the evaluation of the total correlations of the estimators, avoiding the need to perform ad hoc variations. Secondly, it enables the monitoring of the evolution of the combined result in relation to the precision in its uncertainty while including results, thereby evaluating their influence on the combination. The significance of the individual results in the combination is shown in Figure 10. The individual results are shown in Figure 10(a). Their combination is displayed in Figure 10 The inclusion of the tt → lepton + jets result at √ s = 8 TeV leads to the result quoted in the second line, which improves the combined uncertainty by much more than the statistical precision in the uncertainty of the most precise result. The same is found when adding the tt → lepton + jets result at √ s = 7 TeV and comparing with the statistical uncertainty in the previous combination, albeit at a much reduced significance. The corresponding result obtained from these three results, denoted by m (3) top , is also listed in Table 6 Table 6. The values quoted are the combined value, the statistical uncertainty, the systematic uncertainty, the total uncertainty and the uncertainty in the total uncertainty. The results are compared with the new ATLAS combination listed in the last line and shown as the grey vertical bands.
The improvement in the combination by applying the BDT selection to the tt → lepton + jets analysis at √ s = 8 TeV is sizeable. This is seen from repeating the combination of m (3) top but using the result from the standard selection from Table 3. With this, the correlation of the √ s = 8 TeV tt → lepton + jets result with the √ s = 8 TeV tt → dilepton result changes from −0.19 to −0.02. The resulting uncertainty in the combination is 0.59 ± 0.05 GeV, i.e. the combination is 18% less precise than m (3) top obtained using the result from the BDT selection. Adding the remaining results reduces the quoted combined uncertainty by 0.02 GeV, which is smaller than the statistical precision in the uncertainty of the previously achieved result of m (3) top . The changes in statistical uncertainties in the combined value and its uncertainty due to variations of the input systematic uncertainties within their uncertainties are evaluated for two cases, namely for m (3) top and for the combination of all results. Following Ref. [14], the distributions of the combined values and uncertainties are obtained from 500 combinations. For each combination, the sizes of the uncertainties as well as the correlations are newly evaluated. Due to the re-evaluation of the correlation, the resulting distributions are not Gaussian and are also not exactly centred around the combined value and the combined uncertainty. For m (3) top , the root mean square of the distribution of the combined value is 0.03 GeV, and  This means that the uncertainty in this combined result is only known to this precision, which, given its size, is fully adequate.
The χ 2 probability of m (3) top is 78%. Driven by the larger pulls of the remaining three results listed in Table 7, the χ 2 probability of 64% for the new ATLAS combination of m top is lower but still good. The new ATLAS combined result of m top provides a 44% improvement relative to the most precise single input result, which is the tt → dilepton analysis at √ s = 8 TeV. With a relative precision of 0.28%, it improves on the previous combination in Ref. [14] by 31% and supersedes it. As shown in Appendix B, the new ATLAS combined result of m top is more precise than the results from the CDF and D0 experiments, and has a precision similar to the CMS combined result.
In Figure 11, the 68% and 95% confidence-level contours of the indirect determination of m W and m top from the global electroweak fit in Ref. [2] are compared with the corresponding confidence-level contours of the direct ATLAS measurements of the two masses. The top quark mass used in this figure was obtained above, while the W boson mass is taken from Ref.

Conclusion
The top quark mass is measured via a three-dimensional template method in the tt → lepton + jets channel and combined with previous ATLAS m top measurements at the LHC.
For the tt → lepton + jets analysis from √ s = 8 TeV proton-proton collision data with an integrated luminosity of about 20.2 fb −1 , the event selection of the corresponding √ s = 7 TeV analysis is refined. An optimization employing a BDT selection to efficiently suppress less-well-reconstructed events results in a significant reduction in total uncertainty, driven by a significant decrease in theory-modelling-induced uncertainties. With this approach, the measured value of m top is m top = 172.08 ± 0.39 (stat) ± 0.82 (syst) GeV with a total uncertainty of 0.91 ± 0.06 GeV, where the quoted uncertainty in the total uncertainty is statistical. The precision is limited by systematic uncertainties, mostly by uncertainties in the calibration of the jet energy scale, b-tagging and the Monte Carlo modelling of signal events. This result is more precise than the result from the CDF experiment, but less precise than the CMS and D0 results, measured in the same channel.

Appendices A Results from the BDT optimization and individual sources of systematic uncertainty
This appendix has additional details of the measurement of m top in the tt → lepton + jets channel from √ s = 8 TeV data discussed in the main text.
In Figure 12, the template fit functions of the three observables are compared for the standard and the BDT event selection. The distributions of m reco top and m reco W are narrower for the BDT event selection, which means the resolution in the two masses is improved compared with what is observed for the standard selection. The R reco bq distribution is slightly shifted to lower values for the BDT event selection, but the difference is small.
For the BDT selection, a number of systematic uncertainties listed in Table 3 are calculated by performing pseudo-experiments for more than one systematic variation. The individual components are given in Tables 8-12   +0.054 −0.039 +0.047 ± 0.009 Table 9: The individual components of the PDF uncertainty considered for the tt → lepton + jets analysis at √ s = 8 TeV, the resulting PDF-uncertainty-induced shifts in m top and the final uncertainty in m top . The components [30,38,41] together with their statistical precisions are listed in boldface. The total uncertainty in the CT10 variations is calculated as the sum in quadrature of the CT10 subcomponents. The total uncertainty is given with 0.01 GeV precision. Uncertainties quoted as 0.00 (0.000) are smaller than 0.005 (0.0005). The term nuisance parameter is denoted by NuP. The last line refers to the sum in quadrature of the PDF subcomponents.  together with their statistical precisions are listed in boldface and, wherever applicable, calculated as the sum in quadrature of the respective subcomponents. A shift listed as '0' means that the corresponding variation resulted in an unchanged event sample. Uncertainties quoted as 0.00 (0.000) are smaller than 0.005 (0.0005). In the rightmost column, the mapping to the uncertainty components used for √ s = 7 TeV data is given for the weak and the strong correlation scenarios. The '+' sign indicates corresponding components at the two centre-of-mass energies for the weak and strong scenario, while the '(+)' sign indicates components that only correspond for the strong scenario. Finally, mentioning a name indicates that the mapped sources carry different names at √ s = 7 and 8 TeV. The uncertainty components and the total uncertainty are given with 0.01 GeV precision. The term nuisance parameter is denoted by NuP.

B Additional information about the various combinations
This appendix gives additional information about the various combinations discussed in the main text. For all combinations the values quoted are the combined value, the statistical uncertainty, the systematic uncertainty, the total uncertainty and the uncertainty in the total uncertainty, which is statistical. At both centre-of-mass energies, the difference between the combined uncertainties of the partial and full combination is much smaller than the respective statistical precision in the total systematic uncertainties. This statistical precision is obtained from varying each systematic uncertainty within its statistical precision and repeating the combination, as explained in the main text. In Figure 14,    The values quoted are the combined value, the statistical uncertainty, the systematic uncertainty, the total uncertainty and for ATLAS results also the uncertainty in the total uncertainty, which is statistical. For CDF, the separation into statistical and systematic uncertainties is different for the result in the tt → lepton + jets channel and the combination. For the former, the statistical component caused by the in situ determination of the jet scale factor is included in the statistical uncertainty, while for the latter, this uncertainty is part of the systematic uncertainty.         [86] ATLAS Collaboration, Impact of fragmentation modelling on the jet energy and the top-quark mass measurement using the ATLAS detector, ATL-PHYS-PUB-2015-042, 2015, : https://cds.cern.ch/record/2054420.      [102] ATLAS Collaboration, ATLAS Computing Acknowledgements, ATL-GEN-PUB-2016-002, : https://cds.cern.ch/record/2202407.