In situ calibration of large-$R$ jet energy and mass in 13 TeV proton-proton collisions with the ATLAS detector

The response of the ATLAS detector to large-radius jets is measured in situ using 36.2 fb$^{-1}$ of $\sqrt{s} = 13$ TeV proton-proton collisions provided by the LHC and recorded by the ATLAS experiment during 2015 and 2016. The jet energy scale is measured in events where the jet recoils against a reference object, which can be either a calibrated photon, a reconstructed $Z$ boson, or a system of well-measured small-radius jets. The jet energy resolution and a calibration of forward jets are derived using dijet balance measurements. The jet mass response is measured with two methods: using mass peaks formed by $W$ bosons and top quarks with large transverse momenta and by comparing the jet mass measured using the energy deposited in the calorimeter with that using the momenta of charged-particle tracks. The transverse momentum and mass responses in simulations are found to be about 2-3% higher than in data. This difference is adjusted for with a correction factor. The results of the different methods are combined to yield a calibration over a large range of transverse momenta ($p_{\rm T}$). The precision of the relative jet energy scale is 1-2% for $200~{\rm GeV}<p_{\rm T}<2~{\rm TeV}$, while that of the mass scale is 2-10%. The ratio of the energy resolutions in data and simulation is measured to a precision of 10-15% over the same $p_{\rm T}$ range.


Introduction
Signatures with high transverse momentum, p T , massive particles such as Higgs bosons, top quarks, and W or Z bosons have become ubiquitous during Run 2 of the Large Hadron Collider (LHC). These particles most often decay hadronically. Due to their large transverse momentum, the decay products become collimated and may be reconstructed as a single jet with large radius parameter R [1, 2] (a 'large-R' jet). The sensitivity of searches and measurements that use large-R jets depends on an accurate knowledge of the transverse momentum p T and mass m responses of the detector [3]. A calibration of the large-R energy and mass scales derived using Monte Carlo simulation yields uncertainties as large as 10%. The calibration described in this paper results in a reduction of these uncertainties by more than a factor of three.
In this paper, a suite of in situ calibration techniques is described which measure the response in protonproton (pp) collision data at √ s = 13 TeV. The results of several methods are combined to provide a calibration that defines the nominal large-R jet energy scale (JES) and the jet mass scale (JMS). These measurements provide a significant increase in the precision with which the large-R jet p T and mass scales are known across most of the kinematically accessible phase space. The jet energy and mass resolutions (JER, JMR) are also measured in situ and compared with the predictions of Monte Carlo simulations (MC). Additional uncertainties on jet substructure observables used to identify boosted objects are derived from data in Ref. [4].
Jet reconstruction starts with clusters of topologically connected calorimeter cell signals. These topological clusters, or 'topo-clusters', are brought to the hadronic scale using the local hadronic cell weighting scheme (LCW) [5]. Large-R jets are reconstructed with the anti-k t algorithm [6] using a radius parameter R = 1.0. The jets are groomed with the 'trimming' algorithm of Ref. [7], which removes regions of the jet with a small relative contribution to the jet transverse momentum. This procedure reduces the impact from additional pp interactions in the event and from the underlying event, improving the energy and mass resolution.
The several stages of the ATLAS large-R jet calibration procedure are illustrated in Figure 1. The trimmed large-R jets are calibrated to the energy scale of stable final-state particles using corrections based on simulations. This jet-level correction is referred to as the simulation-based calibration and includes a correction to the jet mass [8]. Finally, the jets are calibrated in situ using response measurements in pp collision data. A correction based on a statistical combination of data-to-simulation ratios of these response measurements is applied only to data and adjusts for the residual (typically 2-3%) mismodelling of the response. Uncertainties in the JES and JMS are derived by propagating uncertainties from the individual in situ response measurements through the statistical combination.
The in situ calibration is determined in two separate steps. In the first step, the JES is measured with the same methods used to calibrate small-R jets [9]. These techniques rely on the transverse momentum balance in a variety of final states, illustrated in Figure 2. The JES correction factor is a product of two terms. The absolute calibration is derived from a statistical combination of three measurements from Z+jet, γ+jet, and multijet events in the central region of the detector. A relative intercalibration, derived using dijet events, propagates the well-measured central JES into the forward region of the detector. The in situ calibration accounts for detector effects which are not captured by simulation. The JES correction is applied as a four-momentum scale factor to jets in data; therefore, it also affects the jet mass calibration. The labels J i refer to the ith leading large-R jet, while j i refers to the ith leading small-R jet that fulfils ∆R(J 1 , j) > 1.4. ∆φ is the difference between the azimuthal angle of the jet and the reference object, while ∆α is the difference between the azimuthal angle of the jet and the vectorial sum of the recoil system momenta.
In the second step of the in situ calibration, the jet mass response is measured using two methods following the application of the in situ JES correction. The mass response is measured in lepton+jets top quark pair production (tt production) [10] with a fit to the peaks in the jet mass distribution formed by high-p T W bosons and top quarks decaying into fully hadronic final states. A second measurement is performed with the R trk method [3], which takes advantage of the independent measurements by the calorimeter and the inner tracker. This method provides a calibration for the calorimeter jet mass measurement over a broad p T range. The results from the two methods are combined as a smooth function of p T in two mass bins, which could be applied to data as an in situ correction as outlined in Section 8.
The JER and JMR are also measured in situ and compared with the prediction of the simulation. The dijet balance method takes advantage of the transverse momentum balance in dijet events to extract the JER. The JMR is obtained from fits to the top quark and W boson mass peaks in high-p T lepton+jets tt events. Sections 2 and 3 provide overviews of the ATLAS detector, the data set studied, and the simulations used in this paper. Section 4 describes the reconstruction of large-R jets in ATLAS. The following section presents the results of the balance methods that measure the jet energy scale: the intercalibration, which uses dijet events to ensure a uniform response over the central and forward regions of the detector in Section 5.1, the Z+jet balance method in Section 5.2, the γ+jet balance method in Section 5.3, and the multijet balance method in Section 5.4. Section 6 presents the methods that are used to measure the jet mass response: the R trk method and its results for the energy and mass scale in Section 6.1 and the fits to the W boson and top quark mass peaks in high-p T lepton+jets tt events in Section 6.2, which are also used to measure the JMR. The measurement of the JER in dijet events is discussed in Section 7. The methodology of the combination procedure is presented in Section 8, as well as the resultant combined in situ calibration of the JES and JMS. Section 9 summarizes the results.

The ATLAS detector and data set
The ATLAS experiment consists of three major sub-detectors: the inner detector, the calorimeters, and the muon spectrometer. The inner detector, closest to the interaction point, is used to track charged particles in a 2 T axial magnetic field produced by a thin superconducting solenoid. It consists of a pixel detector, a silicon tracker equipped with micro-strip detectors, and a transition radiation tracker that provides a large number of space points in the outermost layers of the tracker. It covers the pseudorapidity1 range |η| < 2.5. Surrounding the tracker and solenoid, a sampling calorimeter measures the energy of particles produced in the collisions with |η| < 4.9. The energies of electrons and photons are measured precisely in a high-granularity liquid-argon electromagnetic calorimeter. The cylindrical "barrel" covers |η| < 1.475, and the "endcaps" on either end of the detector cover 1.375 < |η| < 3.2. An iron/scintillator tile calorimeter measures the energy of hadrons in the central rapidity range, |η| < 1.7, and a liquid-argon hadronic endcap calorimeter provides coverage for 1.5 < |η| < 3.2. The forward liquid-argon calorimeter measures electrons, photons, and hadrons for 3.2 < |η| < 4.9. Finally, a muon spectrometer in the magnetic field of a system of superconducting air-core toroid magnets identifies muons in the range |η| < 2.7 and measures their transverse momenta. The ATLAS trigger system consists of a hardware-based first-level trigger followed by a software-based high-level trigger, which apply a real-time selection to reduce the up to 40 MHz LHC collision rate to an average rate of events written to storage of 1 kHz [11]. A detailed description of the ATLAS experiment is given in Ref. [12].
The data set used in this analysis consists of pp collisions delivered by the LHC at a centre-of-mass energy of √ s = 13 TeV during 2015 and 2016. The specific trigger requirements vary among the various in situ analyses and are described in the relevant sections. All data are required to meet ATLAS standard quality criteria. Data taken during periods in which detector subsystems were not fully functional are discarded. Data quality criteria also reject events that have significant contamination from detector noise or with issues in the read-out. The remaining data correspond to an integrated luminosity of 36.2 fb −1 .
Due to the high luminosity of the LHC, multiple pp collisions occur during each bunch crossing. Interactions which occur within the bunch crossing of interest (in-time pile-up) or in neighbouring bunch crossings (out-of-time pile-up) may alter the measured energy or mass scale of jets or lead to the reconstruction of additional 'stochastic' jets, seeded by upwards fluctuations in the local pile-up energy density. The average number of additional pp collisions per bunch crossing is 24 in the Run 2 data from 2015 and 2016 analysed here.

Simulations
The data are compared with detailed simulations of the ATLAS detector response [13] based on the Geant4 [14] toolkit. Hard-scatter events for all processes studied were simulated with several different event generators to assess possible systematic effects due to limitations in the physics modelling. Several different simulation packages were also used to hadronize final-state quarks and gluons in order to compare the impact of various models of hadronization and parton showering on the measurements. Dijet events were generated using several different generator configurations. Depending on the analysis, nominal dijet samples were generated using either P 8 (v8.186) [15] or P -B 2.0 [16][17][18] interfaced with P 8. These samples were generated with the A14 set of tuned parameters [19] and the NNPDF2.3 LO parton distribution function (PDF) set [20]. Samples generated with H 7 [21] and S v2.1 [22] were used for comparison. The H 7 sample used the UE-EE-5 set of tuned parameters [23] and CTEQ6L1 PDF set [24]. The S leading-order multileg generator includes 2 → 2 and 2 → 3 processes at matrix element level, combined using the CKKW prescription [25].
Z+jets events are generated using P -B 2.0 interfaced to the Pythia 8.186 parton shower model. The CT10 PDF set is used in the matrix element [26]. The AZNLO set of tuned parameters [27] is used, with PDF set CTEQ6L1, for the modelling of non-perturbative effects. The EvtGen 1.2.0 program [28] is used for the properties of band c-hadron decays. Photos++ 3.52 [29] is used for QED emissions from electroweak vertices and charged leptons. Samples of Z+jet events are compared to a second sample generated using S 2.2.1. Matrix elements are calculated for up to 2 partons at NLO and 4 partons at LO using Comix [30] and OpenLoops [31] and merged with the S parton shower [32] according to the ME+PS@NLO prescription [33]. The NNPDF30nnlo PDF set is used in conjunction with dedicated parton shower tuning developed by the S authors. γ+jets events are compared to a sample generated with the S 2.1.1 event generator. Matrix elements are calculated with up to 3 or 4 partons at LO and merged with the S parton shower according to the ME+PS@LO prescription. The CT10 PDF set is used in conjunction with dedicated parton shower tuning developed by the S authors. Z+jets events are generated using P -B 2.0 interfaced to the Pythia 8.186 parton shower model. The CT10 PDF set is used in the matrix element [26]. The AZNLO set of tuned parameters [27] is used, with PDF set CTEQ6L1, for the modelling of non-perturbative effects. The EvtGen 1.2.0 program [28] is used for the properties of band c-hadron decays. Photos++ 3.52 [29] is used for QED emissions from electroweak vertices and charged leptons. Samples of Z+jet events are compared to a second sample generated using S 2.2.1. Matrix elements are calculated for up to 2 partons at NLO and 4 partons at LO using Comix [30] and OpenLoops [31] and merged with the S parton shower [32] according to the ME+PS@NLO prescription [33]. The NNPDF30nnlo PDF set is used in conjunction with dedicated parton shower tuning developed by the S authors. γ+jets events are compared to a sample generated with the S 2.1.1 event generator. Matrix elements are calculated with up to 3 or 4 partons at LO and merged with the S parton shower according to the ME+PS@LO prescription. The CT10 PDF set is used in conjunction with dedicated parton shower tuning developed by the S authors.
For γ+jet events, P 8 was used as the nominal generator, where the 2 → 2 matrix element is convolved with the NNPDF2.3LO PDF set. The A14 event tune was used. These events are compared to a sample generated with S v2.1.1, which includes up to four jets in the matrix element. These events were generated using the default S tune and the CT10 PDF set.
Top quark pair production and single top production in the s-channel and Wt final state were simulated at NLO accuracy with P -B v2 [34] and the CT10 PDF set. For electroweak t-channel single top quark production, P -B v1 was used, which utilizes the four-flavour scheme for NLO matrix element calculations together with the fixed four-flavour PDF set CT10f4. In all cases, the nominal sample was interfaced with P 8 with the CTEQ6L1 PDF set, which simulates the parton shower, fragmentation, and underlying event. The h damp parameter in P , which regulates the p T of the first additional emission beyond the Born level and thus the p T of the recoil emission against the tt system, was set to the mass of the top quark (172.5 GeV). Systematic uncertainties in the modelling of hadronization were evaluated using a P sample interfaced to H 7. W+jet events, simulated in S v2.2.0, are considered as a background to tt production.
The effect of pile-up on reconstructed jets was modelled by overlaying multiple simulated minimum-bias inelastic pp events on the signal event. These additional events were generated with P 8, using the A2 set of tuned parameters [35] and MSTW2008LO PDF set [36]. The distribution of the average number of interactions per bunch crossing in simulated samples is reweighted to match that of the analyzed dataset.

Large-R jet reconstruction and simulation calibration
This section describes the reconstruction of large-R jets and the grooming procedure. Three classes of jets are used: calorimeter jets, particle-level (or 'truth') jets, and track jets. The large-R jets considered in this paper are reconstructed using the anti-k t algorithm [6] with a radius parameter R = 1.0. For balancing and veto purposes, jets reconstructed with radius parameter R = 0.4 ('small-R jets') are used in some parts of the analysis with their own calibration procedures applied [9]. The specific implementation of the jet clustering algorithm used is taken from the F J package [37,38].

Large-R jets
Calorimeter jets are formed from topological clusters of calorimeter cells. The clusters are seeded by cells with an energy significantly above the calorimeter noise. The large-R jets used in this paper are reconstructed using topological clusters that are calibrated to correct for response differences between energy deposition from electromagnetic particles (electrons and photons) and hadrons with the LCW scheme of Ref. [5]. Small-R jets reconstructed from "electromagnetic scale" topo-clusters are used as a reference system in the multijet balance method of Section 5.4. Results are labelled with "LCW" or "EM" to indicate the calibration of the clusters. Topological clusters are defined to be massless. The four-momenta of these topo-clusters, initially defined as pointing to the geometrical centre of the ATLAS detector, are adjusted to point towards the hard-scatter primary vertex of the event, which is defined as the primary vertex with the largest associated sum of track p 2 T . To reduce the effects of pile-up, soft emissions, and the underlying event on jet substructure measurement, the trimming algorithm is applied to the jets. Trimming reclusters the jet constituents of each R = 1.0 jet using the k t algorithm [39] and R sub = 0.2, producing a collection of subjets for each jet. Subjets with p subjet T /p jet T < 0.05 are removed, and the jet four-momentum is recalculated from the remaining constituents.
In this paper, trimmed large-R jets with p T > 200 GeV and |η| < 2.5 are studied.

Particle-level jets and the simulation-based jet calibration
The reference for the simulation-based jet calibration is formed by particle-level jets. These are created by clustering stable particles originating from the hard-scatter interaction in the simulation event record which have a lifetime τ in the laboratory frame such that cτ > 10 mm. Particles that do not leave significant energy deposition in the calorimeter (i.e. muons and neutrinos) are excluded. Particle-level jets are reconstructed and trimmed using the same algorithms as those applied to large-R jets built from topological clusters, incorporating the grooming procedure within the jet definition.
After reconstruction of the calorimeter jets, a correction derived from a sample of simulated dijet events is applied to restore the average reconstructed calorimeter jet energy scale to that of particle-level jets. A correction is also applied to the η of the reconstructed jet to correct for a bias relative to particle-level jets in certain regions of the detector [40]. Both corrections are applied as a function of the reconstructed jet energy and the detector pseudorapidity, η det , defined as the pseudorapidity calculated relative to the geometrical centre of the ATLAS detector. This yields a better location of the energy-weighted centroid of the jet than the use of the pseudorapidity calculated relative to the hard-scatter primary vertex.
Reconstructed jets are matched to particle-level jets using an angular matching procedure that minimizes the distance ∆R = (∆φ) 2 + (∆η) 2 . The energy response is defined as E reco /E truth , where E reco is the reconstructed jet energy prior to any calibration (later denoted E 0 ) and E truth is the energy of the corresponding particle-level jets. The mass response is defined as m reco /m truth , where m reco and m truth represent the jet mass of the matched detector-level and particle-level jets, respectively. The average response is determined in a Gaussian fit to the core of the response distribution. The parameterization of the average jet energy response R E = E reco /E truth used for the simulation calibration is presented as a function of η det and for several values of the truth jet energy in Figure 3(a). The correction is typically 5-10%, with a weak dependence on the jet energy and a characteristic structure in η det that reflects the calorimeter geometry.
The simulation-based JES correction factor c JES is determined as a function of the jet energy and pseudorapidity η det . It is applied to the jet four-momentum as a multiplicative scale factor. The pseudorapidity correction ∆η only changes the direction. This means that the reconstructed large-R jet energy, mass, η, and p T become where the quantities E 0 , m 0 , η 0 , and ì p 0 refer to the jet properties prior to any calibration, as determined by the trimming algorithm. The quantities c JES and ∆η are smooth functions of the large-R jet kinematics. None of the calibration steps affect the azimuthal angle φ of the jet.
The large-R jet invariant mass is calibrated in a final step. This is important when using the jet mass in physics analyses, because the jet mass is more sensitive than the transverse momentum to soft, wide-angle contributions and to cluster merging and splitting, as well as to the calorimeter geometry. For the mass correction the jet mass response R m = m reco /m truth is determined using the same procedure as for the jet energy calibration. The mass calibration is applied after the standard JES calibration. The mass response is presented in Figure 3 for three representative values of the truth jet mass: 40 GeV in panel (b), the W boson mass in panel (c), and the top quark mass in panel (d). The mass response is close to unity for jets with p T between 200 and 800 GeV and as large as 1.5 for very energetic jets with relatively low mass. Several effects can impact the jet mass response. The reconstructed mass can be artificially increased by the splitting of topo-clusters during their creation. This effect is particularly important for jets with small particle-level mass relative to their p T (m/p T 0.05). Similarly, when several particles form one topo-cluster, or when particles fail to produce any topo-cluster, the mass response is decreased. This effect is significant for jets with large particle-level mass relative to their p T (m/p T 0.5).
The simulation-based correction to the large-R jet mass c JMS is applied as a function of the jet E reco , η det , and log(m reco /E reco ), keeping the large-R jet energy fixed and thus allowing the p T to vary [40]. This factor is also a smooth function of the large-R jet kinematics. This has the following impact on the reconstructed jet kinematics: All results that correspond to jets that are brought to the particle-level with the simulation-based calibration are labelled with "JES+JMS".

Tracks and track jets
Tracks are reconstructed from the hits generated by charged particles passing through the inner tracking detector (ID). They are required to have p T > 500 MeV. To reduce fake tracks, candidate tracks must be composed of at least one pixel detector hit and at least six hits in the silicon tracker. The track transverse impact parameter |d 0 | relative to the primary vertex must be less than 1.5 mm and the longitudinal impact parameter |z 0 | multiplied by sin θ relative to the primary vertex must be less than 3 mm [41,42].
Jets reconstructed from charged-particle tracks are used as a reference in calibration and uncertainty studies, taking advantage of the independence of instrumental systematic effects between the ID and the calorimeter. Track jets are reconstructed by applying the same jet reconstruction procedure to tracks as those used when constructing the topo-cluster jets described above, including the jet trimming algorithm. Track jets are not calibrated.

The combined jet mass
The jet mass resolution is improved by combining the jet mass measurement in the calorimeter with the measurement of the charged component of the jet within the ID [43][44][45][46][47][48][49][50][51]. A track jet is reconstructed from ID tracks with p T > 500 MeV which are ghost-associated [52] to the topo-cluster large-R jet. The   (1) where m TA is the track-assisted mass, m track the mass obtained from the tracker, and p calo T and p track T are the transverse momenta measured respectively by the calorimeter and tracker. This alternative mass measurement has better resolution for high-p T jets with low values of m/p T . A weighted least-squares combination of the mass measurements is subsequently performed with weights: where w calo and w TA are determined by the expected mass resolutions σ calo and σ TA of the calorimeter and track-assisted measurements, using the central 68% inter-quantile range of the jet mass response distribution in dijet events: such that the resolution of the combined mass measurement is always better than either of the two inputs within the sample from which the weights are derived. In this paper, in situ measurements are presented for the jet mass reconstructed from topo-clusters and for the track-assisted mass. The constraint w calo + w TA = 1 ensures that the combined mass is calibrated, if the scales of both mass definitions are fixed.

In situ p T response measurements
In this section, the methods used to derive the in situ calibration for the energy (or transverse momentum) response are presented. These methods use p T conservation in events where a large-R jet recoils against a well-measured reference object. The first method is based on the p T balance in dijet events with a central (|η det | ≤ 0.8) and a forward (|η det | > 0.8) jet. It is applied after the simulation calibration described in Section 4. The η-intercalibration corrects the p T of forward jets to make the jet energy response uniform as a function of pseudorapidity. After the η-intercalibration procedure, three further balance methods are used to provide an absolute p T scale calibration. In the Z+jet balance method, the recoiling system is a reconstructed Z → µ + µ − or Z → e + e − decay, in the γ+jet balance method it is a photon, and in the multijet balance method the system is formed by several calibrated small-R jets with low p T . These three methods offer complementary coverage over a broad p T range. The Z+jet balance method provides the most precise results in the low-p T interval between 200 and 500 GeV, the γ+jet balance between 500 GeV and 1 TeV, and the multijet balance extends to 2.5 TeV. Results of the three methods are presented in this section and are combined into a global constraint on the JES in Section 8.

Dijet η-intercalibration
The relative η-intercalibration extends the jet calibration to the forward detector region, 0.8 < |η| < 2.5. It is derived from the differences in the p T balance between a central reference and a forward jet in data and simulations. The η-intercalibration is determined in dijet events using a procedure similar to that used for small-R jets [53]. The p T balance of the dijet system is characterized by its asymmetry A, defined in terms of the forward (probe) and central (reference) jet p T (p probe T and p ref T ) as The central reference jets are required to be within |η| < 0.8. The balancing probe jet η det defines the detector region whose response is being probed. The asymmetry distribution is studied in bins of p avg T and the probe jet η det . In each bin, the relative response difference between the central and forward jets is where A is the mean value of the asymmetry. The asymmetry distribution is approximately Gaussian, and the mean value is extracted using a Gaussian fit to the core of the distribution.
Large-R jets with p T from 180 GeV to 2 TeV within |η| < 2.5 are considered. Dijet events in data are selected using several dedicated single-jet triggers based on small-R jets. Their efficiency has been evaluated for large-R jets and each trigger is used in its region of full efficiency for those jets. These triggers provide enough events for this technique to be used over a wide range of p T . To ensure a 2 → 2 body topology, events with energetic additional radiation are vetoed with an upper cut on the transverse momentum of the third jet J 3 , and the leading two jets are required to satisfy a minimum angular separation in azimuth. Both of these requirements are varied in order to derive systematic uncertainties accounting for their impact on the response measurements. These selections and systematic variations are summarized in Table 1. No pile-up jet tagging employing the Jet Vertex Tagger likelihood measure (JVT) [54, 55] is applied for large-R jets, since in this kinematic region the contamination by pile-up jets is negligible. Table 1: Summary of the dijet topology selection and systematic variations considered for the η-intercalibration analysis. The label J 3 refers to the third trimmed R = 1.0 jet in the event after ordering the jets in p T .

Variable Nominal Selection Up Variation Down Variation
The relative jet-p T response R rel is shown in Figure 4 as a function of the large-R jet pseudorapidity for data, P +P 8, and S for two p T intervals. The relative jet response as a function of the large-R jet p T is shown in Figure 5 for two pseudorapidity ranges of the probe jet. In the central region, the relative responses of all three samples agree by design. The relative response in data increases in the forward region due to features of the experimental response which are not well-reproduced in the simulation and hence not accounted for in the simulation-based JES calibration factor c JES . Compared to the measured response, the prediction remains relatively constant around unity. The difference between the simulated and measured responses reaches about 5% around |η| = 2.5. Similar trends are observed for R = 0.4 jets in Ref. [9]. In the lower panel of Figure 4 and Figure 5, the ratio of simulation to data is shown. An interpolation using a filter with a sliding Gaussian kernel across η det yields a smooth function of jet p T and η det . The inverse of this smooth function is taken as the η-intercalibration correction factor c rel (p T , η det ), which is applied as a jet four-momentum scale factor.
The uncertainties associated with the η-intercalibration are shown in Figure 6 for two representative p T bins. The uncertainties associated with the veto on additional radiation and the ∆φ requirement placed on the dijet topology are derived by varying these selection criteria to the values listed in Table 1 and re-deriving the calibration. An additional systematic uncertainty accounts for the choice of event generator and parton shower models. The simulation uncertainty is derived by comparing the relative jet-p T response for two event generators: P +P 8 and S . In general, the uncertainties associated with the derived calibration are small, amounting to a ∼1% uncertainty within the region of interest for large-R jets (|η| < 2.0). Uncertainties originating from the kinematic requirements made to select events are typically negligible, except in the highest p avg T bins. < 380 GeV and (b) 550 GeV < p avg T < 700 GeV. The average response within the reference region |η det | < 0.8 is unity by construction. In the lower panels, the dotted lines interpolating between P +P markers are obtained by smoothing with a filter using a sliding Gaussian kernel.   Figure 5: The relative large-R jet response R rel as a function of the large-R jet p T in two representative detector pseudorapidity η det bins in the forward and central reference regions (a) 1.7 < η det < 1.8 and (b) −0.6 < η det < −0.4. In the lower panels, the lines interpolating between P +P markers are obtained by smoothing with a filter using a sliding Gaussian kernel. 550 GeV < p avg T < 700 GeV. The uncertainties evaluated using variations of the dijet topology selection are negligible relative to the simulation modelling uncertainty, which typically amounts to a 1% uncertainty for large-R jets within 0.8 < |η det | < 2.0.

Z+jet balance
For large-R jets within |η det | < 0.8, an in situ calibration is derived by examining the p T balance of a large-R jet and a leptonically decaying Z boson, either Z → e + e − or Z → µ + µ − (Figure 2(b)). Both of these channels provide a precise, independent reference measurement of the jet energy, either from the inner detector and muon spectrometer tracks used to reconstruct muons or from the well-measured electromagnetic showers and inner detector tracks used to reconstruct electrons. The applicable range of this calibration is limited by the kinematic range where Z boson production is relatively abundant, that is, up to a Z boson p T of about 500 GeV. Electrons used to reconstruct the Z boson are required to pass 'medium likelihood identificiation' quality and 'Loose' isolation requirements and must be reconstructed within |η| < 2.47 (excluding the transition region 1.36 < |η| < 1.52 between the barrel and endcap electromagnetic calorimeters) with at least 20 GeV of p T [56,57]. Similarly, 'VeryLoose' quality and 'Loose' isolation requirements are placed on muons, which must be reconstructed within |η| < 2.4 with p T > 20 GeV [58]. The lepton pair must have opposite charge and be kinematically consistent with the decay of a Z boson, requiring the invariant mass of the lepton pair to satisfy 66 < m + − < 116 GeV. Large-R jets studied here are calibrated with the simulation calibration and η-intercalibration described in Sections 4 and 5.1.
The direct balance method used here closely follows the methodology outlined in Ref. [9]. The average momentum balance between the large-R jet and Z boson is where p J T is the large-R jet p T and p ref T = p Z T cos (∆φ) is the component of the reference momentum collinear with the jet, with ∆φ being the azimuthal angle between the large-R jet and reference Z boson. The average value is determined using a Gaussian fit.
Even with an ideal detector, the momentum balance R DB of Eq. 3 will only equal unity for an ideal 2 → 2 process. In practice, there tends to be more QCD radiation in the hemisphere opposite to the colour-neutral Z boson, and therefore R DB tends to be below unity. The event selection imposes a veto on the p T of additional sub-leading jets. A minimum requirement is also imposed on the angular separation ∆φ of the large-R jet and reference Z boson. Any mismodelling in the jet energy scale may be evaluated using the balance double ratio of R DB in data and simulation R data DB /R MC DB . If the event selection criteria are met and the reference object is well measured and correctly modelled in simulation, any deviation from unity in the double ratio can be attributed to a mismodelling of the jet response in simulation and may be taken as an in situ correction.
Calibrated anti-k t R = 0.4 jets constructed from electromagnetic-scale topo-clusters are used to veto additional radiation. These jets are required to be ∆R > 1.4 from the large-R jet whose response is being probed (J 1 ), which ensures that there is no overlap. Such small-R jets with p T < 60 GeV must also satisfy a requirement on the jet vertex tagger (JVT) [54], which is designed to reject additional jets produced by pile-up interactions using information from the inner detector. The 2 → 2 topology selection only accepts events in which any small-R jet is reconstructed with a p T < max(0.1 p ref T , 15 GeV) and the ∆φ between the large-R jet and Z boson is greater than 2.8. A summary of the event selection is presented in Table 2. This table also reports variations associated with each criterion, performed by redoing the full analysis for each such variation and taking the difference between the varied and nominal results as the systematic uncertainty. Table 2: Summary of the 2 → 2 topology selection and systematic variations considered for the Z+jet direct balance analysis. The labels J i refer to the ith leading large-R jet, and j i to the ith leading small-R jet that fulfils ∆R(J 1 , j i ) > 1.4.

Variable
Nominal Selection Up Variation Down Variation p Measurements of R DB are carried out separately in the electron and muon channels. They are found to be consistent and thus combined to provide a single measurement of the JES. The average momentum balance in Z+jet events after this combination is shown in Figure 7. The balance is found to be consistently below unity as a function of p ref T . The ratio of the predicted balance to the measured balance is consistently 1-4% above unity. The uncertainties associated with this measurement are shown in Figure 8, where modelling systematic and statistical uncertainties are the dominant source of error over the p T range considered.  . These uncertainties are also discussed in the context of small-R jets in Ref. [9]. The lines are obtained by smoothing a binned representation of these uncertainties using a sliding Gaussian kernel.

γ+jet balance
The large-R jet energy scale can be measured using the γ+jet final state (Figure 2(b)). This method exploits the fact that the energy of photons is measured more precisely than that of jets. As cross-section for this process is larger than that for Z+jets production, this balance technique probes higher large-R jet p T . The γ+jet method is based on the balance between photons and large-R jets, using the ratio R DB defined in The double ratio of R data DB /R MC DB measures any residual modelling effects in the jet energy scale calibration. If the reference photon is well measured experimentally and the γ+jet events are correctly modelled in simulation, any deviation from unity in the double ratio can be attributed to a mismodelling of the jet response in the Monte Carlo simulation.
Events are selected using the lowest unprescaled single-photon trigger. The offline selection requires the presence of a photon satisfying the 'tight' identification and isolation requirements [59, 60] with at least 140 GeV of E T . This criterion ensures full trigger efficiency. As in the case of Z+jet balance (Section 5.2), the presence of significant additional radiation in the event invalidates the assumption of a balanced topology. Events are therefore vetoed if a reconstructed, calibrated R = 0.4 jet built from electromagnetic-scale topo-clusters has a p T which satisfies p T > max(0.1 p ref T , 15 GeV). Small-R jets with p T < 60 GeV must also satisfy a JVT requirement. Photons must be separated from reconstructed large-R jets by at least ∆φ(J, γ) > 2.8. The simulation calibration and η-intercalibration described in Sections 4 and 5.1 are applied to the large-R jets studied here.
A photon purity correction is applied to the mean balance results in data to correct for contamination from misidentified jets or electrons that may skew the nominal p T balance. The contamination of the photon sample by fakes is derived from data using the double-sideband, or ABCD, method [61, 62] in the plane spanned by the photon isolation2 and the photon identification measure.3 The purity correction results in a shift of the relative R DB value between data and simulation of about 2%.
In Figure 9 the result is shown as a function of the reference p T for large-R jets in the region |η| < 0.8. The ratio of the predicted response in the simulation to the measured response is shown in the inset below the main panel. As already observed in Section 5.2, the ratio of simulation to data is above unity over the whole p T range. These results are included in the in situ calibration that corrects the jet energy response in data.
The uniformity of the large-R jet response across the detector geometry is shown in Figure 10, as a validation of the η-intercalibration procedure (Section 5.1). The relative response across the detector is constant and well behaved.
There are three main categories of systematic uncertainties in the R DB measurement: those related to the modelling of additional QCD radiation which affects the balance, uncertainties associated with the photons [63, 64], and effects due to the presence of pile-up jets. The effects of extra radiation on the balance are assessed by varying the topological selections and the overlap removal as described in Table 3. 2 The calorimeter isolation variable E iso T is defined as the sum of the E T of topological clusters deposited in a cone of size ∆R = 0.4 around the photon candidate, excluding an area of size ∆η × ∆φ = 0.125 × 0.175 centred on the photon cluster and subtracting the expected photon energy deposit outside of the excluded area. Fluctuations in the ambient transverse energy of the event are corrected for; the typical size of this correction is 2 GeV in the central region. 3 The photon identification decision is based on a set of shower shape variables computed from energy depositions in the first and second layers of the electromagnetic calorimeter and from leakage in the hadronic calorimeter.  Repeating the analysis separately using ∆φ(J, j) > 1.2 and ∆φ(J, j) > 1.6 produces a negligible systematic shift relative to the nominal result. The effects of the photon measurement are assessed by varying the energy scale and resolution of the photon calibration, as well as by varying the measured photon purity in the purity correction. The effects of pile-up jets on the calibration are estimated by varying the JVT selection threshold for the small-R jets. Lastly, the analysis is repeated with S 2.1 MC samples, in place of the nominal P 8 samples, to assess the modelling uncertainty. As shown in Figure 11, the overall combined systematic and statistical uncertainty is approximately 1% for the p T range from 150 to 880 GeV. The photon energy scale uncertainty is the dominant source over the entire p T range. Table 3: Summary of the selection and systematic variations considered for the γ+jet direct balance analysis. The labels J 1 refers to the leading large-R jet and j 1 to the leading small-R jet that fulfils ∆R(J 1 , j) > 1.4.

Variable
Nominal Selection Up Variation Down Variation p Sherpa2.1 Stat. Figure 10: The momentum balance R DB extracted from γ+jet balance distributions in data and simulation as a function of the large-R jet detector pseudorapidity η det . The ratio of the results obtained from the nominal P simulation to the results from data is shown in the bottom panel. The ratio of P to S results, taken as a systematic uncertainty associated with modelling, is included in the shaded band in the ratio panel, which also includes statistical and systematic uncertainties from other sources.

Multijet balance
The Z+jet and γ+jet techniques provide precise constraints on the jet energy scale for jets with p T up to 1 TeV. The energy scale of higher-p T large-R jets is measured using multijet events. A schematic representation of the event topology used in this method is shown in Figure 2(c). The multijet balance (MJB) method takes advantage of events where an energetic large-R jet is balanced against a system that consists of multiple lower-p T jets.
For the calibration of large-R jets the reference p recoil T is obtained as the four-vector sum of calibrated small-R anti-k t jets. The transverse momentum balance is where p J T is the transverse momentum of the leading large-R jet and p recoil T is the magnitude of the vectorial sum of the transverse momenta of the recoil system of small-R jets. The average value of the ratio is taken to be the mean value of a Gaussian fit. The value of R MJB is measured in data and determined in simulation in several bins of p recoil T . The data-to-simulation double ratio R data MJB /R MC MJB allows estimation of the response for high-p T jets.
Events are selected using single small-R jet triggers. Bins of p recoil T are defined to correspond to a given fully efficient single small-R jet trigger. The triggers used for 200 GeV < p recoil T < 550 GeV are prescaled, whereas an unprescaled jet trigger is used for p recoil T > 550 GeV.
The event selection is summarized in Table 4. For small-R jets with p T < 60 GeV within |η| < 2.4, the JVT selection is applied to suppress pile-up jets. The large-R probe jet is required to have |η det | < 0.8, while the small-R jets that constitute the recoil system are required to have |η det | < 2.8 and p T > 25 GeV. To select events with multijet recoil systems, the leading jet in the recoil system (j 1 ) is allowed to have no more than 80% of the total transverse momentum of the recoil system. This selection ensures that the recoil system consists of several jets with lower p T than the large-R jet, which are each well-calibrated by small-R jet in situ techniques [9]. The angle α in the azimuthal plane between the leading large-R jet and the vector defining the recoil system is required to satisfy |α − π| < 0.3. The ∆R distance β between the leading large-R jet and the nearest small-R jet from the recoil system is required to be greater than 1.5. The simulation calibration and η-intercalibration described in Sections 4 and 5.1 are applied to the large-R jets studied using this technique.

Variable
Nominal Selection Up Variation Down Variation Separation angle (α) Recoil system minimum p T 25 GeV 30 GeV 20 GeV Figure 12 shows the distribution of R MJB as a function of the large-R jet p T . The balance in data decreases from approximately 1.01 at p T = 300 GeV to about 0.99 for jets with p T = 2 TeV. The simulation shows a similar downward trend. The response in simulations is 2% higher than in data, consistent with the findings of the other methods where they overlap.
The total uncertainty in the R MJB measurement is approximately ±2% or lower for p T < 2 TeV. The uncertainty in the energy scale of the jets of the recoil in situ procedure is propagated through the large-R MJB procedure. Uncertainties associated with high-p T jets in the recoil system which lie beyond the region covered by the R = 0.4 in situ analyses are derived from measurements of the calorimeter response to isolated single charged particles, which are also propagated through this large-R jet analysis to provide coverage at the highest values of jet p T (> 1 TeV) [65]. No assumption is made about the flavour of the recoil jets (originating from a gluon, a light quark, or a heavy-flavour quark). This lack of knowledge is a source of systematic uncertainty. The uncertainty in the multijet-balance observable due to the jet flavour response is evaluated using a correlated propagation of the small-R jet flavour response uncertainties, i.e. all jets are shifted simultaneously.
In addition to the jet calibration and uncertainties in the reference scale, the event selection criteria and the modelling in the event generators directly affect the p T balance used to obtain the multijet-balance results. The impact of the event selection criteria is investigated by shifting each event selection criterion up and down by a specified amount and observing the change in the multijet-balance variable. Using an approach to systematic uncertainties similar to that in the small-R in situ analysis, the transverse momentum threshold for recoil jets is shifted by ± 5 GeV, the p j1 T /p recoil T is shifted by ± 0.1, the angle α is shifted by ± 0.1, and β is shifted by ± 0.4. The uncertainty due to modelling of multijet events in simulations is estimated from the largest difference between the multijet-balance results obtained from the nominal P 8 simulation and those obtained from S v2.1 and H 7. Figure 13 shows the breakdown of the fractional uncertainties in the jet energy scale derived from this method. Various uncertainties propagated from the reference jet system dominate the measurement across the entire p T range.

In situ jet mass calibration
In this section, two methods to derive an in situ calibration for the large-R jet mass are presented. The first method, known as the R trk method, relies on the tracker to provide an independent measurement of the jet mass scale and its associated uncertainty. The second method, known as forward folding, fits the mass peaks and jet mass response of the W boson and top quark to measure the relative energy and mass scales and resolutions between data and simulations. Both measurements are performed after applying the in situ calibration for the energy scale, which also affects the jet mass scale. The results in this section are combined into a global jet mass calibration, detailed in Section 8.

Calorimeter-to-tracker response ratios
The calorimeter-to-tracker response double-ratio method (or R trk method) is built around the fact that the ATLAS detector provides two independent measurements of the properties of the same jet from the calorimeter and the tracker [3]. Jets formed from inner detector tracks only take into account the hits from their charged-particle constituents. Calibrated jets formed from energy depositions within the calorimeter provide a measure of the properties from the full shower. The average calorimeter-to-track jet response is proportional to the average calorimeter-to-truth jet response. Therefore, a comparison of the double ratio of R trk in simulations and data provides a way to validate the modelling of large-R jet properties in situ.
The ratio of R trk values determined in data and simulations should be equal to unity for well-modelled observables. Any deviation from this expectation can be taken as a scale uncertainty in the measurement. This method is versatile and allows the determination of uncertainties for several variables, such as the p T , mass, and substructure information of large-R jets. Moreover, the dijets process provides a very large sample, such that the analysis can be performed in a large number of p T and mass or m/p T regions. Figure 14 shows R trk as a function of the large-R jet p T in dijet events for data and several simulation samples. The maximum spread between the two generators and three tracking variations that assume three different types of mismodelling (resolution [66], efficiency within dense environments [67], and alignment [68]) is about 8%. A steady increase in the calorimeter-to-track jet response R trk with increasing large-R jet p T is observed, going well beyond the expected ratio of the total and charged transverse momenta of a jet, caused by inefficiencies in the tracker response at high jet p T . Figure 15 shows a breakdown of the uncertainties in the large-R jet p T derived from this method for the transverse momentum for large-R jets with values of m/p T ≈ 0.2. The main source of uncertainty across the entire p T range originates from differences between data and the nominal Monte Carlo generator considered in this study. As this uncertainty was expected to be large, the R trk method is neither included in the in situ JES combination nor used as a source of systematic uncertainty for the JES of large-R jets. Rather, the R trk p T results are used as an independent cross-check to validate the JES calibration techniques.
The same method is also applied to the large-R jet calorimeter mass, and is shown in Figure 16. The largest difference between the considered generators is ∼2-3%. Figure 17 shows the various uncertainties in the large-R jet mass derived from the R trk mass response for large-R jets with m/p T = 0.2. Again, the main source of uncertainty originates from differences between data and the nominal simulation. The R trk method can also be used to study the topology dependence of the response modelling. The double ratio is constructed in two event samples, with different jet flavours (jets originating from light quarks or gluons and jets containining a hadronic top quark decay). The dijet sample used for Figure 14 is dominated by gluon jets at low transverse momenta, while at higher momenta the fraction of light-quark jets in the sample increases. The tt sample of Section 6.2 is enriched in large-R jets that contain a complete high-p T object's decay (either a top quark or W boson). In Figure 18 the double ratios of the two samples are compared for jet p T and jet mass. The jets in the samples correspond to the same pseudorapidity range |η| < 2.0 and the same p T and jet mass intervals. In both samples, the double ratio is constructed with the nominal simulation events, which rely on P 8 for hadronization. As systematic uncertainties are expected to partially cancel out, only statistical uncertainties are shown.
There is a mild tension between the double-ratio results from the two samples. The double ratio in the tt sample is systematically somewhat higher than the equivalent result in the dijet sample. The difference is typically 1% or less, except in the first bin of the double ratio for jet mass. This is significant compared to the statistical uncertainties but is small in comparison with the modelling uncertainties of the R trk method. Some properties of these two jet populations differ, such as the distribution of their m/p T and their flavour  Figure 15: The total uncertainty in the relative jet energy scale in data and simulations associated with the R trk method is plotted as a function of jet transverse momentum p T . The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. The contributions from several sources are indicated. The baseline uncertainty represents the deviation of the double ratio from unity for the baseline simulations. The lines shown are obtained by smoothing a binned representation of these uncertainties using a sliding Gaussian kernel. composition, and so it is not expected that the modelling uncertainties will cancel out exactly. No additional uncertainty is assigned to account for the topology dependence.   Figure 16: Measurement of R m trk as a function of the large-R jet transverse momentum p T for large-R jets with m/p T = 0.2. The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. Data are compared with three generators and with three tracking variations for the default generator P 8 (shown as a band around these points). The double ratio of R m trk measured in simulations and data is shown in the lower panel.  Figure 17: The total uncertainty in the relative jet mass scale between data and simulation associated with the R trk method is plotted as a function of jet transverse momentum p T for large-R jets with m/p T = 0.2. The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. The contributions from several sources are indicated. The baseline uncertainty represents the deviation of the double ratio from unity for the baseline simulations. The lines shown are obtained by smoothing a binned representation of these uncertainties using a sliding Gaussian kernel.  Figure 18: The simulation/data ratio of R trk for (a) large-R jet p T and (b) calorimeter mass as a function of the large-R jet transverse momentum p T . Two sets of results are derived from a dijet sample, dominated by light-quark and gluon jets, and a tt sample, where the large-R jets contain a boosted W boson or top quark. The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. The jets in both samples correspond to the same pseudorapidity range |η| < 2.0 and the same p T and jet mass intervals. The double ratio is constructed with the nominal P 8 samples for dijet events and P +P 8 samples for the tt sample. The error bars indicate statistical uncertainties.

Forward folding
A high-purity signal sample of large-R jets with high-p T , hadronically decaying W bosons and top quarks is obtained by selecting tt events in the lepton+jets final state, where a hadronically decaying top quark balances one which decays to a leptonically decaying W boson and b-quark. This sample is used to measure the response for jets in signal-like topologies which contain jets consisting of multiple regions of high energy density [69, 70]. The jet mass response is determined by fits to the W boson and top quark mass peaks in the large-R jet invariant mass distribution of the hadronically decaying top quark candidate.
The event selection is based on the ATLAS search for tt resonances [71] and is summarized in Table 5 The large-R jet mass distribution of the highest-p T large-R jet in the hemisphere opposite to the charged lepton is shown in Figure 19 for two categories of events, and for both the calorimeter-only and track-assisted jet masses. For large-R jets with intermediate p T (200 GeV < p T < 350 GeV), in Figures 19(a) and 19(c), the decay products of the hadronic W boson are captured in a single large-R jet. For high-p T jets with p T > 350 GeV, in Figures 19(b) and 19(d), the complete hadronic top decay is captured in the main large-R jet. The high-p T W boson and top quark topologies are confirmed by, respectively, vetoing or requiring a b-tagged small-R jet that overlaps with the large-R jet.
The track-assisted mass (Eq. (1)) is obtained by scaling the invariant mass of the charged-particle jet by the ratio of the p T of the calorimeter and charged-particle jets. The resulting jet mass distributions in the W boson and top quark large-R jet samples are presented in Figures 19(c) and 19(d). The selection for this second set of plots is entirely based on the properties of the matched calorimeter jet, such that plots (a) and (c) and plots (b) and (d) are populated by the same jets. The track-assisted mass peaks in (c) and (d) are slightly broader than the calorimeter-based mass peaks in (a) and (b) for large-R jets with a large invariant mass and relatively low p T .
The position and shape of the mass peaks provide information about the large-R jet mass scale and resolution. Values for the ratio of the response in data and simulations (s = R m data /R m MC ) and the ratio of the resolution in data and simulations (r = σ m data /σ m MC ) are extracted from the jet mass spectrum. These two parameters are extracted simultaneously in a fit referred to as forward folding [10]. This method produces simulation-based predictions of the jet mass spectrum with variable response and resolution. This is achieved by folding particle-level jets with a response function. The default response function is taken from the nominal simulations. The predicted detector-level jet mass spectrum for arbitrary values of s and r is obtained by modifying the response function by where m reco is the detector-level large-R jet mass and R m is the large-R jet mass response. The value of R m is obtained from simulations, as discussed in Section 4. Typical values of R m are in the range 0.8-1.5, depending on jet p T and mass. The forward-folding procedure does not require the response to be Gaussian. The scale factors s and r also modify the non-Gaussian tails of the response function, if these are present in the simulations.
The prediction from simulation is fit to the data by minimizing the χ 2 built with the predicted and observed distributions. The best-fit values for s and r are taken as the data-to-simulation scale factors for the large-R jet mass response and jet mass resolution. This method has the advantage that the response for the tt events and events from other Standard Model processes is varied consistently. It was first applied to 2012 data [10]. Further details of the forward-folding procedure are in Refs. [43,74].
The results of the fits are shown in Figure 20. The data sample is divided in several p T bins. The W boson peak is fitted in two intervals: 200 GeV < p T < 250 GeV and 250 GeV < p T < 350 GeV. The top quark peak is fitted for p T between 350 and 500 GeV and between 500 GeV and 1 TeV. The small error bars on the points represent the statistical uncertainty, and the larger error bars represent the total uncertainty. The dominant systematic effect is expected to be due to the modelling of top quark pair production, estimated by repeating the analysis with P + H 7, S , and several variations of the generator settings that regulate the probability of hard initial-and final-state radiation.
An in situ calibration is also derived for the track-assisted mass in a completely analogous fashion. The JMS and JMR results are shown with open circles in Figure 20. The statistical and systematic uncertainties are indicated on the data points. The systematic uncertainties are dominated by modelling uncertainties and are expected to be strongly correlated between the two measurements. The in situ scales of the two mass measurements are found to be within 1% for all points and within 0.5% for three out of four. As the track-assisted mass is primarily sensitive to the p T response of the calorimeter, this level of agreement implies that the p T and mass scales are closely connected for these high-mass jets with relatively low p T .
Measurements of the p T response of high-p T W bosons or top quarks can be obtained directly by fitting the balance distribution of the two top quark candidates. This provides a cross-check of the direct balance methods discussed previously in Sections 5.1-5.4 in a topology with a very different radiation pattern. The reference system is formed by the b-jet, the charged lepton, and the neutrino from the semileptonic top quark decay. It is reconstructed by adding the four-vectors of the charged lepton, the leading (and possibly b-tagged) small-R jet in a cone of size ∆R = 1.5 around the charged lepton, and the neutrino [75].
The transverse momentum of the neutrino is inferred by assigning the E miss T to the neutrino p T , and its p z can be reconstructed using a W-mass constraint (but does not affect the balance measurement). The resulting balance distribution of the probe jet p T and the recoiling semileptonic top quark decay system has a distinctive peak around 1. The peak position is sensitive to the large-R jet energy scale, and its width is sensitive to the resolution. Measurements of the relative jet mass scale and resolution obtained by fitting the balance distribution with the same forward-folding technique are shown in Figure 21, after the application of the in situ JES calibration derived from light quark and gluon jets (Section 5). The results are compatible with unit JES within the precision of the measurement. This provides another confirmation that the Monte Carlo modelling of the response of high-p T , hadronically decaying W bosons or top quarks is adequate within 2-3%, and that a calibration derived from jets without hard substructure is applicable to topologies with hard substructure. The large-R jet transverse momentum p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. The template estimated from simulations is rescaled to match the observed yield. The lower panels display the data-to-simulation ratio. The error bars on the data represent the statistical uncertainty. The dashed uncertainty band on the simulation template includes the systematic uncertainties due to signal and detector modelling.  Figure 20: Summary of the in situ measurements of the large-R jet mass response in tt events with a lepton+jets final state as a function of the large-R jet transverse momentum p T . The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. The closed circles correspond to the JMS and JMR of trimmed large-R jets reconstructed from calorimeter clusters. The open circles represent the equivalent result for the track-assisted mass. The dashed lines, corresponding to ±1% for the JMS and ±10% for the JMR, are drawn for reference. The results in the first two p T bins (200 GeV < p T < 250 GeV and 250 GeV < p T < 350 GeV) correspond to a sample of high-p T W bosons, and the highest two bins (350 GeV < p T < 500 GeV and 0.5 TeV < p T < 1 TeV) correspond to high-p T top quarks. In each subsample, the JMS and JMR are extracted simultaneously in a two-parameter fit to the mass distribution. The statistical and total uncertainties are indicated with the small and large error bars on the data points, respectively. The dashed lines, corresponding to ±1% for the JES and ±10% for the JER, are drawn for reference. The results in the first two p T bins (200 < p T < 250 GeV and 250 < p T < 350 GeV) correspond to a sample of high-p T W bosons, and the highest two bins correspond to high-p T top quarks.

Measurement of the large-R jet p T resolution
The in situ measurement of the ATLAS jet p T resolution4 relies on a measurement that exploits the momentum balance between the leading and sub-leading large-R jets in dijet events. This measurement follows the event selection criteria outlined for the η-intercalibration provided in Section 5.1, including the trigger strategy. The simulation calibration and η-intercalibration described in Sections 4 and 5.1 are applied to the large-R jets studied here, and the large-R jet p T is also corrected using the combination of the in situ direct balance techniques discussed in Sections 5.2, 5.3, and 5.4, which is presented in Section 8.
The asymmetry distribution of Section (5.1) is studied in dijet events in bins of the dijet system p avg T and the probe large-R jet η det . The width of the asymmetry distribution depends on the resolution of the jet p T measurement and on the intrinsic particle-level width, which arises due to balance fluctuations and out-of-cone effects. Since the latter effect is uncorrelated with the detector response, the component of the asymmetry width due to the detector resolution can be determined by subtracting in quadrature the asymmetry width of particle-level ('truth-level') jets from that of reconstructed jets, giving The jet energy resolution is measured in two η det bins: the central reference region |η det | < 0.8, denoted "ref", and a forward region 0.8 < |η det | < 2.0, denoted "fwd". If both large-R jets are within the central reference region, they have the same p T resolution. In this case, the determination of the probe jet is arbitrary, and the assignment proceeds using a random-number generator. Since both jets contribute the same amount to the asymmetry distribution, the relative jet-p T resolution of the reference region is defined by The resolution of forward jets is extracted from the width of the asymmetry distribution in events where a central reference jet balances a forward probe jet (in the region 0.8 < |η det | < 2.0). The result is corrected for the resolution of central jets by subtracting the asymmetry of central dijet systems, giving (4) Figure 22 shows σ A for reconstructed-and truth-level dijet systems as a function of p avg T in two η det bins, as well as for data. For each of the event generators, the width of the detector-level asymmetry is shown as a solid line, while the particle-level asymmetry is indicated by a dashed line. For forward jets, the additional correction shown in Eq. (4) is applied to account for the effect of the resolution of the large-R jet within the central reference region.
Following the correction for the particle-level width, the results of a fit to the asymmetry distribution obtained in data and from several event generators (P 8, H 7, and S 2.1) are shown in Figure 23, where the measured relative resolution σ(p T )/p T is plotted as a function of the average p T of the two jets, p avg T . The correction for the particle-level resolution is estimated using the P sample. The  Figure 22: Width of the dijet asymmetry distribution obtained using reconstructed (σ A,reco ) and particle-level jets (σ A,truth ) as a function of the average jet transverse momentum p avg T . Results are shown (a) for events where both jets have detector pseudorapidity in the range |η det | < 0.8 and (b) for events where the probe jet has 0.8 < |η det | < 2.0, and the reference jet is still within |η det | < 0.8. The measurement is compared with the prediction from simulations based on the three generators P 8, H 7, and S 2.1. Also an unweighted average of the three is shown. The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. Statistical errors are usually smaller than the size of the marker. The resolution at the particle level is also shown as a dashed line. measured resolution in the central region is in fair agreement with the predicted resolution. The resolution of forward jets in data and simulations is compatible within the observed uncertainties. The choice of event generator has a small effect on the resolution.
In Figure 23, the relative p T resolution, p reco T /p true T , as predicted by the simulations is compared with the result of the extraction of the resolution from the asymmetry in simulated events. The difference between the two indicates a bias in the method that is taken as an additional uncertainty (labelled non-closure).
The total uncertainty in the determination of the JER is shown in Figure 24 as a function of the average p T and in the two η det regions. A breakdown of the uncertainties into individual sources is presented. The large-R jet energy scale is varied according to its uncertainty, leading to a 10-15% variation in the measured resolution due to its impact on the asymmetry (labelled as 'JES uncertainty'). The non-closure uncertainty is found to be a nearly constant 10% effect in the central region and to be 5-10% in the forward region. The ∆φ requirement is also varied by ±0.5, which has a small effect primarily for low-p T jets. The modelling uncertainty is estimated as the variation of the result when using different generators for the particle-level momentum imbalance, where P 8 is chosen as a nominal sample and H 7 and S 2.1 are chosen as the variations.  Figure 23: Comparison of the measured jet p T resolution with the resolution determined in simulation, averaged between different generators as a function of the average jet p T and in two bins of detector pseudorapidity η det from (a) |η det | < 0.8 and from (b) 0.8 < |η det | < 2.0. The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. The error band, drawn as a light band, represents the statistical and systematic uncertainties added in quadrature. The determination of the relative resolution using the in situ technique for an average of three simulations and their envelope is also shown as a dark band. Inconsistencies between the resolution determined using the in situ technique and of the resolution determined from the response in simulation by matching particle-level jets to reco-level jets (light dotted line) are taken as an additional uncertainty in the measurement. The lines shown are obtained by smoothing a binned representation of these uncertainties using a sliding Gaussian kernel.  Figure 24: The relative uncertainty in the ratio of the jet transverse momentum p T resolution measured in dijet events and in simulations as a function of the average jet p T in pseudorapidity η bins (a) |η| < 0.8 and (b) 0.8 < |η| < 2.0. The large-R jet p T is corrected using the simulation calibration, η-intercalibration, and a combination of in situ direct balance techniques. Contributions from three sources are estimated separately by propagating the uncertainty in the energy scale to the measurement, by varying the ∆φ selection, and by varying the event generator. The lines shown are obtained by smoothing a binned representation of these uncertainties using a sliding Gaussian kernel.

Combined large-R jet calibration results
The measurements of the trimmed large-R jet response relative to simulation obtained using the different in situ methods presented in Sections 5 and 6 are combined to determine the relative jet energy and mass scales over a broad range of jet transverse momenta. The combination procedure is described in detail in Ref. [76].
The data-to-simulation response ratios obtained from the γ+jet, Z+jet, and multijet balance methods are combined to produce a jet p T -dependent calibration curve. The uncertainties in the p T calibration are obtained by error propagation of the uncertainties associated with the in situ methods. A jet mass calibration is derived analogously using the jet mass response measurements provided by the forward-folding and R trk methods.
The measurements of the p T response are performed in bins of the jet transverse momentum (the p ref T values are translated to jet p T ) and evaluated inclusively in mass. The jet mass response combination is performed in bins of the jet transverse momentum and in two bins of the jet mass. The combination proceeds in three steps which take into account correlations between uncertainties and possible inconsistencies between the in situ methods: • Simple Monte Carlo method: Pseudo-experiments are created that represent the ensemble of measurements and contain the full data-treatment chain including interpolation and averaging (described in the following steps). These pseudo-experiments are used to consistently propagate all uncertainties into the evaluation of the average. They are generated taking into account all known correlations by coherently shifting all correction factors by one standard deviation. The difference between the shifted-correction result and the nominal result provides an estimate of the propagated systematic uncertainty.
• Interpolation: The relative p T (mass) response is defined in fine p T bins, separately for each in situ method using interpolating splines based on first-or second-order polynomials.
• Averaging: The actual combination is carried out using a weighted average of the in situ measurements based on a χ 2 -minimization. The weights take into account the statistical and systematic uncertainties, as well as correlations and differing bin sizes. The local χ 2 is also useful to define the level of agreement between in situ measurements where they overlap.
The uncertainty sources are treated according to the Hessian formalism: each uncertainty source is fully correlated across kinematic regions (i.e. as a function of p T and η) but is uncorrelated with other sources. Sources of uncertainty that affect both the small-R and large-R jet in situ calibration are treated as fully correlated. The reduced χ 2 is estimated as χ 2 /N dof , where N dof is the number of degrees of freedom (in this case, the number of combined measurements contributing to the average in a particular p T bin). In case of disagreement between different in situ measurements, i.e. when the reduced χ 2 value is larger than 1, the uncertainty sources are rescaled by χ 2 /N dof .
A smoothing procedure using a variable-size sliding interval with a Gaussian kernel is applied to the response ratio and its associated systematic uncertainties. This smoothing removes spikes due to statistical fluctuations in the measurements, as well as discontinuities at the first and last point in a given measurement.
In Figure 25, the ratio of the jet p T response in data and simulations is shown as a function of the jet transverse momentum. Data points are shown for the γ+jet, Z+jet, and multijet balance methods, and the band corresponds to the result of the combination. [GeV] T p jet R Large-  The relative weight in the fit of the three methods is shown in Figure 26. The Z+jet balance makes the largest contribution up to transverse momenta of approximately 500 GeV. Between 500 GeV and 1 TeV, the γ+jet balance recieves the largest weight. At higher p T , the multijet balance method acquires more weight in the combination. Beyond 1 TeV, it provides the only measurement and extends the jet energy scale beyond 2 TeV.
The local χ 2 per degree of freedom in Figure 27 quantifies the level of agreement between the three sets of measurements. The results of the three methods agree in the whole p T range 0.1 TeV < p T < 1 TeV, where all three provide results.
The combined p T response in data is approximately 3% lower than in the simulation over most of the p T range. The deviation from unity in the data/MC ratio is significant, as the total uncertainty approaches 1% in the intermediate p T region. These observations are consistent with previous in situ measurements of the R = 0.4 JES during Run 2 [9] with similar levels of associated uncertainty. At low p T , the uncertainty reaches about 1% at 200 GeV. Above 1.5 TeV, the uncertainty increases, reaching over 2% at 2.4 TeV.
A breakdown of the total JES uncertainty is presented graphically in Figures 28 and 29. This includes uncertainties in γ+jet, Z+jet, and multijet balance methods associated with the simulation modelling, reference system construction and calibration, and the event selection. Furthermore, as the large-R multijet balance method uses small-R jets as a reference system, all nuisance parameters from the small-R jet calibration enter as uncertainties in the combination presented here.
The combination of the jet mass response includes results from two methods. Forward folding provides four measurements in the p T range below 1 TeV. The R trk method takes advantage of a large data sample and can be finely binned in mass and p T , extending to over 2 TeV. The combined result is shown in Figure 30 for two jet mass intervals: the plot in the upper panel corresponds to the W boson mass window with 50 GeV < m < 120 GeV, and the lower panel corresponds to the top quark mass window with 120 GeV < m < 300 GeV.
The in situ jet mass calibration factor is defined from the combined mass response shown in Figure 30 as c m = R m MC /R m data . It is applied as a scale factor to the jet mass but does not affect the jet momentum vector. The full calibration applied to large-R jets in data impacts the reconstructed jet energy, mass, pseudorapidity, and p T according to where c s = c JES c abs c rel is the product of several calibration factors. The factor c JES corresponds to the simulation-based JES calibration, c rel to the relative in situ correction obtained from the η-intercalibration, and c abs to the absolute in situ correction from the balance methods. All c-factors and the factor ∆η are smooth functions of the large-R jet kinematics. The terms E 0 , m 0 , η 0 and ì p 0 refer to the jet properties prior to any calibration, as returned by the trimming algorithm.
The measured JMS correction is consistent with unity within the precision of the combined measurements. This suggests that the application of an in situ JES correction is sufficient to correct the JMS of these trimmed large-R jets in the mass and p T ranges considered here. The level of precision with which the JMS is measured depends on the kinematic region in question. For large-R jets in the high-mass bin with p T between 400 GeV and 1 TeV, the uncertainties are 2-5%. In other kinematic regions the uncertainty is larger, approaching 10% at high p T in both mass bins.
The contributions of several sources to the uncertainty in the combined jet mass scale are presented in Figures 31 and 32. In both the R trk and forward-folding techniques, the leading systematic uncertainties are associated with uncertainties in the event generators across most of the p T range and for the two mass intervals considered.    Figure 29: Breakdown of the combined uncertainty in the large-R jet p T response as a function of the jet transverse momentum p T . Contributions are shown for nuisance parameters of the multijet balance method for nuisance parameters (a) originating from the MJB selection and (b) propagated from the small-R jets which constitute the recoil system. The vertical axis reflects the uncertainty introduced by a given nuisance parameter in combination, incorporating the weight of the method from which it originates. Since the multijet balance method relies on the small-R jet p T , nuisance parameters from all associated uncertainties are propagated. The lines shown are smoothed using a sliding Gaussian kernel. [GeV] T p jet R Large- Total uncertainty Statistical component Figure 30: Data-to-simulation ratio of the average jet mass response as a function of the large-R jet p T . Corrections using a combination of two in situ methods, the R trk and forward-folding approaches, are applied. The fit is performed for large-R jet mass in the W mass range 50-120 GeV (upper), and the top mass range 120-300 GeV (lower). The error bars represent the statistical and systematic uncertainties added in quadrature. The results apply to anti-k t jets with R = 1.0 calibrated with the LC+JES+JMS scheme. The lines shown are smoothed using a sliding Gaussian kernel.  Figure 30 as a function of jet transverse momentum p T for the jet mass bin 50-120 GeV. Contributions are shown for each of the nuisance parameters of the (a) R trk and (b) forward-folding methods. The vertical axis reflects the uncertainty introduced by a given nuisance parameter in combination, incorporating the weight of the method from which it originates. This weight is dominated at high p T by the R trk method. The lines shown are smoothed using a sliding Gaussian kernel.  Figure 30 as a function of jet transverse momentum p T for the jet mass bin 120-300 GeV. Contributions are shown for each of the nuisance parameters of the (a) R trk and (b) forward-folding methods. The vertical axis reflects the uncertainty introduced by a given nuisance parameter in combination, incorporating the weight of the method from which it originates. This weight is dominated at high p T by the R trk method. The lines shown are smoothed using a sliding Gaussian kernel.

Conclusion
Several in situ calibration methods are used to measure the response of the ATLAS detector to trimmed large-R jets using 36.2 fb −1 of √ s = 13 TeV proton-proton collision data provided by the LHC and collected by the ATLAS experiment during 2015 and 2016. These methods exploit the transverse momentum balance in events where a jet recoils against a reference system with a precisely known energy scale, the independence of measurements performed with different sub-detectors, or the position and width of known mass peaks. With this ensemble of techniques, dedicated jet energy scale and jet mass scale calibrations are derived for large-R jets. The results of several techniques applied to a variety of final states are consistent within the uncertainties, indicating that after calibration, the simulations model the flavour dependence of the jet p T and mass response to within a few percent.
The results of all methods are combined taking into account correlations between uncertainties and possible discrepancies between the results of different in situ methods. The combined measurement of the ratio of the energy scales in data and simulations are used to derive an in situ correction to the response, which determines the large-R jet energy and mass scales. The residual uncertainty in the ratio of the energy scales in data and simulations is 1-2% for transverse momenta from 150 GeV to 2 TeV. The precision of the jet mass scale varies from 2% to 10% over the same p T range. The results of the simulations for jet p T and mass resolution are also validated in situ and found to agree with the measured resolution within 10-15%. The in situ JES calibration, derived from light quark and gluon jets, is found to fully correct the energy and mass scales of high p T W bosons and top quarks to within the precision of the present measurement (1-3%).
Large-R jets are a vital ingredient of the ATLAS physics programme. This new in situ calibration leads to significantly reduced uncertainties in the reconstructed large-R jet p T and mass, thus increasing the sensitivity of searches and the precision of Standard Model measurements using large-R jets.
Aristeia programmes co-financed by EU-ESF and the Greek NSRF, Greece; BSF-NSF and GIF, Israel; CERCA Programme Generalitat de Catalunya, Spain; The Royal Society and Leverhulme Trust, United Kingdom.
The crucial computing support from all WLCG partners is acknowledged gratefully, in particular from CERN, the ATLAS Tier-1 facilities at TRIUMF (Canada)