New techniques for jet calibration with the ATLAS detector

A determination of the jet energy scale is presented using proton–proton collision data with a centre-of-mass energy of s=13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=13$$\end{document} TeV, corresponding to an integrated luminosity of 140 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} collected using the ATLAS detector at the LHC. Jets are reconstructed using the ATLAS particle-flow method that combines charged-particle tracks and topo-clusters formed from energy deposits in the calorimeter cells. The anti-kt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_\textrm{t}$$\end{document} jet algorithm with radius parameter R=0.4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=0.4$$\end{document} is used to define the jet. Novel jet energy scale calibration strategies developed for the LHC Run 2 are reported that lay the foundation for the jet calibration in Run 3. Jets are calibrated with a series of simulation-based corrections, including state-of-the-art techniques in jet calibration such as machine learning methods and novel in situ calibrations to achieve better performance than the baseline calibration derived using up to 81 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} of Run 2 data. The performance of these new techniques is then examined in the in situ measurements by exploiting the transverse momentum balance between a jet and a reference object. The b-quark jet energy scale using particle flow jets is measured for the first time with around 1% precision using γ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma $$\end{document}+jet events.


Introduction
The energetic proton-proton ( ) collisions produced by the Large Hadron Collider (LHC) yield final states that are predominantly characterised by jets, which are collimated sprays of charged and neutral hadrons and their decay products. Jets constitute an essential piece of the physics programme carried out using the ATLAS detector, and a precise understanding of jet reconstruction is critical for a wide variety of processes. Measurements of both the jet energy scale (JES) and resolution (JER) of these complex objects are therefore essential for precision measurements of the Standard Model and for searches for new phenomena beyond it. Several new methods are presented for improving the jet energy scale calibration and evaluating their performance in simulation and data, paving the way to achieving a better precision on the JES for Run 3 and beyond. These techniques were developed for jets reconstructed with the antialgorithm [1,2] with radius parameter = 0.4 using particle flow inputs [3,4]. Previous calibration strategies by the ATLAS Collaboration, that used up to 81 fb −1 data [4][5][6][7][8][9], are extended and improved by by taking advantage of the full Run 2 data sample of 140 fb −1 .
The jet energy scale calibration consists of a series of calibration steps. The first stage of the calibration uses simulation to derive corrections to the jet energy scale to reduce the impact of pile-up, detector effects, and other parameters. The second stage of the calibration is a residual in situ calibration, correcting for remaining differences between data and Monte Carlo (MC) simulation, derived using well-measured reference objects, including photons and bosons.
The structure of the paper is as follows. Section 2 describes the ATLAS detector, and Section 3 describes the recorded data and the MC simulation samples, and the inputs and algorithms used to reconstruct the jets. Section 4 describes the methods used and the result of the simulation-based calibration, Section 5 describes the in situ calibration, and conclusions are given in Section 6.

The ATLAS detector
The ATLAS detector [10] at the LHC covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range of | | < 2.5. The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [11,12]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to | | = 2.0. The TRT also provides electron identification information based on the fraction of hits above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range of | | < 4.9. In the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | = 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region | | < 2.7, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range of | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [13]. The first-level trigger accepts events from the up-to 40 MHz bunch crossings at a rate below 100 kHz, which is further reduced by the high-level trigger to record events to disk at about 1 kHz. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Δ ≡ √︁ (Δ ) 2 + (Δ ) 2 .
For the in situ +jet analysis, samples of boson with jets ( +jet) are produced with MadGraph + Py-thia8 [37] using the NNPDF3.0nnlo PDF set [20] and the AZNLO set of tuned parameters [38]. Sherpa 2.2.11 is used as the alternative MC sample. The nominal +jets samples are generated with Pythia v8.230 [16] using the A14 set of tuned parameters [17] and NNPDF2.3 PDF set. The +jet samples are produced for the direct photon component and the fragmentation photon component separately. The alternative sample used in +jet events is Sherpa 2.2.2 with NNPDF3.0nnlo PDF set [20].
All samples are reconstructed using a full detector simulation and superimposed minimum-bias interactions simulated using Pythia 8 with the A3 set of tuned parameters [39] and NNPDF2.3lo PDF set to represent multiple interactions during the same or nearby bunch crossings (pile-up). The distribution of the average number of pile-up interactions in simulation is reweighted during data analysis to match that observed in the Run 2 data.

Jet reconstruction
The jets in these studies are reconstructed with the anti-algorithm with a radius parameter = 0.4 as implemented in the FastJet software package [40]. Four-momentum objects are used as inputs to the algorithm, and may be particles at the generator level of the MC , charged-particle tracks, calorimeter energy deposits, or algorithmic combinations of the latter two, as in the case of the particle-flow (PFlow) reconstruction technique. Particles at the MC generator level are referred to as truth particles. Reconstructed jets use PFlow objects (PFOs) as inputs to jet reconstruction, which combine measurements from the tracker and the calorimeter to form the input signals for jet reconstruction. Specifically, energy deposited in the calorimeter by charged particles is subtracted from the observed topo-clusters and replaced by the momenta of tracks that are matched to those topo-clusters, as described in Ref. [3], and with the updates described in Ref. [4]. These resulting PFlow jets show improved energy and angular resolution, reconstruction efficiency, and pile-up stability compared with jets reconstructed using only calorimeter information.
Charged particle tracks are used for both the PFlow reconstruction and for deriving calibrations. These tracks are reconstructed in the full acceptance of the inner detector | | < 2.5, and are required to have a T > 500 MeV unless otherwise stated, and must satisfy criteria based on the number of hits in the ID subdetectors. In addition, tracks must satisfy | 0 sin | < 2 mm, where 0 is the distance of closest approach of the track to the hard-scatter primary vertex along the -axis. Tracks used in the calibration are matched to jets using ghost association, a procedure that treats them as four-vectors of infinitesimal magnitude during the jet reconstruction and assigns them to the jet with which they are clustered [41]. described in Section 4.2, applies a subtraction based on the median T density measured in the event and the jet area (the 'pile-up density correction') [41,42], minimizing the sensitivity to the model of the pile-up used in the simulation. Next, a correction for residual dependence on the number of reconstructed primary vertices in the event ( PV ) and is applied (the 'residual pile-up correction'), based on corrections derived using simulated samples, as described in Section 4.3. The third step, the absolute JES calibration detailed in Section 4.4, corrects jets so that they agree, on average, in energy and direction with truth jets from dĳet MC events. Finally, the global calibration improves the jet T resolution and related uncertainties by reducing the dependence of the reconstructed jet response on observables constructed using information from the tracking, calorimeter, and muon chamber detector systems, as introduced in Section 4.5.

Event selection
All stages of the simulation-based jet energy scale calibration use the same event selection. The MC simulation is used to determine the energy scale and resolution of jets by comparing PFlow jets with truth jets. Truth and reconstructed jets are required to satisfy | | < 4.5 to be fully contained in the detector acceptance, and truth jets are additionally required to have T > 7 GeV. Uncalibrated jets have a positive energy, but can become negative in energy after applying the corrections described in Sections 4.2-4.3. Biases are reduced in the determination of the average jet energy response ( reco / true ) at low energies, by requiring reconstructed jets to have T > 0 after the pile-up density correction described in Section 4.2, but making no requirement on the T after the correction described in Section 4.3.
Events are required to have at least one reconstructed primary vertex with at least two matched tracks with T > 500 MeV. In simulated reconstructed events, the primary vertex is the reconstructed primary vertex with the largest sum of squared track momentum, while at the truth level, the primary vertex corresponds to that of the simulated hard-scatter process, and not the collision with the highest momentum transfer. This results in some events where the pile-up collision has a larger momentum transfer than the hard-scatter collision. For MC samples, the reconstructed and truth primary vertices are required to have -positions within 0.2 mm of each other. Events are required to have at least two reconstructed jets, and at least one truth jet. Truth jets are geometrically matched to PFlow jets using the angular distance Δ with the requirement Δ < 0.3. In addition, truth jets are required to be isolated from all other truth jets by Δ > 1.0, while reconstructed jets are required to be isolated from all reconstructed jets by Δ > 0.6. To reduce the contribution of events where the pile-up collision has a larger momentum transfer than the hard-scatter collision, the average T of the two leading reconstructed jets is required to be no larger than 1.4 × truth, leading T .

Estimating the median T density and the pile-up density correction
Pile-up interactions change the jet energy scale, and jet reconstruction is affected by additional interactions in the same or nearby bunch crossings. The first stage of the jet calibration, referred to as the 'pile-up density correction', subtracts the expected contribution from pile-up based on the area of the jet and the median T density in the event [41]. To compute the jet area , a dense, uniformly distributed in × population of infinitesimally soft ghost particles is overlaid on top of the event. Then, is defined as the transverse momentum of the sum of the four-momenta of all ghost constituents matched with a given jet after clustering, normalised by the ghost constituent transverse momentum density.
The median pile-up T density is estimated for each event by the median T density ( T / ) of all jets clustered with the algorithm [43,44] with a radius parameter of 0.4: where the index enumerates over the jets. For this calculation, only jets with | | < 2 are used, since falls off steeply beyond this region, due to a combination of physics and detector effects.
Assuming the pile-up is a uniform, diffuse background, the pile-up contribution to the energy of the jet can then be approximated by the product of the jet area times the median T density. The pile-up density-corrected jet T , area T , is then defined as The ratio of the area T to the uncorrected jet T is applied as a scale factor to the jet four-momentum and does hence not affect its direction.
Previously, the inputs to the calculation were the same as those used to build jets: the neutral PFlow objects and the charged PFlow objects that satisfy | 0 sin | < 2 mm [4]. However, this results in a bias from the inclusion of hard-scatter tracks, shifting the median T density to higher values, particularly when the hard-scatter process has a large jet multiplicity. To prevent such biases, the 'pile-up sideband' (PUSB) definition is studied, which uses neutral PFlow objects, and charged PFlow objects that satisfy 2 mm < | 0 sin | < 4 mm as inputs. The total amount of pile-up using the sideband cuts is expected to be similar to the nominal criteria, since a similar amount of pile-up will meet these criteria. This ensures a minimal loss of the event-by-event correlation of the charged pile-up component that is not removed by charged hadron subtraction cuts.
The JES is measured in specific event selections as described in Section 5, but these calibrations are applied to many final states. An uncertainty is required to cover potential inadequacy of the model used in the simulation of the difference between the bias in in different event topologies, specifically the difference between the bias in between dĳet and (→ )+jets events. The (→ )+jets events are distinctive from dĳet events in several ways, including the quark-gluon composition, colour flow, and momentum transfer of the process, making it a good topology to use to estimate the magnitude of a potential bias. This bias at a given is estimated by comparing the difference between the value of ⟨ ⟩ for the data sample in two different event topologies to measure potential inadequacy of the model used in the MC simulation of the data: Z+jet data , and the bias that is propagated to the jet energy scale uncertainties is the bias determined at the average value of for the data sample: The (→ )+jets selection uses the lowest unprescaled single muon trigger, and requires two muons with 1 T > 30 GeV, 2 T > 25 GeV, 80 < < 100 GeV, and T > 25 GeV, and the dĳet selection uses the lowest unprescaled single jet trigger, and requires a leading jet with T > 500 GeV, | | < 2.4, and greater than 5% of the momentum carried by charged particle flow objects. Figure 1 shows the dependence of ⟨ ⟩ in the two processes as a function of for data and simulation, comparing both the definitions described above. The lower panels compare the values of ⟨ ⟩ in the  Figure 1: The distribution of as a function of for the (top) (→ )+jets and (middle) dĳet selections for data, Pythia 8, Sherpa 2.2.5, and Sherpa 2.1.1. The lower panel shows the difference between the two topologies which is used to determine the uncertainty from the extrapolation across topologies, indicated by the vertical arrows. The left plot shows built from the jet constituents: neutral PFOs and charged PFOs with | 0 sin | < 2mm, and the right plot shows built using neutral PFOs and charged PFOs satisfying the new sideband selection. two processes. Two different Sherpa dĳet samples are shown: a 2.1.1 sample [45] that was used in the previous calibration [4], and the 2.2.5 sample used now, while for (→ )+jets, only Sherpa 2.2.1 is used. The Sherpa 2.2.X samples include an improvement to the multi-parton interaction (MPI) model, which directly affects the bias in . Significantly larger differences are seen between the dĳet Sherpa 2.1.1 sample and the (→ )+jets Sherpa 2.2.1 sample than between the dĳet Sherpa 2.2.5 sample and the (→ )+jets Sherpa 2.2.1 sample, Previously, the bias was determined using the dĳet Sherpa 2.1.1 sample and the (→ )+jets Sherpa 2.2.1 sample, which have different MPI models. Using the updated dĳet Sherpa sample that uses a consistent MPI model with (→ )+jets results in a factor of four reduction in the bias, showing the importance of MPI modelling in MC simulations. The new definition, PUSB , results in significantly smaller differences between the different topologies, and a better description of the data by the simulation. Similarly, the improvements to the definition result in almost a factor of three improvement to the uncertainty, as seen by the difference between data and Sherpa for the two different definitions. Together, these improvements reduce the JES uncertainty from the modelling by a factor of nearly seven.

Residual pile-up correction
To further reduce the impact of pile-up, a residual pile-up correction is applied, based on PV , , the reconstructed jet T ( reco T ), and the reconstructed jet ( reco ). Due to the fast response of the silicon tracking detectors used to reconstruct the tracks used to find the primary vertices, PV is sensitive to the in-time pile-up, while is sensitive to the out-of-time pile-up, since it accounts for the average amount of pile-up around a given bunch crossing. Typically, in-time pile-up increases the energy of the jet, and out-of-time pile-up decreases it. The negative dependence of the jet energy scale on for out-of-time pile-up is a result of the liquid-argon calorimeter's pulse shape, which is negative during the period soon after registering a signal [46]. Two options for the residual pile-up correction are compared.

The 1D residual pile-up correction
The first strategy, referred to as the '1D residual pile-up calibration', follows the method outlined in Ref. [4], where additional corrections are applied based on the and PV of the event, with

The 3D residual pile-up correction
The 1D residual pile-up correction does not account for correlations between and PV , and does not account for changes in the pile-up contribution as a function of jet T . The 3D residual pile-up correction is designed to include these correlations. In this calibration, derived in bins of reco , the jet T scale is shifted to match the truth jet scale as a function of ( PV , , area T ), simultaneously correcting for pile-up and detector effects. The truth T is used as a reference to compute a correction given by Δ area-truth T . For extreme values of and PV , where there are insufficient events to determine an accurate correction, the correction is determined using the closest non-empty ( , PV ) bin (with the same T ), and the result is smoothed. This average difference, Δ area-truth T , is fit as a function of area T using a linear plus logarithmic function, in bins of PV , , and reco T , determined using jets with 10 GeV < truth T < 200 GeV. The corrected value is given by By construction, this residual pile-up calibration corrects the jet energy scale to the truth jet scale, combining corrections due to pile-up with corrections due to detector effects. This is contrasted with the 1D residual pile-up correction, which is designed to exclusively remove the impact of pile-up on the jet T scale. Several options to only correct for the pile-up T were studied, but these were found to either increase the pile-up dependence or result in problematic effects such as a large fraction of jets with negative T .

Comparison of the different residual pile-up corrections
A comparison of the different options for the residual pile-up corrections is shown in Figure 2. As seen in this figure, the residual pile-up calibration is especially useful for improving the pile-up dependence for jets with | reco | > 2.5. Overall, for the 1D residual pile-up correction, the absolute pile-up dependence increases for higher T jets, but the relative impact on the T response is smaller. While the 1D residual pile-up correction performs best for the T range which it is optimised for (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), it has a sizeable pile-up dependence at other jet T . In addition, since the 1D residual pile-up correction is optimised for this same bin, its performance appears enhanced by construction, while a more differential binning would show a worse performance. The 3D residual pile-up correction significantly reduces the pile-up dependence of the calibration, particularly at high T . Based on these results, the 3D residual pile-up calibration is used for the remainder of the reported studies.

The jet energy scale and calibration
The absolute jet energy scale (MCJES) and corrections provide calibration functions for the energy and as a function of det and reco such that jets agree on average with the truth jet energy and . Since the calorimeters measure the energy of particles, and not the transverse momenta, this correction is determined as a function of the jet energy. The jet energy response R, defined as the mean of a fit with a Gaussian function to the core of the reco / true distribution, is measured in true and det bins, where det is the jet pointing from the geometric centre of the detector, which is used to remove any ambiguity about which region of the detector is measuring the jet. The difference in R from the expected value of one is referred to as the non-closure, and regions where the R has the expected value within the uncertainties are said to demonstrate closure.
The jet energy response after the application of the residual calibration is shown as a function of true and det in Figure 3. This differs from previous JES calibrations by the ATLAS experiment, in that the jet energy response is already close to unity, meaning that the correction is relatively small. This is a feature of the 3D residual pile-up correction, which shifts the energy scale of the jets close to the truth scale, although there is some significant difference from one at high , where the residual calibration insufficiently captures the behaviour of the energy response. Since the Δ T term in the 3D residual pile-up correction is determined using jets with truth T < 200 GeV, the jet energy response shifts away from one at energies corresponding to T > 200 GeV.  Figure 3: The jet energy response before the MCJES calibration (a) at fixed energies as a function of det , and (b) at fixed det as a function of truth jet energy. (a) The square shows the response for true = 30 GeV, the plus-sign shows the response for true = 50 GeV, the down-triangle shows the response for true = 110 GeV, the up-triangle shows the response for true = 500 GeV, and the circle shows the response for true = 1200 GeV. (b) The square shows the response for 0.0 < det < 0.1, the plus-sign shows the response for 1.0 < det < 1.1, the down-triangle shows the response for 1.4 < det < 1.5, the up-triangle shows the response for 2.8 < det < 2.9, and the circle shows the response for 4.0 < det < 4.1.
Directly predicting the jet energy response from reco depends on the distribution of true used to derive the calibration. Overall, the distribution of the response is approximately Gaussian for a given true , but not for a given reco [47]. Therefore, the calibration uses a numerical inversion technique [5], where, for each bin, the jet energy response is fit as a function of true , and the jet calibration factor as a function of reco is determined using the inverse of this function. The two methods of determining the fit function, polynomial fits of order , and penalised splines are compared below.

Polynomial fits
Following the procedure outlined in Ref. [4], polynomial fits are defined as a function of log( ), where max = 8 is the maximal order of the fitted polynomials. Out of the given max fit functions, the best fit function is identified using Pearson's 2 test [48]. The calibration factors are usually frozen at an -dependent energy between 3 and 4 TeV to reduce statistical fluctuations, while at for T < 8 GeV, a linear extrapolation of the calibration factor is performed.

Penalised splines
In addition to the polynomial fit functions, a new method using penalised splines is studied. A spline ( ) of degree is a piecewise polynomial function of degree , where pieces of the spline meet at points called knots, and the first − 1 derivatives are continuous across the knots. Splines may be defined from b-spline basis functions ( , ) [49] via where is the number of data fit points, are control points weighting the individual basis functions , and are the knots. Since ( ) = 0 for outside of the range defined by the knots, an extrapolation to lower (higher) energy values is added using a linear extrapolation based on the first (last) five points for the low (high) end of the spline.
A spline will overfit the data, since the basis function is required to exactly pass through the knots, where in this case, the knots correspond to the energies where the response is determined. For a set of points and their corresponding values , this can be mitigated by using penalised b-splines (p-splines), which include an additional smoothness penalisation term , minimizing with and corresponding to the range over which the penalisation term is included, with < < , and the penalisation parameter ≥ 0 is chosen and fixed. For these studies, the values correspond to true , and the values correspond to the jet energy response. As increases from zero to ∞, the result moves from a spline to a linear regression, and this parameter enables a compromise between the curvature penalisation and a close fit to the data. The penalisation parameter is defined dynamically for each bin as where runs over the data fit points, is a regulative parameter, and are the point weights defined as = −1/2 , where is the response fit uncertainty from the iterative fit to a Gaussian function.
For these studies, the splines are implemented using the Splinter framework [50], and a spline of degree three is used, with empirically set to 0.1. To check for overfitting, the calibration and the closure test are performed on statistically independent events.

Comparison of calibrations
A comparison of the MCJES closure for the fitting techniques at different energy values is presented in Figure 4. Both the strategies can provide closure of 1% for high energies, while at low energies, the p-spline approach provides better closure than the polynomial fit. Overall, the p-spline correction provides closure within 1% across the T and range considered, except for a small number of bins where the calibration becomes difficult due to quickly changing response, and non-Gaussian terms in the energy response. The rest of the studies use the correction determined from the p-spline fit, since it provides the best overall closure.

Absolute MC jet calibration
In addition to the jet energy, the jet pseudorapidity is calibrated with a similar approach as the JES calibration to correct for biases in the reconstruction, following the strategy in Ref. [4]. This bias is most pronounced in the transition region between different parts of the calorimeter, where the discrepant response of the different detectors artificially shifts the reconstructed energy on one side of the jet, changing the reconstructed . These corrections are particularly needed in the barrel-endcap (| | ∼ 1.4) and forward-endcap (| | ∼ 3.1) transitions. The bias in is defined as bias = ⟨ reco − true ⟩, as determined by an iterative fit to a Gaussian function, and the correction is performed on a jet-by-jet basis via calib = reco −R . This correction is only applied to the T and of the jet, and is parameterised as a function of true and det . For this correction, only polynomial fits are studied, using up to an order three polynomial. There are small correlations between the corrections in and , so this correction is derived simultaneously with the JES.

The global property calibration
The absolute MCJES calibration corrects the jet energy response based on the and det of the jet. However, there are many other factors that contribute to the jet response, including the distribution of energy in the jet, the distribution of energy deposits across different calorimeter layers, and the types of hadrons produced in the jet. Many of these characteristics depend on whether the jet is quark-or gluon-initiated. This can be seen in Figure 5, which shows an example of the jet response distribution for jets with different initiating partons, and the jet T response as a function of true T , where the parton label is defined by the highest energy parton ghost-associated with the truth jet. Not only are there differences between different jet flavours (i.e. the flavour of the intiating parton), but the behaviours change with the T of the jet. Quark-initiated jets tend to have fewer hadrons, each with a higher fraction of the jet T , which typically results in contributions further into the calorimeter. In contrast, gluon-initiated jets typically have more, lower-T hadrons, leading to a lower calorimeter response and a wider transverse profile. These behaviours are further complicated by the use of particle flow reconstruction, which adds further dependence based on the charged particles in the jet.
The jet T response is also impacted by the MC model, as seen by the differences between the jet T response shown in Figure 6. Overall, most MC predictions have similar behaviour for quark-initiated jets, while the differences between gluon-initiated jets can be sizeable. This is due to differences between predictions between MC generators for the amount of soft radiation and its topological distribution in the jet. There is some separation in the behaviour of models with the Lund string model for hadronisation compared with the other models, where the Lund string model tends to predict a higher gluon T response, with larger differences for jets with T < 100 GeV. This can primarily be attributed to the fraction of jet energy carried by baryons and kaons [51].
The global jet property calibration applies further corrections to jets based on their individual characteristics. While these corrections only have a small effect on the overall closure of the calibration, the closure is significantly improved for different classes of jets, improving the JER. In addition, this calibration reduces differences between MC predictions for the JES, resulting in smaller modelling uncertainties. Two methods for deriving the global calibration are outlined below: the global sequential calibration (GSC), which was described in previous work [4], and a new method, the global neural network calibration (GNNC). Both the corrections are derived in | det | bins corresponding to different detector regions, creating a balance between the statistical uncertainty and the generality of the results.

The global sequential calibration
The GSC is a series of multiplicative corrections to account for the differences between the calorimeter response to different types of jets, which improves the jet resolution without changing the jet energy response. The GSC is based on global jet observables such as the longitudinal profile of the energy deposits in the calorimeters, tracking information matched to the jet, and information related to the activity in the muon chambers behind a jet. Six observables that improve the JER and reduce modelling uncertainties are used as inputs to the GSC. Each GSC correction to the jet four-momentum is derived and applied independently and sequentially, using the following procedure. First, for a given GSC observable, the jet   bin of ( true T , | det |). The resulting responses for a given | det | bin are then smoothed simultaneously in T and the GSC observable using a Gaussian kernel. Because the GSC is applied sequentially, it is possible to validate each GSC correction in a systematic way, testing the impact of any mismodelling of the input variables using data. Such studies were performed to validate the sequential correction procedure.
The six stages of the GSC, in the order of application, are • charged : the fraction of the jet T carried by charged particles, as measured using ghost-associated tracks with T > 500 MeV, | det | < 2.5, • Tile0 : the fraction of jet energy ( frac ) measured in the first layer of the hadronic tile calorimeter, | det | < 1.8, • LAr3 : the frac measured in the third layer of the electromagnetic LAr calorimeter, | det | < 3.5, • track : the number of tracks with T > 1 GeV ghost-associated with the jet, | det | < 2.5, • track : also known as track width, the average T -weighted transverse distance in the -plane, between the jet axis and all tracks of T > 1 GeV ghost-associated with the jet, | det | < 2.5, • segments : the number of muon track segments ghost-associated with the jet, | det | < 2.8.
The segments correction, also known as the punch-through correction, reduces the tails of the response distribution caused by high-T jets that are not fully contained in the calorimeter. Unlike the other corrections, the segments correction is applied as a function of the jet energy instead of the jet T , since this effect is more strongly correlated with energy escaping the calorimeters.
The jet T response for PFlow jets in MC simulation after each of the GSC corrections is shown in Figure 7 for one | | bin. While the jet energy scale is within 1% at low energies, a small amount of non-closure is introduced when determining the response using T instead of energy. The fractional jet resolution, denoted by , is used to determine the magnitude of the fluctuations in the jet energy reconstruction, where is the width of the fit to a Gaussian function for the jet T response distribution divided by the mean of the fit. This is shown for PFlow jets with 0.2 < | det | < 0.7 in MC simulation in Figure 7. As more corrections are applied, the fractional jet resolution improves and the jet response dependence on the jet flavour is reduced as the calibration is improved for jets with varying features. The impact of charged and Tile0 are most apparent in Figure 7, but the relative impact of the different corrections varies as a function of | det |. In addition, these corrections reduces effects that are less evident in the inclusive case. For instance, the punch-through correction scales with energy, and so it primarily impacts analyses that are sensitive to high-energy jets, but its impact is not obvious in the inclusive distribution.

The global neural network calibration
The GSC is limited to using relatively uncorrelated variables for the correction, since otherwise, each sequential step would potentially interfere with previous corrections due to correlations between observables. This constraint is fundamental to the method, limiting the set of corrections that may be applied. However when adding additional observables, and to account for their correlations, a simultaneous calibration is more appropriate [52]. As an alternative to the sequential calibration, a deep neural network (DNN) is trained to determine a simultaneous correction based on a wide variety of jet properties, enabling the use of correlated variables for determining the global jet property correction. Since analyses make selections based on the jet T , the DNN is designed to correct the jet T response, in contrast to the GSC, which leaves the energy response unchanged.
To improve the performance based on the detector geometry, a DNN is trained for each | det | region used to derive the GSC to provide a correction to the jet T based on various jet-and event-level features.
where target is the jet T response, pred is the corresponding NN prediction, and and are tunable parameters. As → 0, the LGK loss learns the mode, and the second term ensures that the gradient of the error function relative to the current weight does not vanish for large target − pred . Learning the mode is less biased by cases where the response is not a perfect Gaussian distribution, resulting in better closure than a loss function that learns the mean of the distribution.
The architecture of the network was chosen as the result of a hyperparameter optimisation based on the closure of the result, where hyperparameters are parameters involving the network structure. The training is done with a batch size of 10 4 jets, and a learning rate of 10 −4 . For the LGK loss, the parameters are chosen to be = 10 −1 , and = 10 −6 based on the hyperparameter scan. The training is done to minimise the LGK loss function, and training continues until there are no improvements to the loss for five epochs. Increasing the patience did not have a noticeable effect on the quality of the results. Unweighted events are used because this avoids issues in the training due to large differences between the event weights. Since the target is the T response, not the jet T itself, the uniform weights do not have a large impact on the final result. Only the two leading jets in the event are used in the training, since the events were simulated using a dĳet process, and so this avoids potential biases from using jets that originate purely from the parton shower. For each | det | bin, several networks were trained, and the one with the best closure was chosen for the final result.
Several sets of variables were considered as inputs to the NN, and the final list of variables used in the training is given in Table 1. This list includes all of the variables used in the GSC calibration, with the addition of more information about the jet kinematics, more granular information about the energy deposits in different calorimeter layers, and measures of pile-up. While the residual pile-up correction removes most of the pile-up dependence, some dependence is reintroduced by the absolute MCJES calibration, and so PV and are included in the training. Some calorimeter layers are not present for certain | det | regions, in which case their frac is set to zero. Explicitly removing these observables from the list of input variables used in the NN training had a negligible impact on the results, and so the set of training variables is kept the same for all | det | regions.  The jet T closure from this calibration is typically better than 1%, but it also has some fluctuations, which can sometimes slightly exceed this. The magnitude of these fluctuations varies with each DNN training but were persistent across different DNN hyperparameters, loss functions, and training targets. To mitigate this, an additional T calibration is derived after the GNNC, using the p-spline method outlined in Section 4.4, but using the truth jet T as the target instead of the energy. This is derived in | det | bins with width of 0.1, which provides better performance than using the same | det | bins as the GNNC correction. This has a neglible effect on the jet T resolution, and only serves to improve the closure and smoothness of the calibration. Figure 8 shows a comparison of the jet T response after the MCJES, GSC and GNNC for one representative | det | bins. As designed, the GSC does not change the energy response of the jets. Since the JES calibration moves the reconstructed energy scale to match that of the truth scale, this can result in some nonclosure in the jet T , which is particularly evident at low T . The GNNC is designed to change the T scale of the jets to match the truth jets, and so the closure in T is better than that of the GSC closure. It is worth noting that while the GSC can instead be applied in a way that corrects the jet T scale, this does not impact the resolution. Other | det | bins show similar qualitative features, though the exact nonclosure seen in the T response after the MCJES and GSC varies slightly. Figure 9 show a comparison of the jet T resolution after the MCJES, GSC and GNNC for several representative | det | bins. In a few cases, the jet T resolution becomes worse in the lowest T bins, but this is also where the T nonclosure is most significant, making it difficult to have an accurate estimate of the resolution, particularly since the T scale of the GNNC is different than that of the MCJES and GSC. Since the T scale of the MCJES and GSC is above one and has a negative slope, the measured resolution is slightly underestimated [47] in these bins, while the GNNC resolution is correctly estimated, since the response closes. In the 0.2 < | | < 0.7 bin, the GNNC has an average improvement in the jet T resolution of over 15%, and maximum improvements of over 25%, when compared with the GSC. Other | det | bins show similar average improvements of around 15-25%, with maximum improvements often over 30%, and the improvement generally becomes more pronounced at higher | det |, where the resolution improvements are significant, mostly due to the improvements from the additional detector information. Studies comparing the GNNC performance with only the GSC observables as inputs find a similar performance to the GSC, indicating that the improvement in the resolution of GNNC compared with GSC is due to the inclusion of additional observables. This is made possible by a simultaneous correction that accounts for correlations between observables. The GNNC provides a larger improvement to the jet energy resolution than the GSC, and so it is used for the remainder of the paper.

Flavour uncertainties
The two flavour-dependence uncertainties in the JES are derived from simulation and account for relative flavour fractions and differing responses to quark-and gluon-initiated jets. The flavour response uncertainty accounts for the fact that, unlike the quark-initiated jet response R , the gluon-initiated jet response R is found to differ significantly between generators. This uncertainty is defined as where is the fraction of gluon-initiated jets, and R ,Pythia8 and R ,Herwig are the gluon-initiated jet response R in Pythia 8 and Herwig respectively. The flavour composition uncertainty accounts for the fact that the jet response is different for quark-and gluon-initiated jets. This is determined based on the  GeV, which appears as a dip in the flavour composition uncertainty. Both the GSC and GNNC can reduce these uncertainties, with the GNNC providing a greater reduction. For each | det | bin, when compared with the GSC, the GNNC results in an average improvement of around 15-25% in the 40 ≤ T < 300 GeV range for the flavour response uncertainty, and up to 25% improvements for the flavour composition uncertainty.

In situ analysis
The final calibration step accounts for differences in the jet response between simulation and data. Such differences arise due to the imperfect simulation of detector response and detector material, and the modelling of physics processes involved: hard scatter, underlying events, pile-up, jet formation and particle interactions with detector material. For the remainder of these studies, a single jet calibration is studied, using the sideband definition in Section 4.2 and the 3D residual calibration in Section 4.3, the absolute MC calibration implemented with p-splines in Section 4.4, and the GNNC for the global calibration in Section 4.5. To fully understand the impact of these changes relative to the calibration procedure in Ref. [4], on the calibration and corresponding uncertainties, the in situ calibration is studied. The in situ calibration provides important validation of the new MC calibration of jets by comparing the data-to-MC difference between the T balance of a jet against a well-calibrated object or system. In addition, novel studies are done to disentangle the physics effects and detector effects in the -intercalibration to reduce the systematic uncertainties. Furthermore, the −jet JES is evaluated in situ using PFlow jets, which is performed using + jet events for the first time.
The in situ calibration response R in situ is defined as the average ratio of the jet T to the transverse momentum of the reference object ref T , derived as a function of ref T . The R in situ response is susceptible to effects such as the radiation of additional partons or the loss of energy outside the reconstructed jet cone. Dedicated event selections are applied to mitigate these effects. A double ratio, insensitive to these secondary effects provided they are well-modelled in simulations, is defined The calibration factor to the jet four-momentum can be obtained by a numerical inversion of this double ratio as a function of jet T , and as a function of det in -intercalibration.
Two stages of in situ analyses are done sequentially to assess the performance of MC calibrations. First, a relative in situ calibration referred to as the -intercalibration is done, which corrects the energy scale of forward jets (0.8 < | det | < 4.5) to match that of the central jets (| det | < 0.8) using the T balance in a dĳet system. Second, an absolute calibration is done by measuring the T balance of a central jet against a well-calibrated boson or a photon. The missing-T projection fraction (MPF) method [57] is used in / +jet events to calculate the T balance between the full hadronic recoil and a boson or a photon. The method is less susceptible to effects of pile-up and the threshold of the jet reconstruction than the direct balance method, allowing a reliable measurement of the low-T jet response below 100 GeV. The direct balance (DB) method measures the balance between a ( −)jet recoiled against a photon in +jet events. By using the DB instead of MPF, the response of the -jet itself is studied without including the effects of the hadronic recoil.
For each in situ analysis, main sources of systematic uncertainties arise from the MC model of physics processes, the measurement of the reference object and the T balance due to the selected event topology. Uncertainties related to MC model of physics effects are addressed by taking the difference between the predictions between two distinct MC event generators. The difference between jet response in simulations depends on hadronisation models that cause different jet contents [51]. Uncertainties in the reference object are estimated by propagating its own ±1 calibration uncertainties through the analysis. Uncertainties due to the selected event topology are evaluated by varying the event selection criteria and comparing the impact on the response ratios between data and MC simulation. To reduce the statistical fluctuations when applying the systematic variations, a rebinning procedure similar to that used in previous publications [4] is employed to obtain statistically significant results using pseudo-experiments. This rebinning procedure is only performed in regions where no sharp variations in T response are observed to ensure no real physics effects are removed.
Events must satisfy the common selection requirements in the in situ analysis. Each event is required to have at least one reconstructed primary vertex with at least two matched tracks of T > 500 MeV. Jets arising from cosmic rays, non-collision background and calorimeter noise are vetoed by applying data-quality requirements [58]. In addition, jets with 20 < T < 60 GeV and | det | < 2.4 are required to satisfy the criteria of jet vertex tagging (JVT) [59,60]. The JVT criteria rejects jets from pile-up interactions by matching jets with the primary vertex; it has a selection efficiency of 97% for hard scatter jets at the nominal operating point.

-intercalibration
The jet response in the forward region (0.8 < | det | < 4.5) is typically less understood due to the more complicated detector structure. The -intercalibration provides a correction for forward jets (0.8 < | det | < 4.5) to bring them to the same energy scale as central jets (| det | < 0.8). This calibration uses events with a dĳet topology, requiring two back-to-back jets in the transverse plane in different det regions. In order to increase the statistical precision, there is no requirement on whether or not one of the two jets is in the central reference region: instead, all regions will be calibrated relative to one another by solving a set of linear equations. This is referred to as the matrix method [4]. The momentum asymmetry is defined to measure the jet T balance between the two jets in two distinct detector regions (symbolically labelled left and right for simplicity) where R is measured in terms of det for left and right jets and avg T . The intercalibration factor is defined as = right left and hence the relative response R satisfies R = 1/ . Dĳet events are selected using a combination of forward and central single-jet triggers, where each trigger is considered in the range of avg T that has an efficiency of at least 99%. Prescaled jet triggers are used to accommodate bandwidth limits, and each selected event is weighted accordingly. The trigger combination method [4,61] is used to maximise the statistical precision. Each event must have at least two leading jets with avg T > 25 GeV and | det | < 4.5. Events containing a third jet with jet3 T / avg T > 0.25 are excluded. The two leading jets must be back-to-back in the transverse plane satisfying a requirement on their azimuthal angle difference Δ 1,2 > 2.5.
The nominal calibration is estimated by taking the ratio of the simulated response in Powheg+Pythia8 to the measured response in data. The binning in det and avg T is chosen to ensure enough of a sample size in scarce reference regions and to capture granular variations in detector response. A two-dimensional Gaussian kernel is optimised to smooth statistical fluctuations while also capturing notable detector features.
The 2017 data sample is representative of the high pile-up conditions and thus discussed here. The relative response, parameterised by det in two avg T regions and by avg T in two det regions between the 2017 data sample and MC simulations from Powheg+Pythia8 and Powheg+Herwig7 with an angular ordered shower, is shown in Figure 11. The predicted response in the two MC simulations is found to capture the overall shape of the det dependence. However, the response predicted from simulations is consistently lower than that measured in data for the forward detector regions across all avg T bins. Uncertainties can arise due to the inaccurate description of physics, detector response and the dĳet topology on the momentum balance. They are evaluated in terms of avg T and det . Uncertainties arising from the MC mismodelling are estimated by taking the difference between the smoothed residual correction between Powheg+Pythia8 and Powheg+Herwig7 with angular ordered shower. Other uncertainties due to the mismodelling in physics and event topology are estimated by modifying the requirement on the third jet veto, the Δ 1,2 separation, and JVT.
Further studies are performed at particle and reconstruction level separately to disentangle physics and detector effects. The particle level can be used to study physics effects affecting the dĳet balance due to additional parton radiations or out-of-cone corrections. It is performed using the same procedure as the reconstruction level except that no JVT requirements are applied. The matrix method is used as the nominal method while the central reference method [4] is used as a cross-check. These physics effects induce a smooth and non trivial structure of the relative response 1/ with a slight asymmetry between positive and negative det in the forward region as shown in Figure 12(a). A similar structure with sharper variations due to convolution with detector effects is also present at the reconstruction level shown in Figure 11(a). The systematic uncertainty Δ on the intercalibration factor at particle level is Δ = syst nominal − 1 , where syst is the intercalibration coefficient obtained with a different selection of the events, either a different selection on jet 3 T / avg T or Δ 1,2 . By comparing Figures 12(a) and 12(b), the magnitude of these physics effects at particle level is similar to the magnitude of the systematic uncertainties designed to cover them,  which are evaluated by varying the selection criteria for jet 3 T / avg T and Δ 1,2 at reconstruction level in data and MC simultaneously. Therefore, these uncertainties are not underestimated.
Variations in parton showering and hadronisation models can affect dĳet balance that convolves both the physics and detector effects. The MC modelling uncertainty derived at particle level as a function of det only considers the physics effects on the dĳet balance and excludes impacts on the detector response which were evaluated in the jet flavour response uncertainty using various MC simulations discussed in Section 4.5. Such a procedure will significantly reduce the MC modelling uncertainty shown in Figure 12(c) and avoid possible double counting of uncertainties. Figure 13 shows the fractional uncertainties derived as a function of det for two representative T values. The systematic uncertainty for | det | < 0.8 is set to zero as they are determined from the absolute in situ JES measurements such as / +jet analysis. The fractional uncertainties increase with det for | det | > 0.8 and illustrate a significant decrease with increasing T . Dominant uncertainties arise from the choice of event generators and variations in the selection criteria on jet 3 T / avg T . The total systematic uncertainty is significantly reduced by using MC modelling uncertainty estimated at particle level instead of reconstruction level. It is worth noting that systematic variations in the selection criterion such as jet 3 T / avg T are performed simultaneously in data and simulation at reconstruction level while Figure 12(b) shows only the relative impact at particle level. If there is a difference between up and down variations, then the systematic uncertainty is taken to be the larger absolute value. Systematic uncertainties are symmetrised around det = 0 between the positive and negative det values using the most conservative approach, as whether the asymmetry of the systematic uncertainty in det arises from statistical fluctuations or detector effects is unknown.

/ +jet balance
The next step in the jet calibration brings the absolute jet energy scale in data to the same scale in simulation by exploiting the T balance between the hadronic recoil and a well-calibrated object such as a boson or a photon. The jet used in the in situ analysis is required to be from the central detector region (| | < 0.8), in which the derived correction can be applied to jets in the forward region via the intercalibration. The / +jet balance measurement is built upon the precise determination of the energy of the photon or / pair from a boson decay. These measurements benefit from the accurate knowledge of the energy scale and resolution of the leptons. The calibration of electrons and photons is accurately known through measurements using → data and other final states [62] while the muon calibration is determined to high precision through studies of /Ψ → , → and Υ → [63].
Three independent measurements consisting of +jet for the → and → decay channels, and +jet are used to do the absolute in situ calibration. The +jet measurement provides enough of a sample size at low and medium jet T covering 17 < T < 800 GeV with limited precision above 800 GeV. The +jet analysis provides a complimentary measurement at medium and high T covering 30 < T < 2000 GeV, with limited precision below 100 GeV due to prescaled low-T triggers, jets misidentified as photons, and MC event generator choices.
The MPF method measures the T balance between the reference objects and the full hadronic recoil in / +jet events. This technique allows the calorimeter response to the hadronic showers to be computed directly. It has low susceptibility to pile-up and underlying event which is uniform across the detector and thus cancelled out in the MPF method. According to conservation of transverse momentum, the transverse momentum of all of the hadronic activity in a / +jet event, recoil T , should be equal and opposite to the transverse momentum of the reference boson, ref T , at particle level, such that At the detector level, the well-calibrated objects have a response of one while the calorimeter response to the hadronic recoil MPF is lower than unity, resulting in possible missing transverse momentum ì miss T in the event. Therefore, equation 1 can be written as: Projecting the vector terms along the direction of the reference boson using a unit vectorˆr ef in the transverse plane, then MPF is only dependent on the missing transverse momentum and the transverse momentum of the reference boson. The average of MPF , R MPF , is measured as a function of ref T , the T of the reference / boson: where ì miss T is computed using particle-flow objects calibrated at the EM scale.
+jet events are selected using either the lowest-T unprescaled dielectron or dimuon trigger. The lowest-T threshold in the dielectron trigger corresponds to 15 GeV for each electron while the lowest-T threshold in the dimuon trigger corresponds to 14 GeV [64,65] for each muon. Both the leptons are required to have T > 20 GeV to have fully efficient triggers. Electrons or muons must satisfy loose identification and isolation criteria [62,63]. Electrons are required to fall within | | = 2.47 and are rejected if they fall in the calorimeter crack region 1.37 < | | < 1.52. Muons must fall within | | = 2.4. The oppositely charged electron and muon pair is required to have an invariant mass around the boson mass, 66 < / < 116 GeV. +jet events are selected using a combination of prescaled and unprescaled single photon triggers in which the lowest prescaled trigger T threshold is 10 GeV. Photon candidates entering the analysis are required to have T > 25 GeV and | | < 1.37 and to satisfy the tight identification and isolation selection criteria [62]. The jet is removed if it falls within Δ = 0.4 (0.35) of a photon (lepton).
Further selection criteria are imposed in the / +jet measurements to reduce the impact from pile-up and additional parton radiations. To suppress contamination from pile-up, jets are required to satisfy the cleaning criteria and to satisfy the JVT requirement. Events must contain a jet with T greater than 10 GeV that falls within | | = 0.8. To suppress effects from additional parton radiations, further requirements are imposed on the azimuthal angle between the reference boson and the leading jet Δ ref, jet > 2.9 and T of the subleading jet T < max(0.3 × ref T , 12) GeV, where the subleading jet falls within | | = 4.5. The MPF response as a function of reference boson T is shown in Figure 14 and Figure 15 using +jet and +jet events for data and two distinct MC samples. The MC sample used to derive the nominal calibration for +jet ( +jet) corresponds to MadGraph+Pythia8 (Pythia8). The alternative MC sample corresponds to Sherpa to determine the uncertainty from the MC event modelling. The dip in the MPF response at low ref T arises due to two opposing effects: the jet reconstruction threshold which tends to increase the response at the lowest jet T values between 17 GeV and 20 GeV and the apparent rise in MPF response as a function of T . The MC-to-data response ratio are rather consistent between +jet and +jet.  Several sources of systematic uncertainties are considered. Uncertainties due to the energy scale and resolution of the reference objects / / are derived from existing calibrations for each object and propagated through the corresponding analysis. The impact of additional parton radiation on the response measurement is evaluated by varying the selection criteria for the subleading jet veto and Δ ref, jet . Uncertainties arising from pile-up suppression are estimated by comparing the response measurement between tighter and looser JVT working points. Uncertainties arising from photon purity in +jet events are assessed using the same methodology documented in [66], in which one of the jets is misreconstructed as a photon. The pseudo-experiments are implemented in the estimate of uncertainties to reduce statistical fluctuations.
The uncertainties for the calibration are presented for the → and → measurements in Figure 16 and for the +jet measurement in Figure 17  The derived calibrations are stable over a range of pile-up conditions in Run 2. Figure 18 shows the MC-to-data response ratios as a function of and PV in +jet event for 45 < ref T < 65 GeV. The in situ calibration is consistent as a function of or PV , demonstrating the expected stability.

-quark jet energy scale in +jet balance
The measurement of the top-quark mass is limited by the -quark jet energy scale ( JES) and a measurement of the JES can potentially improve the precision. The direct balance (DB) technique is used in +jet events to measure the balance of a ( -tagged) jet against a well-calibrated photon. It represents the first measurement determining the −tagged jet energy scale using the PFlow jets in this event topology. ref T is defined in terms of the reference object T , ref T = T × cos Δ , where T is the transverse momentum of the photon, and Δ is the azimuthal angle difference between the photon and the leading jet.
The selections are similar to the +jet selection in the MPF method unless stated otherwise. Events must have a jet with T > 20 GeV instead of T > 10 GeV in the central detector region (| det | < 0.8).
The higher jet T threshold arises due to the tighter requirement on the jet transverse momentum in the −tagging algorithm used. To suppress additional radiation, the DB technique requires the subleading jet The −tagged category is predominantly composed of the − and − quark jets while the light quark and gluon jets dominates the inclusive categories. A jet is labelled as − ( −) quark jets if any ( ) parton or hadron at particle level is found to be within a cone of Δ < 0.3 around a reconstructed jet, otherwise it is labelled as light quark or gluons. A summary of the jet flavour composition for inclusive and −tagged jets is documented in Table 2. The 85% −tagging working points are dominated by the presence of −quark jets and the measurement can be used to constrain the JES in →¯analysis [68] for instance. Figure 19 shows the DB response as a function of the reference photon T for the inclusive and -tagged jets using −tagging working points with an average efficiency of 77%. The MC simulations are in reasonable agreement with data. The MC-to-data response ratios are found to be slightly below one for −tagged jets and above one for inclusive jets in almost all bins. The difference between DB response between Pythia8 and Sherpa arises due to different −quark fragmentation and decay models. Checks on the apparent rise of the DB response around 150 GeV for −tagged jets are done such as the quality of the DB response fit, −tagging scale factors applied in simulations, the jet flavour composition between neighbouring T bins and a looser second jet veto with j2 T < 0.2 × ref T . None of the checks mentioned above is responsible for the DB response rise around 150 GeV. Hence these checks suggest that the feature is due to statistical fluctuations. Figure 20 shows the uncertainties for the -tagged case with a precision between 1% and 5% and inclusive jets with a precision up to 1% for the chosen T range. For −tagged jets, uncertainties are dominated by the event generator modelling everywhere, while for inclusive jets the precision is limited by the event generator modelling, photon purity and the subleading jet veto at lower T and photon energy scale for T > 70 GeV.  A new observable˜J ES is defined as a double ratio of −tagged response to the inclusive jet response to  further measure the energy scale differences between the -tagged and inclusive jets, As the nominal jet calibration is determined relative to the inclusive jet, such a double ratio can be applied on top of the nominal jet calibration to correct JES. The value of˜J ES is determined to be below one using both the MC samples with a slightly higher response in Pythia8 than Sherpa shown in Figure 21.
The difference between the two event generators arises from different fragmentation and decay models. The ratio,˜J ES , is also determined inclusively for photon ref T between 85 and 1000 GeV for various -tagging working points in Table 3 to increase statistical precision for Pythia8 and Sherpa, respectively. It is foreseen to provide MC specific calibrations for the JES to reduce the effects arising from MC modelling. The ratio,˜J ES , was measured with unprecedented precision up to 1%. This in turn will improve precision in measurements of top mass.

Conclusion
The determination of the jet energy scale (JES) is presented using data recorded by the ATLAS experiment in collisions at √ = 13 TeV. The calibration scheme used for anti-jets reconstructed using radius parameter = 0.4 consists of two steps: a Monte-Carlo-based calibration that corrects jets to the truth jet scale, and an in situ calibration correcting the scale of jets in data.
The simulation-based calibration implements several new strategies to improve the pile-up stability at higher T , closure, energy resolution, and modelling uncertainties of the jets. Biases related to the determination of the pile-up T density were a dominant source of uncertainty for jets with T below 30 GeV. The new procedure presented, combined with improvements to the multi-parton interactions model in Monte Carlo simulation, reduces this uncertainty by a factor of seven. Following this, a new residual calibration is applied, which reduces the effects of pile-up by simultaneously correcting for , PV , and T . For the absolute MCJES, a new fit method based on splines is used, leading to better closure for jets with T below 30 GeV. Finally, for the global calibration, which improves the resolution of jets and reduces the difference between the energy scale for quark-and gluon-initiated jets, a new method using a DNN is used, which allows information from correlated observables to be used for this calibration step. This DNN results in an average improvement of the JER of around 15% improvement compared with previous methods, with maximum improvement of over 40%.
Following these simulation-based calibration steps, the full Run 2 data sample is used to do a residual in situ calibration to correct the data-MC differences and constrain the uncertainties. Dĳet events are used to calibrate jets in the forward region relative to the central region as a function of jet transverse momentum and pseudorapidity. The precision is improved by up to a factor of two in the forward detector region at low T by evaluating the MC modelling uncertainty at particle level instead of reconstruction level. Central jets are calibrated by exploiting the balance between jets recoiling against either a photon or a boson. Unprecedented precision up to 1% is achieved in the in situ analysis. For the first time, the energy scale of -tagged jets relative to inclusive jets is determined with precision up to 1% in +jet events. This result is important for improving precision in analysis sensitive to −JES such as the top quark mass and →m easurements.