1 Introduction

The energetic proton–proton (pp) collisions produced by the Large Hadron Collider (LHC) yield final states that are predominantly characterised by jets, which are collimated sprays of charged and neutral hadrons and their decay products. Jets constitute an essential piece of the physics programme carried out using the ATLAS detector, and a precise understanding of jet reconstruction is critical for a wide variety of processes. Measurements of both the jet energy scale (JES) and resolution (JER) of these complex objects are therefore essential for precision measurements of the Standard Model and for searches for new phenomena beyond it. Several new methods are presented for improving the jet energy scale calibration and evaluating their performance in simulation and data, paving the way to achieving a better precision on the JES for Run 3 and beyond. These techniques were developed for jets reconstructed with the anti-\(k_{t}\) algorithm [1, 2] with radius parameter \(R = 0.4\) using particle flow inputs [3, 4]. Previous calibration strategies by the ATLAS Collaboration, that used up to 81 fb\(^{-1}\) data [4,5,6,7,8,9], are extended and improved by by taking advantage of the full Run 2 data sample of 140 fb\(^{-1}\).

The jet energy scale calibration consists of a series of calibration steps. The first stage of the calibration uses simulation to derive corrections to the jet energy scale to reduce the impact of pile-up, detector effects, and other parameters. The second stage of the calibration is a residual in situ calibration, correcting for remaining differences between data and Monte Carlo (MC) simulation, derived using well-measured reference objects, including photons and Z bosons.

The structure of the paper is as follows. Section 2 describes the ATLAS detector, and Sect. 3 describes the recorded data and the MC simulation samples, and the inputs and algorithms used to reconstruct the jets. Section 4 describes the methods used and the result of the simulation-based calibration, Sect. 5 describes the in situ calibration, and conclusions are given in Sect. 6.

2 The ATLAS detector

The ATLAS detector [10] at the LHC covers nearly the entire solid angle around the collision point.Footnote 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.

The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range of \(|\eta | < 2.5\). The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [11, 12]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to \(|\eta | = 2.0\). The TRT also provides electron identification information based on the fraction of hits above a higher energy-deposit threshold corresponding to transition radiation.

The calorimeter system covers the pseudorapidity range of \(|\eta | < 4.9\). In the region \(|\eta |< 3.2\), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \(|\eta | = 1.7\), and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.

The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region \(|\eta | < 2.7\), complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range of \(|\eta | < 2.4\) with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.

Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [13]. The first-level trigger accepts events from the up-to 40 MHz bunch crossings at a rate below 100 kHz, which is further reduced by the high-level trigger to record events to disk at about 1 kHz.

An extensive software suite [14] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Data samples and simulated events

3.1 Data samples

The data used for the calibration study were collected by ATLAS in the pp collisions at \(\sqrt{s}=13\) \(\text {TeV}\)   from 2015 to 2018 with all subdetectors operational, corresponding to an integrated luminosity of 140 \(\hbox {fb}^{-1}\). The proton bunch crossing interval was 25 ns during the data taking. The average number of interactions per bunch crossing (\(\mu \)) for the Run 2 data is 34. These conditions, and those of the detector configuration and reconstruction thresholds, are taken into account in producing and reconstructing simulated data [15].

3.2 Monte Carlo simulation

The Monte-Carlo (MC) based calibration uses MC simulated dijet and multijet events.  Pythia v8.230 [16] is used as the nominal MC generator for simulating dijets. Samples of \(2 \rightarrow 2\) dijet events are simulated using the A14 tune [17] and the NNPDF 2.3 [18] parton distribution function (PDF) set. Decays of heavy-flavour hadrons are modeled using EvtGen [19].

Several alternative samples are used to study the impact of the MC simulation on the calibration, and to determine uncertainties based on this. Two different dijet samples are simulated using the Sherpa 2.2.5 [20] generator. The matrix element calculation was included for the \(2\rightarrow 2\) process at leading-order, and the default Sherpaparton shower [21] based on Catani–Seymour dipole factorisation was used for the showering with \(p_{\text {T}}\) ordering, using the CT 14nnlo PDF set [22]. The first of these samples made use of the dedicated SherpaAHADIC model for hadronisation [23], based on cluster fragmentation. A second sample was generated with the same configuration but using the Sherpainterface to the Lund string fragmentation model of Pythia 6 [24] and its decay tables. These two sets of samples were used to evaluate uncertainties stemming from the hadronisation modelling.

Two sets of samples are simulated using Herwig 7.1.6 [25,26,27] with the NNPDF 2.3lo PDF set. These samples include the \(2 \rightarrow 2\) process at the matrix element level, and either the default angular-ordered parton shower [28] or a dipole parton shower [29, 30] using splitting kernels based on the Catani–Seymour subtraction scheme [31, 32], and in both the cases, using a cluster hadronisation.

Alternative samples of multijet production at NLO accuracy were produced with Powheg Box v2 [33, 34] interfaced to Pythia 8. These were generated with the dijet process as implemented in Powheg Box v2 [35]. The \(p_{\text {T}}\) of the underlying Born configuration was taken as the renormalisation and factorisation scales and the NNPDF 3.0nlo [36] PDF was used. Pythiawith the A14 tune and the NNPDF 2.3lo PDF was used for the shower and multi-parton interactions. These samples included per-event weight variations for different perturbative scales in the matrix element, different parton distribution functions and their uncertainties, and the Pythiaperturbative shower uncertainties. This multijet sample is used as the nominal MC sample for in situ \(\eta \)-intercalibration, while Powheg Box v2 [33, 34] interfaced to Herwig[27] with an angular-ordered parton shower [28] and the NNPDF 2.3lo PDF is used as an alternative sample.

For the in situ \(Z+\)jet analysis, samples of Z boson with jets (Z+jet) are produced with MadGraph + Pythia8 [37] using the NNPDF 3.0nnlo PDF set [20] and the AZNLO set of tuned parameters [38]. Sherpa 2.2.11 is used as the alternative MC sample. The nominal \(\gamma +\)jets samples are generated with  Pythia v8.230 [16] using the A14 set of tuned parameters [17] and NNPDF 2.3 PDF set. The \(\gamma \)+jet samples are produced for the direct photon component and the fragmentation photon component separately. The alternative sample used in \(\gamma \)+jet events is Sherpa 2.2.2 with NNPDF 3.0nnlo PDF set [20].

All samples are reconstructed using a full detector simulation and superimposed minimum-bias interactions simulated using Pythia8 with the A3 set of tuned parameters [39] and NNPDF 2.3lo PDF set to represent multiple pp interactions during the same or nearby bunch crossings (pile-up). The distribution of the average number of pile-up interactions in simulation is reweighted during data analysis to match that observed in the Run 2 data.

3.3 Jet reconstruction

The jets in these studies are reconstructed with the anti-\(k_{t}\) algorithm with a radius parameter \(R = 0.4\) as implemented in the FastJet software package [40]. Four-momentum objects are used as inputs to the algorithm, and may be particles at the generator level of the MC, charged-particle tracks, calorimeter energy deposits, or algorithmic combinations of the latter two, as in the case of the particle-flow (PFlow) reconstruction technique. Particles at the MC generator level are referred to as truth particles. Reconstructed jets use PFlow objects (PFOs) as inputs to jet reconstruction, which combine measurements from the tracker and the calorimeter to form the input signals for jet reconstruction. Specifically, energy deposited in the calorimeter by charged particles is subtracted from the observed topo-clusters and replaced by the momenta of tracks that are matched to those topo-clusters, as described in Ref. [3], and with the updates described in Ref. [4]. These resulting PFlow jets show improved energy and angular resolution, reconstruction efficiency, and pile-up stability compared with jets reconstructed using only calorimeter information.

Charged particle tracks are used for both the PFlow reconstruction and for deriving calibrations. These tracks are reconstructed in the full acceptance of the inner detector \(|\eta | < 2.5\), and are required to have a \(p_{\text {T}} > 500\) \(\text {MeV}\) unless otherwise stated, and must satisfy criteria based on the number of hits in the ID subdetectors. In addition, tracks must satisfy \(|z_{0} \textrm{sin} \theta | < 2\) mm, where \(z_0\) is the distance of closest approach of the track to the hard-scatter primary vertex along the z-axis. Tracks used in the calibration are matched to jets using ghost association, a procedure that treats them as four-vectors of infinitesimal magnitude during the jet reconstruction and assigns them to the jet with which they are clustered [41].

Generator-level jets, referred to as truth jets, are reconstructed using stable final-state particles, defined as those with \(c\tau > 10\)  mm, excluding muons, neutrinos, and particles from pile-up interactions. Generator-level jets are selected with \(p_{\text {T}} > 7\) \(\text {GeV}\) and \(|\eta | < 4.5\), while reconstructed jets used for the MC calibration are selected with \(|\eta | < 4.5\).

4 The simulation-based calibration

This section details the simulation-based jet energy scale calibration, which restores the average jet energy to that of truth jets. The event selection for these steps is described in Sect. 4.1, and the calibration is done in four steps. The first two steps apply pile-up corrections to remove the excess \(p_{\text {T}}\) due to additional pp interactions in the same (in-time) or nearby (out-of-time) bunch crossings. The first pile-up correction, described in Sect. 4.2, applies a subtraction based on the median \(p_{\text {T}}\) density measured in the event and the jet area (the ‘pile-up density correction’) [41, 42], minimizing the sensitivity to the model of the pile-up used in the simulation. Next, a correction for residual dependence on the number of reconstructed primary vertices in the event (\(N_{\text {PV}}\)) and \(\mu \)is applied (the ‘residual pile-up correction’), based on corrections derived using simulated samples, as described in Sect. 4.3. The third step, the absolute JES calibration detailed in Sect. 4.4, corrects jets so that they agree, on average, in energy and direction with truth jets from dijet MC events. Finally, the global calibration improves the jet \(p_{\text {T}}\) resolution and related uncertainties by reducing the dependence of the reconstructed jet response on observables constructed using information from the tracking, calorimeter, and muon chamber detector systems, as introduced in Sect. 4.5.

4.1 Event selection

All stages of the simulation-based jet energy scale calibration use the same event selection. The MC simulation is used to determine the energy scale and resolution of jets by comparing PFlow jets with truth jets. Truth and reconstructed jets are required to satisfy \(|\eta | < 4.5\) to be fully contained in the detector acceptance, and truth jets are additionally required to have \(p_{\text {T}} > 7\) \(\text {GeV}\). Uncalibrated jets have a positive energy, but can become negative in energy after applying the corrections described in Sects. 4.24.3. Biases are reduced in the determination of the average jet energy response (\(E_{\textrm{reco}}/E_{\textrm{true}}\)) at low energies, by requiring reconstructed jets to have \(p_{\text {T}} > 0\) after the pile-up density correction described in Sect. 4.2, but making no requirement on the \(p_{\text {T}}\) after the correction described in Sect. 4.3.

Events are required to have at least one reconstructed primary vertex with at least two matched tracks with \(p_{\text {T}} > 500\) \(\text {MeV}\). In simulated reconstructed events, the primary vertex is the reconstructed primary vertex with the largest sum of squared track momentum, while at the truth level, the primary vertex corresponds to that of the simulated hard-scatter process, and not the collision with the highest momentum transfer. This results in some events where the pile-up collision has a larger momentum transfer than the hard-scatter collision. For MC samples, the reconstructed and truth primary vertices are required to have z-positions within 0.2 mm of each other. Events are required to have at least two reconstructed jets, and at least one truth jet. Truth jets are geometrically matched to PFlow jets using the angular distance \(\Delta R\) with the requirement \(\Delta R < 0.3\). In addition, truth jets are required to be isolated from all other truth jets by \(\Delta R > 1.0\), while reconstructed jets are required to be isolated from all reconstructed jets by \(\Delta R > 0.6\). To reduce the contribution of events where the pile-up collision has a larger momentum transfer than the hard-scatter collision, the average \(p_{\text {T}}\) of the two leading reconstructed jets is required to be no larger than \(1.4 \times p_{\text {T}}^{\text {truth, leading}} \).

4.2 Estimating the median \(p_{\text {T}}\) density and the pile-up density correction

Pile-up interactions change the jet energy scale, and jet reconstruction is affected by additional pp interactions in the same or nearby bunch crossings. The first stage of the jet calibration, referred to as the ‘pile-up density correction’, subtracts the expected contribution from pile-up based on the area of the jet and the median \(p_{\text {T}}\) density in the event [41]. To compute the jet area A, a dense, uniformly distributed in \(\eta \times \phi \) population of infinitesimally soft ghost particles is overlaid on top of the event. Then, A is defined as the transverse momentum of the sum of the four-momenta of all ghost constituents matched with a given jet after clustering, normalised by the ghost constituent transverse momentum density.

The median pile-up \(p_{\text {T}}\) density \(\rho \) is estimated for each event by the median \(p_{\text {T}}\) density (\(p_{\text {T}}/A\)) of all jets clustered with the \(k_t\) algorithm [43, 44] with a radius parameter of 0.4:

$$\begin{aligned} \rho = \text {median} \Bigg \lbrace \frac{p_{\textrm{T},i}^{\textrm{jet}}}{A_i^{\textrm{jet}}} \Bigg \rbrace , \end{aligned}$$

where the index i enumerates over the jets. For this calculation, only jets with \(|\eta | < 2\) are used, since \(\rho \) falls off steeply beyond this region, due to a combination of physics and detector effects.

Assuming the pile-up is a uniform, diffuse background, the pile-up contribution to the energy of the jet can then be approximated by the product of the jet area times the median \(p_{\text {T}}\) density. The pile-up density-corrected jet \(p_{\text {T}}\), \(p_{\text {T}}^{\text {area}}\), is then defined as

$$\begin{aligned} p_{\text {T}}^{\text {area}} = p_{\text {T}}- \rho \times A. \end{aligned}$$

The ratio of the \(p_{\text {T}}^{\text {area}}\) to the uncorrected jet \(p_{\text {T}}\) is applied as a scale factor to the jet four-momentum and does hence not affect its direction.

Previously, the inputs to the \(\rho \) calculation were the same as those used to build jets: the neutral PFlow objects and the charged PFlow objects that satisfy \(|z_0\sin \theta |<2\) mm [4]. However, this results in a bias from the inclusion of hard-scatter tracks, shifting the median \(p_{\text {T}}\) density to higher values, particularly when the hard-scatter process has a large jet multiplicity. To prevent such biases, the ‘pile-up sideband’ (PUSB) \(\rho \) definition is studied, which uses neutral PFlow objects, and charged PFlow objects that satisfy \(2 ~\textrm{mm}~<|z_0\sin \theta |<4\) mm as inputs. The total amount of pile-up using the sideband cuts is expected to be similar to the nominal criteria, since a similar amount of pile-up will meet these criteria. This ensures a minimal loss of the event-by-event correlation of the charged pile-up component that is not removed by charged hadron subtraction cuts.

The JES is measured in specific event selections as described in Sect. 5, but these calibrations are applied to many final states. An uncertainty is required to cover potential inadequacy of the model used in the simulation of the difference between the bias in \(\rho \) in different event topologies, specifically the difference between the bias in \(\rho \) between dijet and \(Z(\rightarrow \mu \mu )+\)jets events. The \(Z(\rightarrow \mu \mu )+\)jets events are distinctive from dijet events in several ways, including the quark-gluon composition, colour flow, and momentum transfer of the process, making it a good topology to use to estimate the magnitude of a potential bias. This bias at a given \(\mu \) is estimated by comparing the difference between the value of \(\langle \rho {}\rangle \) for the data sample in two different event topologies to measure potential inadequacy of the model used in the MC simulation of the data:

$$\begin{aligned} \Delta ^{\rho }(\mu ) = \left( \rho (\mu )^{\text {dijet}}_{\text {MC}} - \rho (\mu )^{\text {Z+jet}}_{\text {MC}} \right) - \left( \rho (\mu )^{\text {dijet}}_{\text {data}} - \rho (\mu )^{\text {Z+jet}}_{\text {data}} \right) , \end{aligned}$$

and the bias that is propagated to the jet energy scale uncertainties is the bias determined at the average value of \(\mu \) for the data sample:

$$\begin{aligned} \text {bias} [\text {GeV} ] = \Delta _\rho (\langle \mu \rangle ). \end{aligned}$$

The \(Z(\rightarrow \mu \mu )+\)jets selection uses the lowest unprescaled single muon trigger, and requires two muons with \(p_{\text {T}} ^{\mu 1}>30~\text {GeV}, p_{\text {T}} ^{\mu 2}>25\) \(\text {GeV}\), \(80<m^{\mu \mu }<100\) \(\text {GeV}\), and \(p_{\text {T}} ^{\mu \mu }>25\) \(\text {GeV}\), and the dijet selection uses the lowest unprescaled single jet trigger, and requires a leading jet with \(p_{\text {T}} >500\) \(\text {GeV}\), \(|\eta |<2.4\), and greater than 5% of the momentum carried by charged particle flow objects.

Figure 1 shows the dependence of \(\langle \rho \rangle \) in the two processes as a function of \(\mu \) for data and simulation, comparing both the \(\rho \) definitions described above. The lower panels compare the values of \(\langle \rho \rangle \) in the two processes. Two different Sherpadijet samples are shown: a 2.1.1 sample [45] that was used in the previous calibration [4], and the 2.2.5 sample used now, while for \(Z(\rightarrow \mu \mu )+\)jets, only Sherpa2.2.1 is used. The Sherpa 2.2.X samples include an improvement to the multi-parton interaction (MPI) model, which directly affects the bias in \(\rho \). Significantly larger differences are seen between the dijet Sherpa 2.1.1 sample and the \(Z(\rightarrow \mu \mu )+\)jets Sherpa 2.2.1 sample than between the dijet Sherpa 2.2.5 sample and the \(Z(\rightarrow \mu \mu )+\)jets Sherpa 2.2.1 sample, Previously, the bias was determined using the dijet Sherpa 2.1.1 sample and the \(Z(\rightarrow \mu \mu )+\)jets Sherpa 2.2.1 sample, which have different MPI models. Using the updated dijet Sherpasample that uses a consistent MPI model with \(Z(\rightarrow \mu \mu )+\)jets results in a factor of four reduction in the bias, showing the importance of MPI modelling in MC simulations. The new \(\rho \) definition, \(\rho ^{\text {PUSB}}\), results in significantly smaller differences between the different topologies, and a better description of the data by the simulation. Similarly, the improvements to the \(\rho \) definition result in almost a factor of three improvement to the uncertainty, as seen by the difference between data and Sherpafor the two different \(\rho \) definitions. Together, these improvements reduce the JES uncertainty from the \(\rho \) modelling by a factor of nearly seven.

Fig. 1
figure 1

The distribution of \(\rho \) as a function of \(\mu \)for the (top) \(Z(\rightarrow \mu \mu )\)+jets and (middle) dijet selections for data, Pythia8, Sherpa 2.2.5, and Sherpa 2.1.1. The lower panel shows the difference between the two topologies which is used to determine the uncertainty from the extrapolation across topologies, indicated by the vertical arrows. The left plot shows \(\rho \) built from the jet constituents: neutral PFOs and charged PFOs with \(|z_0\sin \theta |<2\) mm, and the right plot shows \(\rho \) built using neutral PFOs and charged PFOs satisfying the new sideband selection

4.3 Residual pile-up correction

To further reduce the impact of pile-up, a residual pile-up correction is applied, based on \(N_{\text {PV}}\), \(\mu \), the reconstructed jet \(p_{\text {T}} \) (\(p_{\text {T}}^{\text {reco}}\)), and the reconstructed jet \(\eta \) (\(\eta ^{\text {reco}}\)). Due to the fast response of the silicon tracking detectors used to reconstruct the tracks used to find the primary vertices, \(N_{\text {PV}}\) is sensitive to the in-time pile-up, while \(\mu \) is sensitive to the out-of-time pile-up, since it accounts for the average amount of pile-up around a given bunch crossing. Typically, in-time pile-up increases the energy of the jet, and out-of-time pile-up decreases it. The negative dependence of the jet energy scale on \(\mu \) for out-of-time pile-up is a result of the liquid-argon calorimeter’s pulse shape, which is negative during the period soon after registering a signal [46]. Two options for the residual pile-up correction are compared.

4.3.1 The 1D residual pile-up correction

The first strategy, referred to as the ‘1D residual pile-up calibration’, follows the method outlined in Ref. [4], where additional corrections are applied based on the \(\mu \) and \(N_{\text {PV}}\) of the event, with

$$\begin{aligned} p_{\text {T}}^{\text {1D residual}}= & {} p_{\text {T}}^{\text {area}} - (\partial p_{\text {T}} / \partial N_{\text {PV}}) \nonumber \\{} & {} \times (N_{\text {PV}}-1)- (\partial p_{\text {T}} / \partial \mu ) \times \mu , \end{aligned}$$

where \(\partial p_{\text {T}} / \partial N_{\text {PV}}\) and \(\partial p_{\text {T}} / \partial \mu \) are determined as follows. To determine \(\partial p_{\text {T}} / \partial N_{\text {PV}}\), first, the dependence of \(\Delta p_{\text {T}}^{\text {area-truth}} = p_{\text {T}}^{\text {area}}-p_{\text {T}}^{\text {truth}} \) on \(N_{\text {PV}}\) is fit with a line in bins of \(\mu \), \(p_{\text {T}}^{\text {truth}}\), and \(\eta \). The slope of this function is taken as the dependence \(\partial p_{\text {T}} / \partial N_{\text {PV}}\) per \(\mu \) bin. The average of these slopes across \(\mu \) is taken to be the \(p_{\text {T}}\) dependence on \(N_{\text {PV}}\) for a given \(p_{\text {T}}^{\text {truth}}\) and \(\eta \) bin. For each \(\eta \) bin, the average \(p_{\text {T}}\) dependence is fit as a function of \(p_{\text {T}}^{\text {truth}}\) with a logarithmic function for \(20~\text {GeV} ~< p_{\text {T}}^{\text {truth}} < 200\) \(\text {GeV}\). The value of the logarithmic fit at 25 \(\text {GeV}\)   is taken as the nominal correction, since pile-up effects are most relevant for low-\(p_{\text {T}}\) jets. Finally, a piecewise linear function is fit over the per \(|\eta |\) bin values of \(\partial p_{\text {T}} / \partial N_{\text {PV}}\), reducing statistical fluctuations and providing a continuous correction over the full \(\eta \) range. To determine \(\partial p_{\text {T}} / \partial \mu \), the same process is repeated with \(N_{\text {PV}}\) and \(\mu \) switched, to get the dependence of \(\Delta p_{\text {T}}^{\text {area-truth}}\) on \(\mu \).

4.3.2 The 3D residual pile-up correction

The 1D residual pile-up correction does not account for correlations between \(\mu \) and \(N_{\text {PV}}\), and does not account for changes in the pile-up contribution as a function of jet \(p_{\text {T}}\). The 3D residual pile-up correction is designed to include these correlations. In this calibration, derived in bins of \(\eta ^{\text {reco}}\), the jet \(p_{\text {T}}\) scale is shifted to match the truth jet scale as a function of (\(N_{\text {PV}}\), \(\mu \), \(p_{\text {T}}^{\text {area}}\)), simultaneously correcting for pile-up and detector effects. The \(p_{\text {T}}^{\text {truth}}\) is used as a reference to compute a correction given by \(\Delta p_{\text {T}}^{\text {area-truth}}\). For extreme values of \(\mu \) and \(N_{\text {PV}}\), where there are insufficient events to determine an accurate correction, the correction is determined using the closest non-empty (\(\mu \), \(N_{\text {PV}}\)) bin (with the same \(p_{\text {T}}\)), and the result is smoothed. This average difference, \(\Delta p_{\text {T}}^{\text {area-truth}}\), is fit as a function of \(p_{\text {T}}^{\text {area}}\) using a linear plus logarithmic function, in bins of \(N_{\text {PV}}\), \(\mu \), and \(p_{\text {T}}^{\text {reco}}\), determined using jets with \(10~\text {GeV} ~< p_{\text {T}}^{\text {truth}} < 200\) \(\text {GeV}\). The corrected value is given by

$$\begin{aligned} p_{\text {T}}^{\text {3D residual}} = p_{\text {T}}^{\text {area}} - \Delta p_{\text {T}}^{\text {area-truth}} (N_{\text {PV}},\mu ,p_{\text {T}}^{\text {area}}). \end{aligned}$$

By construction, this residual pile-up calibration corrects the jet energy scale to the truth jet scale, combining corrections due to pile-up with corrections due to detector effects. This is contrasted with the 1D residual pile-up correction, which is designed to exclusively remove the impact of pile-up on the jet \(p_{\text {T}}\) scale. Several options to only correct for the pile-up \(p_{\text {T}}\) were studied, but these were found to either increase the pile-up dependence or result in problematic effects such as a large fraction of jets with negative \(p_{\text {T}}\).

4.3.3 Comparison of the different residual pile-up corrections

A comparison of the different options for the residual pile-up corrections is shown in Fig. 2. As seen in this figure, the residual pile-up calibration is especially useful for improving the pile-up dependence for jets with \(|\eta ^{\text {reco}} | > 2.5\). Overall, for the 1D residual pile-up correction, the absolute pile-up dependence increases for higher \(p_{\text {T}}\) jets, but the relative impact on the \(p_{\text {T}}\) response is smaller. While the 1D residual pile-up correction performs best for the \(p_{\text {T}}\) range which it is optimised for (20–30 \(\text {GeV}\)), it has a sizeable pile-up dependence at other jet \(p_{\text {T}}\). In addition, since the 1D residual pile-up correction is optimised for this same bin, its performance appears enhanced by construction, while a more differential binning would show a worse performance. The 3D residual pile-up correction significantly reduces the pile-up dependence of the calibration, particularly at high \(p_{\text {T}}\). Based on these results, the 3D residual pile-up calibration is used for the remainder of the reported studies.

Fig. 2
figure 2

Dependence of \(p_{\text {T}}\) on (left) \(\mu \)and (right) \(N_{\text {PV}}\)  after the different residual pile-up corrections. The circles indicate the 1D residual pile-up correction, and the squares indicate the 3D residual pile-up correction. This is shown for jets with (top) \(20~\text {GeV}< p_{\text {T}} < 30~\text {GeV} \), (bottom) \(30~\text {GeV}< p_{\text {T}} < 60~\text {GeV} \)

4.4 The jet energy scale and \(\eta \) calibration

The absolute jet energy scale (MCJES) and \(\eta \) corrections provide calibration functions for the energy and \(\eta \) as a function of \(\eta ^{\text {det}}\) and \(E^{\textrm{reco}}\) such that jets agree on average with the truth jet energy and \(\eta \). Since the calorimeters measure the energy of particles, and not the transverse momenta, this correction is determined as a function of the jet energy. The jet energy response \(\mathcal {R}\), defined as the mean of a fit with a Gaussian function to the core of the \(E^{\textrm{reco}}/E^{\textrm{true}}\) distribution, is measured in \(E_{\textrm{true}}\) and \(\eta ^{\text {det}} \) bins, where \(\eta ^{\text {det}} \) is the jet \(\eta \) pointing from the geometric centre of the detector, which is used to remove any ambiguity about which region of the detector is measuring the jet. The difference in \(\mathcal {R}\) from the expected value of one is referred to as the non-closure, and regions where the \(\mathcal {R}\) has the expected value within the uncertainties are said to demonstrate closure.

The jet energy response after the application of the residual calibration is shown as a function of \(E_{\textrm{true}}\) and \(\eta ^{\text {det}} \) in Fig. 3. This differs from previous JES calibrations by the ATLAS experiment, in that the jet energy response is already close to unity, meaning that the correction is relatively small. This is a feature of the 3D residual pile-up correction, which shifts the energy scale of the jets close to the truth scale, although there is some significant difference from one at high \(\eta \), where the residual calibration insufficiently captures the behaviour of the energy response. Since the \(\Delta p_{\text {T}} \) term in the 3D residual pile-up correction is determined using jets with \(p_{\text {T}}^{\text {truth}} < 200\) \(\text {GeV}\), the jet energy response shifts away from one at energies corresponding to \(p_{\text {T}} > 200\) \(\text {GeV}\).

Fig. 3
figure 3

The jet energy response before the MCJES calibration a at fixed energies as a function of \(\eta _{\text {det}} \), and b at fixed \(\eta _{\text {det}} \) as a function of truth jet energy. a The square shows the response for \(E_{\text {true}}=30~\text {GeV} \), the plus-sign shows the response for \(E_{\text {true}}=50~\text {GeV} \), the down-triangle shows the response for \(E_{\text {true}}=110~\text {GeV} \), the up-triangle shows the response for \(E_{\text {true}}=500~\text {GeV} \), and the circle shows the response for \(E_{\text {true}}=1200~\text {GeV} \). b The square shows the response for \(0.0< \eta _{\text {det}} < 0.1\), the plus-sign shows the response for \(1.0< \eta _{\text {det}} < 1.1\), the down-triangle shows the response for \(1.4< \eta _{\text {det}} < 1.5\), the up-triangle shows the response for \(2.8< \eta _{\text {det}} < 2.9\), and the circle shows the response for \(4.0< \eta _{\text {det}} < 4.1\)

Directly predicting the jet energy response from \(E_{\text {reco}}\) depends on the distribution of \(E_{\text {true}}\) used to derive the calibration. Overall, the distribution of the response is approximately Gaussian for a given \(E_{\text {true}}\), but not for a given \(E_{\text {reco}}\) [47]. Therefore, the calibration uses a numerical inversion technique [5], where, for each \(\eta \) bin, the jet energy response is fit as a function of \(E_{\text {true}}\), and the jet calibration factor as a function of \(E_{\text {reco}}\) is determined using the inverse of this function. The two methods of determining the fit function, polynomial fits of order N, and penalised splines are compared below.

4.4.1 Polynomial fits

Following the procedure outlined in Ref. [4], polynomial fits are defined as a function of \(\log (E)\), where \(N_{\textrm{max}}=8\) is the maximal order of the fitted polynomials. Out of the given \(N_{\textrm{max}}\) fit functions, the best fit function is identified using Pearson’s \(\chi ^2\) test [48]. The calibration factors are usually frozen at an \(\eta \)-dependent energy between 3 and \(4~\text {TeV} \) to reduce statistical fluctuations, while at for \(p_{\text {T}} < 8\) \(\text {GeV}\), a linear extrapolation of the calibration factor is performed.

4.4.2 Penalised splines

In addition to the polynomial fit functions, a new method using penalised splines is studied. A spline S(x) of degree N is a piecewise polynomial function of degree N, where pieces of the spline meet at points called knots, and the first \(N-1\) derivatives are continuous across the knots. Splines may be defined from b-spline basis functions \(B_i(x,t)\) [49] via

$$\begin{aligned} S(x) = \sum _i^{n-1} a_i B_i(x,t), \end{aligned}$$

where n is the number of data fit points, \(a_i\) are control points weighting the individual basis functions \(B_i\), and t are the knots. Since \(S(x) = 0\) for x outside of the range defined by the knots, an extrapolation to lower (higher) energy values is added using a linear extrapolation based on the first (last) five points for the low (high) end of the spline.

A spline will overfit the data, since the basis function is required to exactly pass through the knots, where in this case, the knots correspond to the energies where the response is determined. For a set of points \(x_i\) and their corresponding values \(y_i\), this can be mitigated by using penalised b-splines (p-splines), which include an additional smoothness penalisation term P, minimizing

$$\begin{aligned} L = \chi ^2 + \alpha P = \underbrace{\sum _{i=0}^{n} \left( y_i - S(x_i) \right) ^2}_{\text {least~squares}} + \underbrace{\alpha \int _{a}^{b} \left( S^{\prime \prime }(x)\right) ^2 dx}_{\text {penalisation}} \end{aligned}$$

with a and b corresponding to the range over which the penalisation term is included, with \(a<x_i<b\), and the penalisation parameter \(\alpha \ge 0\) is chosen and fixed. For these studies, the x values correspond to \(E_{\textrm{true}}\), and the y values correspond to the jet energy response. As \(\alpha \) increases from zero to \(\infty \), the result moves from a spline to a linear regression, and this parameter enables a compromise between the curvature penalisation and a close fit to the data. The penalisation parameter \(\alpha \) is defined dynamically for each \(\eta \) bin as

$$\begin{aligned} \alpha = \frac{\lambda }{n} \cdot \sum _{i=0}^{n} w_i, \end{aligned}$$

where i runs over the n data fit points, \(\lambda \) is a regulative parameter, and \(w_i\) are the point weights defined as \(w=\sigma _y^{-1/2}\), where \(\sigma _y\) is the response fit uncertainty from the iterative fit to a Gaussian function.

For these studies, the splines are implemented using the Splinter framework [50], and a spline of degree three is used, with \(\lambda \) empirically set to 0.1. To check for overfitting, the calibration and the closure test are performed on statistically independent events.

4.4.3 Comparison of calibrations

A comparison of the MCJES closure for the fitting techniques at different energy values is presented in Fig. 4. Both the strategies can provide closure of 1% for high energies, while at low energies, the p-spline approach provides better closure than the polynomial fit. Overall, the p-spline correction provides closure within 1% across the \(p_{\text {T}}\) and \(\eta \) range considered, except for a small number of bins where the calibration becomes difficult due to quickly changing response, and non-Gaussian terms in the energy response. The rest of the studies use the correction determined from the p-spline fit, since it provides the best overall closure.

Fig. 4
figure 4

Jet response at fixed energies as a function of \(\eta \) a after the polynomial MCJES calibration step, and b after the p-spline MCJES calibration. The square sign shows the response for \(E_{\text {true}}=30~\text {GeV} \), the plus sign shows the response for \(E_{\text {true}}=50~\text {GeV} \), the down-pointing triangle shows the response for \(E_{\text {true}}=110~\text {GeV} \), the up-pointing triangle shows the response for \(E_{\text {true}}=500~\text {GeV} \), and the circle shows the response for \(E_{\text {true}}=1200~\text {GeV} \)

4.4.4 Absolute MC jet \(\eta \) calibration

In addition to the jet energy, the jet pseudorapidity \(\eta \) is calibrated with a similar approach as the JES calibration to correct for biases in the \(\eta \) reconstruction, following the strategy in Ref. [4]. This bias is most pronounced in the transition region between different parts of the calorimeter, where the discrepant response of the different detectors artificially shifts the reconstructed energy on one side of the jet, changing the reconstructed \(\eta \). These \(\eta \) corrections are particularly needed in the barrel-endcap (\(|\eta | \sim 1.4\)) and forward-endcap (\(|\eta | \sim 3.1\)) transitions. The bias in \(\eta \) is defined as \(\eta _\textrm{bias} = \langle \eta ^{\text {reco}} - \eta ^{\text {true}} \rangle \), as determined by an iterative fit to a Gaussian function, and the correction is performed on a jet-by-jet basis via \(\eta ^{\text {calib}} = \eta ^{\text {reco}} - \mathcal {R}_\eta \). This correction is only applied to the \(p_{\text {T}}\) and \(\eta \) of the jet, and is parameterised as a function of \(E^{\textrm{true}}\) and \(\eta ^{\text {det}} \). For this correction, only polynomial fits are studied, using up to an order three polynomial. There are small correlations between the corrections in \(\eta \) and E, so this correction is derived simultaneously with the JES.

4.5 The global property calibration

The absolute MCJES calibration corrects the jet energy response based on the E and \(\eta ^{\text {det}} \) of the jet. However, there are many other factors that contribute to the jet response, including the distribution of energy in the jet, the distribution of energy deposits across different calorimeter layers, and the types of hadrons produced in the jet. Many of these characteristics depend on whether the jet is quark- or gluon-initiated. This can be seen in Fig. 5, which shows an example of the jet response distribution for jets with different initiating partons, and the jet \(p_{\text {T}}\) response as a function of \(p_{\text {T}}^{\text {true}}\), where the parton label is defined by the highest energy parton ghost-associated with the truth jet. Not only are there differences between different jet flavours (i.e. the flavour of the initiating parton), but the behaviours change with the \(p_{\text {T}}\) of the jet. Quark-initiated jets tend to have fewer hadrons, each with a higher fraction of the jet \(p_{\text {T}}\), which typically results in contributions further into the calorimeter. In contrast, gluon-initiated jets typically have more, lower-\(p_{\text {T}}\) hadrons, leading to a lower calorimeter response and a wider transverse profile. These behaviours are further complicated by the use of particle flow reconstruction, which adds further dependence based on the charged particles in the jet.

The jet \(p_{\text {T}}\) response is also impacted by the MC model, as seen by the differences between the jet \(p_{\text {T}}\) response shown in Fig. 6. Overall, most MC predictions have similar behaviour for quark-initiated jets, while the differences between gluon-initiated jets can be sizeable. This is due to differences between predictions between MC generators for the amount of soft radiation and its topological distribution in the jet. There is some separation in the behaviour of models with the Lund string model for hadronisation compared with the other models, where the Lund string model tends to predict a higher gluon \(p_{\text {T}}\) response, with larger differences for jets with \(p_{\text {T}} < 100\) \(\text {GeV}\). This can primarily be attributed to the fraction of jet energy carried by baryons and kaons [51].

The global jet property calibration applies further corrections to jets based on their individual characteristics. While these corrections only have a small effect on the overall closure of the calibration, the closure is significantly improved for different classes of jets, improving the JER. In addition, this calibration reduces differences between MC predictions for the JES, resulting in smaller modelling uncertainties. Two methods for deriving the global calibration are outlined below: the global sequential calibration (GSC), which was described in previous work [4], and a new method, the global neural network calibration (GNNC). Both the corrections are derived in \(|\eta ^{\text {det}} |\) bins corresponding to different detector regions, creating a balance between the statistical uncertainty and the generality of the results.

Fig. 5
figure 5

a The jet \(p_{\text {T}}\) response distribution for different jet flavours for jets with \(20~\text {GeV}< p_{\text {T}}^{\text {true}} < 25~\text {GeV} \), and b the jet \(p_{\text {T}}\) response for several different flavours of jets as a function of \(p_{\text {T}}^{\text {true}}\). The solid line shows the response for gluon jets, the long dashed line shows the response for light quark jets, the short dashed line shows strange jets, the alternating medium and short dashed line shows charm jets, and the alternating long and short dashed line shows bottom jets

Fig. 6
figure 6

The jet \(p_{\text {T}}\) response as a function of \(p_{\text {T}}^{\text {true}}\) for several different MC predictions for a quark jets, and b gluon jets. The solid line shows Pythia8, the long dashed line shows Herwigwith an angular ordered parton shower, the short dashed line shows Herwigwith a dipole shower, the alternating medium and short dashed line shows Sherpa with the AHADIC hadronisation model, the alternating long and short dashed line shows Sherpa with the string hadronisation model, and the long dash with three short dashes shows Powheg+Pythia

4.5.1 The global sequential calibration

The GSC is a series of multiplicative corrections to account for the differences between the calorimeter response to different types of jets, which improves the jet resolution without changing the jet energy response. The GSC is based on global jet observables such as the longitudinal profile of the energy deposits in the calorimeters, tracking information matched to the jet, and information related to the activity in the muon chambers behind a jet. Six observables that improve the JER and reduce modelling uncertainties are used as inputs to the GSC. Each GSC correction to the jet four-momentum is derived and applied independently and sequentially, using the following procedure. First, for a given GSC observable, the jet \(p_{\text {T}}\) response distribution is fit for each bin of \(|\eta |\), \(p_{\text {T}}^{\text {true}}\), and the GSC observable, using the same procedure as in the MCJES calibration. Next, these fitted values are divided by the inclusive value of the response in a given \(|\eta |\) and \(p_{\text {T}}^{\text {true}}\) bin to avoid changing the \(p_{\text {T}}\) scale of the jet calibration. Then, for each bin of the GSC observable, the numerical inversion of the jet \(p_{\text {T}}\) response is performed after a linear smoothing in a given bin of (\(p_{\text {T}}^{\text {true}}\), \(|\eta ^{\text {det}} |\)). The resulting responses for a given \(|\eta ^{\text {det}} |\) bin are then smoothed simultaneously in \(p_{\text {T}}\) and the GSC observable using a Gaussian kernel. Because the GSC is applied sequentially, it is possible to validate each GSC correction in a systematic way, testing the impact of any mismodelling of the input variables using data. Such studies were performed to validate the sequential correction procedure.

The six stages of the GSC, in the order of application, are

  • \(f_{\textrm{charged}}\): the fraction of the jet \(p_{\text {T}}\) carried by charged particles, as measured using ghost-associated tracks with \(p_{\text {T}} > 500~\text {MeV} \), \(|\eta ^{\text {det}} | < 2.5\),

  • \(f_{\textrm{Tile0}}\): the fraction of jet energy (\(E_{\textrm{frac}}\)) measured in the first layer of the hadronic tile calorimeter, \(|\eta ^{\text {det}} | < 1.8\),

  • \(f_{\textrm{LAr3}}\): the \(E_{\textrm{frac}}\) measured in the third layer of the electromagnetic LAr calorimeter, \(|\eta ^{\text {det}} | < 3.5\),

  • \(N_{\textrm{track}}\): the number of tracks with \(p_{\text {T}} > 1~\text {GeV} \) ghost-associated with the jet, \(|\eta ^{\text {det}} | < 2.5\),

  • \(w_{\textrm{track}}\): also known as track width, the average \(p_{\text {T}}\)-weighted transverse distance in the \(\eta \)-\(\phi \) plane, between the jet axis and all tracks of \(p_{\text {T}} > 1~\text {GeV} \) ghost-associated with the jet, \(|\eta ^{\text {det}} | < 2.5\),

  • \(N_{\textrm{segments}}\): the number of muon track segments ghost-associated with the jet, \(|\eta ^{\text {det}} | < 2.8\).

The \(N_{\textrm{segments}}\) correction, also known as the punch-through correction, reduces the tails of the response distribution caused by high-\(p_{\text {T}}\) jets that are not fully contained in the calorimeter. Unlike the other corrections, the \(N_{\textrm{segments}}\) correction is applied as a function of the jet energy instead of the jet \(p_{\text {T}}\), since this effect is more strongly correlated with energy escaping the calorimeters.

The jet \(p_{\text {T}}\) response for PFlow jets in MC simulation after each of the GSC corrections is shown in Fig. 7 for one \(|\eta |\) bin. While the jet energy scale is within 1% at low energies, a small amount of non-closure is introduced when determining the response using \(p_{\text {T}}\) instead of energy. The fractional jet resolution, denoted by \(\sigma \), is used to determine the magnitude of the fluctuations in the jet energy reconstruction, where \(\sigma \) is the width of the fit to a Gaussian function for the jet \(p_{\text {T}}\) response distribution divided by the mean of the fit. This is shown for PFlow jets with \(0.2< |\eta ^{\text {det}} | < 0.7\) in MC simulation in Fig. 7. As more corrections are applied, the fractional jet resolution improves and the jet response dependence on the jet flavour is reduced as the calibration is improved for jets with varying features. The impact of \(f_{\textrm{charged}}\) and \(f_{\textrm{Tile0}}\) are most apparent in Fig. 7, but the relative impact of the different corrections varies as a function of \(|\eta ^{\text {det}} |\). In addition, these corrections reduces effects that are less evident in the inclusive case. For instance, the punch-through correction scales with energy, and so it primarily impacts analyses that are sensitive to high-energy jets, but its impact is not obvious in the inclusive distribution.

Fig. 7
figure 7

a The jet \(p_{\text {T}}\) response after each stage of the GSC calibration, and b the jet \(p_{\text {T}}\) resolution after the MCJES, after the \(f_{\textrm{charged}}\) correction, after the \(f_{\textrm{Tile0}}\) correction, after the \(f_{\textrm{LAr3}}\) correction, after the \(N_{\textrm{track}}\) correction, after the \(w_{\textrm{track}}\) correction, and after the \(N_{\textrm{segments}}\) correction

4.5.2 The global neural network calibration

The GSC is limited to using relatively uncorrelated variables for the correction, since otherwise, each sequential step would potentially interfere with previous corrections due to correlations between observables. This constraint is fundamental to the method, limiting the set of corrections that may be applied. However when adding additional observables, and to account for their correlations, a simultaneous calibration is more appropriate [52]. As an alternative to the sequential calibration, a deep neural network (DNN) is trained to determine a simultaneous correction based on a wide variety of jet properties, enabling the use of correlated variables for determining the global jet property correction. Since analyses make selections based on the jet \(p_{\text {T}}\), the DNN is designed to correct the jet \(p_{\text {T}}\) response, in contrast to the GSC, which leaves the energy response unchanged.

To improve the performance based on the detector geometry, a DNN is trained for each \(|\eta ^{\text {det}} |\) region used to derive the GSC to provide a correction to the jet \(p_{\text {T}}\) based on various jet- and event-level features. The DNNs are trained with Keras [53], using the Adam [54] optimisation algorithm. The network has three hidden layers with swish activation functions [55] and a single-node output layer with linear activation. The number of nodes is optimised for each \(|\eta ^{\text {det}} |\) bin, and ranges between 100 and 300. The network uses the leaky Gaussian kernel (LGK) loss function [56]

$$\begin{aligned} \textrm{Loss}(x^{\textrm{target}}, x^{\textrm{pred}})= & {} - \frac{1}{\sqrt{2\pi }} \textrm{exp}\left( -\frac{(x^{\textrm{target}} - x^{\textrm{pred}})^2}{2\alpha ^2}\right) \nonumber \\{} & {} + \beta |x^{\textrm{target}} - x^{\textrm{pred}} |, \end{aligned}$$

where \(x^{\textrm{target}}\) is the jet \(p_{\text {T}}\) response, \(x^{\textrm{pred}}\) is the corresponding NN prediction, and \(\alpha \) and \(\beta \) are tunable parameters. As \(\alpha \rightarrow 0\), the LGK loss learns the mode, and the second term ensures that the gradient of the error function relative to the current weight does not vanish for large \(x^{\textrm{target}} - x^{\textrm{pred}}\). Learning the mode is less biased by cases where the response is not a perfect Gaussian distribution, resulting in better closure than a loss function that learns the mean of the distribution.

The architecture of the network was chosen as the result of a hyperparameter optimisation based on the closure of the result, where hyperparameters are parameters involving the network structure. The training is done with a batch size of \(10^{4}\) jets, and a learning rate of \(10^{-4}\). For the LGK loss, the parameters are chosen to be \(\alpha =10^{-1}\), and \(\beta =10^{-6}\) based on the hyperparameter scan. The training is done to minimise the LGK loss function, and training continues until there are no improvements to the loss for five epochs. Increasing the patience did not have a noticeable effect on the quality of the results. Unweighted events are used because this avoids issues in the training due to large differences between the event weights. Since the target is the \(p_{\text {T}}\) response, not the jet \(p_{\text {T}}\) itself, the uniform weights do not have a large impact on the final result. Only the two leading jets in the event are used in the training, since the events were simulated using a dijet process, and so this avoids potential biases from using jets that originate purely from the parton shower. For each \(|\eta ^{\text {det}} |\) bin, several networks were trained, and the one with the best closure was chosen for the final result.

Several sets of variables were considered as inputs to the NN, and the final list of variables used in the training is given in Table 1. This list includes all of the variables used in the GSC calibration, with the addition of more information about the jet kinematics, more granular information about the energy deposits in different calorimeter layers, and measures of pile-up. While the residual pile-up correction removes most of the pile-up dependence, some dependence is reintroduced by the absolute MCJES calibration, and so \(N_{\text {PV}}\) and \(\mu \)are included in the training. Some calorimeter layers are not present for certain \(|\eta ^{\text {det}} |\) regions, in which case their \(E_{\textrm{frac}}\) is set to zero. Explicitly removing these observables from the list of input variables used in the NN training had a negligible impact on the results, and so the set of training variables is kept the same for all \(|\eta ^{\text {det}} |\) regions.

Table 1 List of variables used as input to the GNNC. Variables with a * correspond to those that are also used by the GSC

The jet \(p_{\text {T}}\) closure from this calibration is typically better than 1%, but it also has some fluctuations, which can sometimes slightly exceed this. The magnitude of these fluctuations varies with each DNN training but were persistent across different DNN hyperparameters, loss functions, and training targets. To mitigate this, an additional \(p_{\text {T}}\) calibration is derived after the GNNC, using the p-spline method outlined in Sect. 4.4, but using the truth jet \(p_{\text {T}}\) as the target instead of the energy. This is derived in \(|\eta ^{\text {det}} |\) bins with width of 0.1, which provides better performance than using the same \(|\eta ^{\text {det}} |\) bins as the GNNC correction. This has a negligible effect on the jet \(p_{\text {T}}\) resolution, and only serves to improve the closure and smoothness of the calibration.

4.5.3 Comparison of the methods

Figure 8 shows a comparison of the jet \(p_{\text {T}}\) response after the MCJES, GSC and GNNC for one representative \(|\eta ^{\text {det}} |\) bins. As designed, the GSC does not change the energy response of the jets. Since the JES calibration moves the reconstructed energy scale to match that of the truth scale, this can result in some nonclosure in the jet \(p_{\text {T}}\), which is particularly evident at low \(p_{\text {T}}\). The GNNC is designed to change the \(p_{\text {T}}\) scale of the jets to match the truth jets, and so the closure in \(p_{\text {T}}\) is better than that of the GSC closure. It is worth noting that while the GSC can instead be applied in a way that corrects the jet \(p_{\text {T}}\) scale, this does not impact the resolution. Other \(|\eta ^{\text {det}} |\) bins show similar qualitative features, though the exact nonclosure seen in the \(p_{\text {T}}\) response after the MCJES and GSC varies slightly.

Figure 9 show a comparison of the jet \(p_{\text {T}}\) resolution after the MCJES, GSC and GNNC for several representative \(|\eta ^{\text {det}} |\) bins. In a few cases, the jet \(p_{\text {T}}\) resolution becomes worse in the lowest \(p_{\text {T}}\) bins, but this is also where the \(p_{\text {T}}\) nonclosure is most significant, making it difficult to have an accurate estimate of the resolution, particularly since the \(p_{\text {T}}\) scale of the GNNC is different than that of the MCJES and GSC. Since the \(p_{\text {T}}\) scale of the MCJES and GSC is above one and has a negative slope, the measured resolution is slightly underestimated [47] in these bins, while the GNNC resolution is correctly estimated, since the response closes. In the \(0.2< |\eta | < 0.7\) bin, the GNNC has an average improvement in the jet \(p_{\text {T}}\) resolution of over 15%, and maximum improvements of over 25%, when compared with the GSC. Other \(|\eta ^{\text {det}} |\) bins show similar average improvements of around 15–25%, with maximum improvements often over 30%, and the improvement generally becomes more pronounced at higher \(|\eta ^{\text {det}} |\), where the resolution improvements are significant, mostly due to the improvements from the additional detector information. Studies comparing the GNNC performance with only the GSC observables as inputs find a similar performance to the GSC, indicating that the improvement in the resolution of GNNC compared with GSC is due to the inclusion of additional observables. This is made possible by a simultaneous correction that accounts for correlations between observables. The GNNC provides a larger improvement to the jet energy resolution than the GSC, and so it is used for the remainder of the paper.

Fig. 8
figure 8

The jet \(p_{\text {T}}\) closure for \(0.2< |\eta ^{\text {det}} | < 0.7\). The solid line shows the MCJES, the long dashed line shows the GSC, and the short dashed line shows the GNNC

Fig. 9
figure 9

The jet \(p_{\text {T}}\) resolution for a \(0.2< |\eta ^{\text {det}} | < 0.7\), b \(0.7< |\eta ^{\text {det}} | < 1.3\), c \(1.8< |\eta ^{\text {det}} | < 2.5\), and (d) \(3.2< |\eta ^{\text {det}} | < 3.5\). The solid line shows the MCJES, the long dashed line shows the GSC, and the short dashed line shows the GNNC

4.5.4 Flavour uncertainties

The two flavour-dependence uncertainties in the JES are derived from simulation and account for relative flavour fractions and differing responses to quark- and gluon-initiated jets. The flavour response uncertainty accounts for the fact that, unlike the quark-initiated jet response \(\mathcal {R}_{q}\), the gluon-initiated jet response \(\mathcal {R}_g\) is found to differ significantly between generators. This uncertainty is defined as

$$\begin{aligned} \sigma _{\textrm{response}} = f_{g} (\mathcal {R}_{g, \mathrm {\textsc {Pythia}8}} - \mathcal {R}_{g,\mathrm {\textsc {Herwig}}}), \end{aligned}$$

where \(f_g\) is the fraction of gluon-initiated jets, and \(\mathcal {R}_{g, \mathrm {\textsc {Pythia}8}}\) and \(\mathcal {R}_{g,\mathrm {\textsc {Herwig}}}\) are the gluon-initiated jet response \(\mathcal {R}_g\) in Pythia8 and Herwigrespectively. The flavour composition uncertainty accounts for the fact that the jet response is different for quark- and gluon-initiated jets. This is determined based on the fraction of gluon-initiated jets \(f_g\), where \(\mathcal {R}_q\) and \(\mathcal {R}_g\) are the quark and gluon jet responses measured in Pythia8, and \(\sigma _g^f\) is the uncertainty in \(f_g\) in the sample, with the uncertainty defined as

$$\begin{aligned} \sigma _{\textrm{composition}} = \sigma _{g}^{f} \frac{\mathcal {R}_{q} - \mathcal {R}_{g}}{f_{g} \mathcal {R}_{g} + (1-f_{g})\mathcal {R}_{q}}. \end{aligned}$$

Figure 10 shows a comparison of the flavour composition and flavour response uncertainties for the MCJES, GNNC and GSC. After the MCJES calibration, \(R_q - R_g\) becomes negative for jets above 100 \(\text {GeV}\), which appears as a dip in the flavour composition uncertainty. Both the GSC and GNNC can reduce these uncertainties, with the GNNC providing a greater reduction. For each \(|\eta ^{\text {det}} |\) bin, when compared with the GSC, the GNNC results in an average improvement of around 15–25% in the \(40 \le p_{\text {T}} < 300\) \(\text {GeV}\)   range for the flavour response uncertainty, and up to 25% improvements for the flavour composition uncertainty.

Fig. 10
figure 10

a The flavour response uncertainty, and b the flavour composition uncertainty for central jets. The solid line shows the MCJES, the long dashed line shows the GSC, and the short dashed line shows the GNNC

5 In situ analysis

The final calibration step accounts for differences in the jet response between simulation and data. Such differences arise due to the imperfect simulation of detector response and detector material, and the modelling of physics processes involved: hard scatter, underlying events, pile-up, jet formation and particle interactions with detector material. For the remainder of these studies, a single jet calibration is studied, using the sideband \(\rho \) definition in Sect. 4.2 and the 3D residual calibration in Sect. 4.3, the absolute MC calibration implemented with p-splines in Sect. 4.4, and the GNNC for the global calibration in Sect. 4.5. To fully understand the impact of these changes relative to the calibration procedure in Ref. [4], on the calibration and corresponding uncertainties, the in situ calibration is studied. The in situ calibration provides important validation of the new MC calibration of jets by comparing the data-to-MC difference between the \(p_{\text {T}}\) balance of a jet against a well-calibrated object or system. In addition, novel studies are done to disentangle the physics effects and detector effects in the \(\eta \)-intercalibration to reduce the systematic uncertainties. Furthermore, the \(b-\)jet JES is evaluated in situ using PFlow jets, which is performed using \(\gamma +\) jet events for the first time.

The in situ calibration response \({\mathcal {R}}_{in~situ}\) is defined as the average ratio of the jet \(p_{\text {T}}\) to the transverse momentum of the reference object \(p_{\text {T}} ^{\text {ref}}\), derived as a function of \(p_{\text {T}} ^{\text {ref}}\). The \({\mathcal {R}}_{in~situ}\) response is susceptible to effects such as the radiation of additional partons or the loss of energy outside the reconstructed jet cone. Dedicated event selections are applied to mitigate these effects. A double ratio, insensitive to these secondary effects provided they are well-modelled in simulations, is defined

$$\begin{aligned} \mathcal {C} = \frac{{\mathcal {R}}^{\text {data}}_{in~situ}}{{\mathcal {R}}^{\text {MC}}_{in~situ}}. \end{aligned}$$

The calibration factor to the jet four-momentum can be obtained by a numerical inversion of this double ratio as a function of jet \(p_{\text {T}}\), and as a function of \(\eta _{\text {det}} \) in \(\eta \)-intercalibration.

Two stages of in situ analyses are done sequentially to assess the performance of MC calibrations. First, a relative in situ calibration referred to as the \(\eta \)-intercalibration is done, which corrects the energy scale of forward jets (\(0.8<|\eta _\text {det}|<4.5\)) to match that of the central jets (\(|\eta _\text {det}|<0.8\)) using the \(p_{\text {T}}\) balance in a dijet system. Second, an absolute calibration is done by measuring the \(p_{\text {T}}\) balance of a central jet against a well-calibrated Z boson or a photon. The missing-\(E_\text {T}\) projection fraction (MPF) method [57] is used in \(Z/\gamma +\)jet events to calculate the \(p_{\text {T}}\) balance between the full hadronic recoil and a Z boson or a photon. The method is less susceptible to effects of pile-up and the threshold of the jet reconstruction than the direct balance method, allowing a reliable measurement of the low-\(p_{\text {T}}\) jet response below 100 \(\text {GeV}\). The direct balance (DB) method measures the balance between a (\(b-\))jet recoiled against a photon in \(\gamma \)+jet events. By using the DB instead of MPF, the response of the b-jet itself is studied without including the effects of the hadronic recoil.

For each in situ analysis, main sources of systematic uncertainties arise from the MC model of physics processes, the measurement of the reference object and the \(p_{\text {T}}\) balance due to the selected event topology. Uncertainties related to MC model of physics effects are addressed by taking the difference between the predictions between two distinct MC event generators. The difference between jet response in simulations depends on hadronisation models that cause different jet contents [51]. Uncertainties in the reference object are estimated by propagating its own \(\pm 1\sigma \) calibration uncertainties through the analysis. Uncertainties due to the selected event topology are evaluated by varying the event selection criteria and comparing the impact on the response ratios between data and MC simulation. To reduce the statistical fluctuations when applying the systematic variations, a rebinning procedure similar to that used in previous publications [4] is employed to obtain statistically significant results using pseudo-experiments. This rebinning procedure is only performed in regions where no sharp variations in \(p_{\text {T}}\) response are observed to ensure no real physics effects are removed.

Events must satisfy the common selection requirements in the in situ analysis. Each event is required to have at least one reconstructed primary vertex with at least two matched tracks of \(p_{\text {T}} >500\) \(\text {MeV}\). Jets arising from cosmic rays, non-collision background and calorimeter noise are vetoed by applying data-quality requirements [58]. In addition, jets with \(20<p_{\text {T}} <60\) \(\text {GeV}\) and \(|\eta _{\text {det}} |<2.4\) are required to satisfy the criteria of jet vertex tagging (JVT) [59, 60]. The JVT criteria rejects jets from pile-up interactions by matching jets with the primary vertex; it has a selection efficiency of 97% for hard scatter jets at the nominal operating point.

5.1 \(\eta \)-intercalibration

The jet response in the forward region (\(0.8<|\eta _{\text {det}} |<4.5\)) is typically less understood due to the more complicated detector structure. The \(\eta \)-intercalibration provides a correction for forward jets (\(0.8<|\eta _{\text {det}} |<4.5\)) to bring them to the same energy scale as central jets (\(|\eta _{\text {det}} |<0.8\)). This calibration uses events with a dijet topology, requiring two back-to-back jets in the transverse plane in different \(\eta _{\text {det}} \) regions. In order to increase the statistical precision, there is no requirement on whether or not one of the two jets is in the central reference region: instead, all regions will be calibrated relative to one another by solving a set of linear equations. This is referred to as the matrix method [4]. The momentum asymmetry is defined to measure the jet \(p_{\text {T}}\) balance between the two jets in two distinct detector regions (symbolically labelled left and right for simplicity)

$$\begin{aligned} \mathcal {A} = \frac{p_{\text {T}} ^{\text {left}}-p_{\text {T}} ^{\text {right}}}{p_{\text {T}} ^\text {avg}}, \end{aligned}$$

where \(p_{\text {T}} ^\text {avg}\) is the average of the transverse momentum of the left and right jets (\(p_{\text {T}} ^\text {avg} =(p_{\text {T}} ^{\text {left}}+p_{\text {T}} ^{\text {right}})/2\)). For a narrow bin approximation in \(p_{\text {T}} ^\text {avg}\), the relative response \(\mathcal {R}\) of the left and right jets in terms of the calibration factor for each jet and the mean of \(\mathcal {A}\) can be defined as

$$\begin{aligned} \mathcal {R} = \frac{c^{\textrm{right}}}{c^{\textrm{left}}} = \frac{2 + \left\langle \mathcal {A} \right\rangle }{2 - \left\langle \mathcal {A} \right\rangle } \approx \frac{\left\langle p_{\textrm{T}}^{\textrm{left}} \right\rangle }{\left\langle p_{\textrm{T}}^{\textrm{right}} \right\rangle } \end{aligned}$$

where \(\mathcal {R}\) is measured in terms of \(\eta _{\text {det}}\) for left and right jets and \(p_{\text {T}} ^\text {avg}\). The intercalibration factor c is defined as \(c=\frac{c^{\textrm{right}}}{c^{\textrm{left}}}\) and hence the relative response \(\mathcal {R} \) satisfies \(\mathcal {R} =1/c\).

Dijet events are selected using a combination of forward and central single-jet triggers, where each trigger is considered in the range of \(p_{\text {T}} ^\text {avg}\) that has an efficiency of at least 99%. Prescaled jet triggers are used to accommodate bandwidth limits, and each selected event is weighted accordingly. The trigger combination method [4, 61] is used to maximise the statistical precision. Each event must have at least two leading jets with \(p_{\text {T}} ^\text {avg} >25\) \(\text {GeV}\)   and \(|\eta _{\text {det}} |<4.5\). Events containing a third jet with \(p_{\text {T}} ^{\text {jet 3}}/p_{\text {T}} ^\text {avg} >0.25\) are excluded. The two leading jets must be back-to-back in the transverse plane satisfying a requirement on their azimuthal angle difference \(\Delta \phi ^{1,2}>2.5\).

The nominal calibration is estimated by taking the ratio of the simulated response in Powheg+Pythia8 to the measured response in data. The binning in \(\eta _{\text {det}}\) and \(p_{\text {T}} ^\text {avg}\) is chosen to ensure enough of a sample size in scarce reference regions and to capture granular variations in detector response. A two-dimensional Gaussian kernel is optimised to smooth statistical fluctuations while also capturing notable detector features.

The 2017 data sample is representative of the high pile-up conditions and thus discussed here. The relative response, parameterised by \(\eta _{\text {det}}\) in two \(p_{\text {T}} ^\text {avg}\) regions and by \(p_{\text {T}} ^\text {avg}\) in two \(\eta _{\text {det}}\) regions between the 2017 data sample and MC simulations from Powheg+Pythia8 and Powheg+Herwig 7 with an angular ordered shower, is shown in Fig. 11. The predicted response in the two MC simulations is found to capture the overall shape of the \(\eta _{\text {det}}\) dependence. However, the response predicted from simulations is consistently lower than that measured in data for the forward detector regions across all \(p_{\text {T}} ^\text {avg}\) bins.

Fig. 11
figure 11

Relative jet response, 1/c, calibrated with PFlow+JES as a function of \(\eta _{\text {det}}\) in the ranges of a \(25~\text {GeV}< p_{\text {T}} ^\text {jet} < 40~\text {GeV} \) and b \(400~\text {GeV}< p_{\text {T}} ^\text {jet} < 525~\text {GeV} \), and as a function of \(p_{\text {T}}\) in ranges of c \(1.2< \eta _{\text {det}} < 1.4\) and d \(3.3< \eta _{\text {det}} < 3.4\). The top panel presents the measured relative response for data (dots), Powheg+Pythia8 (triangles) and Powheg+Herwig 7 (triangles). The bottom panel presents the MC-to-data response ratios represented by triangles and the smoothed in situ corrections are represented by overlayed curves, in which the solid line shows the derived calibration and the dashed line shows the extrapolated calibration to sparse detector regions using the two dimensional Gaussian Kernels. Two perpendicular lines are drawn at \(\eta _{\text {det}} =\pm 0.8\) to indicate the central (\(|\eta _{\text {det}} |<0.8\)) and the forward (\(0.8<|\eta _{\text {det}} |<4.5\)) detector region. Three horizontal dashed lines are drawn at 0.97, 1, and 1.03 to provide reference points for the viewer

Uncertainties can arise due to the inaccurate description of physics, detector response and the dijet topology on the momentum balance. They are evaluated in terms of \(p_{\text {T}} ^\text {avg}\) and \(\eta _{\text {det}}\). Uncertainties arising from the MC mismodelling are estimated by taking the difference between the smoothed residual correction between Powheg+Pythia8 and Powheg+Herwig 7 with angular ordered shower. Other uncertainties due to the mismodelling in physics and event topology are estimated by modifying the requirement on the third jet veto, the \(\Delta \phi ^{1,2}\) separation, and JVT.

Further studies are performed at particle and reconstruction level separately to disentangle physics and detector effects. The particle level can be used to study physics effects affecting the dijet balance due to additional parton radiations or out-of-cone corrections. It is performed using the same procedure as the reconstruction level except that no JVT requirements are applied. The matrix method is used as the nominal method while the central reference method [4] is used as a cross-check. These physics effects induce a smooth and non trivial structure of the relative response 1/c with a slight asymmetry between positive and negative \(\eta _{\text {det}}\) in the forward region as shown in Fig. 12a. A similar structure with sharper variations due to convolution with detector effects is also present at the reconstruction level shown in Fig. 11a. The systematic uncertainty \(\Delta c\) on the intercalibration factor at particle level is \(\Delta c = \left| \frac{c^{\textrm{syst}}}{c^{\textrm{nominal}}} - 1 \right| \), where \(c^{\textrm{syst}}\) is the intercalibration coefficient obtained with a different selection of the events, either a different selection on \(p_{\text {T}} ^\mathrm {jet~3}/p_{\text {T}} ^{\textrm{avg}}\) or \(\Delta \phi ^{1,2}\). By comparing Fig. 12a, b, the magnitude of these physics effects at particle level is similar to the magnitude of the systematic uncertainties designed to cover them, which are evaluated by varying the selection criteria for \(p_{\text {T}} ^\mathrm {jet~3}/p_{\text {T}} ^{\textrm{avg}}\) and \(\Delta \phi ^{1,2}\) at reconstruction level in data and MC simultaneously. Therefore, these uncertainties are not underestimated.

Variations in parton showering and hadronisation models can affect dijet balance that convolves both the physics and detector effects. The MC modelling uncertainty derived at particle level as a function of \(\eta _{\text {det}}\) only considers the physics effects on the dijet balance and excludes impacts on the detector response which were evaluated in the jet flavour response uncertainty using various MC simulations discussed in Sect. 4.5. Such a procedure will significantly reduce the MC modelling uncertainty shown in Fig. 12c and avoid possible double counting of uncertainties.

Fig. 12
figure 12

Intercalibration coefficients and uncertainties derived at both the particle and reconstruction level, using MC simulated events reconstructed with the conditions of 2017 data-taking period. a Intercalibration coefficients obtained for different generators and different methods. Powheg+Pythia8 is used as the nominal MC generator. Three horizontal dashed lines are drawn at 0.97, 1, and 1.03 to provide reference points for the viewer. b Systematic uncertainties obtained with the matrix method and with Powheg+Pythia8 at particle level. The up and down variations are symmetrised by taking the maximum of either of them. A 2D smoothing is applied ( in \(p_{\text {T}} ^{\textrm{avg}}\) and in \(\eta \)) with a Gaussian kernel. c MC modelling uncertainty, evaluated either at reconstruction level in the dashed line or at particle level in the solid line. A smoothing is applied after the computation of \(\left| \frac{c_{\mathrm {\textsc {Powheg+Pythia}{8} }}}{c_{\mathrm {\textsc {Powheg+Herwig\,7} }}} - 1 \right| \)

Figure 13 shows the fractional uncertainties derived as a function of \(\eta _{\text {det}}\) for two representative \(p_{\text {T}}\) values. The systematic uncertainty for \(|\eta _{\text {det}} |<0.8\) is set to zero as they are determined from the absolute in situ JES measurements such as \(Z/\gamma \)+jet analysis. The fractional uncertainties increase with \(\eta _{\text {det}}\) for \(|\eta _{\text {det}} |>0.8\) and illustrate a significant decrease with increasing \(p_{\text {T}}\). Dominant uncertainties arise from the choice of event generators and variations in the selection criteria on \(p_{\text {T}} ^\mathrm {jet~3}/p_{\text {T}} ^{\textrm{avg}}\). The total systematic uncertainty is significantly reduced by using MC modelling uncertainty estimated at particle level instead of reconstruction level. It is worth noting that systematic variations in the selection criterion such as \(p_{\text {T}} ^\mathrm {jet~3}/p_{\text {T}} ^{\textrm{avg}}\) are performed simultaneously in data and simulation at reconstruction level while Fig. 12b shows only the relative impact at particle level. If there is a difference between up and down variations, then the systematic uncertainty is taken to be the larger absolute value. Systematic uncertainties are symmetrised around \(\eta _{\text {det}} =0\) between the positive and negative \(\eta _{\text {det}}\) values using the most conservative approach, as whether the asymmetry of the systematic uncertainty in \(\eta _{\text {det}}\) arises from statistical fluctuations or detector effects is unknown.

Fig. 13
figure 13

Systematic uncertainties in the \(\eta \) intercalibration as a function of \(\eta _{\text {det}} \) for PFlow+JES jets of a \(p_{\text {T}} =35~\text {GeV} \), b \(p_{\text {T}} =450~\text {GeV} \). The total systematic uncertainty is represented by the middle shaded band, which is a quadrature sum of different components of systematic uncertainties marked by coloured lines. The statistical uncertainty is indicated by the lowest shaded band. The highest shaded band shows the quadrature sum of different components of systematic uncertainties using MC modelling uncertainty estimated at reconstruction level. A smoothing procedure is applied to the systematic uncertainty to suppress statistical fluctuations. The precision is limited by the MC modelling uncertainty estimated at particle level and variations in the selection criteria for \(p_{\text {T}} ^\mathrm {jet~3}/p_{\text {T}} ^{\textrm{avg}}\)

Fig. 14
figure 14

The MPF response as a function of \(p_{\text {T}} ^{\text {ref}}\) measured in data and simulations in Z+jet events for a \(Z\rightarrow ee\) and b \(Z\rightarrow \mu \mu \). The data are represented by the black dots. The MadGraph+Pythia8predictions are represented by the triangles while the Sherpapredictions are represented by the inverted triangles. The MC-to-data response ratios are shown in the bottom panel. The error bars correspond to the statistical uncertainties

5.2 \(Z/\gamma +\)jet balance

The next step in the jet calibration brings the absolute jet energy scale in data to the same scale in simulation by exploiting the \(p_{\text {T}}\) balance between the hadronic recoil and a well-calibrated object such as a Z boson or a photon. The jet used in the in situ analysis is required to be from the central detector region (\(|\eta |<0.8\)), in which the derived correction can be applied to jets in the forward region via the \(\eta \) intercalibration. The \(Z/\gamma +\)jet balance measurement is built upon the precise determination of the energy of the photon or \(e/\mu \) pair from a Z boson decay. These measurements benefit from the accurate knowledge of the energy scale and resolution of the leptons. The calibration of electrons and photons is accurately known through measurements using \(Z\rightarrow ee\) data and other final states [62] while the muon calibration is determined to high precision through studies of \(J/\Psi \rightarrow \mu \mu \), \(Z\rightarrow \mu \mu \) and \(\Upsilon \rightarrow \mu \mu \) [63].

Three independent measurements consisting of Z+jet for the \(Z\rightarrow ee\) and \(Z\rightarrow \mu \mu \) decay channels, and \(\gamma \)+jet are used to do the absolute in situ calibration. The \(Z+\)jet measurement provides enough of a sample size at low and medium jet \(p_{\text {T}}\) covering \(17<p_{\text {T}} <800\) \(\text {GeV}\)   with limited precision above 800 \(\text {GeV}\). The \(\gamma +\)jet analysis provides a complimentary measurement at medium and high \(p_{\text {T}}\) covering \(30<p_{\text {T}} <2000\) \(\text {GeV}\), with limited precision below 100 \(\text {GeV}\)  due to prescaled low-\(p_{\text {T}}\) triggers, jets misidentified as photons, and MC event generator choices.

The MPF method measures the \(p_{\text {T}}\) balance between the reference objects and the full hadronic recoil in \(Z/\gamma +\)jet events. This technique allows the calorimeter response to the hadronic showers to be computed directly. It has low susceptibility to pile-up and underlying event which is uniform across the detector and thus cancelled out in the MPF method. According to conservation of transverse momentum, the transverse momentum of all of the hadronic activity in a \(Z/\gamma +\)jet event, \(p_{\text {T}} ^\text {recoil}\), should be equal and opposite to the transverse momentum of the reference boson, \(p_{\text {T}} ^{\text {ref}}\), at particle level, such that

$$\begin{aligned} \vec {p}_\text {T,truth}^\text { ref} + \vec {p}_\text {T,truth}^{\text { recoil}} = 0. \end{aligned}$$
(1)

At the detector level, the well-calibrated objects have a response of one while the calorimeter response to the hadronic recoil \(r_{\text {MPF}}\) is lower than unity, resulting in possible missing transverse momentum \(\vec {E}^\mathrm {\,miss}_\textrm{T}\) in the event. Therefore, equation 1 can be written as:

$$\begin{aligned} \vec {p}_\text {T}^{\,\mathrm {\text {ref}}}+ r_{\text {MPF}}\,\, \vec {p}_\text {T}^{\,\textrm{recoil}}= - \vec {E}^\mathrm {\,miss}_\textrm{T}. \end{aligned}$$

Projecting the vector terms along the direction of the reference boson using a unit vector \(\hat{n}_\text {ref}\) in the transverse plane, then \(r_{\text {MPF}}\) is only dependent on the missing transverse momentum and the transverse momentum of the reference boson. The average of \(r_{\text {MPF}}\), \(\mathcal {R} _\text {MPF}\), is measured as a function of \(p_{\text {T}} ^{\text {ref}}\), the \(p_{\text {T}}\) of the reference \(Z/\gamma \) boson:

$$\begin{aligned} \mathcal {R} _{\text {MPF}} = \bigg \langle 1 + \frac{\hat{n}_\text {ref} \cdot \vec {E}^\mathrm {\,miss}_\textrm{T}}{p_{\text {T}} ^{\text {ref}}} \bigg \rangle \, \end{aligned}$$

where \(\vec {E}^\mathrm {\,miss}_\textrm{T}\) is computed using particle-flow objects calibrated at the EM scale.

\(Z+\)jet events are selected using either the lowest-\(p_{\text {T}}\) unprescaled dielectron or dimuon trigger. The lowest-\(p_{\text {T}}\) threshold in the dielectron trigger corresponds to 15 \(\text {GeV}\)   for each electron while the lowest-\(p_{\text {T}}\) threshold in the dimuon trigger corresponds to 14 \(\text {GeV}\) [64, 65] for each muon. Both the leptons are required to have \(p_{\text {T}} >20\) \(\text {GeV}\) to have fully efficient triggers. Electrons or muons must satisfy loose identification and isolation criteria [62, 63]. Electrons are required to fall within \(|\eta |=2.47\) and are rejected if they fall in the calorimeter crack region \(1.37<|\eta |<1.52\). Muons must fall within \(|\eta |=2.4\). The oppositely charged electron and muon pair is required to have an invariant mass around the Z boson mass, \(66<m_{ee/\mu \mu }<116\) \(\text {GeV}\). \(\gamma +\)jet events are selected using a combination of prescaled and unprescaled single photon triggers in which the lowest prescaled trigger \(E_{\text {T}}\) threshold is 10 \(\text {GeV}\). Photon candidates entering the analysis are required to have \(E_{\text {T}} ^{\gamma }>25\) \(\text {GeV}\) and \(|\eta ^{\gamma }| < 1.37\) and to satisfy the tight identification and isolation selection criteria [62]. The jet is removed if it falls within \(\Delta R=0.4\) (0.35) of a photon (lepton).

Further selection criteria are imposed in the \(Z/\gamma +\)jet measurements to reduce the impact from pile-up and additional parton radiations. To suppress contamination from pile-up, jets are required to satisfy the cleaning criteria and to satisfy the JVT requirement. Events must contain a jet with \(p_{\text {T}}\) greater than 10 \(\text {GeV}\) that falls within \(|\eta |=0.8\). To suppress effects from additional parton radiations, further requirements are imposed on the azimuthal angle between the reference boson and the leading jet \(\Delta \phi ^{\text {ref, jet}} > 2.9\) and \(p_{\text {T}}\) of the subleading jet \(p_{\text {T}} < \text {max}(0.3 \times p_{\text {T}} ^{\text {ref}},12)~\text {GeV} \), where the subleading jet falls within \(|\eta |=4.5\).

The MPF response as a function of reference boson \(p_{\text {T}}\) is shown in Figs. 14 and 15 using \(Z+\)jet and \(\gamma \)+jet events for data and two distinct MC samples. The MC sample used to derive the nominal calibration for \(Z+\)jet (\(\gamma \)+jet) corresponds to MadGraph+Pythia8  (Pythia8). The alternative MC sample corresponds to Sherpato determine the uncertainty from the MC event modelling. The dip in the MPF response at low \(p_{\text {T}} ^{\text {ref}} \) arises due to two opposing effects: the jet reconstruction threshold which tends to increase the response at the lowest jet \(p_{\text {T}}\) values between 17 \(\text {GeV}\) and 20 \(\text {GeV}\) and the apparent rise in MPF response as a function of \(p_{\text {T}}\). The MC-to-data response ratio are rather consistent between \(Z+\)jet and \(\gamma +\)jet.

Fig. 15
figure 15

The MPF response as a function of \(p_{\text {T}} ^{\text {ref}}\) measured in data and simulations for \(\gamma +\)jet. The data are represented by the black dots. The Pythia8 predictions are represented by the triangles while the Sherpapredictions are represented by the inverted triangles. The MC-to-data response ratios are shown in the bottom panel. The error bars correspond to the statistical uncertainties

Fig. 16
figure 16

Systematic uncertainty in the MPF response ratios as a function of \(p_{\text {T}}\) for jets calibrated up to, and including, the \(\eta \) intercalibration for a \(Z(\rightarrow ee)+\)jet events and b \(Z(\rightarrow \mu \mu )+\)jet events. Uncertainties arise from JVT, the subleading jet veto and \(\Delta \phi ^{\text {ref, jet}}\) requirement in the analysis selection. Uncertainties due to electron and muon energy scale and resolution are propagated through the analysis. The statistical uncertainty of the MC-to-data response ratios and the uncertainties due to choice of event generators are shown. Each uncertainty is smoothed to suppress statistical fluctuations

Several sources of systematic uncertainties are considered. Uncertainties due to the energy scale and resolution of the reference objects \(e/\mu /\gamma \) are derived from existing calibrations for each object and propagated through the corresponding analysis. The impact of additional parton radiation on the response measurement is evaluated by varying the selection criteria for the subleading jet veto and \(\Delta \phi ^{\text {ref, jet}}\). Uncertainties arising from pile-up suppression are estimated by comparing the response measurement between tighter and looser JVT working points. Uncertainties arising from photon purity in \(\gamma +\)jet events are assessed using the same methodology documented in [66], in which one of the jets is misreconstructed as a photon. The pseudo-experiments are implemented in the estimate of uncertainties to reduce statistical fluctuations.

The uncertainties for the calibration are presented for the \(Z\rightarrow ee\) and \(Z\rightarrow \mu \mu \) measurements in Fig. 16 and for the \(\gamma +\)jet measurement in Fig. 17. Uncertainties are dominated by the modelling in MC simulations in the low- and medium-\(p_{\text {T}}\) regions, and by the energy scale of the photon/electron for \(p_{\text {T}} >100\) \(\text {GeV}\).

Fig. 17
figure 17

Systematic uncertainty in the MPF response ratios as a function of \(p_{\text {T}}\) for jets calibrated up to, and including, the \(\eta \) intercalibration in \(\gamma \)+jet events. Uncertainties arise from JVT, the subleading jet veto and \(\Delta \phi ^{\text {ref, jet}}\) requirement in the analysis selection. Uncertainties due to photon energy scale and resolution are propagated through the analysis. The statistical uncertainty of the MC-to-data response ratios and the uncertainties due to choice of event generators and photon purities are shown. Each uncertainty is smoothed to suppress statistical fluctuation

The derived calibrations are stable over a range of pile-up conditions in Run 2. Figure 18 shows the MC-to-data response ratios as a function of \(\mu \) and \(N_\text {PV}\) in \(\gamma +\)jet event for \(45<p_{\text {T}} ^{\text {ref}} <65\) \(\text {GeV}\). The in situ calibration is consistent as a function of \(\mu \) or \(N_\text {PV}\), demonstrating the expected stability.

5.3 b-quark jet energy scale in \(\gamma \)+jet balance

The measurement of the top-quark mass is limited by the b-quark jet energy scale (bJES) and a measurement of the bJES can potentially improve the precision. The direct balance (DB) technique is used in \(\gamma \)+jet events to measure the balance of a (b-tagged) jet against a well-calibrated photon. It represents the first measurement determining the \(b-\)tagged jet energy scale using the PFlow jets in this event topology. \(p_{\text {T}} ^{\text {ref}}\) is defined in terms of the reference object \(p_{\text {T}}\), \(p_{\text {T}} ^{\text {ref}} =p_{\text {T}} \times \cos \Delta \phi \), where \(p_{\text {T}}\) is the transverse momentum of the photon, and \(\Delta \phi \) is the azimuthal angle difference between the photon and the leading jet.

The selections are similar to the \(\gamma \)+jet selection in the MPF method unless stated otherwise. Events must have a jet with \(p_{\text {T}} >20\) \(\text {GeV}\) instead of \(p_{\text {T}} >10\) \(\text {GeV}\) in the central detector region (\(|\eta _{\text {det}} |<0.8\)). The higher jet \(p_{\text {T}}\) threshold arises due to the tighter requirement on the jet transverse momentum in the \(b-\)tagging algorithm used. To suppress additional radiation, the DB technique requires the subleading jet \(p_{\text {T}} ^\text {j2} < \text {max}(0.1 \times p_{\text {T}} ^{\text {ref}},15)~\text {GeV} \) and \(\Delta \phi ^{\gamma ,\text {jet}} > 2.8\). Jets in the inner tracker coverage (\(|\eta |<2.5\)) containing \(b-\)hadrons are identified (\(b-\)tagged) by a multivariate algorithm (DL1r) using information of impact parameters of tracks and displaced vertices [67]. The \(b-\)tagging working points with an average efficiency of 60%, 70%, 77% and 85% are used. The events are classified into inclusive and b-tagged categories. The \(b-\)tagged category is predominantly composed of the \(b-\) and \(c-\) quark jets while the light quark and gluon jets dominates the inclusive categories. A jet is labelled as \(b-\) (\(c-\)) quark jets if any b(c) parton or hadron at particle level is found to be within a cone of \(\Delta R<0.3\) around a reconstructed jet, otherwise it is labelled as light quark or gluons. A summary of the jet flavour composition for inclusive and \(b-\)tagged jets is documented in Table 2. The 85% \(b-\)tagging working points are dominated by the presence of \(c-\)quark jets and the measurement can be used to constrain the cJES in \(H\rightarrow c\bar{c}\) analysis [68] for instance.

Fig. 18
figure 18

The MPF response for \(45<p_{\text {T}} ^{\text {ref}} <65\) \(\text {GeV}\)   measured in data and simulations as a function of a \(\mu \) and b \(N_\text {PV}\). The data are represented by the black dots. The Pythia8 predictions are represented by the triangles while the Sherpapredictions are represented by the inverted triangles. The MC-to-data response ratios are shown in the bottom panel. The error bars correspond to the statistical uncertainties

Figure 19 shows the DB response as a function of the reference photon \(p_{\text {T}}\) for the inclusive and b-tagged jets using \(b-\)tagging working points with an average efficiency of 77%. The MC simulations are in reasonable agreement with data. The MC-to-data response ratios are found to be slightly below one for \(b-\)tagged jets and above one for inclusive jets in almost all bins. The difference between DB response between Pythia8 and Sherpaarises due to different \(b-\)quark fragmentation and decay models. Checks on the apparent rise of the DB response around 150 \(\text {GeV}\)   for \(b-\)tagged jets are done such as the quality of the DB response fit, \(b-\)tagging scale factors applied in simulations, the jet flavour composition between neighbouring \(p_{\text {T}}\) bins and a looser second jet veto with \(p_{\text {T}} ^\text {j2} <0.2 \times p_{\text {T}} ^{\text {ref}} \). None of the checks mentioned above is responsible for the DB response rise around 150 \(\text {GeV}\). Hence these checks suggest that the feature is due to statistical fluctuations. Figure 20 shows the uncertainties for the b-tagged case with a precision between 1% and 5% and inclusive jets with a precision up to 1% for the chosen \(p_{\text {T}}\) range. For \(b-\)tagged jets, uncertainties are dominated by the event generator modelling everywhere, while for inclusive jets the precision is limited by the event generator modelling, photon purity and the subleading jet veto at lower \(p_{\text {T}}\) and photon energy scale for \(p_{\text {T}} >70\) \(\text {GeV}\).

Table 2 The average fractions of jet flavours for various \(b-\)tagging working points and inclusive jet
Fig. 19
figure 19

The DB response as a function of \(p_{\text {T}} ^{\text {ref}}\) measured in data and simulations for a \(\gamma +\)jet and b \(\gamma +b\)-tagged jet. The data are represented by the black dots and the Pythia8 are represented by the triangles, and the (Sherpa) predictions are represented by the inverted triangles. The MC-to-data response ratios are shown in the bottom panel. The error bars correspond to the statistical uncertainties

Fig. 20
figure 20

Summaries of uncertainties in the MC-to-data response ratio as a function of \(p_{\text {T}} ^{\text {ref}}\) for a \(\gamma +\)jet and b \(\gamma +b\)-tagged jet. Uncertainties can arise from photon purity, \(b-\)tagging, JVT, \(\Delta \phi \) and subleading jet veto requirement. Uncertainties due the photon energy scale and resolution are propagated through the analysis

A new observable \(\tilde{R}_{b\text {JES}}\) is defined as a double ratio of \(b-\)tagged response to the inclusive jet response to further measure the energy scale differences between the b-tagged and inclusive jets,

$$\begin{aligned} \tilde{R}_{b\text {JES}} = \frac{{\mathcal {R}}^{\text {MC}}_{\text {b-tagged}}/{\mathcal {R}}^{\text {data}}_{\text {b-tagged}}}{{\mathcal {R}}^{\text {MC}}_{\text {inclusive}}/{\mathcal {R}}^{\text {data}}_{\text {inclusive}}}\,. \end{aligned}$$

As the nominal jet calibration is determined relative to the inclusive jet, such a double ratio can be applied on top of the nominal jet calibration to correct bJES. The value of \(\tilde{R}_{b\text {JES}}\) is determined to be below one using both the MC samples with a slightly higher response in Pythia8 than Sherpashown in Fig. 21. The difference between the two event generators arises from different fragmentation and decay models. The ratio, \(\tilde{R}_{b\text {JES}}\), is also determined inclusively for photon \(p_{\text {T}} ^{\text {ref}}\) between 85 and 1000 \(\text {GeV}\) for various b-tagging working points in Table 3 to increase statistical precision for Pythia8 and Sherpa, respectively. It is foreseen to provide MC specific calibrations for the bJES to reduce the effects arising from MC modelling. The ratio, \(\tilde{R}_{b\text {JES}}\), was measured with unprecedented precision up to 1%. This in turn will improve precision in measurements of top mass.

Fig. 21
figure 21

\(\tilde{R}_{b\text {JES}}\) as a function of reference photon \(p_{\text {T}}\) determined using either Pythia8 or Sherpafor b-tagging working point with an efficiency of a 60%, b 70%, c 77% and d 85%. The error bars correspond to the statistical uncertainties

6 Conclusion

The determination of the jet energy scale (JES) is presented using data recorded by the ATLAS experiment in pp collisions at \(\sqrt{s} = 13~\text {TeV} \). The calibration scheme used for anti-\(k_{t}\) jets reconstructed using radius parameter \(R = 0.4\) consists of two steps: a Monte-Carlo-based calibration that corrects jets to the truth jet scale, and an in situ calibration correcting the scale of jets in data.

The simulation-based calibration implements several new strategies to improve the pile-up stability at higher \(p_{\text {T}}\), closure, energy resolution, and modelling uncertainties of the jets. Biases related to the determination of the pile-up \(p_{\text {T}}\) density were a dominant source of uncertainty for jets with \(p_{\text {T}}\) below 30 \(\text {GeV}\). The new procedure presented, combined with improvements to the multi-parton interactions model in Monte Carlo simulation, reduces this uncertainty by a factor of seven. Following this, a new residual calibration is applied, which reduces the effects of pile-up by simultaneously correcting for \(\mu \), \(N_{\text {PV}}\), and \(p_{\text {T}}\). For the absolute MCJES, a new fit method based on splines is used, leading to better closure for jets with \(p_{\text {T}}\) below 30 \(\text {GeV}\). Finally, for the global calibration, which improves the resolution of jets and reduces the difference between the energy scale for quark- and gluon-initiated jets, a new method using a DNN is used, which allows information from correlated observables to be used for this calibration step. This DNN results in an average improvement of the JER of around 15% improvement compared with previous methods, with maximum improvement of over 40%.

Table 3 \(\tilde{R}_{b\text {JES}}\) obtained for various \(b-\)tagging working points using Pythia8 and Sherpaseparately for \(85<p_{\text {T}} <1000\) \(\text {GeV}\)

Following these simulation-based calibration steps, the full Run 2 data sample is used to do a residual in situ calibration to correct the data-MC differences and constrain the uncertainties. Dijet events are used to calibrate jets in the forward region relative to the central region as a function of jet transverse momentum and pseudorapidity. The precision is improved by up to a factor of two in the forward detector region at low \(p_{\text {T}}\) by evaluating the MC modelling uncertainty at particle level instead of reconstruction level. Central jets are calibrated by exploiting the balance between jets recoiling against either a photon or a Z boson. Unprecedented precision up to 1% is achieved in the in situ analysis. For the first time, the energy scale of b-tagged jets relative to inclusive jets is determined with precision up to 1% in \(\gamma \)+jet events. This result is important for improving precision in analysis sensitive to \(b-\)JES such as the top quark mass and \(H\rightarrow b\bar{b}\) measurements.