Calibration of the light-ﬂavour jet mistagging efﬁciency of the b -tagging algorithms with Z +jets events using 139 fb − 1 of ATLAS proton–proton collision data at √ s = 13 TeV

The identiﬁcation of b -jets, referred to as b - tagging, is an important part of many physics analyses in the ATLAS experiment at the Large Hadron Collider and an accurate calibration of its performance is essential for high-quality physics results. This publication describes the calibration of the light-ﬂavour jet mistagging efﬁciency in a data sample of proton–proton collision events at √ s = 13 TeV corresponding to an integrated luminosity of 139 fb − 1 . The calibration is performed in a sample of Z bosons produced in association with jets. Due to the low mistagging efﬁciency for light-ﬂavour jets, a method which uses modiﬁed versions of the b -tagging algorithms referred to as ﬂip taggers is used in this work. A ﬁt to the jet-ﬂavour-sensitive secondary-vertex mass is performed to extract a scale factor from data, to correct the light-ﬂavour jet mistagging efﬁciency in Monte Carlo simulations, while simultaneously correcting the b -jet efﬁ-ciency. With this procedure, uncertainties coming from the modeling of jets from heavy-ﬂavour hadrons are considerably lower than in previous calibrations of the mistagging scale factors, where they were dominant. The scale factors obtained in this calibration are consistent with unity within uncertainties.


Introduction
Many analyses in ATLAS [1], such as measurements or searches involving top quarks or Higgs bosons, rely on the identification of jets containing -hadrons ( -jets) with high tagging efficiency and low mistagging efficiency for jets containing -hadrons ( -jets) or containing neither -nor -hadrons (light-flavour jets). The relatively long lifetime and high mass of -hadrons together with the large track multiplicity of their decay products is exploited by -tagging algorithms to identify -jets.
The -tagging algorithms are trained using Monte Carlo (MC) simulated events and therefore need to be calibrated in order to correct for efficiency differences between data and simulation that may arise from an imperfect description of the data, e.g. in the parton shower and fragmentation modelling or in the detector and response simulation. The efficiency of identifying a -jet ( ) and the mistagging efficiencies ( and light ), which are the probabilities that other jets are wrongly identified by the algorithms as -jets, are measured in data and compared with the predictions of the simulation. These tagging and mistagging efficiencies are defined as (1) The calibration factors correct the efficiencies and mistagging efficiencies in simulation to the ones in data and are applied to all physics analyses in ATLAS that use -tagging. The -tagging efficiencies and the calibration SFs depend on the jet kinematics. The -jet efficiency ( ) is calibrated using the method described in Ref. [2], where the SFs are extracted from a sample of events containing top-quark pairs decaying into a final state with two charged leptons and two -jets. The -jet mistagging efficiency ( ) is calibrated via the method described in Ref. [3], where the SFs are extracted from events containing top-quark pairs decaying into a final state with exactly one charged lepton and several jets. The events are reconstructed using a kinematic likelihood technique and include a hadronically decaying boson, whose decay products are rich in -jets.
This paper describes the measurement of the light-flavour jet mistagging efficiency ( light ) of the DL1r -tagger, which is widely used in ATLAS Run 2 physics analyses and is discussed in Section 5. The SFs light are extracted from particle-flow jets [4] using 139 fb −1 of proton-proton ( ) collision data collected by the ATLAS detector during Run 2 of the LHC.
The mistagging efficiency light is difficult to calibrate because after applying a -tagging requirement, the obtained sample of jets is strongly dominated by -jets and the fraction of light-flavour jets passing a selection on the -tagging score is too low to estimate light data . In order to extract an unbiased and precise SF light , a sample enriched in mistagged light-flavour jets is required.
In this paper, the Negative Tag method [5,6], which relies on a modified tagger with a reduced and but similar light with respect to the nominal tagger, is used. The method, described in detail in Section 6, has already been used to calibrate light by using dĳet events in 2015-2016 data [7] from the early part of Run 2.
In this work, jets produced in association with bosons ( +jets) are used instead, allowing the use of unprescaled lepton triggers instead of the prescaled single-jet triggers which were used in the previous calibration. The calibration precision is improved by extracting the SF light in a fit that simultaneously corrects using data. Previously, was estimated from simulation, resulting in large modelling uncertainties which impacted the precision of the result. An alternative approach [7] to the Negative Tag method exists in which light data is estimated from simulation by applying data-driven corrections to the input quantities of the low-level -tagging algorithms. This method is used to assess the extrapolation uncertainty between the modified and nominal tagger as described in Section 7.
The paper is structured as follows. Section 2 describes the ATLAS detector. Section 3 presents the dataset and simulations used in this calibration. The reconstruction of jets, electrons and muons is summarised in Section 4, while the -tagging algorithms are described in detail in Section 5. The calibration method is detailed in Section 6. Systematic uncertainties and results are presented, respectively, in Section 7 and Section 8 and conclusions are given in Section 9.

ATLAS detector
The ATLAS experiment [1] at the LHC is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4 coverage in solid angle. 1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer with a toroidal magnet system.
The inner tracking detector, which provides full coverage of a pseudorapidity range | | < 2.5, consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first measurement normally being in the insertable B-layer (IBL) installed before Run 2, primarily to enhance the -tagging performance [8,9]. Sampling calorimeters, made of lead and liquid Argon (LAr), provide electromagnetic (EM) energy measurements with high granularity in the pseudorapidity region | | < 3.2. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (| | < 1.7) and a copper/LAr hadron calorimeter covers the range 1.5 < | | < 3.2. The forward region is instrumented with LAr calorimeters in the range 3.1 < | | < 4.9, measuring both electromagnetic and hadronic energies in copper/LAr and tungsten/LAr modules. The muon spectrometer surrounds the calorimeters and is based on three large superconducting air-core toroidal magnets with eight coils each. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. The muon spectrometer includes a system of precision tracking chambers and fast detectors for triggering.
A two-level trigger system is used to select events. The first-level trigger is implemented in hardware and uses a subset of the detector information to accept events at a rate below 100 kHz. This is followed 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point in the centre of the detector and the -axis along the beam pipe. The -axis points from the interaction point to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). The angular distance is measured in units of Δ ≡ √︁ (Δ ) 2 + (Δ ) 2 . the parton shower, hadronisation, and underlying event, using the A14 set of tuned parameters [31]. This simulation was used to model top-quark pair production with subsequent decays into final states with two charged leptons and into final states with one charged lepton. The latter is used to evaluate the difference between the SFs obtained from the modified (DL1rFlip) and nominal (DL1r) -tagging algorithms.
The decays of -and -hadrons were handled by EvtGen 1.6.0 [32] in all simulations, except for those generated using Sherpa, for which the default configuration recommended by the Sherpa authors was used.

Object and event selection
Tracks of charged particles are reconstructed from energy deposits in the material of the ID [33]. Events are required to contain at least one vertex with two or more associated tracks that must have T > 500 MeV. Among all vertices, the vertex with the highest 2 T sum of the associated tracks is taken as the primary vertex (PV) [34]. The transverse ( 0 ) track impact parameter (IP) is defined as the distance of closest approach of the track-trajectory to the PV in the transverse plane. The longitudinal track IP ( 0 ) is defined as the distance in the -direction between the PV and the track trajectory at the point of the closest approach in the -plane.
Jets containing -and -hadrons are characterised by having in-flight decays due to the relatively long lifetime of the heavy-flavour hadrons and give rise to secondary vertices (SV) with associated tracks. To exploit the presence of in-flight decays in the jet direction, the signed IP is defined for each track within a jet. As shown in Figure 1, a track has a positive IP if the angle between the jet direction and the line joining the PV to the point of closest approach to the track is less than /2 and negative otherwise. The Single Secondary Vertex Finder (SSVF) algorithm [35] is used for the reconstruction of the SV.
Electron candidates are reconstructed from energy clusters in the EM calorimeter matched to an ID track [36]. Candidates must satisfy T > 18 GeV and | | < 2.47, excluding the barrel-endcap transition region of 1.37< | | <1.52, and their associated tracks must fulfil | 0 |/ 0 < 5 and | 0 | sin < 0.5 mm, where 0 is the uncertainty in 0 . Electrons are identified using a likelihood-based discriminant that combines information about shower shapes in the EM calorimeter, track properties and the quality of the track-to-cluster matching [36]. Electrons must fulfill a 'Tight' identification selection. The gradient isolation criteria are applied to reject non-prompt electrons, using energy deposits in the ID and calorimeter ? T = 25 (60) GeV. Efficiency scale factors measured in data [36] are used to correct for differences in 161 reconstruction, identification, isolation and trigger selection efficiencies between data and simulation.

162
Muon candidates are reconstructed by combining tracks in the ID with tracks in the muon spectrometer [37]. 163 Only muons with |[| < 2.5 and ? T > 18 GeV are used. They have to fulfil the 'Tight' identification 164 selection criteria [38] and satisfy |3 0 |/f 3 0 < 3 and |I 0 | sin \ < 0.5 mm. The isolation selection reduces isolation and trigger efficiencies between simulation and data [38].
170 Jets are reconstructed from particle-flow objects combining information from the tracker and calorimeter [4],

181
The jet-flavour labelling in simulation is based on an angular matching of reconstructed jets to generator- within a cone around the electron. Their efficiency is 90% (99%) for electrons from → decays at T = 25 (60) GeV. Efficiency scale factors measured in data [36] are used to correct for differences in reconstruction, identification, isolation and trigger selection efficiencies between data and simulation.
Muon candidates are reconstructed by combining tracks in the ID with tracks in the muon spectrometer [37]. Only muons with | | < 2.5 and T > 18 GeV are used. They have to fulfil the 'Tight' identification selection criteria [38] and satisfy | 0 |/ 0 < 3 and | 0 | sin < 0.5 mm. The isolation requirement reduces the contamination from non-prompt muons by placing an upper bound on the amount of energy measured in the tracking detectors and the calorimeter (combined using the particle-flow algorithm [4]) within a cone of variable size (for T < 50 GeV) or fixed size (for T > 50 GeV) around the muon. Efficiency scale factors are used to correct for differences in muon reconstruction, identification, vertex association, isolation and trigger efficiencies between simulation and data [38]. Jets are reconstructed from particle-flow objects combining information from the tracker and calorimeter [4], using the anti-algorithm [39, 40] with a radius parameter of = 0.4. The jet energy is calibrated to the particle scale by using a sequence of corrections, including simulation-based corrections and in situ calibrations [41]. The jet-vertex tagging technique (JVT) [42], which uses a multivariate likelihood approach, is applied to jets with | | < 2.4 and T < 60 GeV to suppress jets from pile-up activity. The 'Tight' selection criterion, corresponding to a JVT score > 0.5, is used. Scale factors are applied to match the JVT MC efficiencies to those in data. All selected jets must have a T > 20 GeV and | | < 2.5.
The -tagging algorithms use tracks matched to jets as input. This matching uses the angular separation between the track momenta, defined at the point of closest approach to the PV, and the jet axis, Δ (track, jet axis). The selection requirement on Δ (track, jet axis) varies as a function of the jet T because the -hadron decay products are more collimated at larger hadron T [2].
The jet-flavour labelling in simulation is based on an angular matching of reconstructed jets to generatorlevel -hadrons, -hadrons and -leptons with T > 5 GeV. If a -hadron is found within a cone of size Δ = 0.3 around the jet axis, the jet is labelled as a -jet. If no matching to any -hadron is possible, the matching procedure is repeated sequentially for -hadrons and -leptons, and the matched jets are called -jets and -jets, respectively. A jet is labelled as a light-flavour jet by default if no matching to any of these particles was successful.
A series of requirements on the angular separation Δ between muons, electrons and jets are applied to remove overlaps between objects. If an electron candidate shares an ID track with a muon candidate, the electron candidate is rejected. Jets within a cone of Δ = 0.2 around a lepton are rejected, unless the lepton is a muon and the jet has more than three associated tracks, in which case the muon is rejected. Finally, lepton candidates that are found to be 0.2 < Δ < 0.4 from any remaining jet are discarded.
A jet sample enriched in light-flavour jets is needed for the calibration. A sample constructed using the leading jet in T in a +jets selection is expected to contain a 7%-8% fraction of -jets and a 4%-6% fraction of -jets, depending on the jet T . The mistagging efficiency calibration is performed on the leading jet in a sample of +jets candidate events. The distinct signature of a boson decaying into either two electrons or two muons allows a clean sample of +jets events to be selected. Events are selected for further analysis using single-lepton triggers, where one of the leptons must be matched to an object that triggered the recording of the event. The event must contain exactly two prompt leptons with opposite charges and the same flavour (i.e. either exactly two electrons or exactly two muons). The leading lepton must have T > 28 GeV and the invariant mass of the dilepton system, ℓℓ , must satisfy 81 < ℓℓ < 101 GeV. Only events with a reconstructed boson with T > 50 GeV are considered because the overall modelling is better in this range [43]. The event must also contain at least one jet with T > 20 GeV and | | < 2.5.
The event yields obtained after applying all selection requirements are listed in Table 2, while comparisons between the predictions by the simulation and the data are shown in Figure 2 for selected variables. The predictions of the MC simulation are normalised to data for these comparisons. The shapes of the distributions predicted by the simulation are consistent with those observed within statistical uncertainties for most of the considered kinematic range. The discrepancy observed in the SV mass ( SV ) distribution is expected because it is affected by a known mismodelling of the jet-flavour fractions [44,45]; this is mitigated by the fit described in Section 6. The SV observable is calculated using the SSVF algorithm. Negative values of SV indicate that no SV was found in the jet. The mismodelling in jet T does not affect the calibration.

The DL1r -tagging algorithm
Properties of -hadrons, such as their relatively high mass, long lifetime and large multiplicity of reconstructed tracks from their decay products, are exploited to distinguish -jets from -jets and lightflavour jets. Individual low-level taggers [46,47] are designed to target the specific properties of -jets. The IP2D, IP3D and RNNIP algorithms exploit the individual properties of the tracks from charged particles matched to the jet, especially the track 0 and 0 . RNNIP is based on a recurrent neural network and it explores the correlations between the IPs of different selected tracks in the jet. The SV1 algorithm exploits the output of the SSVF algorithm and JetFitter [48] reconstructs secondary and tertiary vertices, following the expected topology of -hadron decays. The DL1r algorithm is a deep neural network that takes the output of the low-level taggers and jet kinematics as input and builds a single discriminant [49,50].
ATLAS analyses use selection requirements defining a lower bound on the DL1r discriminant to select -jets with a certain efficiency. Four of these so-called single-cut operating points (OPs) are defined, corresponding to -jet selection efficiencies of 85%, 77%, 70% and 60%. The OPs are evaluated in a sample of -jets from simulated¯events. These selection requirements divide the DL1r score into five intervals to form the pseudo-continuous OPs, where the lower edge of the lowest interval corresponds to 100% DL1r -tagging efficiency, and the upper edge of the highest interval corresponds to 0% efficiency.
The single-cut and pseudo-continuous OPs are calibrated in order to correct for efficiency differences between data and simulation. The calibration SF, defined in Eq. (1), is measured relative to a reference efficiency MC . The performance of the -tagging algorithm in simulation is affected by the hadronisation and fragmentation model used in the parton shower simulation [43]. To account for their differences, simulation-to-simulation SFs are applied to those simulated samples that have a fragmentation model different from the default. The calculation of these simulation-to-simulation correction factors is described in Ref. [51]. The top-quark pair production sample produced with Powheg Box v2 + Pythia 8.230 is used to define the reference MC efficiencies in the -and -jet calibrations. As no equivalent +jets simulation produced with the Powheg Box v2 generator is available, the +jets MadGraph + Pythia 8 simulation is used for the definition of the reference efficiency in the calculation of SF light for light-flavour jets. Since the +jets MadGraph + Pythia 8 simulation uses the same parton shower model and set of tuned parameters as the top-quark pair production simulation, the light-flavour jet results and the -and -jet calibrations are obtained relative to similar reference models.

The Negative Tag method
The DL1r algorithm rejects 97.52% and 99.96% of the light-flavour jets, respectively, at the 85% and 60% OPs, according to simulated¯events. Therefore, the fraction of light-flavour jets passing the threshold is too low to estimate light in data. The Negative Tag method [5,6] calibrates light data using a modified version of the DL1r algorithm (called DL1rFlip) that achieves lower to light ratios without changing light significantly.
Tracks matched to -jets have relatively large and positively signed IPs due to the long lifetime of the -hadrons and the presence of displaced decay vertices. In contrast, tracks matched to light-flavour jets typically have IP values consistent with zero within the IP resolution such that a more symmetric 2 IP distribution is expected. The expected IP distributions of the tracks associated with -jets, -jets or light-flavour jets are shown in Figure 3. The Negative Tag method assumes that the probability for a light-flavour jet to be mistagged remains almost the same when inverting the IP signs of all tracks and displaced vertices. This is based on the assumption that light-flavour jets are misidentified as -jets mainly due to resolution effects in the track reconstruction which result in tracks and vertices from tracks with positive IPs inside the jet. Given the symmetric IP distributions, the fractions of tracks and vertices from tracks with positive IPs remain stable after inverting the IP signs of all tracks and vertices. The presence of the positive tail in the IP distribution challenges this assumption and its impact is taken into account by a dedicated uncertainty, which is called the DL1rFlip-to-DL1r extrapolation uncertainty in the following.  The DL1rFlip algorithm inverts the signs of the track IPs and the decay length 3 , while using the same algorithm training and OP-threshold definitions as the DL1r algorithm. The light values obtained by the two algorithms are approximately the same, while the DL1rFlip algorithm selects a smaller fraction of -jets than the nominal version. The discriminants of the DL1r and DL1rFlip algorithms are compared between data and simulation in Figure 4. These comparisons show that the heavy-flavour fractions are much lower for high DL1rFlip values than for high DL1r values. At the 85% (60%) single-cut OP, the DL1r algorithm selects a sample of jets in which 25% (1%) are light-flavour jets, while the DL1rFlip algorithm selects a sample in which a higher fraction, 38% (6%), are light-flavour jets at the same OP.
The calibration is performed independently in jet T intervals in order to account for the T dependence of light . A simultaneous binned fit to the SV distribution in each pseudo-continuous interval of the DL1rFlip discriminant is performed in order to simultaneously determine data and light data in the DL1rFlip discriminant intervals. The sensitivity of the fit does not allow the SFs of all three jet flavours to be derived simultaneously. Therefore, data is constrained to the MC predictions and SF is fixed to unity within an uncertainty of 30%, as suggested by studies of the -jet mistagging efficiency calibration [3].  For a given interval of jet T , the expected number of jets for a defined discriminant interval is given by where ,MC is the predicted flavour-inclusive event yield for each discriminant interval; is a global normalisation factor and are the jet-flavour fractions; ( SV ) is the probability density function of SV for jet flavour in the -th DL1rFlip discriminant interval, taken from simulation. The ( SV ) is defined in such a way to integrate an additional bin ( SV < 0 GeV) representing the number of events where no secondary vertex is found. The SV has been obtained with tracks with nominal sign as input to the SSVF algorithm.

The , and SF
,light parameters are allowed to float in the fit, while ,MC and ,MC are fixed to the predictions from simulated events and SF is set to 1.0 ± 0.3. The constraints 5 =1 ,MC × SF = 1, where the contributions to the first bin (corresponding to the 100%−85% OP interval) are allowed to vary such that unitarity is preserved, and light = 1 − − are applied, reducing the number of free parameters to 11. Figure 5 shows the post-fit SV and DL1rFlip discriminant distributions for the 50-100 GeV

Systematic uncertainties
The measurement of SF light is affected by four types of uncertainties, including those due to experimental effects, the modelling of the +jets and background processes, and the limited number of events in data and simulation. For each source of uncertainty, one parameter of the fit model is varied at a time, and the effect of this variation on SF light is evaluated. This approach is chosen to prevent the fit from using the data to constrain the impact of individual systematic uncertainties. The probability density functions in SV are derived from the Sherpa 2.2.1 +jets simulation sample. The probability density function for each flavour component is separately normalised to unity, so only the shape of the SV distributions is estimated using the Sherpa 2.2.1 +jets simulation. The simulated efficiencies from the MadGraph +jets simulation listed in Table 1 are used in the likelihood definition to predict the expected number of events for each flavour category. Given that the flavour fractions and most SFs are extracted from the data, the systematic uncertainty calculation considers mainly effects on the template shapes. An additional uncertainty accounts for differences between SF light values from the DL1rFlip and DL1r algorithms to ensure the applicability of this calibration to the DL1r algorithm.
Uncertainties in the modelling of the detector response have a negligible impact on the calibration results. The impact of the jet energy scale uncertainties was estimated by globally shifting the jet energy scale (JES) by 5%, in accord with the most conservative estimate of the JES uncertainty in T derived in Ref. [41]. This conservative assumption about the magnitude of the jet energy scale uncertainties leads to an estimated impact of less than 1% on the calibration results, owing to the low T dependence of the scale factors. The impact of 0 and 0 IP resolution modelling uncertainties on the SV templates was estimated by applying data-driven corrections to the simulated 0 and 0 in a sample of simulated top-quark pairs with a method similar to the one in Ref. [7]. The impact of these corrections is transfered to the fit templates and is symmetrised around the central value. This approach was chosen in order to mitigate the effect of statistical fluctuations in the available +jets samples. Other experimental uncertainties were found to be negligible because they are not correlated with the discriminant used in the fit.
Simulations of +jets via Sherpa 2.2.1 are used as the nominal model to derive the fit templates for -jets, -jets and light-flavour jets in the SV distribution. In order to assess the impact of the resulting choice of parton shower model, shower matching scheme and order of the perturbative QCD calculations on the template shapes, SFs light are derived using an alternative model, provided by the +jets MadGraph + Pythia 8 simulation. The difference between the obtained SFs light is taken as the estimate of the MC modelling uncertainty. The effect of QCD scale uncertainties is estimated by independently doubling or halving the renormalisation and factorisation scales. The impact of choosing a particular PDF set is estimated by propagating its uncertainties. However, the impacts of the QCD scale and PDF set uncertainties are found to be covered by the statistical uncertainties of the MC simulation and these uncertainties are therefore not included.
The effect of the 30% uncertainty in SF , which was fixed to a value of 1.0 in the fit, is estimated by repeating the fit with SF set to 1.3 and 0.7 instead. The mean impact of the two variations on SF light is taken as the impact of the charm calibration uncertainty on the light-flavour jet calibration.
The light value is assumed to be the same for DL1rFlip and DL1r to first order, as described in Section 6. However, mismodelling of the IP resolution, the fake-track rate or the parton shower can have different effects on the tagging performance and SF light values of the DL1r and DL1rFlip algorithms. A DL1rFlipto-DL1r extrapolation uncertainty is added to account for residual differences in SF light between the two algorithms. SF light cannot be measured in data for the DL1r algorithm. Therefore, a method similar to the one in Ref.
[7] is used to derive SF light for DL1rFlip and DL1r from MC simulation in order to compare the two calibrations. The DL1rFlip-to-DL1r extrapolation uncertainty consists of two components. One component estimates the impact of the modelling of tracking variables on the DL1rFlip and DL1r tagging performance, evaluated in a bottom-up approach correcting the simulation, and the second component estimates the impact of the shower and hadronisation model on the tagging performance difference between DL1rFlip and DL1r. The impact of the modelling of tracking variables is evaluated by applying data-driven corrections to underlying tracking variables affecting the -tagging performance in a sample of simulated¯events generated with Powheg Box v2 + Pythia 8.230. The impact on light of correcting these observables is evaluated with respect to the uncorrected simulation. Only corrections which have been shown to have a significant impact are considered [7]. These include the IP resolution and fake-track rate modelling [53]. The total impact on the mistagging efficiency is obtained by multiplying the effects of all corrections. The total impact on light MC is compared between DL1r and DL1rFlip and the relative difference is assigned as the first component of the DL1rFlip-to-DL1r extrapolation uncertainty. A comparison of these simulation-based calibration factors is shown in Figure 6. The impact of the parton shower modelling on the difference between SF light in DL1r and DL1rFlip is estimated by multiplying the SFs light for DL1r (DL1rFlip), derived using the corrected Powheg Box v2 + Pythia 8.230 simulation, by the simulation-to-simulation SFs defined as the ratio of the DL1r (DL1rFlip) light for the Sherpa 2.2.1 and the MadGraph + Pythia 8 simulations. The simulation-to-simulation SFs have been derived using the +jets samples listed in Table 1. This gives an estimate of the DL1rFlip-to-DL1r extrapolation uncertainty for an alternative shower model. The difference between this estimate and the estimate obtained using the corrected Powheg Box v2 + Pythia 8.230 simulation is used as the component assessing the impact of the shower and hadronisation model on the tagging performance difference between DL1rFlip and DL1r. The two components are added in quadrature to obtain the DL1rFlip-to-DL1r extrapolation uncertainty. The envelope over T of the DL1rFlip-to-DL1r extrapolation uncertainty is used for the final uncertainty estimate. The shower modelling makes the largest contribution to the DL1rFlip-to-DL1r extrapolation uncertainty and ranges up to 10%. The largest uncertainty contribution in correcting the MC simulation is obtained from the track IP resolution modelling corrections, ranging up to 10% as well. The total DL1rFlip-to-DL1r extrapolation uncertainty is 10%-12% depending on the -tagging OP and is the dominant systematic uncertainty overall.
Statistical uncertainties of the data and the SV templates from simulated MC events are taken into account. The statistical uncertainties of the MC-based fit templates are implemented following the light-weight Beeston-Barlow method [54]. As their effect is non-negligible, cross-check studies were performed using a 'toy' approach. The fit template bin entries were fluctuated randomly around the nominal value according to a Gaussian probability distribution with a standard deviation equal to the MC statistical uncertainty in the template bins. SF light was extracted for each of these new fit templates and the standard deviation of the resulting SF light distribution was taken as the statistical uncertainty. The statistical uncertainty estimates via the Beeston-Barlow model and the toy approach were found to be consistent.

Results
The calibration SFs light of the DL1r algorithm are presented. The 85%, 77% and 70% OPs were successfully calibrated in data by using the Negative Tag method. However, it is not feasible to calibrate the 60% OP in data because of insufficient statistics and the relatively large contamination by heavy-flavour jets.
Due to these limitations and since the measured SFs for the closest 77%-70% and 70%-60% pseudocontinuous OPs do not show any significant deviation from unity compared to their respective measurement uncertainties, the assumption is made that the same must hold for the 60% OP. The SF for this OP is thus set to 1. Its dominant uncertainty, the DL1rFlip-to-DL1r extrapolation uncertainty, is derived specifically for the 60% OP, and found to be slightly larger than for the 70% OP. The other (sub-leading) uncertainties are assumed to be identical to the 70% OP, since their evaluation suffers from large uncertainties. It should also be noted that extracting very precise SFs light for the 60%-0% pseudo-continuous OP interval is not critical to most physics analyses as only around 0.5 per mille of all light-flavour jets are -tagged at this OP [49].
The results for the mistagging efficiency calibration of light-flavour jets from the DL1r tagger, using particle-flow jets in Run 2 data, are shown in Figure 7 for the pseudo-continuous OPs of the DL1r algorithm. The calibration of the DL1r single-cut OPs, which is derived from the calibration results of the pseudo-continuous OPs, is shown in Figure 8. Furthermore, Figure 9 shows a breakdown of the uncertainties in the SFs light for each jet T interval and single-cut OP.
The measured SFs light are consistent with unity within uncertainties, except for the 85%-77% pseudocontinuous OP and the 85% single-cut OP, where the data prefer SFs light which differ from one by about one standard deviation. Therefore, the results indicate the MC simulation predicts light to be similar to that in data.
Mistags of light-flavour jets are mostly caused by the presence of fake tracks and the limited impact parameter resolution [7]. The fake-track rate is underestimated by the MC simulation and the impact parameter resolution is overestimated [7,53]. An increase in the fake-track rate and a decrease in impact parameter resolution both give rise to larger mistagging efficiencies [7, 55]. Observed SFs slightly larger than unity are therefore expected.
Tables 3-6 feature a detailed breakdown of the uncertainties in the results. Overall, the largest systematic uncertainty is the DL1rFlip-to-DL1r extrapolation uncertainty, with values of 10%-12%, depending on the -tagging OP. The impact of the charm calibration uncertainty amounts to 5%-10% depending on the jet T and the -tagging OP, and is therefore one of the larger uncertainties. The MC modelling uncertainty is a few percent in most cases, but reaches its maximum of 11% for the 70% single-cut OP in the 20 ≤ jet T ≤ 50 GeV interval.
The effect of the 0 and 0 impact parameter resolution modelling uncertainty is generally of the order of a few percent and therefore mostly subdominant. However, the effect can range up to 14% for low jet T and tight OPs. The data statistical uncertainties give subdominant contributions except for high jet T and tight OPs, where contributions range from 0.5% to 9.5%. The MC statistical uncertainty is one of the dominant contributions, ranging from 0.4% to 13%, depending on the OP and the jet T . The total uncertainty in SF light is between 11% and 23%. In general, higher precision is obtained for the looser OPs and the size of the dominant uncertainties does not depend significantly on the jet T .
In the previous calibration with the Negative Tag method using dĳet events in 2015-2016 data, the total uncertainty in SF light ranged from 14% to 76% for jets with T between 20 and 300 GeV. Dominant      using data. Previously, and the heavy-flavour fractions were estimated from simulation, resulting in large modelling uncertainties, which were a limiting factor in the precision of the result.

Conclusions
The light-flavour jet mistagging efficiency light of the DL1r -tagging algorithm has been measured with a 139 fb −1 sample of √ = 13 TeV collision events recorded during 2015-2018 by the ATLAS detector at the LHC. The measurement is based on an improved method applied to a sample of +jets events. The Negative Tag method, based on the application of an alternative -tagging algorithm, designed to facilitate the measurement of the light-flavour jet mistagging efficiency, is used. Data-to-simulation scale factors for correcting light in simulation are measured in four different jet transverse momentum intervals, ranging from 20 to 300 GeV, for four separate quantiles of the -tagging discriminant. The scale factors typically exceed unity by around 10%-20%, with total uncertainties ranging from 11% to 23%, and do not exhibit any strong dependence on jet transverse momentum. These calibration uncertainties are considerably lower than the previous 14% to 76% uncertainties from the Negative Tag method using 2015-2016 data.          [56] ATLAS Collaboration, ATLAS Computing Acknowledgements, ATL-SOFT-PUB-2021-003, 2021, url: https://cds.cern.ch/record/2776662.