Measurement of the $c$-jet mistagging efficiency in $t\bar{t}$ events using $pp$ collision data at $\sqrt{s}=13$ TeV collected with the ATLAS detector

A technique is presented to measure the efficiency with which $c$-jets are mistagged as b-jets (mistagging efficiency) using $t\bar{t}$ events, where one of the $W$ bosons decays into an electron or muon and a neutrino and the other decays into a quark-antiquark pair. The measurement utilises the relatively large and known $W\to cs$ branching ratio, which allows a measurement to be made in an inclusive $c$-jet sample. The data sample used was collected by the ATLAS detector at $\sqrt{s} = 13$ TeV and corresponds to an integrated luminosity of 139 fb$^{-1}$. Events are reconstructed using a kinematic likelihood technique which selects the mapping between jets and $t\bar{t}$ decay products that yields the highest likelihood value. The distribution of the $b$-tagging discriminant for jets from the hadronic $W$ decays in data is compared with that in simulation to extract the mistagging efficiency as a function of jet transverse momentum. The total uncertainties are in the range 3%-17%. The measurements generally agree with those in simulation but there are some differences in the region corresponding to the most stringent $b$-jet tagging requirement.


ATLAS detector
The ATLAS detector [1] is a multipurpose particle physics detector with a forward-backward symmetric cylindrical geometry and nearly 4 coverage in solid angle. 1 The inner tracking detector consists of silicon pixel and microstrip detectors covering the pseudorapidity region | | < 2.5, surrounded by a transition radiation tracker which enhances electron identification in the region | | < 2.0. Between Run 1 and Run 2, a new inner pixel layer, the insertable B-layer [17,18], was added at a mean sensor radius of 3.3 cm. The inner detector is surrounded by a thin superconducting solenoid providing an axial 2 T magnetic field, and by a fine-granularity lead/liquid-argon (LAr) electromagnetic calorimeter covering | | < 3.2. A steel/scintillator-tile calorimeter provides hadronic coverage in the central pseudorapidity range (| | < 1.7). The endcap and forward regions (1.5 < | | < 4.9) of the hadronic calorimeter are made of LAr active layers with either copper or tungsten as the absorber material. An extensive muon spectrometer (MS) with an air-core toroidal magnet system surrounds the calorimeters. Three layers of high-precision tracking chambers provide coverage in the range | | < 2.7, while dedicated fast chambers allow triggering in the region | | < 2.4. The ATLAS trigger system consists of a hardware-based level-1 trigger followed by a software-based high-level trigger [19]. An extensive software suite [20] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and simulated event samples
The data analysed in this paper correspond to 139 fb −1 [21,22] of collision data collected by the ATLAS detector between 2015 and 2018 with a centre-of-mass energy of 13 TeV and a 25 ns proton bunch crossing interval. The data sample was collected using a set of single-electron [23] and single-muon [24] triggers with T thresholds in the range of 20-26 GeV depending on the lepton flavour and data-taking period. All detector subsystems were required to be operational during data taking and to fulfil data quality requirements. Events with noise bursts or coherent noise in the calorimeters are removed. The presence of additional interactions in the same bunch crossing, referred to as pile-up, is characterised by the average number of such interactions, , which was 34 for the whole dataset.
Simulated event samples are used to model SM processes and to estimate the expected signal yields. All samples were produced using the ATLAS simulation infrastructure [25] and G 44 [26]. A subset of samples used a faster simulation based on a parameterisation for the calorimeter response and G 44 for the other detector systems [25]. The simulated events are reconstructed with the same algorithms as used for data, and contain a realistic modelling of pile-up interactions. The pile-up profiles in the simulation match those of each dataset between 2015 and 2018, and were obtained by overlaying the hard-scatter events with minimum-bias events simulated using the soft QCD processes of P 8 [27] with the NNPDF3.0 set [28] of parton distribution functions (PDFs) [29] and a set of tuned parameters called the 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point in the centre of the detector.
The positive -axis is defined by the direction from the interaction point to the centre of the LHC ring, with the positive -axis pointing upwards, while the beam direction defines the -axis. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The component of momentum in the transverse plane is denoted by T . The pseudorapidity is defined in terms of the polar angle by = − ln tan( /2). Rapidity is defined as = 0.5 ln[( + )/( − )] where denotes the energy, and is the component of the momentum along the beam direction. The separation of two objects in -space is given by Δ = √︁ (Δ ) 2 + (Δ ) 2 .
A3 tune [30]. For all samples, with the exception of those generated using S [31], the decays of bottom and charm hadrons were performed by E G [32].
The events used in this study originate mostly from¯production. This process was modelled using the P B v2 [33][34][35][36] generator at next-to-leading order (NLO) with the NNPDF3.0 PDF set [28] and the ℎ damp parameter 2 set to 1.5 × top [37], where top denotes the mass of the top quark. The events were interfaced to P 8.230 to model the parton shower, hadronisation, and underlying event, with parameters set according to the A14 tune [38] and using the NNPDF2.3 set of PDFs [28]. The uncertainty due to initial-state radiation (ISR) is estimated by simultaneous variations of the ℎ damp parameter and the renormalisation and factorisation scales, and choosing the Var3c up/down variants of the A14 tune. The impact of final-state radiation (FSR) is evaluated by varying the renormalisation scale for emissions from the parton shower by a factor two up or down. The impact of using a different parton shower and hadronisation model is evaluated by comparing the nominal¯sample with another¯sample generated by P B v2 using exactly the same parameters but interfaced to H 7.04 [39,40], using the H7UE tune [40] and the MMHT2014 PDF set [41].
In addition to¯production, some minor backgrounds contribute to the final event sample used for the calibration. These backgrounds consist mostly of single-top and diboson production, the production ofī n association with a vector boson and the production of a vector boson in association with jets. Details of the modelling of these samples are given in the following.
Single-top -channel production was modelled using the P B v2 generator at NLO in QCD in the five-flavour scheme with the NNPDF3.0 PDF set. Single-top -channel production was modelled using the P B v2 generator at NLO in QCD, using the four-flavour scheme and the corresponding NNPDF3.0 set of PDFs. The associated production of a single top quark and a boson ( ) was modelled using the P B v2 generator at NLO in QCD, using the five-flavour scheme and the NNPDF3.0 set of PDFs. The diagram removal scheme [42] was used to remove interference and overlap with¯production. The events for all single-top production channels were interfaced to P 8.230 using the A14 tune and the NNPDF2.3 set of PDFs.
The production of +jets and +jets events was simulated with the S 2.  [28] was used and the samples were normalised to a next-to-next-to-leading-order (NNLO) prediction [52].
Samples of events with diboson final states ( ) were simulated with the S 2.2.1 or 2.2.2 generator depending on the process, including off-shell effects and Higgs boson contributions where appropriate. Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other hadronically, were generated using matrix elements at NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional parton emissions. Event samples for the loop-induced processes → were generated using LO-accurate matrix elements for up to one additional parton emission for both the fully leptonic and semileptonic final states. The matrix element calculations were matched and merged with the S parton shower based on Catani-Seymour dipole factorisation [43, 47] using the MEPS@NLO prescription. The virtual QCD corrections were provided by the O L 2 The ℎ damp parameter is a resummation damping factor and one of the parameters that controls the matching of P matrix elements to the parton shower and thus effectively regulates the high-T radiation against which the¯system recoils. library. The NNPDF3.0 set of PDFs was used, along with the dedicated set of tuned parton-shower parameters developed by the S authors.
Production of¯in association with a vector boson was modelled using the M G 5_aMC@NLO 2.3.3 [53] generator at NLO with the NNPDF3.0 PDF set. The events were interfaced to P 8.210 [27] using the A14 tune and the NNPDF2.3 PDF set.

Object reconstruction
Selected events are required to contain at least one vertex having at least two associated tracks with T > 500 MeV, and the primary vertex is chosen to be the vertex reconstructed with the largest Σ 2 T of its associated tracks.
Electron candidates are reconstructed by matching inner-detector tracks to clusters of energy deposited in the EM calorimeter. Electrons must have T > 27 GeV and | | < 2.47. The associated track must have | 0 |/ 0 < 5 and | 0 | sin < 0.5 mm, where 0 ( 0 ) is the transverse (longitudinal) impact parameter relative to the primary vertex and 0 is the uncertainty in 0 . Candidates are identified with a likelihood method and must satisfy the 'medium' identification criteria described in Ref. [54]. The likelihood relies on the shape of the EM shower measured in the calorimeter, the quality of the track reconstruction, and the quality of the match between the track and the cluster. To suppress candidates originating from photon conversions, hadron decays, or jets misidentified as electrons, candidates are required to satisfy the gradient isolation criteria based on tracking and calorimeter measurements [54]. The electron energy and reconstruction efficiency are calibrated using → + − decays [54].
Muon candidates are reconstructed in the range | | < 2.5 by combining tracks in the inner detector with tracks in the MS. All muon candidates must have T > 27 GeV, | 0 |/ 0 < 3, and | 0 | sin < 0.5 mm. The 'medium' quality requirements described in Ref. [55] are used and muons from hadron decays are suppressed by imposing a track-based isolation requirement. The muon reconstruction efficiency in the simulation is corrected using comparisons with data [56].
Jets are formed using objects from a particle-flow algorithm, which combines energy deposits in the calorimeter with inner detector tracks [14]. The PFlow objects are combined into jets in the range | | < 2.5 and T > 20 GeV using the anti-algorithm [57,58] with a radius parameter of 0.4. A jet-vertex-tagging technique using a multivariate likelihood [59] is applied to jets with | | < 2.4 and T < 60 GeV to suppress jets that are not associated with the event's primary vertex. Jets are further calibrated according to in situ measurements of the jet energy scale [15].
The labelling scheme used to define the flavour of a jet in a simulated event is applied by matching reconstructed jets to generator-level -or -hadrons with T > 5 GeV within a cone of size Δ = 0.3 around the jet axis. Jets that contain a -hadron are called -jets. Remaining jets containing a -hadron are called -jets. Jets with a hadronically decaying -lepton are called -jets, and all remaining jets are called light-flavour jets.
Overlaps between reconstructed objects are removed using a procedure based on the angular separation between different final-state objects. The procedure is similar to the one described in Ref. [60].
The event's missing transverse momentum, whose magnitude is denoted by miss T , is computed as the negative vectorial sum of the transverse momenta of leptons, jets and a track-based soft term [61] accounting for the contribution from particles from the primary vertex that are not already included. The jets employed in the miss T calculation include PFlow jets and, in addition, anti-= 0.4 jets with T > 30 GeV and 2.5 < | | < 4.5.

Description of -tagging algorithms
Jets that contain a -hadron are distinguished from other jets that contain a -hadron or only light-flavour hadrons mainly by the larger mass and longer lifetime of the -hadron. The -tagging algorithm studied in this paper is called DL1r, and is an updated version of the DL1 tagger described in Ref. [62]. Information from the tracks in the jet, such as their transverse impact parameter and the reconstructed secondary and tertiary vertices, are combined into a set of low-level taggers. The DL1 algorithm combines the output of the low-level taggers into a single discriminant by using a deep neural network. DL1r adds the result of the RNNIP algorithm, which is based on a recurrent neural network exploiting the correlation between the tracks' impact parameters. Current analyses in ATLAS use the DL1r discriminant in intervals defined by the efficiency to tag -jets in simulated¯events [10]. The -tagging interval boundaries are 0%, 60%, 70%, 77%, 85% and 100%.
Measurements of the tagging efficiency SFs for -jets are made as a function of jet T for | | < 2.5, usinḡ events where both top quarks decay leptonically and a method similar to that described in Ref.
[10]. The SFs for the mistagging efficiency for light-flavour jets are obtained by defining a 'flipped tagger' in which the sign of the track impact parameters that are used in the low-level taggers is inverted [11]. This results in similar efficiencies for light-and heavy-flavour jets, allowing the light-flavour SFs to be determined. Both the -and light-flavour jet SFs have been determined using the full Run 2 dataset. It should be noted that both sets of SFs require a determination of the -jet SFs, which were taken from a preliminary version of the method described in this paper. Since the contamination from -jets is relatively small in these measurements and because the previous SFs were similar to those in the present measurement, any differences only have a small impact.
Analyses that use -tagging in ATLAS apply a weight to each simulated event, derived as the product of the SFs for all jets for which a tagging requirement is made. It is found that there are significant differences between the simulated (mis)tagging efficiency for different fragmentation models [10]. To account for these differences, simulation-to-simulation SFs are applied to those simulated samples that have a fragmentation model and decay different from the one used to derive the SFs. In the present paper the¯simulation (see Section 3) uses P 8 and E G , which is the reference MC fragmentation program for all the measured SFs.

Event selection and reconstruction
The analysis aims to obtain a pure sample of semileptonic¯events. Exactly one electron or one muon, denoted by ℓ, is required. The events are also required to have miss T > 20 GeV, and the transverse mass T of the miss T and the lepton must satisfy where Δ = ( miss T ) − (ℓ) is the azimuthal angle between the lepton and miss T . Subsequently, a requirement on the number of PFlow jets is applied such that one of the following selections is true: • the event contains exactly four jets with T > 25 GeV • the event contains at least three jets with T > 25 GeV and exactly one jet with 20 < T < 25 GeV • the event contains at least five jets with T > 25 GeV and at least one of these jets has T > 70 GeV.
These selections are designed to keep a high number of jets covering a large range in T , while reducing the non-¯background and the number of jets arising from final state QCD radiation (FSR). The first requirement is the 'baseline' selection, which has a relatively low rate of FSR jets, due to requiring exactly four jets. The second requirement allows measurements to be made for jets bewteen 20 < T < 25 GeV, while minimising the non-¯background which is more likely to have multiple low T jets. The third requirement improves the statistics of high T jets where the rate of FSR is greater. Although allowing five jets increases the fraction FSR jets being included in the selection, the rate of such jets is reasonable for jet T > 70 GeV, since the likelihood, described below, is better able to identify the top decay products at higher T .
The four-vectors of the four highest-T jets, the lepton and the event miss T are used as inputs to a likelihood-based¯event reconstruction algorithm, which minimises deviations in the invariant masses of the top quarks and bosons from their true values and is described in more detail in Ref.
[63]. This algorithm uses a likelihood function to assign the four jets to the¯decay topology. In particular, the algorithm assigns one jet to the -jet from the leptonically decaying top quark ( → → ℓ ), another jet to the -jet from the hadronically decaying top quark ( → → , where are the quarks into which the boson decays) and the remaining two jets to the jets that come from the hadronic boson decay. The jet assignment does not use any -tagging information. The following notation is used: the jets that are assigned as the decay products of the boson are referred to as -jets and the remaining two jets are referred to as top-jets.
The four-jet combination that has the highest likelihood is selected for the¯reconstruction only if the negative logarithm of the likelihood value is greater than −48. This requirement increases the fraction of events where the top-quark decay products are correctly assigned. Subsequently, the top-jets are required to lie in the tightest (0-60%) DL1r -tagging interval, whereas no -tagging requirement is imposed on the -jets, so as to leave them unbiased. After these requirements it is found that 98.8% of events in thes imulation have two top-jets that are -jets.
The distribution of the value of the negative logarithm of the likelihood for the selected events is shown in Figure 1. The majority of the events come from¯production with 3% of the events, referred to as 'non¯', coming from other processes such as +jets or +jets production or single-top production. The simulated events are categorised according to the flavour of the -jets. The notation ' ' indicates that both -jets are light-flavour jets. Similarly, ' ' (' ') indicates that one of the -jets is a -jet ( -jet) and the other is a light-flavour jet. The -jet pairs with flavours other than those discussed above fall into a category called 'other', which also includes events in which at least one of the -jets comes from a hadronically decaying -lepton. The category accounts for 55% of events, the category for 41%, the category for 1.8% and the 'other' category for 2.5%.
The simulated¯yield shown in Figure 1, as well as in all subsequent figures and results in this paper, is corrected for compatibility with ATLAS¯measurements [64], which indicate that the simulation underestimates the number of events containing more than two -jets. For this reason, simulated events in which both the top-jets and at least one of the -jets are -jets are scaled by 1.25 ± 0.25, where the full difference between data and the prediction from simulation is taken as a systematic uncertainty. The remaining events are normalised using the prediction from simulation. Finally, Figure 1 includes  Figure 1: Distribution of the negative logarithm of the likelihood that is used to reconstruct the¯decay. Only events with values of this quantity exceeding -48 are shown, which corresponds to the selection used in the measurement. The¯simulation is split according to the flavours of the -jets ( , , and other). The (mis)tagging efficiency scale factors have not been applied to the simulation. The hashed area shows the total uncertainty, excluding the uncertainties from the (mis)tagging efficiency scale factors and the uncertainty on the jet T distribution, which is derived from the difference between data and simulation.
an uncertainty in the simulation estimate (see Section 7.2), derived as the combination of the detector systematic uncertainties, the¯modelling uncertainties and the uncertainty due to the limited number of simulated events.
The and T distributions of the -jets that are selected with the procedure described above are shown in Figure 2. Although the simulation generally describes the data well, it can be seen that it has a slightly harder T spectrum than the data. This difference is included as a systematic error as described in Section 7.2. Figure 2 also shows the DL1r discriminant, where it can be seen that the simulation describes the data reasonably well, but there are some differences, particularly at high and low DL1r discriminant values.

Measurement of the charm mistagging efficiency 7.1 Method
The -jet mistagging efficiency is determined by comparing the fraction of events where either of the -jets are tagged in data with the fraction tagged in simulation. The efficiency is evaluated in four jet-T intervals, with boundaries at 20, 40, 65, 140 and 250 GeV, and for the tagging intervals as described in Section 5. Jets are said to be tagged (untagged) if they have DL1r discriminant values greater (less) than the value at the 85% boundary. The method described below is for the 'pseudo-continuous' calibration where the efficiency is defined as the fraction of -jets that have a DL1r discriminant value that lies between the boundaries, i.e. a value greater than the 60% boundary, a value between the 70% and 60% boundaries, etc.
The light-flavour jet and -jet SFs are taken from other measurements as described in Section 5 and the corresponding SFs are applied to the two -jets and the two top-jets if they are light-flavour jets or -jets. The remaining difference between the numbers of tagged -jets events in data and simulation is used to determine the ratio of the -jet mistagging efficiencies in data and simulation. To simplify the fit, events where both -jets are tagged, making up 0.7% of the total number of events, are rejected. Events where both -jets are untagged are retained to provide a normalisation of the simulation as described below. The data are divided into bins according to the tagging interval (labelled , running from 1 to 4) and the T bin of the tagged -jet (labelled ), and the T bin of the untagged -jet (labelled ). The number of events in each bin is denoted by data ( , ). In addition, the number of untagged data events is recorded in each -jet T bin combination: untag data ( , ) where in this case the higher-T jet is denoted by , so ≥ .
The fit is performed by comparing data with the total expectation, which is dominated by¯events. The number of simulated 'signal' events in each bin, ( , ), is made up of those events where the tagged -jet is a -jet; the other jets in the event may be of any flavour. There are two sources of 'background' events, defined as events where the tagged -jet is not a -jet. The first is the number of events where neither of the top-jets is a -jet, ( , ). The second arises from events where a -jet is misclassified as one of the top-jets. This background requires a specific treatment since the -jet is in a T bin, labelled , different from those of the -jets and so must be binned in this variable in addition to and . The number of events from this background is denoted by The background where one of the top-jets is a -jet depends only on =4 ( ) since the top-jets are tagged at the 60% operating point. The SFs for -jets that are untagged in each jet T bin, untag ( ), are evaluated by constraining the sum of the measured efficiencies of all tagged bins and the untagged bin to be 1. Technically, this is evaluated by adding additional terms to the 2 and treating the SFs untag ( ) as additional free parameters with an arbitrary tolerance of 1%: where untag MC ( ) and MC ( ) are the MC untagged and tagged efficiencies, respectively. As part of validating the method it was checked that scale factors of unity are obtained when performing a fit with data replaced by the MC simulation.

Systematic uncertainties
Several sources of experimental and MC modelling systematic uncertainties affecting the -jet efficiency SFs are considered. The uncertainty from each source is evaluated by making the corresponding change to the MC model and rederiving the SFs for each tagging interval and jet T bin. The total systematic uncertainty for each bin is evaluated by adding the individual contributions in quadrature.
The uncertainties in the jet energy scale and resolution are based on their measurements in data [15] and result in shifts of up to 5% in the measured SFs due to a single component. Uncertainties in the track-based soft term of the miss T [65] result in shifts of up to 3%. The uncertainty due to the pile-up modelling [66] leads to shifts of up to 0.7%. Uncertainties originating from the electron [54] and muon [56] reconstruction lead to shifts up to 0.5%. An additional uncertainty is applied to cover the -jet T spectrum differences between data and simulation, giving a maximum uncertainty in the measured SFs of 0.2%.
The uncertainties in the light-flavour mistagging and -jet tagging SFs are accounted for by shifting the SFs by each component of the total uncertainty, and where appropriate correlating this with the uncertainty used in the charm SF determination. The most important non-correlated uncertainties all come from the light-flavour SF. They originate from using the flipped tagger rather than standard tagger, giving uncertainties of up to 7%, and from varying the charm SFs in the light-flavour SF determination, giving uncertainties of up to 4% [11].
Uncertainties in the modelling of the¯background are estimated by replacing the nominal MC sample by alternative MC samples. The uncertainty due to the choice of parton shower and hadronisation model is estimated by replacing the standard P 8 model with H . This is the largest source of uncertainty in the analysis and leads to shifts of up to 13% in the SF. The uncertainty in the modelling of initialand final-state radiation is assessed by increasing the radiation by doubling the ℎ damp parameter [37] and halving the renormalisation and factorisation scales and decreasing the radiation by increasing the scales by a factor of two. The resulting uncertainty is at most 1%. As already mentioned, the uncertainty in the normalisation of the¯+¯process [64] is estimated by scaling the events where both the top-jets and at least one of the -jets are -jets by ±25%, giving an uncertainty of up to 9% on the SF. The uncertainty in non-¯processes is evaluated by scaling these contributions by ±25%, leading to a maximum uncertainty of 1% on the SF.

Results
The measured -jet pseudo-continuous mistagging efficiency SFs together with the total, statistical and systematic uncertainties are listed in Table 1 and shown in Figure 3 for four tagging intervals. The fitted normalisation factors ( , ) lie in the range 0.83 to 1.05. The contributions to the systematic uncertainty are listed in Table 2. The corresponding measured data mistagging efficiencies, shown in Figure 4, are obtained by multiplying the mistagging efficiencies from simulation by the SFs. The SFs do not have a strong jet-T dependence and are consistent with unity for all but the tightest tagging interval, where the efficiency is higher in data than in MC simulation. The total uncertainty varies from 3% to 17% and is dominated by the systematic uncertainty except in the highest T bin. Figure 5 shows the DL1r discriminant distributions for tagged -jets, after applying to simulation the light-flavour mistagging and -tagging efficiency SFs, taken from other measurements as described in Section 5, and the -jet SFs obtained from this analysis. It can be seen that the simulation agrees with the data within uncertainties over the entire distribution. with the corresponding numbers of events in which the tagged jet has a DL1r discriminant value above the operating point. The extra terms in Eq. (2) are not used in these fits. Instead the untagged SFs are evaluated using the results of the fit as (1 − ( ) MC ( ))/(1 − MC ( )). The resulting single-cut -jet mistagging efficiency SFs are shown in Figure 6.    Scale Factor (tot. unc.) Figure 6: The -jet single-cut mistagging efficiency scale factors for particle-flow jets shown as a function of jet T for the four tagging operating points. The scale factors are shown for the P 8+E G fragmentation model.

Conclusion
A new technique has been used to measure the inclusive -jet mistagging efficiency, using semileptonic events from collisions at √ = 13 TeV at the LHC. The measurement is based on an integrated luminosity of 139 fb −1 , corresponding to the full Run 2 dataset, collected with the ATLAS detector. The efficiencies were measured as a function of jet T in the range 20-250 GeV, with boundaries corresponding to 60%, 70%, 77% and 85% efficiency to tag -jets in simulated¯events. The total uncertainties are in the range 3%-17%. The measured efficiencies generally agree with those in simulation but are higher for the tightest (0-60%) tagging interval.