1 Introduction

The algorithms for the identification of jets containing b-hadrons, also known as \(b\)-tagging algorithms, constitute an important tool for the analysis of the data collected by the ATLAS experiment [1] at the Large Hadron Collider (LHC) [2]. Such algorithms play a crucial role in a large number of Standard Model (SM) precision measurements (e.g. Refs. [3, 4]), Higgs boson measurements (e.g. Refs. [5, 6]), and searches for supersymmetry and other exotic phenomena (e.g. Refs. [7, 8]).

The performance of the \(b\)-tagging algorithms depends crucially on whether the jets that are identified contain b-hadrons (\(b\)-jets), c-hadrons (\(c\)-jets), or neither of them (light-flavour jets). The Monte Carlo (MC) simulation provides an estimate of the probability that a jet fulfils the requirements of the \(b\)-tagging algorithm i.e. the tagging efficiency for \(b\)-jets and the mistagging efficiency for \(c\)-jets and light-flavour jets. However, since the simulation is not perfect, the (mis)tagging efficiency must be measured in data. Simulation to data scale factors (SFs), which are defined as the ratio of the efficiency measured in data to that in simulation, are used as a correction to the simulation, assuming that the SFs are independent of the physics process. Analyses that use \(b\)-tagging apply a weight to each simulated event, derived as the product of the SFs for all jets for which a tagging requirement is made.

Measurements of the SFs have been made using \(t\bar{t}\) events for \(b\)-jets [9, 10] and inclusive jet events for light-flavour jets [11]. Measurements of SFs for \(c\)-jets using proton–proton (\(pp\)) collision data at \(\sqrt{s}= 7\) \(\text {Te}\text {V}\), described in Ref. [12], use two methods to measure the mistagging efficiency of \(c\)-jets. One technique uses the production of a W boson in association with a \(c\)-jet that decays semi-muonically and the other identifies a \(D^*\) meson within \(c\)-jets by explicitly reconstructing the meson decay chain \(D^{*+} \rightarrow D^{0} \pi ^+ \rightarrow K^{-}\pi ^+\pi ^+\).

This paper discusses a complementary technique to measure the inclusive \(c\)-jet mistagging efficiency by means of a \(c\)-jet sample that is derived from \(t\bar{t}\) events. The measurement uses a \(pp\) collision dataset which was collected with the ATLAS detector during 2015–2018 and corresponds to an integrated luminosity of 139 fb\(^{-1}\). A likelihood-based method is used to select a high-purity \(t\bar{t}\) sample in which one of the W bosons originating from the top-quark decay \(t\rightarrow W b\) decays leptonically into an electron or a muon plus the corresponding neutrino and the other decays hadronically. The branching ratio of a W boson to final states containing a charm-quark, which is approximately 33% [13], allows the \(c\)-jet mistagging efficiency of the data to be determined as a function of jet transverse momentum by fitting simulated event samples to the data sample. The technique has the advantage that it does not depend on a specific hadron decay chain topology.

The measurement of the \(c\)-jet mistagging efficiency presented in this paper uses particle-flow (PFlow) jets, which are made by combining calorimeter energy deposits with inner-detector tracks [14], but the same technique is also used to make measurements for jets made only from calorimeter energy deposits [15] and jets that only use tracks (track-jets) [16].

This paper is structured as follows. Section 2 describes the ATLAS detector. Data and simulated samples are discussed in Sect. 3. The reconstruction of electrons, muons, jets and missing transverse momentum is described in Sect. 4, and the \(b\)-tagging algorithms are discussed in Sect. 5. The selection of the \(t\bar{t}\) sample is described in Sect. 6. The measurement of the \(c\)-jet mistagging efficiency is described in Sect. 7, and Sect. 8 gives the conclusions.

2 ATLAS detector

The ATLAS detector [1] is a multipurpose particle physics detector with a forward–backward symmetric cylindrical geometry and nearly 4\(\pi \) coverage in solid angle.Footnote 1 The inner tracking detector consists of silicon pixel and microstrip detectors covering the pseudorapidity region \(|\eta | < 2.5\), surrounded by a transition radiation tracker which enhances electron identification in the region \(|\eta | < 2.0\). Between Run 1 and Run 2, a new inner pixel layer, the insertable B-layer [17, 18], was added at a mean sensor radius of 3.3 cm. The inner detector is surrounded by a thin superconducting solenoid providing an axial 2 T magnetic field, and by a fine-granularity lead/liquid-argon (LAr) electromagnetic calorimeter covering \(|\eta | < 3.2\). A steel/scintillator-tile calorimeter provides hadronic coverage in the central pseudorapidity range (\(|\eta | < 1.7\)). The endcap and forward regions (\(1.5< |\eta | < 4.9\)) of the hadronic calorimeter are made of LAr active layers with either copper or tungsten as the absorber material. An extensive muon spectrometer (MS) with an air-core toroidal magnet system surrounds the calorimeters. Three layers of high-precision tracking chambers provide coverage in the range \(|\eta | < 2.7\), while dedicated fast chambers allow triggering in the region \(|\eta | < 2.4\). The ATLAS trigger system consists of a hardware-based level-1 trigger followed by a software-based high-level trigger [19]. An extensive software suite [20] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Data and simulated event samples

The data analysed in this paper correspond to 139 fb\(^{-1}\) [21, 22] of \(pp\) collision data collected by the ATLAS detector between 2015 and 2018 with a centre-of-mass energy of 13 \(\text {Te}\text {V}\) and a 25 ns proton bunch crossing interval. The data sample was collected using a set of single-electron [23] and single-muon [24] triggers with \(p_{\mathrm {T}}\) thresholds in the range of 20–26 \(\text {Ge}\text {V}\) depending on the lepton flavour and data-taking period. All detector subsystems were required to be operational during data taking and to fulfil data quality requirements. Events with noise bursts or coherent noise in the calorimeters are removed. The presence of additional interactions in the same bunch crossing, referred to as pile-up, is characterised by the average number of such interactions, \(\langle \mu \rangle \), which was 34 for the whole dataset.

Simulated event samples are used to model SM processes and to estimate the expected signal yields. All samples were produced using the ATLAS simulation infrastructure [25] and \(\textsc {Geant} 4\) [26]. A subset of samples used a faster simulation based on a parameterisation for the calorimeter response and \(\textsc {Geant} 4\) for the other detector systems [25]. The simulated events are reconstructed with the same algorithms as used for data, and contain a realistic modelling of pile-up interactions. The pile-up profiles in the simulation match those of each dataset between 2015 and 2018, and were obtained by overlaying the hard-scatter events with minimum-bias events simulated using the soft QCD processes of Pythia 8 [27] with the NNPDF3.0lo set [28] of parton distribution functions (PDFs) [29] and a set of tuned parameters called the A3 tune [30]. For all samples, with the exception of those generated using Sherpa  [31], the decays of bottom and charm hadrons were performed by EvtGen  [32].

The events used in this study originate mostly from \(t\bar{t}\) production. This process was modelled using the Powheg Box v2 [33,34,35,36] generator at next-to-leading order (NLO) with the NNPDF3.0nlo PDF set [28] and the \(h_{\mathrm {damp}}\) parameterFootnote 2 set to \(1.5 \times m_{\mathrm {top}} \) [37], where \(m_{\mathrm {top}}\) denotes the mass of the top quark. The events were interfaced to Pythia 8.230 to model the parton shower, hadronisation, and underlying event, with parameters set according to the A14 tune [38] and using the NNPDF2.3lo set of PDFs [28]. The uncertainty due to initial-state radiation (ISR) is estimated by simultaneous variations of the \(h_{\mathrm {damp}}\) parameter and the renormalisation and factorisation scales, and choosing the Var3c up/down variants of the A14 tune. The impact of final-state radiation (FSR) is evaluated by varying the renormalisation scale for emissions from the parton shower by a factor two up or down. The impact of using a different parton shower and hadronisation model is evaluated by comparing the nominal \(t\bar{t}\) sample with another \(t\bar{t}\) sample generated by Powheg Box v2 using exactly the same parameters but interfaced to Herwig 7.04 [39, 40], using the H7UE tune [40] and the MMHT2014lo  PDF set [41].

In addition to \(t\bar{t}\) production, some minor backgrounds contribute to the final event sample used for the calibration. These backgrounds consist mostly of single-top and diboson production, the production of \(t\bar{t}\) in association with a vector boson and the production of a vector boson in association with jets. Details of the modelling of these samples are given in the following.

Single-top s-channel production was modelled using the Powheg Box v2 generator at NLO in QCD in the five-flavour scheme with the NNPDF3.0nlo PDF set. Single-top t-channel production was modelled using the Powheg Box v2 generator at NLO in QCD, using the four-flavour scheme and the corresponding NNPDF3.0nlo set of PDFs. The associated production of a single top quark and a W boson (tW) was modelled using the Powheg Box v2 generator at NLO in QCD, using the five-flavour scheme and the NNPDF3.0nlo set of PDFs. The diagram removal scheme [42] was used to remove interference and overlap with \(t\bar{t}\) production. The events for all single-top production channels were interfaced to Pythia 8.230 using the A14 tune and the NNPDF2.3lo set of PDFs.

The production of \(Z+\)jets and W+jets events was simulated with the Sherpa 2.2.1 generator using NLO matrix elements for up to two partons, and leading-order (LO) matrix elements for up to four partons calculated with the Comix [43] and OpenLoops  [44,45,46] libraries. They were matched with the Sherpa parton shower [47] using the MEPS@NLO prescription [48,49,50,51] and the set of tuned parameters developed by the Sherpa authors. The NNPDF3.0nnlo set of PDFs [28] was used and the samples were normalised to a next-to-next-to-leading-order (NNLO) prediction [52].

Samples of events with diboson final states (VV) were simulated with the Sherpa 2.2.1 or 2.2.2 generator depending on the process, including off-shell effects and Higgs boson contributions where appropriate. Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other hadronically, were generated using matrix elements at NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional parton emissions. Event samples for the loop-induced processes \(gg \rightarrow VV\) were generated using LO-accurate matrix elements for up to one additional parton emission for both the fully leptonic and semileptonic final states. The matrix element calculations were matched and merged with the Sherpa parton shower based on Catani–Seymour dipole factorisation [43, 47] using the MEPS@NLO prescription. The virtual QCD corrections were provided by the OpenLoops library. The NNPDF3.0nnlo set of PDFs was used, along with the dedicated set of tuned parton-shower parameters developed by the Sherpa authors.

Production of \(t\bar{t}\) in association with a vector boson was modelled using the MadGraph5_aMC@NLO 2.3.3 [53] generator at NLO with the NNPDF3.0nlo PDF set. The events were interfaced to Pythia 8.210 [27] using the A14 tune and the NNPDF2.3lo PDF set.

4 Object reconstruction

Selected events are required to contain at least one vertex having at least two associated tracks with \(p_{\text {T}} > 500\) \(\text {Me}\text {V}\), and the primary vertex is chosen to be the vertex reconstructed with the largest \(\Sigma p_{\mathrm {T}}^2\) of its associated tracks.

Electron candidates are reconstructed by matching inner-detector tracks to clusters of energy deposited in the EM calorimeter. Electrons must have \(p_{\mathrm {T}}>27\) \(\text {Ge}\text {V}\) and \(|\eta |<2.47\). The associated track must have \(|d_0|/\sigma _{d_0}<5\) and \(|z_0|\sin \theta <0.5\) mm, where \(d_0\) (\(z_0\)) is the transverse (longitudinal) impact parameter relative to the primary vertex and \(\sigma _{d_0}\) is the uncertainty in \(d_0\). Candidates are identified with a likelihood method and must satisfy the ‘medium’ identification criteria described in Ref. [54]. The likelihood relies on the shape of the EM shower measured in the calorimeter, the quality of the track reconstruction, and the quality of the match between the track and the cluster. To suppress candidates originating from photon conversions, hadron decays, or jets misidentified as electrons, candidates are required to satisfy the gradient isolation criteria based on tracking and calorimeter measurements [54]. The electron energy and reconstruction efficiency are calibrated using \(Z \rightarrow e^+e^-\) decays [54].

Muon candidates are reconstructed in the range \(|\eta |<2.5\) by combining tracks in the inner detector with tracks in the MS. All muon candidates must have \(p_{\mathrm {T}}>27\) \(\text {Ge}\text {V}\), \(|d_0|/\sigma _{d_0}<3\), and \(|z_0|\sin \theta <0.5\) mm. The ‘medium’ quality requirements described in Ref. [55] are used and muons from hadron decays are suppressed by imposing a track-based isolation requirement. The muon reconstruction efficiency in the simulation is corrected using comparisons with data [56].

Jets are formed using objects from a particle-flow algorithm, which combines energy deposits in the calorimeter with inner detector tracks [14]. The PFlow objects are combined into jets in the range \(|\eta |<2.5\) and \(p_{\mathrm {T}}>20\) \(\text {Ge}\text {V}\) using the anti-\(k_t\) algorithm [57, 58] with a radius parameter R of 0.4. A jet-vertex-tagging technique using a multivariate likelihood [59] is applied to jets with \(|\eta |<2.4\) and \(p_{\mathrm {T}}<60\) \(\text {Ge}\text {V}\) to suppress jets that are not associated with the event’s primary vertex. Jets are further calibrated according to in situ measurements of the jet energy scale [15].

The labelling scheme used to define the flavour of a jet in a simulated event is applied by matching reconstructed jets to generator-level b- or c-hadrons with \(p_{\mathrm {T}}> 5\) \(\text {Ge}\text {V}\) within a cone of size \(\Delta R = 0.3\) around the jet axis. Jets that contain a b-hadron are called \(b\)-jets. Remaining jets containing a c-hadron are called \(c\)-jets. Jets with a hadronically decaying \(\tau \)-lepton are called \(\tau \)-jets, and all remaining jets are called light-flavour jets.

Overlaps between reconstructed objects are removed using a procedure based on the angular separation between different final-state objects. The procedure is similar to the one described in Ref. [60].

The event’s missing transverse momentum, whose magnitude is denoted by \(E_{\mathrm {T}}^{\mathrm {miss}}\), is computed as the negative vectorial sum of the transverse momenta of leptons, jets and a track-based soft term [61] accounting for the contribution from particles from the primary vertex that are not already included. The jets employed in the \(E_{\mathrm {T}}^{\mathrm {miss}}\) calculation include PFlow jets and, in addition, anti-\(k_t\) \(R = 0.4\) jets with \(p_{\mathrm {T}}>30\) \(\text {Ge}\text {V}\) and \(2.5<|\eta |<4.5\).

5 Description of \(b\)-tagging algorithms

Jets that contain a b-hadron are distinguished from other jets that contain a c-hadron or only light-flavour hadrons mainly by the larger mass and longer lifetime of the b-hadron. The \(b\)-tagging algorithm studied in this paper is called DL1r, and is an updated version of the DL1 tagger described in Ref. [62]. Information from the tracks in the jet, such as their transverse impact parameter and the reconstructed secondary and tertiary vertices, are combined into a set of low-level taggers. The DL1 algorithm combines the output of the low-level taggers into a single discriminant by using a deep neural network. DL1r adds the result of the RNNIP algorithm, which is based on a recurrent neural network exploiting the correlation between the tracks’ impact parameters. Current analyses in ATLAS use the DL1r discriminant in intervals defined by the efficiency to tag \(b\)-jets in simulated \(t\bar{t}\) events [10]. The \(b\)-tagging interval boundaries are 0, 60, 70, 77, 85 and 100%.

Measurements of the tagging efficiency SFs for \(b\)-jets are made as a function of jet \(p_{\mathrm {T}}\) for \(|\eta |<2.5\), using \(t\bar{t}\) events where both top quarks decay leptonically and a method similar to that described in Ref. [10]. The SFs for the mistagging efficiency for light-flavour jets are obtained by defining a ‘flipped tagger’ in which the sign of the track impact parameters that are used in the low-level taggers is inverted [11]. This results in similar efficiencies for light- and heavy-flavour jets, allowing the light-flavour SFs to be determined. Both the b- and light-flavour jet SFs have been determined using the full Run 2 dataset. It should be noted that both sets of SFs require a determination of the \(c\)-jet SFs, which were taken from a preliminary version of the method described in this paper. Since the contamination from \(c\)-jets is relatively small in these measurements and because the previous SFs were similar to those in the present measurement, any differences only have a small impact.

Analyses that use \(b\)-tagging in ATLAS apply a weight to each simulated event, derived as the product of the SFs for all jets for which a tagging requirement is made. It is found that there are significant differences between the simulated (mis)tagging efficiency for different fragmentation models [10]. To account for these differences, simulation-to-simulation SFs are applied to those simulated samples that have a fragmentation model and decay different from the one used to derive the SFs. In the present paper the \(t\bar{t}\) simulation (see Sect. 3) uses Pythia 8 and EvtGen, which is the reference MC fragmentation program for all the measured SFs.

6 Event selection and reconstruction

The analysis aims to obtain a pure sample of semileptonic \(t\bar{t}\) events. Exactly one electron or one muon, denoted by \(\ell \), is required. The events are also required to have \(E_{\mathrm {T}}^{\mathrm {miss}}> 20\) \(\text {Ge}\text {V}\), and the transverse mass \(m_{\mathrm {T}}\) of the \(E_{\mathrm {T}}^{\mathrm {miss}}\) and the lepton must satisfy

$$\begin{aligned} m_{\mathrm {T}}= \sqrt{2 p_{\mathrm {T}}^\ell E_{\mathrm {T}}^{\mathrm {miss}}(1-\cos \Delta \phi )} > 40~\text {Ge}\text {V}, \end{aligned}$$

where \(\Delta \phi = \phi (E_{\mathrm {T}}^{\mathrm {miss}})-\phi (\ell )\) is the azimuthal angle between the lepton and \(E_{\mathrm {T}}^{\mathrm {miss}}\). Subsequently, a requirement on the number of PFlow jets is applied such that one of the following selections is true:

  • the event contains exactly four jets with \(p_{\mathrm {T}}> 25\) \(\text {Ge}\text {V}\)

  • the event contains at least three jets with \(p_{\mathrm {T}}> 25\) \(\text {Ge}\text {V}\) and exactly one jet with \(20< p_{\mathrm {T}}< 25\) \(\text {Ge}\text {V}\)

  • the event contains at least five jets with \(p_{\mathrm {T}}> 25\) \(\text {Ge}\text {V}\) and at least one of these jets has \(p_{\mathrm {T}}> 70\) \(\text {Ge}\text {V}\).

These selections are designed to keep a high number of jets covering a large range in \(p_{\mathrm {T}}\), while reducing the non-\(t\bar{t}\) background and the number of jets arising from final state QCD radiation (FSR). The first requirement is the ‘baseline’ selection, which has a relatively low rate of FSR jets, due to requiring exactly four jets. The second requirement allows measurements to be made for jets between \(20< p_{\mathrm {T}}< 25\) \(\text {Ge}\text {V}\), while minimising the non-\(t\bar{t}\) background which is more likely to have multiple low \(p_{\mathrm {T}}\) jets. The third requirement improves the statistics of high \(p_{\mathrm {T}}\) jets where the rate of FSR is greater. Although allowing five jets increases the fraction FSR jets being included in the selection, the rate of such jets is reasonable for jet \(p_{\mathrm {T}}> 70\) \(\text {Ge}\text {V}\), since the likelihood, described below, is better able to identify the top decay products at higher \(p_{\mathrm {T}}\).

The four-vectors of the four highest-\(p_{\mathrm {T}}\) jets, the lepton and the event \(E_{\mathrm {T}}^{\mathrm {miss}}\) are used as inputs to a likelihood-based \(t\bar{t}\) event reconstruction algorithm, which minimises deviations in the invariant masses of the top quarks and W bosons from their true values and is described in more detail in Ref. [63]. This algorithm uses a likelihood function to assign the four jets to the \(t\bar{t}\) decay topology. In particular, the algorithm assigns one jet to the \(b\)-jet from the leptonically decaying top quark (\(t\rightarrow Wb \rightarrow \ell \nu b\)), another jet to the \(b\)-jet from the hadronically decaying top quark (\(t\rightarrow Wb \rightarrow qq^\prime b\), where \(qq^\prime \) are the quarks into which the W boson decays) and the remaining two jets to the jets that come from the hadronic W boson decay. The jet assignment does not use any \(b\)-tagging information. The following notation is used: the jets that are assigned as the decay products of the W boson are referred to as W-jets and the remaining two jets are referred to as top-jets.

The four-jet combination that has the highest likelihood is selected for the \(t\bar{t}\) reconstruction only if the negative logarithm of the likelihood value is greater than \(-48\). This requirement increases the fraction of events where the top-quark decay products are correctly assigned. Subsequently, the top-jets are required to lie in the tightest (0–60%) DL1r \(b\)-tagging interval, whereas no \(b\)-tagging requirement is imposed on the W-jets, so as to leave them unbiased. After these requirements it is found that 98.8% of events in the \(t\bar{t}\) simulation have two top-jets that are \(b\)-jets.

The distribution of the value of the negative logarithm of the likelihood for the selected events is shown in Fig. 1. The majority of the events come from \(t\bar{t}\) production with 3% of the events, referred to as ‘non \(t\bar{t}\) ’, coming from other processes such as W+jets or Z+jets production or single-top production. The simulated \(t\bar{t}\) events are categorised according to the flavour of the W-jets. The notation ‘ll’ indicates that both W-jets are light-flavour jets. Similarly, ‘cl’ (‘bl’) indicates that one of the W-jets is a \(c\)-jet (\(b\)-jet) and the other is a light-flavour jet. The W-jet pairs with flavours other than those discussed above fall into a category called ‘other’, which also includes events in which at least one of the W-jets comes from a hadronically decaying \(\tau \)-lepton. The ll category accounts for 55% of events, the cl category for 41%, the bl category for 1.8% and the ‘other’ category for 2.5%.

Fig. 1
figure 1

Distribution of the negative logarithm of the likelihood that is used to reconstruct the \(t\bar{t}\) decay. Only events with values of this quantity exceeding \(-48\) are shown, which corresponds to the selection used in the measurement. The \(t\bar{t}\) simulation is split according to the flavours of the W-jets (ll, cl, bl and other). The (mis)tagging efficiency scale factors have not been applied to the simulation. The hashed area shows the total uncertainty, excluding the uncertainties from the (mis)tagging efficiency scale factors and the uncertainty on the jet \(p_{\mathrm {T}}\) distribution, which is derived from the difference between data and simulation

Fig. 2
figure 2

Distributions of a \(\eta \), b \(p_{\mathrm {T}}\) and c the DL1r discriminant for the particle-flow jets that are associated with the W boson decay by the likelihood-based \(t\bar{t}\) event reconstruction algorithm (W-jets). The \(t\bar{t}\) simulation is split according to the flavours of the W-jets (ll, cl, bl and other). The (mis)tagging efficiency scale factors have not been applied to the simulation. The hashed area shows the total uncertainty, excluding the uncertainties from the (mis)tagging efficiency scale factors and the uncertainty on the jet \(p_{\mathrm {T}}\) distribution, which is derived from the difference between data and simulation. The vertical dashed lines in c indicate the DL1r discriminant tagging intervals

The simulated \(t\bar{t}\) yield shown in Fig. 1, as well as in all subsequent figures and results in this paper, is corrected for compatibility with ATLAS \(t\bar{t}\) measurements [64], which indicate that the simulation underestimates the number of events containing more than two \(b\)-jets. For this reason, simulated events in which both the top-jets and at least one of the W-jets are \(b\)-jets are scaled by \(1.25 \pm 0.25\), where the full difference between data and the prediction from simulation is taken as a systematic uncertainty. The remaining events are normalised using the prediction from simulation. Finally, Fig. 1 includes an uncertainty in the simulation estimate (see Sect. 7.2), derived as the combination of the detector systematic uncertainties, the \(t\bar{t}\) modelling uncertainties and the uncertainty due to the limited number of simulated events.

The \(\eta \) and \(p_{\mathrm {T}}\) distributions of the W-jets that are selected with the procedure described above are shown in Fig. 2. Although the simulation generally describes the data well, it can be seen that it has a slightly harder \(p_{\mathrm {T}}\) spectrum than the data. This difference is included as a systematic error as described in Sect. 7.2. Figure 2 also shows the DL1r discriminant, where it can be seen that the simulation describes the data reasonably well, but there are some differences, particularly at high and low DL1r discriminant values.

7 Measurement of the charm mistagging efficiency

7.1 Method

The \(c\)-jet mistagging efficiency is determined by comparing the fraction of events where either of the W-jets are tagged in data with the fraction tagged in simulation. The efficiency is evaluated in four jet-\(p_{\mathrm {T}}\) intervals, with boundaries at 20, 40, 65, 140 and 250 \(\text {Ge}\text {V}\), and for the tagging intervals as described in Sect. 5. Jets are said to be tagged (untagged) if they have DL1r discriminant values greater (less) than the value at the 85% boundary. The method described below is for the ‘pseudo-continuous’ calibration where the efficiency is defined as the fraction of \(c\)-jets that have a DL1r discriminant value that lies between the boundaries, i.e. a value greater than the 60% boundary, a value between the 70% and 60% boundaries, etc.

Table 1 The measured \(c\)-jet pseudo-continuous mistagging efficiency scale factors listed with the total, statistical and systematic uncertainties for particle-flow jets shown for each tagging and jet-\(p_{\mathrm {T}}\) interval. The scale factors are listed for the Pythia 8+EvtGen fragmentation model

The light-flavour jet and \(b\)-jet SFs are taken from other measurements as described in Sect. 5 and the corresponding SFs are applied to the two W-jets and the two top-jets if they are light-flavour jets or \(b\)-jets. The remaining difference between the numbers of tagged W-jets events in data and simulation is used to determine the ratio of the \(c\)-jet mistagging efficiencies in data and simulation.

Fig. 3
figure 3

The \(c\)-jet pseudo-continuous mistagging efficiency scale factors for particle-flow jets shown as a function of jet \(p_{\mathrm {T}}\) for four tagging intervals. The scale factors are shown for the Pythia 8+EvtGen fragmentation model

Table 2 The contributions to the \(c\)-jet pseudo-continuous mistagging efficiency scale factor systematic uncertainties for particle-flow jets. Listed are the uncertainties related to the \(t\bar{t}\) modelling, the jets and \(E_{\mathrm {T}}^{\mathrm {miss}}\), the light-flavour jet scale factor, the \(b\)-jet scale factor, and all other sources
Fig. 4
figure 4

The \(c\)-jet pseudo-continuous mistagging efficiencies for particle-flow jets shown as red points as a function of tagging interval for the four jet-\(p_{\mathrm {T}}\) ranges. Included in the plots are the corresponding MC efficiencies using the Pythia 8+EvtGen fragmentation model

To simplify the fit, events where both W-jets are tagged, making up 0.7% of the total number of events, are rejected. Events where both W-jets are untagged are retained to provide a normalisation of the simulation as described below. The data are divided into bins according to the tagging interval (labelled t, running from 1 to 4) and the \(p_{\mathrm {T}}\) bin of the tagged W-jet (labelled i), and the \(p_{\mathrm {T}}\) bin of the untagged W-jet (labelled j). The number of events in each bin is denoted by \(N^{t}_{\mathrm {data}}(i,j)\). In addition, the number of untagged data events is recorded in each W-jet \(p_{\mathrm {T}}\) bin combination: \(N^{\mathrm {untag}}_{\mathrm {data}}(i,j)\) where in this case the higher-\(p_{\mathrm {T}}\) jet is denoted by j, so \(j\ge i\).

The fit is performed by comparing data with the total expectation, which is dominated by \(t\bar{t}\) events. The number of simulated ‘signal’ events in each bin, \(N^{t}_{C}(i,j)\), is made up of those events where the tagged W-jet is a \(c\)-jet; the other jets in the event may be of any flavour. There are two sources of ‘background’ events, defined as events where the tagged W-jet is not a \(c\)-jet. The first is the number of events where neither of the top-jets is a \(c\)-jet, \(N^{t}_{J}(i,j)\). The second arises from events where a \(c\)-jet is misclassified as one of the top-jets. This background requires a specific treatment since the \(c\)-jet is in a \(p_{\mathrm {T}}\) bin, labelled k, different from those of the W-jets and so must be binned in this variable in addition to i and j. The number of events from this background is denoted by \(N^{t}_{X}(i,j,k)\). The number of untagged simulated events is recorded in a similar way to the data as \(N^{\mathrm {untag}}_{\mathrm {MC}}(i,j)\).

The free parameters in the fit are the \(c\)-jet mistagging SF in each \(p_{\mathrm {T}}\) bin and tagging interval, \(c^{t}(i)\), and the overall MC normalisation in each W-jet \(p_{\mathrm {T}}\) combination, p(ij).

A fit is performed by minimising the \(\chi ^2\) defined as:

$$\begin{aligned} \chi ^2= & {} \sum _{t=1}^4 \sum _{i=1}^4 \sum _{j=1}^4 \left[ N^{t}_{\mathrm {data}}(i,j)- p(i,j) \Bigg ( c^{t}(i)N^{t}_{C}(i,j)\right. \nonumber \\&\left. +N^{t}_{J}(i,j)+\sum _{k=1}^4 c^{t=4}(k) N^{t}_{X}(i,j,k)\Bigg )\right] ^2/N^{t}_{\mathrm {data}}(i,j)\nonumber \\&+ \sum _{i=1}^4 \sum _{j=i}^4 \left( N^{\mathrm {untag}}_{\mathrm {data}}(i,j)-p(i,j)N^{\mathrm {untag}}_{\mathrm {MC}}(i,j)\right) ^2\nonumber \\&\quad /N^{\mathrm {untag}}_{\mathrm {data}}(i,j). \end{aligned}$$
(1)
Fig. 5
figure 5

The DL1r discriminant distribution for tagged particle-flow jets associated with the W boson decay (W-jets) after applying the b-, c- and light-flavour jet tagging scale factors to the simulation. The red dashed lines (bottom) show the uncertainty from the jet tagging scale factors, while the hashed area shows all other uncertainties excluding the uncertainty on the jet \(p_{\mathrm {T}}\) distribution, which is derived from the difference between data and simulation. The vertical dashed lines indicate the DL1r discriminant tagging intervals

Fig. 6
figure 6

The \(c\)-jet single-cut mistagging efficiency scale factors for particle-flow jets shown as a function of jet \(p_{\mathrm {T}}\) for the four tagging operating points. The scale factors are shown for the Pythia 8+EvtGen fragmentation model

The background where one of the top-jets is a \(c\)-jet depends only on \(c^{t=4}(k)\) since the top-jets are tagged at the 60% operating point. The SFs for \(c\)-jets that are untagged in each jet \(p_{\mathrm {T}}\) bin, \(c^{\mathrm {untag}}(i)\), are evaluated by constraining the sum of the measured efficiencies of all tagged bins and the untagged bin to be 1. Technically, this is evaluated by adding additional terms to the \(\chi ^2\) and treating the SFs \(c^{\mathrm {untag}}(i)\) as additional free parameters with an arbitrary tolerance of \(1\%\):

$$\begin{aligned} \sum _{i=1}^4 \left( 1-c^{\mathrm {untag}}(i)\epsilon ^{{\mathrm {untag}}}_{{\mathrm {MC}}}(i) -\sum _{t=1}^4 c^{t}(i) \epsilon _{{\mathrm {MC}}}^t(i)\right) ^2/(1\%)^2, \end{aligned}$$
(2)

where \(\epsilon ^{{\mathrm {untag}}}_{{\mathrm {MC}}}(i)\) and \(\epsilon _{{\mathrm {MC}}}^t(i)\) are the MC untagged and tagged efficiencies, respectively. As part of validating the method it was checked that scale factors of unity are obtained when performing a fit with data replaced by the MC simulation.

7.2 Systematic uncertainties

Several sources of experimental and MC modelling systematic uncertainties affecting the \(c\)-jet efficiency SFs are considered. The uncertainty from each source is evaluated by making the corresponding change to the MC model and rederiving the SFs for each tagging interval and jet \(p_{\mathrm {T}}\) bin. The total systematic uncertainty for each bin is evaluated by adding the individual contributions in quadrature.

The uncertainties in the jet energy scale and resolution are based on their measurements in data [15] and result in shifts of up to 5% in the measured SFs due to a single component. Uncertainties in the track-based soft term of the \(E_{\mathrm {T}}^{\mathrm {miss}}\) [65] result in shifts of up to 3%. The uncertainty due to the pile-up modelling [66] leads to shifts of up to 0.7%. Uncertainties originating from the electron [54] and muon [56] reconstruction lead to shifts up to 0.5%. An additional uncertainty is applied to cover the W-jet \(p_{\mathrm {T}}\) spectrum differences between data and simulation, giving a maximum uncertainty in the measured SFs of 0.2%.

The uncertainties in the light-flavour mistagging and \(b\)-jet tagging SFs are accounted for by shifting the SFs by each component of the total uncertainty, and where appropriate correlating this with the uncertainty used in the charm SF determination. The most important non-correlated uncertainties all come from the light-flavour SF. They originate from using the flipped tagger rather than standard tagger, giving uncertainties of up to 7%, and from varying the charm SFs in the light-flavour SF determination, giving uncertainties of up to 4% [11].

Uncertainties in the modelling of the \(t\bar{t}\) background are estimated by replacing the nominal MC sample by alternative MC samples. The uncertainty due to the choice of parton shower and hadronisation model is estimated by replacing the standard Pythia 8 model with Herwig. This is the largest source of uncertainty in the analysis and leads to shifts of up to 13% in the SF. The uncertainty in the modelling of initial- and final-state radiation is assessed by increasing the radiation by doubling the \(h_{\mathrm {damp}}\) parameter [37] and halving the renormalisation and factorisation scales and decreasing the radiation by increasing the scales by a factor of two. The resulting uncertainty is at most 1%. As already mentioned, the uncertainty in the normalisation of the \(t\bar{t}+b\bar{b}\) process [64] is estimated by scaling the events where both the top-jets and at least one of the W-jets are \(b\)-jets by \(\pm 25\)%, giving an uncertainty of up to 9% on the SF. The uncertainty in non-\(t\bar{t}\) processes is evaluated by scaling these contributions by \(\pm 25\%\), leading to a maximum uncertainty of 1% on the SF.

7.3 Results

The measured \(c\)-jet pseudo-continuous mistagging efficiency SFs together with the total, statistical and systematic uncertainties are listed in Table 1 and shown in Fig. 3 for four tagging intervals. The fitted normalisation factors p(ij) lie in the range 0.83 to 1.05. The contributions to the systematic uncertainty are listed in Table 2. The corresponding measured data mistagging efficiencies, shown in Fig. 4, are obtained by multiplying the mistagging efficiencies from simulation by the SFs. The SFs do not have a strong jet-\(p_{\mathrm {T}}\) dependence and are consistent with unity for all but the tightest tagging interval, where the efficiency is higher in data than in MC simulation. The total uncertainty varies from 3 to 17% and is dominated by the systematic uncertainty except in the highest \(p_{\mathrm {T}}\) bin.

Figure 5 shows the DL1r discriminant distributions for tagged W-jets, after applying to simulation the light-flavour mistagging and b-tagging efficiency SFs, taken from other measurements as described in Sect. 5, and the \(c\)-jet SFs obtained from this analysis. It can be seen that the simulation agrees with the data within uncertainties over the entire distribution.

Many ATLAS analyses also use the SFs in a simplified form called ‘single cut’, where events are classified as tagged (untagged) if the DL1r discriminant lies above (below) one of the operating point boundaries. The corresponding SFs are determined in a similar way to the \(c\)-jet pseudo-continuous mistagging efficiency SFs by using Eq. (1), but replacing \(N^{t}_{\mathrm {data}}\), \(N^{t}_{C}\), \(N^{t}_{J}\) and \(N^{t}_{X}\) with the corresponding numbers of events in which the tagged jet has a DL1r discriminant value above the operating point. The extra terms in Eq. (2) are not used in these fits. Instead the untagged SFs are evaluated using the results of the fit as \((1-c^t(i)\epsilon ^t_{{\mathrm {MC}}}(i))/(1-\epsilon ^t_{{\mathrm {MC}}}(i))\). The resulting single-cut \(c\)-jet mistagging efficiency SFs are shown in Fig. 6.

8 Conclusion

A new technique has been used to measure the inclusive \(c\)-jet mistagging efficiency, using semileptonic \(t\bar{t}\) events from \(pp\) collisions at \(\sqrt{s}=13\) \(\text {Te}\text {V}\) at the LHC. The measurement is based on an integrated luminosity of 139 fb\(^{-1}\), corresponding to the full Run 2 dataset, collected with the ATLAS detector. The efficiencies were measured as a function of jet \(p_{\mathrm {T}}\) in the range 20–250 GeV, with boundaries corresponding to 60, 70, 77 and 85% efficiency to tag \(b\)-jets in simulated \(t\bar{t}\) events.The total uncertainties are in the range 3–17%. The measured efficiencies generally agree with those in simulation but are higher for the tightest (0–60%) tagging interval.