ATLAS flavour-tagging algorithms for the LHC Run 2 pp collision dataset

The flavour-tagging algorithms developed by the ATLAS Collaboration and used to analyse its dataset of s=13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s} = 13$$\end{document} TeV pp collisions from Run 2 of the Large Hadron Collider are presented. These new tagging algorithms are based on recurrent and deep neural networks, and their performance is evaluated in simulated collision events. These developments yield considerable improvements over previous jet-flavour identification strategies. At the 77% b-jet identification efficiency operating point, light-jet (charm-jet) rejection factors of 170 (5) are achieved in a sample of simulated Standard Model tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{t}$$\end{document} events; similarly, at a c-jet identification efficiency of 30%, a light-jet (b-jet) rejection factor of 70 (9) is obtained.


Introduction
The separation of jets containing -and -hadrons ( -jets and -jets, respectively) against jets containing neither -or -hadrons (light-flavour jets) is of major importance in many areas of the physics programme of the ATLAS experiment [1] at the Large Hadron Collider (LHC) [2]. Flavour-tagging has been decisive in observations of the Higgs boson decay into bottom quarks [3] and of its production in association with a top-quark pair [4], and plays a crucial role in a large number of Standard Model (SM) precision measurements, studies of Higgs boson properties, and searches for new phenomena.
The ATLAS Collaboration uses various algorithms to identify -and -jets [5], referred to as flavour-tagging algorithms, when analysing data from collisions recorded during Run 2 of the LHC (2015-2018) at √ = 13 TeV. These algorithms exploit the long lifetime, high mass and high decay multiplicity of -and -hadrons as well as the properties of heavy-quark fragmentation. Given a lifetime of the order of 1.5 ps (⟨ ⟩ ≈ 450 m), energetic -hadrons have a significant mean flight length, ⟨ ⟩ = , in the detector before decaying, generally leading to at least one vertex displaced by a few mm from the hard-scatter collision point.
The strategy developed by the ATLAS Collaboration is based on a two-stage approach. Low-level algorithms reconstruct the characteristic features of the heavy-flavour jets via two complementary approaches: one that uses the properties of individual charged-particle tracks (referred to as 'tracks') associated with a hadronic jet, and a second which combines the tracks to explicitly reconstruct displaced vertices. Then, in order to maximise performance, the results of low-level algorithms are combined in high-level algorithms consisting of multivariate classifiers. The analysis of the data from Run 2 of the LHC is marked by improvements and retuning of the low-level algorithms [6], first introduced during Run 1, but also by the introduction of new low-and high-level algorithms respectively based on recurrent and deep neural networks. This yields considerable improvements over previous work, which was based on boosted decision trees or likelihood discriminants. This paper is organised as follows. Section 2 introduces the ATLAS detector. The simulated Monte Carlo events used in this work are described in Section 3. Section 4 contains the description of the objects reconstructed in the detector which are key inputs to flavour-tagging algorithms, while Sections 5 and 6 describe the low-and high-level tagging algorithms respectively. Finally, their performance, evaluated on simulated event samples, is presented in Section 7.

The ATLAS detector
The ATLAS detector [1] at the LHC covers nearly the entire solid angle around the collision point. It consists of an inner tracking detector (ID) surrounded by a superconducting solenoid, electromagnetic and hadronic calorimeters and a muon spectrometer incorporating three large superconducting toroid magnets.
The ID consists of a high-granularity silicon pixel detector which covers the vertex region and typically provides four measurements per track. The innermost layer, known as the insertable B-layer (IBL) [7], was added in 2014 and provides high-resolution hits at small radius to improve the tracking performance. For a fixed -jet efficiency, the incorporation of the IBL improves the light-flavour jet rejection of the -tagging algorithms by up to a factor of four [8]. The silicon pixel detector is followed by a silicon microstrip tracker that typically provides eight measurements from four strip double layers. These silicon detectors are complemented by a transition radiation tracker (TRT), which enables radially extended track reconstruction up to a pseudorapidity 1 of | | = 2.0. The TRT also provides electron identification information based on the fraction of hits (typically 33 in the barrel and up to an average of 38 in the endcaps) above a higher energy-deposit threshold corresponding to transition radiation. The ID is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the pseudorapidity range | | < 2.5.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) sampling calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by a steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadronic endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements, respectively.
The muon spectrometer comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroids. The precision chamber system covers the region | | < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel and thin-gap chambers in the endcap regions.
A two-level trigger system [9] is used to select interesting events. The first level of the trigger is implemented in hardware and uses a subset of detector information to reduce the event rate to a design value of at most 100 kHz. It is followed by a software-based trigger that reduces the event rate to a maximum of around 1 kHz for offline storage.
An extensive software suite [10] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Monte Carlo Samples
The optimisation of the ATLAS Run 2 -tagging algorithms is performed with jets from a 'hybrid' sample composed of a mixture of simulated SM¯and high-mass ′ →¯events. The ′ events do not correspond to a single resonance but have a broad ′ mass spectrum in order to optimise the -tagging performance at high jet momentum transverse to the beam-line ( T ). The final hybrid sample for training is obtained by mixing all -jets from the available¯events, if the corresponding -hadron T is below 250 GeV, with all jets containing a -hadron with T > 250 GeV from the ′ sample. A similar strategy, based on the jet T , is applied for -jets and light-flavour jets. No attempt is made to distinguish quark-initiated jets from gluon-initiated jets in these samples.
The¯simulation sample was produced using Powheg Box v2 [11][12][13][14], which yields matrix elements at next-to-leading order (NLO) in the strong coupling constant s for top-quark pair production. The first-gluon-emission cut-off scale parameter ℎ damp was set to 1.5 , with = 172.5 GeV used for the top-quark mass. Powheg Box was interfaced to Pythia 8.230 [15] with the A14 set of tuned parameters [16] 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Δ ≡ √︁ (Δ ) 2 + (Δ ) 2 . and NNPDF3.0nnlo (NNPDF2.3lo) [17,18] parton distribution functions in the matrix elements (parton shower). This set-up was found to produce the best modelling, out of a number of available generator configurations, of the multiplicity of additional jets and of both the individual top-quark and¯system T [19, 20]. The¯events with at least one leptonic -boson decay are considered, which ensures that a sufficiently large fraction of -jets, -jets, and light-flavour jets are present in the jet population.
To train and to evaluate the performance of the -tagging algorithms at high jet T , a ′ sample was generated using Pythia 8.212 with the A14 set of tuned parameters for the underlying event and the leading-order NNPDF2.3lo [18] parton distribution functions. The cross-section of the hard-scattering process was modified by applying an event-by-event weighting factor to broaden the natural width of the resonance. The branching fractions of the decays were set to be one-third each for the¯,¯and light-flavour quark pairs to obtain a sample uniformly populated by jets of each flavour. This results in a fairly flat jet T spectrum between 250 GeV and 1.5 TeV for -jets, -jets, and light-flavour jets, with the falling tail of the T distribution populated to 3 TeV for each flavour.
The EvtGen [21] package was used to simulate the decay of heavy-flavour hadrons. All simulated events have additional overlaid minimum-bias interactions generated with Pythia 8.160 with the A3 set of tuned parameters [22] and NNPDF2.3lo parton distribution functions to simulate pile-up background. 2 These events are weighted to reproduce the observed distribution of the average number of interactions per bunch crossing in the corresponding data sample. The simulated events were processed through the full ATLAS detector simulation [23] based on Geant4 [24]. Interactions of -hadrons with the detector material were not simulated; -hadrons and -leptons are similarly affected. This may result in sub-optimal identification algorithm performance in data for jets with very energetic particles that decay beyond the IBL. ATLAS analyses apply dedicated correction factors and corresponding uncertainties to account for the difference in performance between data and simulation.
while the transverse resolution ranges from 10 to 12 m, depending on the LHC running conditions that determine the beam-spot size. Flavour tagging requires at least one PV in each event, and the PV with the highest sum of squared transverse momenta of contributing tracks is selected as the primary interaction point or 'hard-scatter vertex'; all tracks contributing to a vertex are used in the hard-scatter vertex determination -they need not be associated to a hadronic jet. Charged-particle tracks originating from -hadron decays often have large transverse and longitudinal impact parameters, 0 and 0 respectively, where 0 is the distance of closest approach of the track to the PV in the transverse plane, and 0 is the longitudinal separation between the PV and the point on the track where 0 is measured, referred to below as the 'perigee'.
Hadronic jets are built from 'particle-flow objects', which are constructed from signals in the ATLAS calorimeters and ID; particle-flow objects take advantage of the better resolution of particle tracking when low-T charged hadrons result in geometrically-matched calorimeter deposits and a charged-particle track [29]. The anti-algorithm with radius parameter = 0.4 [30], implemented in FastJet [31], is used for jet finding. Jet transverse momenta are further corrected to the corresponding particle-level jet T , based on the simulation [32]. Remaining differences between simulated events and observed data are evaluated using in situ techniques, which exploit the transverse momentum balance between a jet and a reference object such as a photon, boson, or multi-jet system in data; corrections are applied to simulated jets to bring them in line with data [32]. Jets with T < 20 GeV or | | ≥ 2.5 are not considered for jet flavour identification, as low-T jets are outside the valid calibration range and high-| | jets are outside the tracking fiducial volume. In order to reduce the number of jets with large energy fractions from pile-up collision vertices, the 'jet vertex tagger' (JVT) algorithm is used [33]. The JVT procedure builds a multivariate discriminant for each jet within | | < 2.4 based on the ID tracks ghost-associated 3 with the jet [34]; in particular, jets with a large fraction of high-momentum tracks from pile-up vertices are less likely to pass the JVT requirement. The JVT efficiency for jets originating from the hard scattering is 92% in the simulation, but the rate of pile-up jets with T ≥ 60 GeV is sufficiently small that the JVT requirement is removed above this threshold. The reconstructed jet direction and transverse momentum are especially important inputs to flavour-tagging, as they will determine which charged-particle tracks should be considered for jet flavour identification.
Tracks are matched to jets by setting a maximum allowed angular separation Δ between the track momenta, defined at the perigee, and the jet axis. Given that the decay products from higher-T -hadrons are more collimated, the Δ requirement varies as a function of jet T , being wider for low-T jets (0.45 for jet T = 20 GeV) and narrower for high-T jets (0.26 for jet T = 150 GeV); if more than one jet fulfils the matching criteria, the closest jet is preferred. The jet axis is also used to assign signed impact parameters to tracks, where the sign is defined to be positive if the track intersects the jet axis in the transverse plane in front of the primary vertex, and negative if the intersection lies behind the primary vertex [5].
The flavour of a jet in simulation is determined by the nature of the hadrons it contains. Jets are labelled as -jets if at least one weakly decaying -hadron having T ≥ 5 GeV is found within a cone of size Δ = 0.3 around the jet axis. If no -hadrons are found, -hadrons and then -leptons are searched for, based on the same selection criteria. The jets matched to a -hadron ( -lepton) are labelled as -jets ( -jets). The remaining jets are labelled as light-flavour jets. For jets with more than one heavy-flavour hadron, e.g. from gluon splitting to¯or¯, the procedure above is still followed, and¯(¯) jets will receive a ( ) label.

Low-level -taggers
This section describes the different low-level algorithms used for -jet identification in the ATLAS Run 2 dataset. These algorithms, designed to reconstruct the characteristic features of -jets, fall into two broad categories. The first approach, implemented in the IP2D and IP3D algorithms [35] and described in Section 5.1, is inclusive and based on exploiting the large impact parameters of the tracks originating from the -hadron decays. The new RNNIP [36] algorithm, developed during Run 2 and described in Section 5.2, exploits a recurrent neural network [37] to learn track impact-parameter correlations in order to further improve the jet flavour discrimination. The second approach explicitly reconstructs displaced vertices. The SV1 algorithm [38], discussed in Section 5.3, attempts to reconstruct an inclusive secondary vertex, while the JetFitter algorithm [39], presented in Section 5.4, aims to reconstruct the full -to -hadron decay chain.

Algorithms based on impact parameters
Two impact-parameter-based (IP-based) algorithms, IP2D and IP3D [35], are used by ATLAS. The IP2D tagger makes use of the signed transverse impact parameter significance of tracks to construct a discriminating variable, whereas IP3D uses both the signed transverse and signed longitudinal impact parameter significances 4 in a two-dimensional template to account for their correlation. The signed transverse and longitudinal impact parameter significances are shown in Figure 1. Probability density functions (pdfs) obtained from reference histograms of the signed transverse and longitudinal impact parameter significances of tracks associated with -jets, -jets and light-flavour jets are derived from MC simulation. The pdfs are computed in exclusive categories that depend on the hit multiplicities of the tracks in the different ID layers to increase the discriminating power. In particular, the IBL hit pattern expectations and the second-innermost layer's information are fully exploited to improve the -tagging performance in LHC Run 2. A set of template pdfs is produced using an equal mix of simulated¯and ′ events for track categories with no hits in the first two layers, which are populated by long-lived -hadrons traversing the first layers before they decay. The¯sample is used to populate all remaining categories. The pdfs are used to calculate ratios of the -jet, -jet and light-flavour jet probabilities on a per-track basis. Log-likelihood ratio (LLR) discriminants are then defined as the sum of the logarithms of the per-track probability ratios for each jet-flavour hypothesis, e.g. =1 ln ( )/ light ( ) for the -jet and light-flavour jet hypotheses, where is the number of tracks associated with the jet and ( ) light ( ) is the template pdf for the -jet (light-flavour jet) hypothesis, with the impact parameter significance of the track . The flavour probabilities of the different tracks contributing to the sum are assumed to be independent of each other. The log-likelihood ratios separating -jets from light-flavour jets for the IP2D and IP3D -tagging algorithms are shown in Figure 2. In addition to the LLR separating -jets from light-flavour jets, two extra LLR discriminants are defined to separate -jets from -jets and -jets from light-flavour jets, respectively. These three likelihood discriminants for both the IP2D and IP3D algorithms are used as inputs to the high-level taggers.     Figure 2: The log-likelihood ratio for the (a) IP2D and (b) IP3D -tagging algorithms for -jets, -jets and light-flavour jets in¯events. The log-likelihood ratio shown here is computed as the ratio of the -jet to light-flavour jet hypothesis probabilities. Jets with no tracks are not included in the plot, but assigned a large negative value. The first (last) bin in the distribution does not account for underflow (overflow).

Track-based recurrent neural network tagger
In the case of a -hadron decay, several charged particles can emerge from the secondary (or tertiary) decay vertex with large impact parameters. The impact parameters of these hadron-decay tracks are intrinsically correlated: if one track is found with a large impact parameter then finding a second track with a large impact parameter is more likely. If no displaced decay is present, as in light-flavour jets, then such a correlation for large impact parameter significance is not expected. The baseline IP-based -tagging algorithms, described in Section 5.1 uses likelihood templates to compute per-flavour conditional likelihoods. Due to the large event sample needed to compute such templates and therefore the difficulty in exploiting more input variables as additional histogram axes, IP-based -tagging algorithms assume that the properties of each track in a jet are independent of all other tracks, which limits their ability to fully model the properties of -jets. Recurrent neural networks (RNN) can be used to overcome this challenge by directly learning sequential dependencies between the variables in sequences of arbitrary length, as is the case for the tracks in a jet. For each selected track, the lifetime-correlated signed transverse ( 0 ) and longitudinal ( 0 ) impact parameter significances, the fraction of transverse momentum carried by the track relative to the jet T ( frac T ), the angular distance between the track and the jet-axis (Δ (track, jet-axis)) and the hit multiplicity of the track in the different ID layers are fed into the network [36]. The tracks are ordered by the | 0 | values to form a sequence emphasising the particular importance this kinematic feature. This sequence is then passed to the neural network cells as a vector of the ordered-track features. During the training, the -jet and -jet T spectra are separately reweighted to the light-flavour jet spectrum so as to prevent the RNN from learning to discriminate directly from sample-and flavour-specific momentum distributions. The outputs provided by the network correspond to the -jet, -jet, and light-flavour jet probabilities. Figure 3 shows a schematic of the network architecture used for this algorithm, known as RNNIP.
The outputs of the RNN are combined into the -tagging discriminant function as defined as: where , and light are the three jet-flavour probabilities, and denotes the -jet fraction. The relative importance of -jet and light-flavour-jet rejection can be changed by varying . In this paper an optimised -jet fraction of 0.07 is used in order evaluate the performance of the RNNIP -tagging algorithm in¯and ′ →¯events. This value is chosen as a compromise to ensure good rejection factors for both -jets and light-flavour jets in a large -tagging efficiency range. The distribution of the RNNIP -tagging discriminant for -jets, -jets and light-flavour jets in the baseline¯simulated events is shown in Figure 4.

Secondary-vertex-tagging algorithm
The secondary-vertex-tagging algorithm, SV1 [38], reconstructs a single displaced secondary vertex in a jet. The reconstruction starts by identifying the possible two-track vertices built with all tracks associated with the jet, while rejecting any tracks making two-track vertices compatible with 0 S or Λ decays, photon conversions or hadronic interactions with the detector material. The SV1 algorithm runs iteratively on all tracks contributing to the selected two-track vertices, trying to fit one secondary vertex. In each iteration, the track-to-vertex matching is evaluated using a 2 test. The track with the largest 2 is removed and the vertex fit is repeated until an acceptable vertex 2 and a vertex invariant mass less than 6 GeV are obtained. With this approach, the decay products from -and -hadrons are typically assigned to a single  common secondary vertex. The SV1 algorithm also benefits from several improvements [38] introduced during Run 2, and resulting in increased pile-up rejection and an overall enhancement of the performance at high jet T . Among the various algorithm improvements, additional track-cleaning requirements based on silicon detector hit multiplicity are applied to jets in the high-pseudorapidity region (| | ≥ 1.5) in order to improve the quality of the selected tracks. The fake-vertex rate is also better controlled by limiting the algorithm to only consider the 25 highest-T tracks in the jets. Finally, eight discriminating variables, including the jet T and , the number of tracks associated with the SV1 vertex, the invariant mass of the secondary vertex, its energy fraction (defined as the total energy of all the tracks associated with the secondary vertex divided by the energy of all the tracks associated with the jet), and the three-dimensional decay length significance, are used as inputs to the high-level taggers. Six of these variables are illustrated in Figure 5.
In order to quantify the flavour-tagging performance of the SV1 algorithm, a simple feed-forward neural network was trained by exclusively using the outputs of the algorithm and the T and of the input jets. The network was trained in the same way as the main high-level tagger and on identical training samples, as explained in Section 6. This new low-level tagger, defined only to illustrate the performance of SV1 relative to other algorithms described in this paper, is referred to as SVKine.

Topological multi-vertex finding algorithm
The topological multi-vertex finding algorithm, JetFitter [39], exploits the topological structure of weak -and -hadron decays inside the jet and tries to reconstruct the full -hadron decay chain. A modified Kalman filter [40] is used to find a common line on which the primary, -and -hadron decay   Figure 5: Properties of secondary vertices reconstructed by the SV1 algorithm for -jets, -jets and light-flavour jets in the baseline¯simulated events: (a) the number of two-track vertices reconstructed within the jet, (b) the transverse decay length, (c) the 3D decay length significance defined as the significance of the distance between the primary vertex and displaced vertex, (d) the energy fraction, defined as the energy of the tracks in the displaced vertex relative to the energy of all tracks reconstructed within the jet, (e) the invariant mass and (f) the number of tracks associated with the vertex. The increased rate of light-flavour jets at high transverse decay length values is due to residual interactions with detector material. The jumps in the frequency of reconstructed two-track vertices (a) originates from combinatorics. Expecting ( · ( − 1))/2 possible track pairs created from a set of tracks, this number is reduced due to track selection criteria, resulting in low-side tails on each spike. The first (last) bin in the distribution does not account for underflow (overflow). vertices lie, approximating the -hadron flight path as well as the vertex positions. With this approach, it is possible to resolve the -or -hadron decay vertex even if only a single track is attached to it. The JetFitter algorithm also benefits from several improvements [39] introduced during Run 2. These include a reoptimisation of the track selection to better mitigate the effect of pile-up tracks, an improvement in the rejection of interactions with detector material, and the introduction of a vertex-mass-dependent selection during the decay chain fit to increase the efficiency for tertiary vertex reconstruction. Eight discriminating variables, including the track multiplicity at the JetFitter displaced vertices, the invariant mass of tracks associated with these vertices, their energy fraction and their average three-dimensional decay length significance, are used as inputs to the high-level taggers. Six of these variables are illustrated in Figure 6. Finally, the discrimination of -jets from -jets and light-flavour jets is further improved by more specifically exploiting jets for which only a single secondary vertex is reconstructed with intermediate charged decay multiplicity and comparable decay distance to -hadrons in jets. A set of nine additional variables [35], among which the number, the invariant mass and the energy of the tracks associated with the secondary vertex as well as their rapidity, computed with respect to the jet axis and the vector defined between the primary and secondary vertices, are used as inputs to the high-level -tagging algorithms.
Similarly to the SV1 algorithm, the flavour-tagging performance of JetFitter is assessed from a simple feed-forward neural network trained by exclusively using the outputs of JetFitter and the input jets' T and . This training was performed in the same way as for the main high-level tagger and on identical training samples, as explained in Section 6. This new low-level tagger, defined only to illustrate the performance of JetFitter relative to other algorithms described in this paper, is referred to as JFKine.

High-level flavour-taggers, the DL1 series
To maximise the flavour-tagging performance for Run 2, the output quantities of the low-level algorithms are combined using deep-learning classifiers, based on fully connected multi-layer feed-forward neural networks (NN) [41], forming the so-called DL1 algorithm series.
These algorithms are trained with a hybrid training sample, for which 70% of the jets in the sample are from¯events and the remaining 30% are from ′ →¯events, using TensorFlow [42] with the Keras [42] front-end and the Adam optimiser [43]. The DL1 algorithm, introduced at the beginning of Run 2 in Ref. [6], exploits as input the IP2D, IP3D, SV1 and JetFitter algorithm outputs, while the DL1r algorithm also includes the jet RNNIP output probabilities.
The level of correlation between the different low-level algorithm outputs varies as a function of the jet flavour and the kinematic range. In general, large correlations between the IP2D, IP3D, SV1 and JetFitter algorithms are observed for heavy-flavour jets. However, these correlations are significantly reduced in the case of light-flavour jets. In addition, such correlations are further reduced in highregimes. On the other hand, the RNNIP algorithm contributes a set of input variables which are not strongly correlated. The DL1r algorithm training exploits these correlation differences to reach the best tagging performance.
In addition, the kinematic properties of the jets, namely T and | |, are included in the training in order to take advantage of the correlations with the other input variables. However, to avoid differences between the kinematic distributions of signal and background jets being used to discriminate between the different jet flavours, the input training dataset is resampled. The resampling procedure ensures that jets in the training sample are uniformly distributed in jet T and for each flavour class. No kinematic resampling is applied  at the evaluation stage of the algorithms. Table 1 presents a detailed list of input variables used by each algorithm.
Likelihood ratio of the -jet to light-flavour jet hypotheses Likelihood ratio of the -jet to -jet hypotheses Likelihood ratio of the -jet to light-flavour jet hypotheses ✓ ✓  The DL1r NN has a multidimensional output corresponding to the probabilities for a jet to be a -jet, a -jet or a light-flavour jet. The use of a multi-class network architecture provides the algorithm with a smaller memory footprint than the previous ATLAS MV2c10 algorithm [6] based on boosted decision trees (BDTs). The topology of the network consists of a mixture of fully connected hidden layers. The DL1r algorithm parameters, listed in Table 2, include the architecture of the NN, the number of training epochs, the learning rates and training batch size. Each of these parameters is optimised in order to maximise the -tagging performance. Batch normalisation [44] is added by default since it is found to improve the performance.
Training with multiple output nodes offers additional flexibility when constructing the final output discriminant by combining the -jet, -jet and light-flavour jet probabilities. Since all flavours are treated equally during training, the trained network can be used for both -jet and -jet tagging. The final DL1r where , and light represent respectively the -jet, -jet and light-flavour jet probabilities, and denotes the effective -jet fraction in the background hypothesis. Using this approach, the -jet fraction in the background can be chosen a posteriori in order to optimise the performance of the algorithm at physics analysis level. In this paper, an optimised -jet fraction of 0.018 is used to evaluate the performance of the DL1r -tagging algorithm in simulated¯and ′ →¯events. This value is chosen as a compromise to ensure good rejection factors for both -jet and light-flavour jets in a large -tagging efficiency range across a number of analyses. In particular, the ATLAS measurements of , →¯production [45] and¯, →¯production [46] were considered in this optimisation.
Similarly, the DL1r -tagging discriminant is defined as: where represents the effective -jet fraction in the background training sample. A -jet fraction of 0.2 is used to evaluate the performance of the DL1r -tagging algorithm in this paper in simulated¯and ′ →¯events. Larger than the fraction presented above, the fraction is chosen here to maximise the -jet rejection factor at the given -tagging efficiency rates of 20% and 30%. The output probabilities of the DL1r -tagging algorithms for -jets, -jets and light-flavour jets in the baseline¯simulated events are shown in Figures 7(a)-(c); the corresponding -tagging and -tagging discriminants are also shown in Figures 7(d) and 7(e).

Flavour-tagging performance
The performance of a flavour-tagging algorithm is characterised by the probability or efficiency of correctly tagging a signal jet, , and the probability of mistakenly identifying a background jet, referred to as the mis-tag rate. In this paper, the performance of the algorithms is quantified in terms of background-jet rejection factors, defined as 1/ for background jets.

-tagging performance
When analysis of LHC data requires the identification of -jets, the tagging efficiency of a given requirement on the -tagging discriminant is denoted by , and the charm-jet and light-flavour jet rejection factors are 1/ and 1/ light , respectively. Many ATLAS analyses of Run 2 LHC data use a requirement on the DL1r discriminant, or bins built from several such requirements, that do not vary with jet kinematics. These are referred to as 'fixed-cut operating points', and they are labelled according to their inclusive efficiency for the population of -jets present in the¯sample used to train DL1r; for example, the DL1r discriminant value for which 77% of the -jets in a¯sample have a higher score is called the '77% operating point'. Figures 8 and 9 show the light-flavour jet and -jet rejection factors as a function of for a variety of lowand high-level -taggers. At high-efficiency operating points, RNNIP provides the best rejection among the low-level taggers, while at low efficiency the secondary-vertex finders SVKine and JFKine achieve the highest background rejection. DL1r substantially outperforms all low-level taggers across the range: the low-level algorithms exploit different jet properties, and combining these produces large gains. DL1r also exceeds the performance of the BDT-based MV2c10 tagger and the DL1 tagger [6].  It is also important to gauge the -tagging performance across a broad T range, because the ATLAS physics programme relies on excellent background rejection in a variety of situations, depending on the  needs of a given analysis of the LHC data. In Figure 10, the background rejection achieved at a fixed -tagging efficiency of 77% is shown in bins of jet T ; this fixed-efficiency is obtained by choosing the appropriate -tagging discriminant requirement in each T bin. Figures 11 and 12 show the values and background rejection factors for jets from simulated SMā nd flattened ′ samples, respectively, in bins of jet T ; several high-level -taggers are compared at the 77% operating point. DL1r performs significantly better than previous ATLAS -taggers across a broad range of jet T , although some common patterns are worth noting: (1) the around T ≈ 175 GeV and falls with T above this point, and (2) the light-flavour jet rejection falls until about 1 TeV, above which it is approximately constant. However, while MV2c10 maintains a nearly flat -jet rejection versus jet T , the DL1 and DL1r rejection factors improve with T . The enhanced performance for highly energetic jets has yielded substantially stronger tests of the Standard Model with the ATLAS data. For example, recent searches for new resonances decaying into¯pairs using the DL1r -tagger achieved about a factor of 3 stronger limits on new narrow resonances decaying into¯than predicted via luminosity-scaling of previous results using MV2c10 [47].
Similarly, the -tagging efficiencies and background-jet rejection factors vary with the jet pseudorapidity , in large part due to the poorer track 0 and 0 resolutions at high | | [48]. Figure 13 shows and the background-jet rejection as a function of jet . The -tagging efficiency and -jet mis-tag rates are higher for all compared high-level taggers in the central region than at high | |, in part due to inefficiency of secondary-vertexing in the forward region [38,39]. The light-flavour jet rejection performance also deteriorates for more forward jets. However, DL1r consistently outperforms DL1 and MV2c10 across jet ranges. The ATLAS flavour-tagging algorithms are stable versus the number of pile-up interactions accompanying the hard-scatter collisions, as shown in Figure 14. While there is a small slope in the -tagging efficiency versus the average number of interactions per bunch crossing ⟨ ⟩, only changes by about 2% over the range 10 < ⟨ ⟩ < 70 for the inclusive 77% efficiency operating point. The light-flavour jet and -jet rejection factors also show little dependence on ⟨ ⟩.
The 60%, 70%, 77%, and 85% operating points are commonly used in ATLAS physics analyses interpreting the LHC Run 2 dataset [6]. Figure 15 shows the -tagging efficiency and background-jet rejection factors versus jet T for DL1r at these commonly used operating points. It is worth noting that relatively small changes in -tagging efficiency operating point of the order of 10% result in very different background rejection factors, which range from ∼10 to ∼10 3 for light-flavour jets and from ∼3 to ∼40 for charm jets.

Charm-tagging performance
A growing number of ATLAS analyses of LHC data require identification of -jets in order to probe physics processes involving final-state charm quarks. For example, the search for Higgs boson decays into charm quarks aims to obtain direct evidence of the charm-quark Yukawa coupling parameter by examining a sample of events with two -tagged jets [49]. An algorithm's performance is indicated by its -jet and light-flavour jet rejection factors, 1/ and 1/ light , at a given charm-tagging efficiency, . Due to charmed hadrons having shorter lifetimes and smaller masses than -hadrons, the identification of charm jets is challenging, and the efficiency of the optimal operating point tends be much lower than for -tagging. Figure 16 presents the -jet and light-flavour jet rejection factors as a function of the -tagging efficiency, evaluated in a population of jets taken from a sample of simulated¯events. Figure 17 shows the light-flavour jet and -jet rejection factors attained by the DL1 and DL1r algorithms for a fixed charm-tagging efficiency, again evaluated in jets from a¯sample. These 'iso-efficiency' curves are obtained by varying the parameter used to define the -tagging discriminant introduced in Eq. (1).

MC generator dependence
A variety of MC event generators are used in ATLAS analyses to model various production processes at the LHC. Pythia [15],  Figure 17: The light-flavour jet rejection as a function of -jet rejection for inclusive 20%, 30%, and 40% -jet efficiency operating points for the DL1 and DL1r high-level -taggers. Each point on a curve corresponds to a particular choice of , the -jet background fraction in the log-likelihood ratio that defines the tagging discriminant; the star symbols indicate the = 0.2 point.

Overtraining checks
In order to correct the tagging rates of jets in simulation, corresponding rates are measured in data through a variety of procedures [6,[53][54][55][56], reaching a precision as good as 1% in -tagging efficiency for -jets with T ∼ 100 GeV; the most precise measurement of the -jet (light-flavour jet) mis-tag rate has uncertainties of approximately 5% (15%). Efficiencies and mis-tag rates in simulation are adjusted via per-jet weights or 'scale factors' such that they reflect the performance measured in data. However,¯MC events used to train the RNNIP and DL1r algorithms are also used in the efficiency measurements and in physics analyses utilising flavour-tagging. For the calibration procedure to result in a properly corrected simulation, the DL1r performance on the 'training' sample, used to optimise the tagging algorithms, and a 'validation' sample, comprising MC events not contained in the training sample, must be consistent to within the uncertainty associated with the corresponding data-based measurement. Figure 18 shows the background rejection factors versus the -tagging efficiency separately for the training and validation samples of¯events. Figure 19 presents background rejection factors and -tagging efficiency versus the jet T , and the expected performance in the training and validation samples is again compared. No discrepancy is observed that is significant relative to the precision of efficiency measurements, indicating that it is safe to re-use the events used in DL1r training in physics analyses.  c-jet ratio The statistical uncertainties of the rejection are calculated using binomial uncertainties and are indicated as coloured bands. No difference in performance is observed above the 2% level, which is below the precision of the calibration measurements for these quantities.  Figure 19: A comparison of performance in jet T bins between the training and validation¯samples for DL1r. The training sample contains events used to optimise the DL1r algorithm, while the validation sample comprises a statistically distinct set of events that were not used during training. The (a) -tagging efficiency, (b) -jet rejection and (c) light-flavour jet rejection are shown for a broad T range at the inclusive 77% efficiency operating point, commonly used in ATLAS analyses of LHC Run 2 data. This operating point is derived using the combined training and validation samples. The lower panels show the ratio of training sample performance to validation sample performance. No statistically significant difference in performance is observed at the level of the precision of data-based efficiency measurements, which is ∼1% for -jets, ∼5% for -jets, and ∼15% for light-flavour jets. The statistical uncertainties of the efficiency (rejection) are calculated using binomial uncertainties and are indicated as coloured bands.

Conclusion
Several flavour-tagging algorithms are used to identify jets containing heavy-flavour hadrons in data recorded by the ATLAS experiment during Run 2 of the LHC. The recent ATLAS strategy combines the results of low-level algorithms (IP2D, IP3D, SV1, JetFitter and RNNIP) into high-level algorithms based on the DL1r feed-forward neural network classifier. The low-level algorithms either exploit the large impact parameters of tracks left by heavy-flavour hadron decay products or attempt to directly reconstruct their decay vertices. Large increases in background-jet rejection are obtained by the DL1r algorithms compared to each individual low-level algorithm and to previous tagging algorithms, illustrating the high complementarity of the low-level inputs. In a sample of simulated Standard Model¯events, light-flavour jet (charm-jet) rejection factors of 170 (5) are achieved at a -jet identification efficiency of 77%; similarly, at a -jet efficiency of 30%, the obtained light-flavour jet ( -jet) rejection factor is 70 (9).