Search for third-generation vector-like leptons in $pp$ collisions at $\sqrt{s} = 13\,\text{TeV}$ with the ATLAS detector

A search for vector-like leptons in multilepton (two, three, or four-or-more electrons plus muons) final states with zero or more hadronic $\tau$-lepton decays is presented. The search is performed using a dataset corresponding to an integrated luminosity of 139 fb$^{-1}$ of proton$-$proton collisions at a centre-of-mass energy of 13 TeV recorded by the ATLAS detector at the LHC. To maximize the separation of signal and background, a machine-learning classifier is used. No excess of events is observed beyond the Standard Model expectation. Using a doublet vector-like lepton model, vector-like leptons coupling to third-generation Standard Model leptons are excluded in the mass range from 130 GeV to 900 GeV at the 95% confidence level, while the highest excluded mass is expected to be 970 GeV.


Introduction
The predictions of the Standard Model (SM) of particle physics are in excellent agreement with measurements of proton-proton (pp) collisions at the Large Hadron Collider (LHC) [1]. Nonetheless, some aspects of our universe cannot be explained within the framework of the SM, such as the excess of matter over antimatter, the origin of the neutrino masses, the nature of dark matter and dark energy, and the hierarchy and fine-tuning problems [2,3]. Many possible ways to find solutions have been proposed, including models based on supersymmetry, which help to explain why the Higgs boson's mass is very far from the Planck scale. However, the measured Higgs boson branching fractions are in good agreement with the SM predictions, so it is not easy to accommodate new particles whose masses are generated via the Higgs mechanism [4][5][6]. One class of particles that are motivated by a variety of phenomenological models based on string theory or large extra dimensions are vector-like fermions that transform as non-chiral representations of the unbroken SM gauge group [7]. They therefore have Dirac masses and decouple from the electroweak scale in the large-mass limit.
A large number of searches for vector-like quarks have been performed at the Tevatron [8,9] and the LHC [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. In addition, the CMS Collaboration has performed a search for vector-like leptons (VLL) in the plus three -jet final state [28]. The search was performed in the context of the '4321 model' [29][30][31] over a mass range of 500 to 1050 GeV. Following the suggestion in Refs. [32,33], a search for VLL in a doublet model has been performed by ATLAS and is presented in this article. A similar search was performed by the CMS Collaboration using 138 fb −1 of pp collisions at √ = 13 TeV and excludes vector-like -lepton masses below 1045 GeV at the 95% confidence level (CL) [34]. The VLL doublet ′ = ( ′ , ′ ) comprises two fermions of approximately equal mass that couple only to the third-generation leptons. The VLL production cross section is dominated by → ′ ′ , which is approximately 3.7 times greater than either the pp → ′+ ′− or → ′ ′ modes, which have approximately equal cross sections. The ′ decays exclusively into W + − , while the ′ decays are ′− → Z − and ′− → H − , where the branching fraction of the former is larger, but they asymptotically approach each other with increasing ′ mass ( ′ ) due to the Goldstone equivalence principle [35]. Examples of leading-order (LO) Feynman diagrams for VLL production and decay are displayed in Figure 1. Given the possible decays, the search is performed by selecting events containing at least two charged light leptons, ± or ± , zero or more -leptons decaying hadronically, and a momentum imbalance transverse to the beam. To achieve better background rejection than is possible with an event selection based on kinematic and topological variables, a boosted decision tree (BDT) algorithm is utilized as an event classifier [36,37]. Figure 1: Examples of LO Feynman diagrams for VLL production and decay: (a) ′¯′ production followed by ′ decay into and ′ decay into + − , where ( ) represents a weak isospin +1⁄ 2 (−1⁄ 2 ) quark, (b) production of ′¯′ followed by the decay into − and + , and (c) ′¯′ production followed by ′ decay into .
This article is organized as follows. A brief description of the ATLAS detector is given in Section 2. Section 3 presents the data and simulation samples used in this search. The reconstruction of objects used in the search for a VLL signal is delineated in Section 4. Section 5 describes techniques used to perform the event selection, while Section 6 outlines the method used to estimate the backgrounds. A discussion of the systematic uncertainties is given in Section 7. The statistical method used to arrive at the 95% CL upper limit on the VLL production cross section, and hence the mass exclusion region, is described in Section 8. Finally, the analysis and results are summarized in Section 9.

ATLAS detector
The ATLAS detector [38] at the LHC covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range | | < 2.5. The high-granularity silicon pixel detector covers the vertex region and provides up to four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [39,40]. It is followed by the silicon microstrip tracker (SCT), which usually provides 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the center of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to | | = 2.0. The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimized for electromagnetic and hadronic energy measurements, respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. A set of precision chambers covers the region | | < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the particle flux is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [41]. The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [42] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and simulation samples
The data sample used in this article corresponds to an integrated luminosity of 139 fb −1 of pp collisions at a centre-of-mass energy ( √ ) of 13 TeV collected by the ATLAS detector during the 2015-2018 data-taking periods after requiring stable LHC beams and that all detector subsystems were operational [43]. The primary luminosity measurement was performed using the LUCID-2 detector [44]. Candidate events are required to satisfy at least one of the dilepton triggers (ee, , e ) [41,45,46]. These triggers have looser identification and isolation requirements than the single-lepton triggers but have comparable signal efficiency for events satisfying the analysis selection. The lowest T threshold for the leading lepton range from 12 to 22 GeV and for the sub-leading lepton it ranged from 8 to 17 GeV.
To evaluate the effects of the detector resolution and acceptance on the signal and background, and to estimate the SM backgrounds, simulated event samples were produced using dedicated event generators. The detector response to the final-state particles was then modelled using a Geant4-based Monte Carlo (MC) detector simulation [47,48]. The simulated data must account for the fact that significantly more than one inelastic pp collision occurs per bunch crossing, with the average number ranging from 13 to 38 for the 2015-2018 data-taking periods, respectively. Inelastic collisions were simulated using Pythia 8.186 [49] with the A3 set of tuned parameters [50] and the NNPDF2.3lo [51] set of parton distribution functions (PDFs), and overlaid on the signal and background MC samples. These simulated events were reweighted to match the conditions of the collision data, specifically the number of additional pp interactions (pile-up).
The tt+ , t+Z, t+WZ, ttt, tttt and tt+WW events were generated by MadGraph5_aMC@NLO 2.3.3 [52] at NLO, using the NNPDF3.0nlo PDF set for tt + , t+ Z, t+ WZ, ttt and tt + WW, and the NNPDF3.1nlo PDF set for tttt. Parton hadronization and showering was performed by Pythia 8.210 using the A14 tune and the NNPDF2.3lo PDF set. A summary of simulated signal and background samples is provided in Table 1. 2 The ℎ damp parameter is a resummation damping factor and one of the parameters that controls the matching of Powheg matrix elements to the parton shower. It effectively regulates the high-T radiation against which the tt system recoils.

Object reconstruction
All events used in this analysis are required to contain a primary vertex [71]. It is selected as the pp collision vertex candidate with the highest sum of the squared transverse momenta of all associated tracks with T > 500 MeV. In addition, there must be at least two tracks associated with that vertex.
Electron candidates are reconstructed from energy clusters in the EM calorimeter that match a reconstructed track [72]. They are required to have T > 30 GeV in order to suppress the number of fake electrons. In addition, they are required to be within | | < 2.47 with the region 1.37 < | | < 1.52 being excluded since it contains a significant amount of non-sensitive material in front of the calorimeter. To match these objects to the primary vertex, the track's transverse and longitudinal impact parameters ( 0 and 0 , respectively) are required to satisfy | 0 |/ ( 0 ) < 5 and | 0 sin | < 0.5 mm. Furthermore, through the application of one of several working points, these candidates must satisfy object identification criteria [72]. Each working point offers a trade-off between identification efficiency and misidentification rate. A likelihood-based discriminant is constructed from a set of variables that enhance electron selection, while rejecting photon conversions and hadrons misidentified as electrons [72]. An -and T -dependent selection is applied to the likelihood discriminant in order to define specific working points. For this search, the Tight likelihood working point is used, which has a 75% efficiency at T = 30 GeV, increasing to 88% at T = 100 GeV [72], when used to identify electrons from Z-boson decays. Electrons are also required to be isolated using criteria based on ID tracks and topological clusters in the calorimeter; the Loose isolation working point is applied and has an efficiency of approximately 99% [72]. Correction factors are applied to simulated electrons to take into account the small differences in reconstruction, identification and isolation efficiencies between data and MC simulation.
Muon candidates are reconstructed by combining a reconstructed track from the ID with one from the muon spectrometer [73], with the requirement that T > 20 GeV and | | < 2.5. In addition, the transverse and longitudinal impact parameters of the track are required to satisfy | 0 |/ ( 0 ) < 3 and | 0 sin | < 0.5 mm.
To reject misidentified muon candidates, primarily originating from pion and kaon decays, several quality requirements are imposed on the muon candidate. The Medium working point is used to select muons with T < 300 GeV and the HighPt working point is used for those with T > 300 GeV [73]. This choice ensures a 97% efficiency at T = 30 GeV for the Medium working point, and 76% at T = 500 GeV for the HighPt working point [73]. An isolation requirement based on ID tracks and topological clusters in the calorimeter is imposed. The TightTrackOnly isolation working point is used, resulting in an efficiency between 94% and 99% for muons from W-boson decays in simulated tt events [73]. Similarly to electrons, correction factors are applied to simulated muons to account for the small differences between data and simulation.
Particle-flow (PFlow) jets within | | < 4.5 are reconstructed using the anti-algorithm [74,75] with a radius parameter = 0.4 [76], using neutral PFlow constituents and charged constituents associated with the primary vertex as input [77]. These jets are then calibrated to the particle level by applying a jet energy scale derived from simulation [78]. Furthermore, in situ corrections based on the collected data are applied [78]. A cleaning procedure is used to identify and remove jets arising from calorimeter noise or non-collision backgrounds. To suppress jets arising from pile-up, a discriminant called the 'jet vertex tagger' (JVT) is constructed using a two-dimensional likelihood method [79]. The Medium JVT working point is used, which has an average efficiency of 95%. Jets used in this analysis are required to have T > 20 GeV and | | < 2.5. Jets originating from b-quarks are identified using the DL1r b-tagging algorithm [80,81]. The working point used corresponds to a b-tagging efficiency of 77% [80,81], measured in a sample of simulated¯events. The corresponding rejection factors are approximately 130, 5, and 14 for light-quark and gluon jets, c-jets, and hadronically decaying -leptons, respectively. Correction factors are applied to the simulated jets to take account of the small differences in reconstruction and identification efficiencies, and the energy scale and resolution differences, between data and MC simulation.
The event selection for this analysis considers only those -leptons that decay into final states containing a -neutrino and one-or-more hadrons, denoted by had . Since the escapes the detector volume undetected, only the hadronic decay products consisting of one or three charged hadrons and up to two neutral pions are visible and they are denoted by had-vis . Reconstruction of had-vis [82,83] is seeded by jets reconstructed from topological clusters by the anti-algorithm with the radius parameter set to 0.4. The tracks associated with the jet are required to originate from the primary vertex by satisfying the impact parameter requirements | 0 | < 1 mm and | 0 sin | < 1.5 mm. If these requirements are satisfied, the tracks are then required to be within a cone of size Δ = 0.2 around the jet axis, surrounded by a conical isolation region covering 0.2 < Δ < 0.4, in order to be considered a had-vis candidate. The direction of the had-vis candidate in ( , ) is calculated as the vector sum of the topological clusters within Δ = 0.2 of the jet axis, using the had-vis vertex as the origin. A multivariate discriminant is used to select tracks that were produced by charged had decay products [82,83]. Reconstructed had-vis objects are selected for the analysis if they have exactly one or three associated tracks (1-or 3-prong) with a total charge equal to ±1. The had-vis objects must also satisfy the requirements T > 20 GeV and | | < 2.47, excluding the region 1.37 < | | < 1.52. These requirements have an efficiency of about 85% for 1-prong and 70% for 3-prong had-vis objects [82,83] as estimated from simulated Z/ * → + − events. A multivariate regression technique trained on MC samples is used to determine the had-vis energy scale using information from associated tracks, calorimeter energy clusters, and reconstructed neutral pions [82,83] .
A recurrent neural network (RNN) classifier [84] is employed to select had -initiated jets and reject those initiated by quarks or gluons. The RNN is trained on simulated Z/ * → + − (for signal) and simulated dĳet events (for background). The training variables are single-track variables, and reconstructed kinematic and topological variables. This analysis uses the Medium working point with an efficiency of 75% (60%) for 1-prong (3-prong) candidates and a background rejection factor of 35 (240). A boosted decision tree [82,83] is used to reject electron backgrounds misidentified as 1-prong had-vis objects. Variables used for its training include information from the calorimeter, the tracking detector, and the visible momentum measured from the reconstructed tracks. The Tight working point with an efficiency of 75% is used. To assess the contribution from misidentified leptons (Section 6), less stringent (Loose) object identification requirements are applied. When selecting Loose had-vis there is an additional requirement for the RNN score to be at least 0.01. In addition, a dedicated muon-veto criterion is used to reject muons reconstructed as had-vis . Correction factors are applied to simulated had objects to take into account the small differences in reconstruction and identification efficiencies between data and MC simulation. The energy scale and resolution differences between data and MC simulation are also accounted for by applying scale factors. A summary of the lepton object definitions is presented in Table 2. Table 2: Object definitions used in the analysis for leptons. Tight objects are those selected for the nominal analysis and pass the prescribed selection requirements. Loose objects are those used to assess the contribution from the fake-lepton background and fail one or more selection requirements. The missing transverse momentum (with magnitude miss T ) [85] is reconstructed as the negative vector sum of the T of all the selected electrons, muons, jets, and had-vis . An extra track-based 'soft term' is built using additional tracks associated with the primary vertex, but not with any reconstructed object. The use of this track-based soft term is motivated by improved performance in miss T in a high pile-up environment.
To avoid cases where the detector response to a single physical object is reconstructed as two separate final-state objects, an overlap removal procedure is used. If electron and muon candidates share a track, the electron candidate is removed. After that, if the Δ , distance 3 between a jet and an electron candidate is less than 0.2, the jet is discarded. If multiple jets satisfy this requirement, only the closest jet is removed. For jet-electron distances between 0.2 and 0.4, the electron candidate is removed. If the distance between a jet and a muon candidate is less than 0.4, the muon candidate is removed if the jet has more than two associated tracks; otherwise the jet is removed. The had candidates are seeded from jets, so this procedure removes any ambiguity in their selection. 3 Δ , is the Lorentz-invariant distance in the rapidity-azimuthal-angle plane, defined as Δ , = √︃ (Δ ) 2 + (Δ ) 2 where the rapidity is = (1/2) [( + )/( − )].

Analysis strategy
Based on the production and decay modes of the ′ and ′ given in Section 1, the multilepton final states are expected to maximize the signal sensitivity, and hence the multilepton final states are used to search for the VLL signal. To further optimize the signal sensitivity, the different decay modes are targeted by splitting the data into seven training regions (given in Table 3) based on the numbers of light leptons and had , and a miss T requirement. In all regions at least one jet is required. Additional requirements are derived by maximizing the signal significance through the application of a BDT [36,37]. This leads to the seven signal regions (SRs) defined in Table 4.  . Every region has a requirement that it contain at least one jet ( jet > 0).

Variables
BDT Training Regions In addition to the seven SRs, three control regions (CRs) are used in order to normalize the dominant physics background (tt + Z, WZ, and ZZ) estimates to data, and a fourth CR is used to assess fake had objects originating from gluon-initiated jets and pile-up. These CRs are defined in Table 5. Since events in the CR do not have a -lepton, its kinematic variables are set to zero when calculating the BDT score. To confirm that the CRs are modelled correctly and that the obtained background normalization factors are also valid in the regions with different numbers of light leptons and -leptons, three validation regions (VRs) are defined, as also shown in Table 5. An additional seven VRs are used to confirm that the BDT models the data correctly. These VRs are selected where the BDT distribution is expected to primarily contain background events, as shown in Figure 5. The SRs, CRs, and VRs are selected such that there is no overlap of events between the regions. The BDT distribution shape in the CRs is shown in Figure 2, which demonstrates good agreement between data and the background simulation.
To classify the events as signal or background, the AdaBoost BDT algorithm [36,37] is used as implemented in the scikit-learn package [86]. The training and optimization of the BDT are performed in two steps. The first step is the optimization of the BDT hyperparameters and the second step is the optimization of the training through the selection of the variables used for the training.     Table 5: The definition of the CRs used to determine the normalization of the largest backgrounds, as well as the CR used to assess fake -leptons originating from gluon-initiated jets and pile-up. In addition, the VRs used to validate the CRs are also defined. Both the CRs and VRs are selected so that they do not overlap with the SRs or with each other, but are similar enough to avoid problems when extrapolating between the regions. Like the SRs, the CRs and VRs have the requirement that jet > 0. A dash indicates that the variable is not used for the corresponding CR or VR.

Control Regions
Validation To optimize the hyperparameters (maximum tree depth, maximum features per split, minimum samples per leaf, minimum samples per split, number of estimators, and learning rate) the set of 34 kinematic and topological variables listed in Table 6 were used. These variables are chosen as they are expected to provide good separation between the background and signal topologies. To avoid biasing the search by selecting a specific mass, the simulated signal samples described in Section 3 are combined with equal weight for ′ set to 800, 900 and 1000 GeV to train the BDT. All of the background samples are considered in the training, including all fake leptons. A scan over the hyperparameters is performed using the 5-fold cross-validation procedure to train the BDT, where the simulated data samples are split into five equal randomized samples, with four being used in the training and the fifth used as a testing sample. The training is performed five times so that each of the five samples is used as a testing sample. The set of hyperparameters with the highest receiver operating characteristic (ROC) score is selected.
To select the optimal set of training variables, a BDT is trained in each analysis region defined by lepton multiplicity using the full set of 34 variables. The ranking of the variables is evaluated using the procedure provided by the scikit-learn package. The lowest-ranked training variable is then removed and the BDT is retrained. This procedure is repeated until the ROC score decreases by more than 1%. This leads to each SR having its own unique set of training variables, which are listed in Table 7.
The final training variables are individually compared with data to confirm that they are well modelled. To avoid any bias in the analysis, only events with a BDT score not satisfying the SR criteria (Table 4) are used in the comparison, since they are background dominated. Through the use of a 2 test, a probability of > 5% is found for agreement between data and simulation for all variables used in the BDT training. Figure 3 shows the distributions of some highly ranked variables in each of the BDT training regions.

Background estimation
There are two basic categories of backgrounds to the signal. One category, the irreducible backgrounds, is defined by those processes that yield the same final state as the signal. The other category, the reducible backgrounds, is defined by those that mimic the final state because of misidentified leptons or non-prompt leptons as well as misidentified lepton charge. The irreducible backgrounds are estimated from the simulated samples discussed in Section 3.
Similarly to Ref.
[87] the simulated background in the WZ CR is found not to agree with data when examined as a function of the number of jets in the event. Since inverting the BDT score criteria for this CR yielded a similar mismodelling, the data in this region are used to calculate a scale factor to correct the MC simulation to the data in the WZ CR, which then agreed with data.
Charge misidentification for electrons arises from photon conversions or bremsstrahlung, and its rate is challenging to describe through detector simulation. Therefore, scale factors are derived and applied to simulated background events to match the charge misidentification probabilities observed in data. The scale factors are derived using a → + − data sample and are parameterized as a function of T and [72].
The reducible backgrounds from misidentified leptons (electron, muons and had ) are estimated from data by using the fake-factor (FF) method to derive a transfer function from a background-dominated region to the SR [88]. The transfer function is the ratio of events passing the tight lepton selection to events passing a loose lepton selection in the background-dominated CR. The FF is used to estimate the probability of an object being either a misidentified lepton or a non-prompt lepton from an in-flight decay. The FF is calculated as a function of the lepton T .  Δ between the leading T light lepton and -lepton in the event Δ ( 1 1 ) Δ between the leading T jet and -lepton in the event         Figure 3: A comparison of the total background and signal distributions for ′ = 800, 900 and 1000 GeV in variables that are highly ranked in the BDT training. The signal distribution is scaled by the value indicated in the legend. The background prediction is taken after performing the fit described in Section 8, while the signal prediction is taken before the fit. The hatched band represents the combined statistical and systematic uncertainties. The last bin contains overflow events. The arrows in the ratio plot are for points that are outside the range. (a) ℓ for the 2 ℓ SSSF, 1 training region (b) T ( 1 ) for the 2 ℓ SSOF, 1 training region (c) T + T for the 2 ℓ OSSF, 1 training region (d) T + miss T for the 2 ℓ OSOF, 1 training region (e) T ( 1 ) for the 2 ℓ, ≥ 2 training region (f) ℓ for the 3 ℓ, ≥ 1 training region (g) T for the ≥ 4 ℓ, ≥ 0 training region. Non-prompt or fake light leptons can originate from decays of bottom or charm hadrons, pion or kaon decays, jets misidentified as electrons, and electrons from photon conversions. To calculate the transfer function for electrons, the background CR requires exactly one electron with a loose selection and no other leptons, miss T < 40 GeV, jet ≥ 2, and no reconstructed b-jets. The background CR for muons requires exactly one muon with a loose selection and zero electrons and -leptons, miss T < 40 GeV, and at least two jets with the leading jet T > 35 GeV.
Since only had decays are selected, misidentified -leptons originate from jets that can arise from several sources: light-and heavy-flavour jets, gluon radiation, and jets from pile-up. The SRs are dominated by lightand heavy-flavour jets misidentified as -leptons. Therefore, FFs are calculated independently in dedicated CRs corresponding to light-and heavy-flavour jets. The Z + jets FF CR uses a Z + jets sample, which is enriched in light-flavour fake -leptons with a requirement of Z → + − decay and | − | < 15 GeV.
In addition, the sample is required to have zero reconstructed b-jets and miss T < 60 GeV. The tt FF CR is a dilepton tt sample, enriched in heavy-flavour fake -leptons with a requirement of exactly two light leptons that satisfy | ℓℓ − | > 10 GeV, exactly one -lepton that satisfies the loose criteria, and at least two jets with at least one satisfying the b-jet requirements. The FF is calculated separately for 1-and 3-prong -leptons. The FFs calculated in the Z + jets and tt CRs are combined by performing a template fit of the BDT score in each training region. The fit uses MC 'truth'-matched fake -leptons to compare the fake composition in the analysis region and the Z + jets and tt CRs. The modelling of the background originating from fake leptons is verified by performing closure tests in the regions where the FFs are derived as well as by comparing the background estimates to data in the VRs.

Systematic uncertainties
The systematic uncertainties considered in this analysis come from instrumental and theoretical sources, affecting both the overall event yield and the shape of the distribution. They are evaluated by varying each source around its nominal value as described below.
The uncertainties in the theoretical production cross sections used to simulate the background events are calculated following the same approach as in Ref. [89]. These uncertainties are assigned to every background process whose normalization is not determined by the fit. These backgrounds come from Z + jets, WW, triple vector boson, and top-quark processes (excluding tt + Z).
Uncertainties from missing higher-order contributions are evaluated [90] by applying seven independent variations of the QCD factorization and renormalization scales by factors of one-half and two in the matrix elements after removing combinations that differ by a factor of four. The effect of the uncertainty in the strong coupling constant s ( ) = 0.118 as well as the uncertainties in the nominal PDF set, used in the generation of simulated events, is evaluated by following the PDF4LHC recommendations [91]. In addition, the modelling uncertainty due to the choice of generator for tt + Z production is evaluated by comparing samples from the nominal generator MC@NLO + Pythia 8 and the alternative generator Sherpa 2.2.1. This gives an uncertainty of up to 7.9% in the tt + Z background estimate.
The uncertainty in the full integrated luminosity as obtained from the LUCID-2 detector is 1.7% [44] and is applied to the background and signal processes that are normalized to the theoretical predictions. Uncertainties associated with the pile-up reweighting procedure range from 0.2% to 3.5% [92].
Instrumental uncertainties are evaluated for each object that is considered in the analysis. For the selected leptons, they originate from the reconstruction and identification efficiency, the energy (momentum) scale and resolution, and the isolation efficiency [72,73,93]. The uncertainty related to trigger efficiencies is also included. For jets, the uncertainties originate from the jet energy scale and resolution [77,94] Uncertainties from several sources are evaluated for the FF derived for -leptons. Most notably, to account for the limited numbers of simulated events in the regions used to determine the composition of fake -lepton samples, the uncertainty of the fitted fractions from the template fit to the Z + jets and tt FF CRs, is evaluated as the difference between the two values 0% and 100%, for each SR. Furthermore, a systematic uncertainty to account for the gluon-and pile-up-initiated fake -lepton contribution is applied. To assess this uncertainty a new CR is defined with a less restrictive RNN score requirement on the loose had-vis in the Z + jets FF CR, which increases this contribution from roughly 40% to 60% in the CR. The systematic uncertainty in the estimated number of fake -leptons is then taken as the difference between applying the nominal FF in each SR and the FF calculated with the less restrictive RNN requirement. The nominal FF is calculated by applying a minimum RNN requirement of 0.01, while for the assessment of its systematic uncertainty a minimum value of 0.005 is used. Since gluon-and pile-up-initiated fake -leptons typically have low T , the -lepton FF systematic uncertainties are split into <40 GeV and >40 GeV regions and assessed independently. The gluon-and pile-up-initiated contribution is only taken for fake -leptons with T less than 40 GeV, and the large uncertainty of 28% is constrained in the Fake had CR (Table 5), which is a highly populated region dominated by gluon-and pile-up-initiated fake -leptons. However, the gluonand pile-up-initiated fake -lepton fraction is less than 20% in the SRs, where the BDTs tend to select -leptons with T greater than 40 GeV, so this uncertainty does not have an impact on them. Additionally, uncertainties are derived for the -lepton FFs by accounting for the numbers of events in the Z + jets and tt FF CRs.

Results
In order to test for the presence of a VLL signal, the BDT score templates for signal and background events are fitted to the data using a binned maximum-likelihood (ML) approach in the RooFit and RooStats frameworks [96,97]. The normalizations to data of the main background component modelled in the CR templates (WZ, ZZ, tt + Z), and SR VLL templates are allowed to vary without constraints. The other backgrounds' normalizations are assigned Gaussian constraints based on their normalization uncertainties. In addition, the systematic uncertainties are included in the fit as nuisance parameters with correlations across regions and processes taken into account. To quantify the statistical significance of the fit and its resulting power to reject the background-only hypothesis, a test statistic is constructed using the profile likelihood ratio [98].
After performing the simultaneous fit of the seven SRs and four CRs to the data, the fitted normalization factors relative to the theoretical expectations for the main backgrounds are 1.06 ± 0.14 for WZ, 1.02 ± 0.07 for ZZ, and 1.18 ± 0.18 for tt + Z. Figure 4 shows a comparison between the data and the signal and background yields for the SRs, CRs, and corresponding VRs. Figure 5 shows the templates and data versus BDT score for each of the analysis regions after applying the selection requirements in Table 4 but before the SR's BDT requirement is applied. The regions with BDT score values less than the SR requirement are used as VRs. In all SRs the number of observed events is compatible with the background hypothesis. Tables 8 and 9 show the total background and signal yields in all SRs and CRs after fitting to data.  Table 10 shows the impact of each source of systematic uncertainty on the signal strength in the signal-plus-background fit. Signal strength is defined as the ratio of the signal cross section estimated using the data to the predicted signal cross section. The nuisance parameters are grouped according to their origin. To evaluate the impact of each source of systematic uncertainty, the source is removed from the full fit and the signal strength and its uncertainty are recalculated. The square of the impact is defined as the decrease in the squared signal-strength uncertainty. The nuisance parameters associated with the background normalization have the highest impact on , while the systematic uncertainties associated with the fake-lepton estimation have the second-highest impact.
Using the VLL doublet model in Refs. [32,33], the predicted significance of the signal is expected to be greater than 5 standard deviations for ′ in the range from 130 to 600 GeV and greater than 3 standard deviations for values of ′ up to 800 GeV. Above this mass the expected significance decreases to ≈ 1 standard deviation at 1 TeV. The observed significance is found to be ⪅ 1 standard deviation over the entire ′ range probed, as shown in Table 11.
No significant deviation from the SM prediction is observed. Therefore, the 95% CL exclusion limit on the VLL production cross section as a function of ′ is calculated and shown in Figure 6. In order to estimate the 95% CL upper limit on the VLL cross section, a simultaneous binned likelihood fit of the seven SRs and four CRs is performed, using the CL s method [99] with the asymptotic approximation. The expected limit is shown with the black dashed line, and the shaded regions correspond to its one and Table 9: Total observed yields as computed by the fit for control regions. The uncertainty contains both the systematic and statistical uncertainties. The prediction for each background sample is taken after a likelihood fit is performed to measure the VLL production cross section. Background normalization factors are also applied. The 'Other Top' sample includes contributions from single top, tt, tt + W, t+ Z, t+ WZ, tt + H, tt + WW, tttt, and ttt. The prediction from the signal samples is taken before the likelihood fit is performed. The background contributions may not add up to equal the total background due to rounding.             Figure 5: The post-fit BDT score distributions of data, background, and pre-fit signal modelling. The arrows indicate the point where there is a break between the regions and indicate the SRs used in the likelihood fit. The remaining distribution is treated as a VR and is not used in the fit (with the exception of (c) 2 ℓ OSSF, where the low BDT score region is used as a CR and is included in the fit); however, fitted nuisance parameters are propagated to these regions. The uncertainty bands contain both the systematic and statistical components post-fit. The first bin contains underflow events and the last bin contains overflow events. The arrows in the ratio plot are for points that are outside the range. (a) 2 ℓ SSSF, 1 ; (b) 2 ℓ SSOF, 1 ; (c) 2 ℓ OSSF, 1 ; (d) 2 ℓ OSOF, 1 ; (e) 2 ℓ, ≥ 2 ; (f) 3 ℓ, ≥ 1 ; (g) ≥ 4 ℓ, ≥ 0 .

1.7
two standard-deviation uncertainty bands. The observed 95% CL exclusion limit is shown with the solid black line. The expected lower limit on ′ is found to be 970 GeV, and the observed limit to be 900 GeV, by comparing the NLO theory prediction with the expected and observed 95% CL cross-section limits.
Observed Limit Figure 6: The 95% CL exclusion limit on the VLL production cross section as a function of VLL mass. The black dashed line represents the expected limit while the shaded regions are its one and two standard-deviation uncertainty bands. The solid black line is the observed limit as a function of VLL mass. The red curve is the NLO theory prediction along with its uncertainty.

Conclusions
A search for vector-like leptons in a doublet model is performed using 139 fb −1 of pp collision data recorded at √ = 13 TeV by the ATLAS detector at the LHC. The search is performed using events with final states containing multiple light leptons and had . Observing no excess of events above the SM expectation, a 95% CL upper limit on the cross section is calculated using the CL s method. Using a doublet model where the vector-like leptons couple to the third-generation SM leptons, the observed mass range from 130 to 900 GeV is excluded at the 95% CL, while the highest excluded mass is expected to be 970 GeV.