Search for pair-produced vector-like top and bottom partners in events with large missing transverse momentum in pp collisions with the ATLAS detector

A search for pair-produced vector-like quarks using events with exactly one lepton (e or μ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}), at least four jets including at least one b-tagged jet, and large missing transverse momentum is presented. Data from proton–proton collisions at a centre-of-mass energy of s=\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=$$\end{document}13 TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {TeV}$$\end{document}, recorded by the ATLAS detector at the LHC from 2015 to 2018 and corresponding to an integrated luminosity of 139 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document}, are analysed. Vector-like partners T and B of the top and bottom quarks are considered, as is a vector-like X with charge +5/3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+5/3$$\end{document}, assuming their decay into a W, Z, or Higgs boson and a third-generation quark. No significant deviations from the Standard Model expectation are observed. Upper limits on the production cross-section of T and B quark pairs as a function of their mass are derived for various decay branching ratio scenarios. The strongest lower limits on the masses are 1.59 TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {TeV}$$\end{document} assuming mass-degenerate vector-like quarks and branching ratios corresponding to the weak-isospin doublet model, and 1.47 TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {TeV}$$\end{document} (1.46 TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {TeV}$$\end{document}) for exclusive T→Zt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T \rightarrow Zt$$\end{document} (B/X→Wt\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B/X \rightarrow Wt$$\end{document}) decays. In addition, lower limits on the T and B quark masses are derived for all possible branching ratios.


Introduction
The fine-tuning or naturalness problem [1] in particle physics arises from loop corrections to the Higgs boson mass that are quadratically divergent.If the Standard Model were complete, those would lead to corrections to the Higgs mass of the order of the Planck scale, 10 19 GeV, and a finely tuned bare Higgs mass of a similar value is needed to arrive at the measured mass of about 125 GeV [2].
Vector-like quarks (VLQs) [3][4][5][6] could dampen the unnaturally large quadratic corrections to the Higgs boson mass by contributing significantly to loop corrections.They are hypothetical spin-1/2 coloured particles whose left-handed and right-handed states have the same electroweak coupling.They appear in a number of theories beyond the Standard Model (SM) of particle physics, mainly in the 'Little Higgs' [7][8][9] and 'Composite Higgs' [10,11] classes of models.
At the Large Hadron Collider (LHC), VLQs could be produced singly via electroweak interactions or in pairs mainly via the strong interaction.While the cross-section for the latter depends only on the VLQ mass, the former has an additional dependence on the unknown coupling strength between the electroweak bosons and the VLQ.VLQs are expected to preferentially couple to third-generation quarks [3,12].Therefore the up-type VLQ  with charge +2/3 is in the following assumed to only have the three possible decay modes  → ,  → , and  →  .Similarly, the down-type VLQ  with charge −1/3 can decay into  , , or .Vector-like  quarks with charge +5/3 also appear in multiplets with  partners [4,13] and decay via  →  only.
This analysis investigates all possible decay modes and combinations of branching ratios for the pairproduced vector-like  (VLT) quark and  (VLB) quark, shown in Figure 1.However, it is most sensitive to the  →  and  →  decays.Since the analysis does not distinguish between particles and antiparticles, the limits for  →  also apply to the vector-like  quark, given that it exclusively decays Figure 1: Representative Feynman diagrams for (a)  T and (b)  B production and decay.In the analysis, no distinction is made between particles and antiparticles, leading to sensitivity to the  X →  final state as well.
into .Particular combinations of branching ratios correspond to the weak-isospin singlet and doublet models.For  and  quarks, the branching ratio for each decay mode depends on the VLQ mass and weak-isospin quantum numbers [4].The branching ratios given in the following are for VLQ masses above 800 GeV, where they are approximately independent of the VLQ mass.For a singlet , all decay modes have sizeable branching ratios (B (, ,  ) ≈ (0.25, 0.25, 0.5)), whereas if  is in either an (, ) doublet or a (, ) doublet, it decays only into  or  with equal branching ratios as long as the generalised Cabibbo-Kobayashi-Maskawa (CKM) matrix elements fulfil |   | ≪ |  | [4].Similarly, for a singlet  the branching ratios of all decay modes are sizeable (B ( , , ) ≈ (0.25, 0.25, 0.5)), while for the (, ) doublet scenario with |   | ≪ |  | the  →  decay is the only possibility.
The analysis is based on a final-state signature with high missing transverse momentum  miss T , one lepton ℓ ( or ), and at least four jets including a -tagged jet.The previous ATLAS search in this final state is based on a subset of the Run 2 data [16], yielding lower limits on the  quark mass of 1. 16 TeV for B ( → ) = 100% and 0.87 TeV (1.05 TeV) for the  in the singlet (doublet) model.Here the analysis is extended mainly by also investigating vector-like  quarks and by using neural networks (NNs) trained at several branching ratios in order to separate signal from background, instead of using a cut-and-count analysis with a single signal region (SR).The training is done separately for  and  in a common training region using simulated events.The SRs are each defined by a subset of the training region passing a selection on the corresponding NN output.Control regions (CRs) are defined so as to be enriched in the various background processes.They are orthogonal to the training region, and thus to the SRs, and orthogonal to each other.The statistical interpretation is based on a simultaneous fit to the CRs and SR for  or , in which the normalisations for  t, +jets and single-top-quark backgrounds and a possible signal contribution are determined.

ATLAS detector
The ATLAS experiment [27] at the LHC is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4 coverage in solid angle. 1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer.The inner tracking detector covers the pseudorapidity range || < 2.5.It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors.Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity.A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (|| < 1.7).The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to || = 4.9.The muon spectrometer surrounds the calorimeters and is based on three large superconducting air-core toroidal magnets with eight coils each.The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector.The muon spectrometer includes a system of precision tracking chambers and fast detectors for triggering.A two-level trigger system is used to select events.The first-level trigger is implemented in hardware and uses a subset of the detector information to accept events at a rate below 100 kHz.This is followed by a software-based high-level trigger (HLT) that reduces the accepted event rate to 1 kHz on average depending on the data-taking conditions.An extensive software suite [28] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and simulated event samples
The analysis uses data from proton-proton ( ) collisions at √  = 13 TeV recorded with the ATLAS detector at the LHC in the years 2015 to 2018.The dataset, collected during stable beam conditions and with all detector subsystems operational [29], corresponds to an integrated luminosity of 139 fb −1 with an uncertainty of 1.7% [30].At the high luminosities reached at the LHC, events are affected by additional inelastic   collisions in the same or neighbouring bunch crossings, referred to as pile-up.The average number of interactions per bunch crossing was 33.7.Events were selected online during data-taking by  miss T triggers [31] with an  miss T threshold of 70 GeV in the HLT in 2015 and a threshold rising from 90 GeV to 110 GeV during the later years.
Monte Carlo (MC) simulated events are used for the modelling of the background processes and the VLQ signals.Details of the simulated nominal samples, including the matrix-element (ME) generator and the parton distribution function (PDF) set, the parton shower (PS) and hadronisation model, and the set of tuned parameters (tune), are summarised in Table 1.
The generated events were processed through a simulation [32] of the ATLAS detector geometry and response using Geant4 [33].A faster simulation, which employed a parameterisation of the calorimeter response, was used in some cases to estimate systematic uncertainties.In these cases, the systematically varied samples were compared with versions of the nominal samples that were also processed through 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe.The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards.Cylindrical coordinates (, ) are used in the transverse plane,  being the azimuthal angle around the -axis.
The pseudorapidity is defined in terms of the polar angle  as  = − ln tan(/2).Angular distance is measured in units of the fast simulation.In order to model pile-up effects, minimum-bias   interactions were generated with Pythia 8.186 [34] using the A3 [35] set of tuned parameters and overlaid on the simulated hard-scatter events.The resulting events were weighted to match the pile-up profile of the recorded data.
Finally, the simulated events were reconstructed using the same software as the collision data.Corrections were applied to the simulated events in order to match object identification efficiencies, energy scales and resolution to those determined from data in auxiliary measurements.
Signal samples for the pair-production of vector-like  and  quarks were generated at leading order (LO) with Protos v2.2 [36] using the NNPDF2.3loPDF [37] set, interfaced with Pythia 8.186 to model the parton shower, hadronisation, and underlying event.Using the narrow-width approximation, the samples were produced for masses from 800 GeV up to 2 TeV, with a mass spacing of 100 GeV from 1 TeV to 1.8 TeV.The chirality-dependent couplings of the VLQs were set to those in the weak-isospin singlet model, but with equal branching ratios into the three decay modes (, ,  ) for the vector-like  quark and ( , , ) for the vector-like  quark.Dedicated signal samples in the doublet model were produced for the 1.2 TeV mass point in order to test for potential kinematic biases from the assumed singlet couplings.For the  quark, this choice is conservative because the acceptance is higher in the doublet case, while for the  quark the acceptances are similar for singlet and doublet couplings.In order to obtain the desired branching ratios, an event-by-event reweighting based on generator information is performed.The signal sample cross-sections were calculated with Top++ 2.0 [38] at next-to-next-to-leading order (NNLO) in QCD including the resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms.
The production of  t events was modelled using the Powheg Box v2 [39][40][41][42] generator at next-to-leading order (NLO) with the NNPDF3.0nloset [43] of PDFs and the ℎ damp parameter2 set to 1.5   [44], with   = 172.5 GeV.The events were interfaced to Pythia 8.230.The cross-section was corrected to the theory prediction at NNLO including NNLL soft-gluon terms calculated using Top++ 2.0.
Samples of single-top-quark events were produced with the Powheg Box v2 generator at NLO in QCD using the NNPDF3.0nloset of PDFs with the five-flavour scheme for  production and -channel single-top production, and the four-flavour scheme for -channel single-top events.The  sample was modelled using the diagram removal scheme [45] to remove interference and overlap with  t production.The events were interfaced with either Pythia 8.230 or Pythia 8.235.The samples were normalised to their NLO QCD cross-sections [46,47], with additional NNLL soft-gluon terms for  production [48,49].
The production of +jets ( = , ) was simulated with the Sherpa 2.2.1 generator using NLO-accurate matrix elements for up to two partons, and LO matrix elements for up to four partons, calculated with the Comix [50] and OpenLoops [51][52][53] libraries.They were matched with the Sherpa PS [54] using the MEPS@NLO prescription [55][56][57][58] and the set of tuned parameters developed by the Sherpa authors.The NNPDF3.0nnlo set of PDFs was used and the samples were normalised to a NNLO prediction [59].
Samples of diboson final states () were simulated with the Sherpa 2.2.1 or 2.2.2 generator, depending on the process, including off-shell effects and Higgs boson contributions where appropriate.Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other one hadronically, were generated using matrix elements at NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional parton emissions.The matching of NLO matrix elements to the PS and the merging of different jet multiplicities was done in the same way as for +jets production.The NNPDF3.0nnlo set of PDFs was used, along with the Sherpa-internal tune.The diboson event samples were normalised to the total cross-section calculated by Sherpa at NLO in QCD.
The production of  t and  t events was modelled using the MadGraph5_aMC@NLO 2.3.3 [60] generator at NLO with the NNPDF3.0nloPDF set.The events were interfaced to Pythia 8.210.Similarly, the production of   events was modelled using MadGraph5_aMC@NLO 2.6.8 at NLO with the NNPDF3.0nloPDF set, interfaced to Pythia 8.244.The diagram removal scheme was employed to handle the interference between   and  t, and was applied to the   sample.Samples for the production of  t events were generated using the Powheg Box v2 generator at NLO with the NNPDF3.0nloPDF set, interfaced to Pythia 8.230.The generated samples for  t,  t,  , and  t production were normalised to NLO cross-section predictions calculated by MadGraph5_aMC@NLO.
All simulated samples, except those produced with the Sherpa [61] event generator, utilised the EvtGen [62] program to model the decay of heavy-flavour hadrons.While EvtGen 1.2.0 was used for the VLQ signal samples and the  t samples, EvtGen 1.6.0 was used in all other cases.For all nominal samples where Pythia 8 [63] was utilised for the showering and hadronisation, Pythia was used with the A14 [64] set of tuned parameters and the NNPDF2.3loset of PDFs.

Event reconstruction and object selection
Events are required to have at least one   collision vertex candidate with at least two associated tracks with transverse momentum  T > 0.5 GeV.The primary vertex is defined to be the vertex candidate with the largest scalar sum of transverse momenta of all associated tracks.In this analysis, electrons, muons, and jets are the calibrated physics objects used.For the charged leptons, two sets of quality and kinematic requirements are imposed, where the selection for signal leptons is tighter than for baseline leptons.
Electron candidates are reconstructed from energy deposits in the EM calorimeter matched to chargedparticle tracks in the ID.Baseline electrons are required to have  T > 10 GeV and to be reconstructed within || < 2.47, excluding the barrel-endcap transition region 1.37 < || < 1.52.They must fulfil 'loose' identification criteria, using a likelihood-based discriminant that combines information about tracks in the ID and energy deposits in the calorimeter system [65], and are required to have a hit in the innermost layer of the pixel detector.Furthermore, isolation requirements in both the calorimeter and the ID are imposed [65].An electron does not meet the isolation criteria if, after subtracting contributions from pile-up and the electron itself, the transverse energy deposited in the calorimeter within a surrounding cone of radius Δ = 0.2 exceeds 20% of the transverse energy of the electron.Similarly, electron candidates are excluded if the scalar sum of the transverse momenta of tracks within a cone of radius Δ = min(10 GeV/ T (), 0.2), excluding the track matched to the electron, is larger than 15% of the electron  T .In addition, each electron candidate's track must be matched to the primary vertex.This requires that the significance of its transverse impact parameter,  0 , satisfies | 0 |/  0 < 5, where   0 is the uncertainty in  0 , and that the longitudinal distance  0 from the primary vertex to the point where  0 is measured satisfies | 0 sin | < 0.5 mm.In order to suppress backgrounds due to hadrons misidentified as electrons, signal electrons must satisfy all baseline criteria and in addition 'tight' identification criteria [65], and have  T > 28 GeV.
Muon candidates are reconstructed by combining charged-particle tracks formed in the ID and in the muon spectrometer or by matching ID tracks to an energy deposit in the calorimeter compatible with a minimum ionising particle [66].Baseline muons are required to have  T > 10 GeV and || < 2.5, and to satisfy the 'loose' identification criteria [66].Track-to-vertex matching is ensured by requiring the muon track to satisfy | 0 |/  0 < 3 and | 0 sin | < 0.5 mm.Signal muons must satisfy 'medium' identification criteria and are required to have  T > 28 GeV.Additionally, signal muons must be isolated, requiring that the scalar  T sum of all tracks within a cone of radius Δ = min (10 GeV/ T (), 0.3) around the muon is less than 6% of the muon  T .
Small-radius (small-) jet candidates are built from particle-flow objects [67,68], using the anti-  algorithm [69,70] with a radius parameter of  = 0.4.The particle-flow algorithm combines information about tracks in the ID and energy deposits in the calorimeters to form the input for the jet reconstruction.
Jets are required to have  T > 25 GeV and || < 2.5.To reject jets originating from pile-up interactions, jet candidates with || < 2.4 and  T < 60 GeV are required to satisfy the 'tight' jet vertex tagger (JVT) criterion [71].Small- jets containing a -hadron decay are -tagged using a multivariate algorithm, called DL1r, operating at a tagging efficiency of 77 % as determined in simulated  t events [72,73].
An overlap removal procedure is applied to prevent double counting of ambiguous reconstructed objects, using the baseline lepton definitions.First, electron-muon overlap is handled by removing muons sharing a track in the ID with an electron if the muon is calorimeter-tagged, and otherwise removing the electron.Subsequently, overlap between jets and leptons is removed by rejecting any jets within Δ = 0.2 of an electron and afterwards rejecting any electrons within Δ = 0.4 of a jet.Similarly, jets are discarded if they have fewer than three associated tracks and are within Δ = 0.2 of a muon candidate.Otherwise, the muon is rejected if it lies within Δ = min(0.4,0.04 + 10 GeV/ T ()) of a jet.
The missing transverse momentum, with magnitude  miss T , is defined as the negative vectorial sum of the transverse momenta of all calibrated objects in an event, plus a track-based soft-term which takes into account energy depositions associated with the primary vertex but not with any calibrated object [74].
Finally, large-radius (large-) jets are constructed from the selected small- jets using the anti-  algorithm with  = 1.0.In order to reduce the impact of soft radiation, constituent small- jets with  T less than 5% of the large- jet  T are removed.These reclustered large- jets are required to have  T > 150 GeV and a mass larger than 50 GeV.

Event selection and categorisation
All events considered in this analysis must be selected by an  miss T trigger.Since the trigger thresholds were raised during Run 2, a requirement of  miss T > 250 GeV is imposed to ensure full trigger efficiency over all data-taking periods.Events are also required to have exactly one signal lepton (, ) and at least four small- jets of which at least one is -tagged.A veto on a second lepton, fulfilling the baseline requirements, is used to suppress  t events with two leptons in the final state.To reject events with  miss T arising from mismeasured jets, the azimuthal angle between the missing transverse momentum vector ì  miss T and both the leading (  1 ) and subleading (  2 ) jets, ordered in  T , must satisfy the condition |Δ(   , ì  miss T )| > 0.4, with  ∈ {1, 2}.In addition, events must have a transverse mass   T > 30 GeV, where   T is defined as After applying these requirements, referred to as 'preselection' in Table 2, the dominant backgrounds come from  t and +jets production.A training region for the NNs is defined by applying further requirements listed in Table 2, which reduce the amount of background without decreasing the sensitivity to the signal.
Requiring   T to be well above the  boson mass peak,   T > 120 GeV, strongly reduces the +jets and semileptonic  t background.In these background processes the leptonic  boson decay is the only source of  miss T , while in the VLQ pair production additional sources of  miss T can arise from e.g., the  boson decay to neutrinos or the  boson decay to a hadronically-decaying tau-lepton and a neutrino.The remaining  t background originates from dileptonic  t events where one lepton is not detected.This type of  t background is suppressed by requirements on the asymmetric transverse mass,  T2 [75,76], which is a variant of the  T2 [77,78] variable.The latter is applied to signatures where two or more particles are not detected directly (e.g.dileptonic  t events, where the two neutrinos are not detected), and it is defined as In this formula,  T and  T are transverse masses calculated using two sets of one or more visible particles, denoted by  and , respectively, and all possible combinations of missing transverse momenta ì  T and ì  T , with ì  T + ì  T = ì  miss T .In the calculation of  T2 , the two sets of visible particles are asymmetric as they consist of the one identified signal lepton with one of the two jets with highest -tagging score on one hand and the other jet on the other.Given the two possible lepton-jet pairings the combination with the lowest  T2 is taken.For dileptonic  t events, the  T2 distribution has a kinematic endpoint at the top-quark mass, while additional sources of  miss T extend the distribution towards higher  T2 values.Events in the training region have to fulfil  T2 > 200 GeV.At least one hadronic decay of a high- T top quark or SM boson is expected for the considered signal.Thus, at least one large- jet is required in the training region.About 4% of the simulated signal events are reconstructed in the training region for  T or  B production with pure  →  and  →  decays and a VLQ mass of 1.2 TeV.The  t background, which is a major background in this analysis, is not modelled accurately at high transverse momenta [79,80].Therefore, a reweighting procedure, referred to as 'top reweighting' in the following, is applied.Reweighting factors are derived in bins of the jet multiplicity (4, 5, 6, ≥7) as a function of the effective mass  eff , defined as the scalar sum of the  T of all reconstructed objects and  miss T .The reweighting factors are determined for the sum of the  t and single-top backgrounds using their nominal prediction and are parameterised with a linear function.They are derived from a comparison between data and MC simulation in a dedicated top reweighting region (see Table 2), which is defined in the same way as the training region except for an inverted  T2 requirement.This requirement is strengthened to  T2 < 180 GeV in order to have a higher  t purity of about 90% and less signal contamination in the tails of the  eff distribution.The resulting reweighting factors are applied to  t and single-top-quark events in each of the defined analysis regions, which changes the event yields and leads to an improved modelling.This can be seen in Figure 2 which compares the data with MC simulation in the top reweighting region after reweighting, and also shows the MC expectation before reweighting.
Control regions are defined for the +jets and single-top-quark backgrounds so as to be enriched in the respective background and have negligible contamination from signal.Both CRs are defined to be orthogonal to the top reweighting region and the training region by modifying the requirement on   T , using a window of 30 GeV <   T < 120 GeV around the  boson mass.In order to reduce the  t background contribution in these regions, the large- jet multiplicity is required to be less than two, and if a large- jet is present its mass has to be below 150 GeV.For the +jets CR, the contribution from  t events is further reduced by selecting only events with exactly one -tagged jet.Since the cross-section for  + production is larger than for  − production, higher +jets purity is achieved by selecting only events with a positively charged lepton.In the single-top CR, the contribution from +jets is reduced by requiring at least two -tagged jets with an angular separation of Δ( 1 ,  2 ) > 1.4 between the two highest- T -jets.Table 2 summarises the selection criteria for both CRs, and Figure 3 compares the effective mass distribution in data with that in MC simulation after top reweighting, and also shows the MC expectation before reweighting.

Neural network training
To enhance the separation between signal and background events, NNs with several input variables combined into a single discriminant are employed.They are trained for various signal hypotheses using the simulated signal and background events in the training region.For  T production, four NNs are The NNs are implemented using the NeuroBayes package [81,82], which combines a three-layer feedforward NN with preprocessing of the input variables prior to their presentation to the NN.The main purpose of the preprocessing is to facilitate optimal network training by ordering the input variables according to their separation power, taking correlations into account, and removing all but the most relevant ones.Sets of input variables are selected for their ability to discriminate between signal and background.Table 3 lists the input variables that are used to train at least one NN.Each set of input variables is composed of observables reflecting the signal topology, e.g. the high VLQ mass via  eff or the properties of the reclustered large- jets.Other important variables are the -jet multiplicity and the transverse masses,   T and  T2 , that are used to define the CRs and training region.It is checked that all input variables are modelled well.As an example, the distributions of four important input variables in the training region are shown in Figure 4.
NeuroBayes uses Bayesian regularisation techniques for the training process to improve the generalisation performance and to avoid overtraining.In general, the network infrastructure consists of one input node for each input variable, plus one bias node, an arbitrary, user-defined number of hidden nodes, and one output node which gives a continuous NN output score ( out ) in the interval (0, +1), where  out values close to zero indicate background-like events and values close to one correspond to signal-like events.For the NNs in this analysis, 15 nodes are used in the hidden layer and the ratio of signal to background events     in the training is chosen to be 1:1.The different background processes are weighted according to their expected event contribution.All the main backgrounds,  t, +jets, single top quark, and  t, are used in the training.For the signal process, VLQ masses from 1 TeV to 1.5 TeV are combined in each training.Events at different signal masses enter with the same cross-section when composing the training sample in order to prevent the lower masses with higher cross-sections from dominating.As a check for potential overtraining, only 80% of the simulated events serve as input to the training, while the remaining 20% are used as a test sample.No signs of overtraining are observed.After the training step, all simulated signal and background events, as well as the observed data events, are processed by the NNs in order to get an  out value for each event.For each NN, the training region is divided into a low- out CR with  out < 0.5, and the SR with  out > 0.5.

Systematic uncertainties
Several sources of experimental and theoretical systematic uncertainty are considered.The experimental uncertainties are mainly related to the reconstruction and calibration of the final-state physics objects, while the theoretical uncertainties are associated with the modelling of the various processes by the MC event generators.The sources of the largest systematic uncertainties in the analysis are related to the modelling of the major background processes and to the jet energy resolution.
For  t and single-top production the following systematic uncertainties related to the event modelling are applied.The uncertainty in the matching procedure between the ME generator and parton shower is assessed by comparing the nominal Powheg+Pythia 8 samples with alternative samples generated by MadGraph5_aMC@NLO and showered by Pythia 8.In order to estimate the uncertainties in the modelling of the underlying event, the parton shower, and the hadronisation, the nominal samples are compared with a Powheg+Herwig 7 [83] prediction.Uncertainties related to the choice of renormalisation and factorisation scales of the matrix-element calculation are considered by independently doubling and halving the scales.The impact of initial-state radiation (ISR) is estimated by varying  s in the A14 tune.Similarly, the uncertainty related to final-state radiation (FSR) is assessed by varying the renormalisation scale for final-state parton-shower emissions by a factor of two.The uncertainty related to the choice of scale for the matching of the matrix-element calculation for the  t process to the parton shower is evaluated by comparing the nominal samples with an alternative sample produced with the ℎ damp parameter set to ℎ damp = 3.0   .Uncertainties due to PDFs are obtained using the PDF4LHC15 combined PDF set [84].A dominant systematic uncertainty in the modelling of the single-top processes stems from the handling of the interference between  t and  at NLO.The uncertainty is estimated by comparing the nominal sample for  production generated using the diagram-removal scheme with an alternative sample using the diagram-subtraction scheme [45,85].Finally, an additional 30% normalisation uncertainty is assigned to events from  t + heavy-flavour jets production [86].
Uncertainties in the top reweighting procedure arise from the chosen form of the parameterised function and the statistical uncertainties of events in the reweighting region.These are accounted for by varying the parameterised function by ±1 from its nominal value, using the uncertainties of the fit parameters and taking their correlations into account.Each of the four jet bins for which the reweighting is determined is treated as an independent source of uncertainty.
For all other considered processes, namely +jets, diboson,  t,  t, and   production, the renormalisation and factorisation scales are independently varied by a factor of two.A 30% uncertainty is assigned to the heavy-flavour component of the +jets process, based on a comparison between Sherpa 2.2.1 and data [87].
Backgrounds without a free-floating normalisation parameter in the profile-likelihood fit are assigned a theoretical cross-section uncertainty.For the  t process, an 11% [88] uncertainty is assigned and for  t and   the uncertainty amounts to 15% and 12% [88], respectively.The cross-section uncertainty is taken to be 6% [89] for diboson production and 5% [90] for the +jets process.
Besides theoretical systematic uncertainties, detector-related uncertainties are considered in the analysis, the dominant one being the jet energy resolution [68].Additional jet-related uncertainties are due to the jet energy scale, the jet mass scale and resolution, the efficiency of the JVT requirements, and the -jet identification [72,91].Uncertainties associated with leptons arise from the efficiencies of the lepton identification, isolation, and reconstruction, as well as the lepton energy scale and resolution [65,66].Further experimental uncertainties are related to the scale and resolution of the track soft-term in the  miss T calculation [92].Additional contributions to the total systematic uncertainty come from the uncertainty in the integrated luminosity and the pile-up profile.

Statistical analysis
The signal-enriched part of the binned NN output distribution ( out > 0.5), and the overall number of events in the low- out , +jets, and single-top CRs are used to test for the presence of a signal.For hypothesis testing, binned profile-likelihood fits are performed for each of the seven NNs separately, following a modified frequentist method [93] implemented in RooStats [94], and taking the systematic uncertainties affecting the signal and background expectations into account as nuisance parameters.
The binned likelihood function L (, ) is constructed as the product of Poisson probability terms over all bins.It depends on the signal strength parameter , a factor multiplying the theoretical signal production cross-section, and , a set of nuisance parameters, constrained in the likelihood function by Gaussian or log-normal priors.The low- out , +jets, and single-top CRs are used to mainly control the normalisations of  t, +jets, and single-top backgrounds, for which additional unconstrained normalisation factors (  t ,  +jets , and  single top ) are included in the likelihood function.The number of events expected in a bin depends on the normalisation factors as well as on the nuisance parameters.The nuisance parameters  adjust the expectations for signal and background according to the corresponding systematic uncertainties, and their fitted values correspond to the amounts that best fit the data.
In order to avoid double-counting of normalisation uncertainties for the free-floating background processes, only shape effects and acceptance differences between the CRs and the SR are included when considering the systematic uncertainties in their modelling.A smoothing algorithm is applied to the templates for the systematic variations in case the statistical fluctuations between bins in the signal region are large.Furthermore, the templates for all systematic variations are symmetrised.Some of the dominant systematic uncertainties related to the modelling of the major background processes are one-sided and are symmetrised by mirroring the uncertainty.To simplify the fitting procedure, nuisance parameters are only included for systematic uncertainties that affect the event yield by more than 1% for a process in any bin.Normalisation and shape components for a source of systematic uncertainty are treated separately in this procedure.
The test statistic   is defined as the profile likelihood ratio,   = −2 ln L (, θ)/L ( μ, θ), where μ and θ are the values of the parameters that simultaneously maximise the likelihood function, and θ are the values of the nuisance parameters that maximise the likelihood function for a fixed value of .The compatibility of the observed data with the background-only hypothesis is tested by setting  = 0 in the test statistic  0 .Upper limits on the signal production cross-section for each considered signal scenario are computed using   in the CL s method [93] with the asymptotic approximation [95].A given signal scenario is considered to be excluded at ≥ 95% confidence level (CL) if the value of the signal production cross-section (parameterised by ) yields a CL s value ≤ 0.05.

Results
Background-only likelihood fits are performed for each NN.The obtained normalisation factors for the  t, +jets, and single-top processes vary between the fits for the different NNs: between 1.00 ± 0.28 and 1.14 ± 0.27 for  t, between 0.91 ± 0.19 and 1.08 ± 0.17 for +jets, and between 0.53 ± 0.30 and 0.60 ± 0.23 for single top.The reduction of the single-top contribution appears large, but it is less than the difference between the nominal and alternative schemes for modelling the interference between the  t and  processes, which are described in Sections 3 and 7.The  out distributions after the fit are validated in the CRs by comparing data with simulation.As an example, the plots for the NN training with a VLT signal with B (, ,  ) = (0.8, 0.1, 0.1) and a VLB signal with B ( , , ) = (0.1, 0.1, 0.8) are shown in Figure 5.For the training with the VLT signal, the corresponding number of events expected from each process in the three CRs and also the signal region is shown in Table 4, together with the number of observed events and the expected signal yield for a mass of 1.2 TeV.The large uncertainty in the single-top yield due to the different schemes for modeling the interference can also be observed here.
The uncertainty in the total background is less than the uncertainty in the separate processes because of strong (anti-)correlations between various systematic uncertainties.
The  out distributions in the signal region for the two training cases shown in Figure 5, and for another training for a VLT signal with B (, ,  ) = (0.2, 0.4, 0.4), are shown in Figure 6.No significant deviations from the SM expectation are observed for these three cases or when using other trained NNs.
Upper limits on the pair-production cross-sections for  and  quarks are calculated at the 95% CL.For each signal mass and branching ratio, the NN giving the most stringent expected limit is selected.These obtained cross-section limits are compared with the theoretical cross-section to set exclusion limits on the signal mass.The limits are calculated for  and  quarks in the weak-isospin singlet and doublet representations, with mass-dependent branching ratios, as well as for pure  →  and  →  decays, where the latter corresponds to  →  as well as to the (, ) doublet as mentioned before.For the doublet scenarios, the contribution from the VLQ partner is either not considered, leading to conservative limits, or considered assuming mass-degenerate VLQs.Mass differences of at most a few GeV are allowed, so that decays from one member of the doublet to the other remain suppressed [4,6].Also for the mass-degenerate doublet scenario the seven NNs described in Section 6 are used and the one with the most stringent expected limit is selected as described above, i.e., no additional NN is trained to potentially increase the sensitivity to the added yield from both doublet members.
The expected and observed lower limits on the VLQ mass in the aforementioned models are listed in Table 5 and shown in Figure 7.The impact of the statistical uncertainties on the mass limits is larger than that of the systematic uncertainties.However, the latter is not negligible and for the case of pure  →  ( → ) decays it reduces the expected lower limit by about 40 GeV (70 GeV) to a value of 1.45 TeV (1.42 TeV).For the three -quark scenarios in Table 5, the obtained mass limits are 300 to 400 GeV higher than in the earlier ATLAS analysis in the same final state using a subset of the Run 2 data [16].This     ) and observed (Obs.)mass limits for the pair production of specific VLQs (, , ) in certain decay scenarios corresponding to SU(2) singlet or doublet representations or to the decay into just one specific final state.In the doublet scenarios, contributions from the VLQ partner are not considered, leading to conservative limits, except for the last row where the VLQs in the doublet are assumed to be mass degenerate.Since the analysis does not distinguish between particles and antiparticles, the limits for  →  also apply to the vector-like  because it decays exclusively into .Similarly, the (, ) doublet scenarios correspond to (, ) doublet scenarios.improvement is only partially due to the larger dataset, as the expected limits on the cross section for a VLT mass of 1.4 TeV improved between a factor of 4.5 for the pure  →  and a factor of 7.7 for the SU(2) singlet.Especially when the branching fraction into  becomes smaller the major effect stems from the training of neural networks at several branching ratios instead of using a cut-and-count analysis with a single SR as done previously.The obtained mass limits for the first five scenarios in Table 5 are also better than the corresponding limits in the combination of all ATLAS results using 36 fb −1 [26], apart from the  singlet scenario where the observed limit is weaker than the expected limit.The strongest lower limits on the VLQ masses, 1.59 TeV, are derived for the weak-isospin doublets assuming mass-degenerate VLQs.

VLQ
Apart from limits for specific models and branching ratios, lower limits on the signal mass are set as a function of the  and  branching ratios.The resulting expected and observed mass limits are shown in Figure 8.As expected, the highest sensitivity is found in the regions near B ( → ) = 100% and  B ( → ) = 100%.For the  quark, the sensitivity for the mixed  decay mode is larger than for the   decay.In the case of the  quark, the sensitivity decreases if the branching fraction for  decay into a Higgs or  boson and a bottom quark increases.The differences between the observed and expected limits for a vector-like  quark around the SU(2) singlet branching ratio are not significant, as can be seen in Figure 7(c).They result from the  out distribution obtained from the NN trained for a branching ratio B (, ,  ) = (0.2, 0.4, 0.4).In the last bin of the corresponding signal region in Figure 6(b), the data slightly exceeds the predicted SM background.Figure 8: Expected (left) and observed (right) mass limits for  T (upper row) and  B (lower row) production.The mass limit is calculated using the NN giving the most stringent expected limit at each signal mass and branching ratio point.The white lines indicate mass exclusion contours.The black markers indicate the branching ratios for the SU(2) singlet and doublet scenarios for masses above 800 GeV, where they are approximately independent of the VLQ mass.Since the analysis does not distinguish between particles and antiparticles, the mass exclusion for the  quark in the (, ) doublet is equivalent to the exclusion for the  quark in the (, ) doublet.The white areas indicate that the mass limit is below 800 GeV.

Conclusion
A search for pair-produced vector-like  and  quarks, with electric charge +2/3 and −1/3, respectively, is performed in events with exactly one isolated lepton, at least four jets including one that is -tagged, and high missing transverse momentum.The analysis is based on data collected by the ATLAS experiment in √  = 13 TeV proton-proton collisions at the LHC, corresponding to an integrated luminosity of 139 fb −1 .Several neural networks are trained for various branching ratios of the  and  quarks, assuming decays into a , , or Higgs boson and a third-generation quark.The analysis considers all possible decays of the vector-like quarks, but it is most sensitive to the  →  and  →  decay modes.Since the analysis does not distinguish between particles and antiparticles, the limits for  →  also apply to a vector-like  with electric charge +5/3.
No significant deviations from the Standard Model expectation are observed, and 95% CL upper limits on the pair-production cross-sections for  and  quarks as a function of their mass are derived for various decay branching-ratio scenarios.The lower limits on the masses of the  and  quarks in the weak-isospin singlet model are 1.26 TeV and 1.33 TeV, respectively, and 1.41 TeV for the  quark in the doublet representation.For the doublet, the contributions from the VLQ partner are not considered, leading to a conservative limit.Stronger lower limits of 1.47 TeV and 1.46 TeV are set on the masses when considering pure  →  and  →  decays, where the latter corresponds to the (, ) or (, ) doublet and also applies to  →  decays.For the three discussed -quark scenarios, the obtained mass limits are 300 to 400 GeV higher than in the earlier ATLAS analysis in the same final state using a subset of the Run 2 data.The strongest lower limits for ,  and  are at 1.59 TeV for (, ) and (, ) weak-isospin doublets where both VLQ partners are considered and assumed to be mass degenerate.Finally, lower limits on the  and  quark masses are derived for all possible branching ratios.
Catalunya and PROMETEO and GenT Programmes Generalitat Valenciana, Spain; Göran Gustafssons Stiftelse, Sweden; The Royal Society and Leverhulme Trust, United Kingdom.

Figure 2 :
Figure 2: Distributions of (a)  eff and (b) lepton  T in the top reweighting region after applying the reweighting factors to the  t and single-top background.The dashed line indicates the total background before the reweighting.The band includes statistical and systematic uncertainties.Minor background contributions from  t,  , and +jets are combined into Others.The ratios of the data to the expected background are shown in the bottom panels of the plots.The last bin in each distribution contains the overflow.

Figure 3 :
Figure 3: Distributions of  eff in (a) the +jets CR and (b) the single-top CR after applying the top reweighting factors to the simulated  t and single-top-quark events.The dashed line indicates the total background before the reweighting.Minor background contributions from  t,  , and +jets are combined into Others.The band includes statistical and systematic uncertainties.The ratios of the data to the expected background are shown in the bottom panels of the plots.The last bin in each distribution contains the overflow.

Figure 4 :
Figure 4: Distributions of NN input variables in the training region: (a)  eff , (b)   T , (c)  T2 and (d)  miss T .Minor background contributions from  t,  , and +jets are combined into Others.The signal distributions for  T →  t and  B →  t, assuming a VLQ mass of 1.2 TeV, are overlaid and normalised to the total background prediction.The band includes statistical and systematic uncertainties.The ratios of data to the expected background are shown in the bottom panels of the plots.The last bin in each distribution contains the overflow.

Figure 5 :
Figure 5: Data and background expectation in the +jets CR (left panels), the single-top CR (middle panels), and the low- out CR (right panels) after a background-only fit to data (Post-Fit) for a NN training considering a VLT signal with B (, ,  ) = (0.8, 0.1, 0.1) (upper panels) and a VLB signal with B ( , , ) = (0.1, 0.1, 0.8) (lower panels).Minor background contributions from  t,  , and +jets are combined into Others.The band indicates the post-fit uncertainty.The lower panels show the ratio of data to the background expectation.

Figure 6 :
Figure 6: Data and background expectation in the signal region after the simultaneous background-only fit to data (Post-Fit) for a NN training for (a) a VLT signal with B (, ,  ) = (0.8, 0.1, 0.1), (b) a VLT signal with B (, ,  ) = (0.2, 0.4, 0.4), and (c) a VLB signal with B ( , , ) = (0.1, 0.1, 0.8).Contributions from  t,  , and +jets are combined into Others.Expected pre-fit signal distributions with the signal branching ratio corresponding to the respective training are added on top of the background expectation, using a signal mass of 1.2 TeV.The band indicates the statistical and systematic uncertainties.The ratio of data to the background expectation is shown in the bottom panels of the plots.

Figure 7 :
Figure 7: Expected and observed upper limits on the signal cross-section for (a) the case B ( → ) = 100%, (b) the case B ( → ) = 100%, (c) a  quark in the SU(2) singlet representation, (d) a  quark in the SU(2) singlet representation, and (e) a  quark in an SU(2) doublet.In the doublet scenario, contributions from the not-considered vector-like quark are neglected, leading to conservative limits.The SU(2) (, ) doublet scenario considering contributions from both the  and  quark is shown in (f), assuming mass-degenerate VLQs.The thickness of the theory curve represents the theoretical uncertainty from the PDFs, scales, and strong coupling constant  s .

Table 1 :
List of ME generator, PDF set, PS model, and tune for the signal and different background processes.

Table 2 :
Overview of the event selections for the training region that is subdivided into a low- out control region and a signal region, for the top reweighting region, and for the control regions for +jets events and single-top-quark events.

Table 3 :
Input variables to the NN training, sorted in descending discriminating power between signal and background.The order is not strict as it depends on the VLQ type and branching ratio the NN is trained for.

Table 4 :
Observed data event yields and the expected background event yields with their total uncertainties in the control and signal regions after the background-only fit considering a NN trained for a VLT signal with branching ratio B (, ,  ) = (0.8, 0.1, 0.1).For comparison, the event yields for a VLT signal with a mass of 1.2 TeV and a branching ratio of B (, ,  ) = (0.8, 0.1, 0.1) are given.