1 Introduction

The fine-tuning or naturalness problem [1] in particle physics arises from loop corrections to the Higgs boson mass that are quadratically divergent. If the Standard Model were complete, those would lead to corrections to the Higgs mass of the order of the Planck scale, \(10^{19}\) \(\text {GeV}\), and a finely tuned bare Higgs mass of a similar value is needed to arrive at the measured mass of about 125 \(\text {GeV}\) [2].

Vector-like quarks (VLQs) [3,4,5,6] could dampen the unnaturally large quadratic corrections to the Higgs boson mass by contributing significantly to loop corrections. They are hypothetical spin-1/2 coloured particles whose left-handed and right-handed states have the same electroweak coupling. They appear in a number of theories beyond the Standard Model (SM) of particle physics, mainly in the ‘Little Higgs’ [7,8,9] and ‘Composite Higgs’ [10, 11] classes of models.

At the large hadron collider (LHC), VLQs could be produced singly via electroweak interactions or in pairs mainly via the strong interaction. While the cross-section for the latter depends only on the VLQ mass, the former has an additional dependence on the unknown coupling strength between the electroweak bosons and the VLQ. VLQs are expected to preferentially couple to third-generation quarks [3, 12]. Therefore the up-type VLQ T with charge \(+2/3\) is in the following assumed to only have the three possible decay modes \(T\rightarrow Zt\), \(T\rightarrow Ht\), and \(T\rightarrow Wb\). Similarly, the down-type VLQ B with charge \(-1/3\) can decay into Zb, Hb, or Wt. Vector-like X quarks with charge \(+5/3\) also appear in multiplets with T partners [4, 13] and decay via \(X \rightarrow Wt\) only.

Fig. 1
figure 1

Representative Feynman diagrams for a \(T\bar{T}\) and b \(B\bar{B}\) production and decay. In the analysis, no distinction is made between particles and antiparticles, leading to sensitivity to the \(X\bar{X} \rightarrow W t W t\) final state as well

This analysis investigates all possible decay modes and combinations of branching ratios for the pair-produced vector-like T (VLT) quark and B (VLB) quark, shown in Fig. 1. However, it is most sensitive to the \(T\rightarrow Zt\) and \(B\rightarrow Wt\) decays. Since the analysis does not distinguish between particles and antiparticles, the limits for \(B\rightarrow Wt\) also apply to the vector-like X quark, given that it exclusively decays into Wt. Particular combinations of branching ratios correspond to the weak-isospin singlet and doublet models. For T and B quarks, the branching ratio for each decay mode depends on the VLQ mass and weak-isospin quantum numbers [4]. The branching ratios given in the following are for VLQ masses above 800 GeV, where they are approximately independent of the VLQ mass. For a singlet T, all decay modes have sizeable branching ratios (\(\mathcal {B}(Zt,Ht,Wb) \approx (0.25,0.25,0.5)\)), whereas if T is in either an (XT) doublet or a (TB) doublet, it decays only into Zt or Ht with equal branching ratios as long as the generalised Cabibbo–Kobayashi–Maskawa (CKM) matrix elements fulfil \(|V_{Tb}| \ll |V_{Bt}|\) [4]. Similarly, for a singlet B the branching ratios of all decay modes are sizeable (\(\mathcal {B}(Zb,Hb,Wt) \approx (0.25,0.25,0.5)\)), while for the (TB) doublet scenario with \(|V_{Tb}| \ll |V_{Bt}|\) the \(B\rightarrow Wt\) decay is the only possibility.

The results are based on the full Run 2 (2015–2018) dataset collected by the ATLAS experiment in \(\sqrt{s}=\)13 TeV proton–proton collisions at the LHC, corresponding to an integrated luminosity of 139 fb\(^{-1}\). Several searches for pair-produced VLQs targeting various final states have already been performed by ATLAS [14,15,16,17,18,19,20,21] and CMS [22,23,24,25]. Those results are based on a subset of the Run 2 data of about 36 fb\(^{-1}\), except for Refs. [21, 25]. A combination of the ATLAS results using the 36 fb\(^{-1}\) dataset is also available [26], excluding T (B) masses below 1.31 \(\text {TeV}\) (1.03 \(\text {TeV}\)) for any combination of the three decay modes per VLQ discussed above.

The analysis is based on a final-state signature with high missing transverse momentum \(E_{\text {T}}^{\text {miss}}\), one lepton \(\ell \) (e or \(\mu \)), and at least four jets including a b-tagged jet. The previous ATLAS search in this final state is based on a subset of the Run 2 data [16], yielding lower limits on the T quark mass of 1.16 \(\text {TeV}\) for \(\mathcal {B}(T\rightarrow Zt) = 100\)% and 0.87 \(\text {TeV}\) (1.05 \(\text {TeV}\)) for the T in the singlet (doublet) model. Here the analysis is extended mainly by also investigating vector-like B quarks and by using neural networks (NNs) trained at several branching ratios in order to separate signal from background, instead of using a cut-and-count analysis with a single signal region (SR). The training is done separately for T and B in a common training region using simulated events. The SRs are each defined by a subset of the training region passing a selection on the corresponding NN output. Control regions (CRs) are defined so as to be enriched in the various background processes. They are orthogonal to the training region, and thus to the SRs, and orthogonal to each other. The statistical interpretation is based on a simultaneous fit to the CRs and SR for T or B, in which the normalisations for \(t\bar{t}\), \(W\)+jets and single-top-quark backgrounds and a possible signal contribution are determined.

Table 1 List of ME generator, PDF set, PS model, and tune for the signal and different background processes

2 ATLAS detector

The ATLAS experiment [27] at the LHC is a multipurpose particle detector with a forward–backward symmetric cylindrical geometry and a near \(4\pi \) coverage in solid angle.Footnote 1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer. The inner tracking detector covers the pseudorapidity range \(|\eta | < 2.5\). It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (\(|\eta | < 1.7\)). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to \(|\eta | = 4.9\). The muon spectrometer surrounds the calorimeters and is based on three large superconducting air-core toroidal magnets with eight coils each. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. The muon spectrometer includes a system of precision tracking chambers and fast detectors for triggering. A two-level trigger system is used to select events. The first-level trigger is implemented in hardware and uses a subset of the detector information to accept events at a rate below 100 kHz. This is followed by a software-based high-level trigger (HLT) that reduces the accepted event rate to 1 kHz on average depending on the data-taking conditions. An extensive software suite [28] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Data and simulated event samples

The analysis uses data from proton–proton (pp) collisions at \(\sqrt{s} = {13}\,{\text {TeV}}\) recorded with the ATLAS detector at the LHC in the years 2015 to 2018. The dataset, collected during stable beam conditions and with all detector subsystems operational [29], corresponds to an integrated luminosity of 139 fb\(^{-1}\) with an uncertainty of 1.7% [30]. At the high luminosities reached at the LHC, events are affected by additional inelastic pp collisions in the same or neighbouring bunch crossings, referred to as pile-up. The average number of interactions per bunch crossing was 33.7. Events were selected online during data-taking by \(E_{\text {T}}^{\text {miss}}\) triggers [31] with an \(E_{\text {T}}^{\text {miss}}\) threshold of 70 \(\text {GeV}\) in the HLT in 2015 and a threshold rising from 90 \(\text {GeV}\) to 110 \(\text {GeV}\) during the later years.

Monte Carlo (MC) simulated events are used for the modelling of the background processes and the VLQ signals. Details of the simulated nominal samples, including the matrix-element (ME) generator and the parton distribution function (PDF) set, the parton shower (PS) and hadronisation model, and the set of tuned parameters (tune), are summarised in Table 1.

The generated events were processed through a simulation [32] of the ATLAS detector geometry and response using Geant4  [33]. A faster simulation, which employed a parameterisation of the calorimeter response, was used in some cases to estimate systematic uncertainties. In these cases, the systematically varied samples were compared with versions of the nominal samples that were also processed through the fast simulation. In order to model pile-up effects, minimum-bias pp interactions were generated with Pythia 8.186 [34] using the A3 [35] set of tuned parameters and overlaid on the simulated hard-scatter events. The resulting events were weighted to match the pile-up profile of the recorded data.

Finally, the simulated events were reconstructed using the same software as the collision data. Corrections were applied to the simulated events in order to match object identification efficiencies, energy scales and resolution to those determined from data in auxiliary measurements.

Signal samples for the pair-production of vector-like T and B quarks were generated at leading order (LO) with Protos v2.2 [36] using the NNPDF 2.3lo PDF [37] set, interfaced with Pythia 8.186 to model the parton shower, hadronisation, and underlying event. Using the narrow-width approximation, the samples were produced for masses from 800 \(\text {GeV}\) up to 2 \(\text {TeV}\), with a mass spacing of 100 \(\text {GeV}\) from 1 to 1.8 \(\text {TeV}\). The chirality-dependent couplings of the VLQs were set to those in the weak-isospin singlet model, but with equal branching ratios into the three decay modes (Zt, Ht, Wb) for the vector-like T quark and (Zb, Hb, Wt) for the vector-like B quark. Dedicated signal samples in the doublet model were produced for the 1.2 \(\text {TeV}\) mass point in order to test for potential kinematic biases from the assumed singlet couplings. For the T quark, this choice is conservative because the acceptance is higher in the doublet case, while for the B quark the acceptances are similar for singlet and doublet couplings. In order to obtain the desired branching ratios, an event-by-event reweighting based on generator information is performed. The signal sample cross-sections were calculated with Top++ 2.0 [38] at next-to-next-to-leading order (NNLO) in QCD including the resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms.

The production of \(t\bar{t}\) events was modelled using the Powheg Box v2 [39,40,41,42] generator at next-to-leading order (\(\text {NLO}\)) with the NNPDF 3.0nlo set [43] of PDFs and the \(h_\text {damp}\) parameterFootnote 2 set to \(1.5\,m_{t}\) [44], with \(m_{t}\,{=}\, {172.5}\,{\text {GeV}}\). The events were interfaced to Pythia 8.230. The cross-section was corrected to the theory prediction at \(\text {N}\) \(\text {NLO}\) including \(\text {NNLL}\) soft-gluon terms calculated using Top++ 2.0.

Samples of single-top-quark events were produced with the Powheg Box v2 generator at \(\text {NLO}\) in QCD using the NNPDF 3.0nlo set of PDFs with the five-flavour scheme for tW production and s-channel single-top production, and the four-flavour scheme for t-channel single-top events. The tW sample was modelled using the diagram removal scheme [45] to remove interference and overlap with \(t\bar{t}\) production. The events were interfaced with either Pythia 8.230 or Pythia 8.235. The samples were normalised to their \(\text {NLO}\) QCD cross-sections [46, 47], with additional \(\text {NNLL}\) soft-gluon terms for tW production [48, 49].

The production of \(V\)+jets (\(V=W,Z\)) was simulated with the Sherpa 2.2.1 generator using \(\text {NLO}\)-accurate matrix elements for up to two partons, and LO matrix elements for up to four partons, calculated with the Comix [50] and OpenLoops [51,52,53] libraries. They were matched with the SherpaPS [54] using the MEPS@NLO prescription [55,56,57,58] and the set of tuned parameters developed by the Sherpaauthors. The NNPDF 3.0nnlo set of PDFs was used and the samples were normalised to a \(\text {N}\) \(\text {NLO}\) prediction [59].

Samples of diboson final states (VV) were simulated with the Sherpa 2.2.1 or 2.2.2 generator, depending on the process, including off-shell effects and Higgs boson contributions where appropriate. Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other one hadronically, were generated using matrix elements at \(\text {NLO}\) accuracy in QCD for up to one additional parton and at \(\text {LO}\) accuracy for up to three additional parton emissions. The matching of \(\text {NLO}\) matrix elements to the PS and the merging of different jet multiplicities was done in the same way as for \(V\)+jets production. The NNPDF 3.0nnlo set of PDFs was used, along with the Sherpa-internal tune. The diboson event samples were normalised to the total cross-section calculated by Sherpaat \(\text {NLO}\) in QCD.

The production of \(t\bar{t} W\) and \(t\bar{t} Z\) events was modelled using the MadGraph5_aMC@NLO 2.3.3 [60] generator at \(\text {NLO}\) with the NNPDF 3.0nlo PDF set. The events were interfaced to Pythia 8.210. Similarly, the production of \(tWZ\) events was modelled using MadGraph5_aMC@NLO 2.6.8 at \(\text {NLO}\) with the NNPDF 3.0nlo PDF set, interfaced to Pythia 8.244. The diagram removal scheme was employed to handle the interference between \(tWZ\) and \(t\bar{t} Z\), and was applied to the \(tWZ\) sample. Samples for the production of \(t\bar{t} H\) events were generated using the Powheg Box v2 generator at \(\text {NLO}\) with the NNPDF 3.0nlo PDF set, interfaced to Pythia 8.230. The generated samples for \(t\bar{t} W\), \(t\bar{t} Z\), \(tWZ\), and \(t\bar{t} H\) production were normalised to \(\text {NLO}\) cross-section predictions calculated by MadGraph5_aMC@NLO.

All simulated samples, except those produced with the Sherpa [61] event generator, utilised the EvtGen [62] program to model the decay of heavy-flavour hadrons. While EvtGen 1.2.0 was used for the VLQ signal samples and the \(t\bar{t} V\) samples, EvtGen 1.6.0 was used in all other cases. For all nominal samples where Pythia 8 [63] was utilised for the showering and hadronisation, Pythiawas used with the A14 [64] set of tuned parameters and the NNPDF 2.3lo set of PDFs.

4 Event reconstruction and object selection

Events are required to have at least one pp collision vertex candidate with at least two associated tracks with transverse momentum \(p_{\text {T}} > {0.5}\,{\text {GeV}}\). The primary vertex is defined to be the vertex candidate with the largest scalar sum of transverse momenta of all associated tracks. In this analysis, electrons, muons, and jets are the calibrated physics objects used. For the charged leptons, two sets of quality and kinematic requirements are imposed, where the selection for signal leptons is tighter than for baseline leptons.

Electron candidates are reconstructed from energy deposits in the EM calorimeter matched to charged-particle tracks in the ID. Baseline electrons are required to have \(p_{\text {T}} > {10}\,{\text {GeV}}\) and to be reconstructed within \(|\eta | < 2.47\), excluding the barrel–endcap transition region \(1.37< |\eta | < 1.52\). They must fulfil ‘loose’ identification criteria, using a likelihood-based discriminant that combines information about tracks in the ID and energy deposits in the calorimeter system [65], and are required to have a hit in the innermost layer of the pixel detector. Furthermore, isolation requirements in both the calorimeter and the ID are imposed [65]. An electron does not meet the isolation criteria if, after subtracting contributions from pile-up and the electron itself, the transverse energy deposited in the calorimeter within a surrounding cone of radius \(\Delta R = 0.2\) exceeds 20% of the transverse energy of the electron. Similarly, electron candidates are excluded if the scalar sum of the transverse momenta of tracks within a cone of radius \(\Delta R = \min ({10}\,{\text {GeV}}/p_{\text {T}} (e), 0.2)\), excluding the track matched to the electron, is larger than 15% of the electron \(p_{\text {T}}\). In addition, each electron candidate’s track must be matched to the primary vertex. This requires that the significance of its transverse impact parameter, \(d_0\), satisfies \(|d_0|/\sigma _{d_0} < 5\), where \(\sigma _{d_0}\) is the uncertainty in \(d_0\), and that the longitudinal distance \(z_0\) from the primary vertex to the point where \(d_0\) is measured satisfies \(|z_0 \sin \theta | < {0.5}{\hbox {mm}}\). In order to suppress backgrounds due to hadrons misidentified as electrons, signal electrons must satisfy all baseline criteria and in addition ‘tight’ identification criteria [65], and have \(p_{\text {T}} > {28}\,{\text {GeV}}\).

Muon candidates are reconstructed by combining charged-particle tracks formed in the ID and in the muon spectrometer or by matching ID tracks to an energy deposit in the calorimeter compatible with a minimum ionising particle [66]. Baseline muons are required to have \(p_{\text {T}} > {10}\,{\text {GeV}}\) and \(|\eta | < 2.5\), and to satisfy the ‘loose’ identification criteria [66]. Track-to-vertex matching is ensured by requiring the muon track to satisfy \(|d_0|/\sigma _{d_0} < 3\) and \(|z_0 \sin \theta | < {0.5}{\hbox {mm}}\). Signal muons must satisfy ‘medium’ identification criteria and are required to have \(p_{\text {T}} > {28}\,{\text {GeV}}\). Additionally, signal muons must be isolated, requiring that the scalar \(p_{\text {T}}\) sum of all tracks within a cone of radius \(\Delta R = \min \left( {10}\,{\text {GeV}}/p_{\text {T}} (\mu ), 0.3\right) \) around the muon is less than 6% of the muon \(p_{\text {T}}\).

Small-radius (small-R) jet candidates are built from particle-flow objects [67, 68], using the anti-\(k_{t}\) algorithm [69, 70] with a radius parameter of \(R=0.4\). The particle-flow algorithm combines information about tracks in the ID and energy deposits in the calorimeters to form the input for the jet reconstruction. Jets are required to have \(p_{\text {T}} > {25}\,{\text {GeV}}\) and \(|\eta | < 2.5\). To reject jets originating from pile-up interactions, jet candidates with \(|\eta | < 2.4\) and \(p_{\text {T}} < {60}\,{\text {GeV}}\) are required to satisfy the ‘tight’ jet vertex tagger (JVT) criterion [71]. Small-R jets containing a b-hadron decay are \(b\text {-tagged}\) using a multivariate algorithm, called DL1r, operating at a tagging efficiency of \(77\%\) as determined in simulated \(t\bar{t}\) events [72, 73].

An overlap removal procedure is applied to prevent double counting of ambiguous reconstructed objects, using the baseline lepton definitions. First, electron–muon overlap is handled by removing muons sharing a track in the ID with an electron if the muon is calorimeter-tagged, and otherwise removing the electron. Subsequently, overlap between jets and leptons is removed by rejecting any jets within \(\Delta R = 0.2\) of an electron and afterwards rejecting any electrons within \(\Delta R = 0.4\) of a jet. Similarly, jets are discarded if they have fewer than three associated tracks and are within \(\Delta R = 0.2\) of a muon candidate. Otherwise, the muon is rejected if it lies within \(\Delta R = \text {min}(0.4, 0.04 + {10}\,{\text {GeV}}/p_{\text {T}} (\mu ))\) of a jet.

The missing transverse momentum, with magnitude \(E_{\text {T}}^{\text {miss}}\), is defined as the negative vectorial sum of the transverse momenta of all calibrated objects in an event, plus a track-based soft-term which takes into account energy depositions associated with the primary vertex but not with any calibrated object [74].

Finally, large-radius (large-R) jets are constructed from the selected small-R jets using the anti-\(k_{t}\) algorithm with \(R=1.0\). In order to reduce the impact of soft radiation, constituent small-R jets with \(p_{\text {T}}\) less than 5% of the large-R jet \(p_{\text {T}}\) are removed. These reclustered large-R jets are required to have \(p_{\text {T}} >{150}\,{\text {GeV}}\) and a mass larger than \({50}\,{\text {GeV}}\).

5 Event selection and categorisation

All events considered in this analysis must be selected by an \(E_{\text {T}}^{\text {miss}}\) trigger. Since the trigger thresholds were raised during Run 2, a requirement of \(E_{\text {T}}^{\text {miss}} > {250}\,{\text {GeV}}\) is imposed to ensure full trigger efficiency over all data-taking periods. Events are also required to have exactly one signal lepton (e, \(\mu \)) and at least four small-R jets of which at least one is b-tagged. A veto on a second lepton, fulfilling the baseline requirements, is used to suppress \(t\bar{t}\) events with two leptons in the final state. To reject events with \(E_{\text {T}}^{\text {miss}}\) arising from mismeasured jets, the azimuthal angle between the missing transverse momentum vector \(\vec {E}_\textrm{T}^\textrm{miss}\) and both the leading (\(j_1\)) and subleading (\(j_2\)) jets, ordered in \(p_{\text {T}}\), must satisfy the condition \(|\Delta \phi (j_{i}, \vec {E}_\textrm{T}^\textrm{miss})| > 0.4\), with \(i \in \{1,2\}\). In addition, events must have a transverse mass \(m_\text {T}^W > {30}\,{\text {GeV}}\), where \(m_\text {T}^W \) is defined as

$$\begin{aligned} m_\text {T}^W = \sqrt{2 p_{\text {T}} (\ell ) E_{\text {T}}^{\text {miss}} \left[ 1-\cos \left( \Delta \phi (\ell , \vec {E}_\textrm{T}^\textrm{miss})\right) \right] }. \end{aligned}$$
Table 2 Overview of the event selections for the training region that is subdivided into a low-\(N\hspace{-1.00006pt}N_{\text {out}}\) control region and a signal region, for the top reweighting region, and for the control regions for \(W\)+jets events and single-top-quark events

After applying these requirements, referred to as ‘preselection’ in Table 2, the dominant backgrounds come from \(t\bar{t}\) and \(W\)+jets production. A training region for the NNs is defined by applying further requirements listed in Table 2, which reduce the amount of background without decreasing the sensitivity to the signal. Requiring \(m_\text {T}^W \) to be well above the W boson mass peak, \(m_\text {T}^W > {120}\,{\text {GeV}}\), strongly reduces the \(W\)+jets and semileptonic \(t\bar{t}\) background. In these background processes the leptonic W boson decay is the only source of \(E_{\text {T}}^{\text {miss}}\), while in the VLQ pair production additional sources of \(E_{\text {T}}^{\text {miss}}\) can arise from e.g., the Z boson decay to neutrinos or the W boson decay to a hadronically-decaying tau-lepton and a neutrino. The remaining \(t\bar{t}\) background originates from dileptonic \(t\bar{t}\) events where one lepton is not detected. This type of \(t\bar{t}\) background is suppressed by requirements on the asymmetric transverse mass, \(am_{\textrm{T2}}\)  [75, 76], which is a variant of the \(m_\textrm{T2}\)  [77, 78] variable. The latter is applied to signatures where two or more particles are not detected directly (e.g. dileptonic \(t\bar{t}\) events, where the two neutrinos are not detected), and it is defined as

$$\begin{aligned} m_\textrm{T2} = \min _{\vec {q}_{\textrm{T}a}+\vec {q}_{\textrm{T}b} = \vec {E}_{\textrm{T}}^{\textrm{miss}}}\left[ \max (m_{\textrm{T}a},m_{\textrm{T}b})\right] . \end{aligned}$$

In this formula, \(m_{\textrm{T}a}\) and \(m_{\textrm{T}b}\) are transverse masses calculated using two sets of one or more visible particles, denoted by a and b, respectively, and all possible combinations of missing transverse momenta \(\vec {q}_{\textrm{T}a}\) and \(\vec {q}_{\textrm{T}b}\), with \(\vec {q}_{\textrm{T}a}+\vec {q}_{\textrm{T}b}=\vec {E}_\textrm{T}^\textrm{miss}\). In the calculation of \(am_{\textrm{T2}}\), the two sets of visible particles are asymmetric as they consist of the one identified signal lepton with one of the two jets with highest b-tagging score on one hand and the other jet on the other. Given the two possible lepton-jet pairings the combination with the lowest \(am_{\textrm{T2}}\) is taken. For dileptonic \(t\bar{t}\) events, the \(am_{\textrm{T2}}\) distribution has a kinematic endpoint at the top-quark mass, while additional sources of \(E_{\text {T}}^{\text {miss}}\) extend the distribution towards higher \(am_{\textrm{T2}}\) values. Events in the training region have to fulfil \(am_{\textrm{T2}} >{200}\,{\text {GeV}}\). At least one hadronic decay of a high-\(p_{\text {T}} \) top quark or SM boson is expected for the considered signal. Thus, at least one large-R jet is required in the training region. About 4% of the simulated signal events are reconstructed in the training region for \(T\bar{T}\) or \(B\bar{B}\) production with pure \(T\rightarrow Zt\) and \(B\rightarrow Wt\) decays and a VLQ mass of 1.2 \(\text {TeV}\).

The \(t\bar{t}\) background, which is a major background in this analysis, is not modelled accurately at high transverse momenta [79, 80]. Therefore, a reweighting procedure, referred to as ‘top reweighting’ in the following, is applied. Reweighting factors are derived in bins of the jet multiplicity (4, 5, 6, \({\ge }7\)) as a function of the effective mass \(m_\text {eff}\), defined as the scalar sum of the \(p_{\text {T}}\) of all reconstructed objects and \(E_{\text {T}}^{\text {miss}}\). The reweighting factors are determined for the sum of the \(t\bar{t}\) and single-top backgrounds using their nominal prediction and are parameterised with a linear function. They are derived from a comparison between data and MC simulation in a dedicated top reweighting region (see Table 2), which is defined in the same way as the training region except for an inverted \(am_{\textrm{T2}}\) requirement. This requirement is strengthened to \(am_{\textrm{T2}} < {180}\,{\text {GeV}}\) in order to have a higher \(t\bar{t}\) purity of about 90% and less signal contamination in the tails of the \(m_\text {eff}\) distribution. The resulting reweighting factors are applied to \(t\bar{t}\) and single-top-quark events in each of the defined analysis regions, which changes the event yields and leads to an improved modelling. This can be seen in Fig. 2 which compares the data with MC simulation in the top reweighting region after reweighting, and also shows the MC expectation before reweighting.

Fig. 2
figure 2

Distributions of a \(m_\text {eff}\) and b lepton \(p_{\text {T}}\) in the top reweighting region after applying the reweighting factors to the \(t\bar{t}\) and single-top background. The dashed line indicates the total background before the reweighting. The band includes statistical and systematic uncertainties. Minor background contributions from \(t\bar{t} H\), \(tWZ\), and \(Z\)+jets are combined into Others. The ratios of the data to the expected background are shown in the bottom panels of the plots. The last bin in each distribution contains the overflow

Fig. 3
figure 3

Distributions of \(m_\text {eff}\) in a the \(W\)+jets CR and b the single-top CR after applying the top reweighting factors to the simulated \(t\bar{t}\) and single-top-quark events. The dashed line indicates the total background before the reweighting. Minor background contributions from \(t\bar{t} H\), \(tWZ\), and \(Z\)+jets are combined into Others. The band includes statistical and systematic uncertainties. The ratios of the data to the expected background are shown in the bottom panels of the plots. The last bin in each distribution contains the overflow

Control regions are defined for the \(W\)+jets and single-top-quark backgrounds so as to be enriched in the respective background and have negligible contamination from signal. Both CRs are defined to be orthogonal to the top reweighting region and the training region by modifying the requirement on \(m_\text {T}^W \), using a window of \({30}\,{\text {GeV}}<m_\text {T}^W < {120}\,{\text {GeV}}\) around the W boson mass. In order to reduce the \(t\bar{t}\) background contribution in these regions, the large-R jet multiplicity is required to be less than two, and if a large-R jet is present its mass has to be below 150 \(\text {GeV}\). For the \(W\)+jets CR, the contribution from \(t\bar{t}\) events is further reduced by selecting only events with exactly one b-tagged jet. Since the cross-section for \(W^+\) production is larger than for \(W^-\) production, higher \(W\)+jets purity is achieved by selecting only events with a positively charged lepton. In the single-top CR, the contribution from \(W\)+jets is reduced by requiring at least two b-tagged jets with an angular separation of \(\Delta R (b_{1},b_{2})> 1.4\) between the two highest-\(p_{\text {T}}\) b-jets. Table 2 summarises the selection criteria for both CRs, and Fig. 3 compares the effective mass distribution in data with that in MC simulation after top reweighting, and also shows the MC expectation before reweighting.

6 Neural network training

To enhance the separation between signal and background events, NNs with several input variables combined into a single discriminant are employed. They are trained for various signal hypotheses using the simulated signal and background events in the training region. For \(T\bar{T}\) production, four NNs are trained for different branching ratios \(\mathcal {B}(Zt,Ht,Wb)\) covering the region of the branching ratio plane where this analysis is sensitive: (0.8, 0.1, 0.1), (0.2, 0.4, 0.4), (0.4, 0.1, 0.5), (0.4, 0.5, 0.1). Similarly, three NNs are trained for \(B\bar{B}\) production, considering the branching ratios \(\mathcal {B}(Zb,Hb,Wt) =\) (0.1, 0.1, 0.8), (0.4, 0.1, 0.5), and (0.1, 0.4, 0.5).

Table 3 Input variables to the NN training, sorted in descending discriminating power between signal and background. The order is not strict as it depends on the VLQ type and branching ratio the NN is trained for

The NNs are implemented using the NeuroBayes package [81, 82], which combines a three-layer feed-forward NN with preprocessing of the input variables prior to their presentation to the NN. The main purpose of the preprocessing is to facilitate optimal network training by ordering the input variables according to their separation power, taking correlations into account, and removing all but the most relevant ones. Sets of input variables are selected for their ability to discriminate between signal and background. Table 3 lists the input variables that are used to train at least one NN. Each set of input variables is composed of observables reflecting the signal topology, e.g. the high VLQ mass via \(m_\text {eff}\) or the properties of the reclustered large-R jets. Other important variables are the b-jet multiplicity and the transverse masses, \(m_\text {T}^W \) and \(am_{\textrm{T2}}\), that are used to define the CRs and training region. It is checked that all input variables are modelled well. As an example, the distributions of four important input variables in the training region are shown in Fig. 4.

Fig. 4
figure 4

Distributions of NN input variables in the training region: a \(m_\text {eff}\), b \(m_\text {T}^W \), c \(am_{\textrm{T2}}\) and d \(E_{\text {T}}^{\text {miss}}\). Minor background contributions from \(t\bar{t} H\), \(tWZ\), and \(Z\)+jets are combined into Others. The signal distributions for \(T\bar{T}\rightarrow ZtZ\bar{t}\) and \(B\bar{B}\rightarrow WtW\bar{t}\), assuming a VLQ mass of 1.2 \(\text {TeV}\), are overlaid and normalised to the total background prediction. The band includes statistical and systematic uncertainties. The ratios of data to the expected background are shown in the bottom panels of the plots. The last bin in each distribution contains the overflow

NeuroBayes uses Bayesian regularisation techniques for the training process to improve the generalisation performance and to avoid overtraining. In general, the network infrastructure consists of one input node for each input variable, plus one bias node, an arbitrary, user-defined number of hidden nodes, and one output node which gives a continuous NN output score (\(N\hspace{-1.00006pt}N_{\text {out}}\)) in the interval \((0,+1)\), where \(N\hspace{-1.00006pt}N_{\text {out}}\) values close to zero indicate background-like events and values close to one correspond to signal-like events. For the NNs in this analysis, 15 nodes are used in the hidden layer and the ratio of signal to background events in the training is chosen to be 1:1. The different background processes are weighted according to their expected event contribution. All the main backgrounds, \(t\bar{t}\), \(W\)+jets, single top quark, and \(t\bar{t} V\), are used in the training. For the signal process, VLQ masses from 1 \(\text {TeV}\) to 1.5 \(\text {TeV}\) are combined in each training. Events at different signal masses enter with the same cross-section when composing the training sample in order to prevent the lower masses with higher cross-sections from dominating. As a check for potential overtraining, only 80% of the simulated events serve as input to the training, while the remaining 20% are used as a test sample. No signs of overtraining are observed. After the training step, all simulated signal and background events, as well as the observed data events, are processed by the NNs in order to get an \(N\hspace{-1.00006pt}N_{\text {out}}\) value for each event. For each NN, the training region is divided into a low-\(N\hspace{-1.00006pt}N_{\text {out}}\) CR with \(N\hspace{-1.00006pt}N_{\text {out}} < 0.5\), and the SR with \(N\hspace{-1.00006pt}N_{\text {out}} > 0.5\).

7 Systematic uncertainties

Several sources of experimental and theoretical systematic uncertainty are considered. The experimental uncertainties are mainly related to the reconstruction and calibration of the final-state physics objects, while the theoretical uncertainties are associated with the modelling of the various processes by the MC event generators. The sources of the largest systematic uncertainties in the analysis are related to the modelling of the major background processes and to the jet energy resolution.

For \(t\bar{t}\) and single-top production the following systematic uncertainties related to the event modelling are applied. The uncertainty in the matching procedure between the ME generator and parton shower is assessed by comparing the nominal MadGraph5_aMC@NLO 8 samples with alternative samples generated by MadGraph5_aMC@NLOand showered by Pythia 8. In order to estimate the uncertainties in the modelling of the underlying event, the parton shower, and the hadronisation, the nominal samples are compared with a Powheg+Herwig  7 [83] prediction. Uncertainties related to the choice of renormalisation and factorisation scales of the matrix-element calculation are considered by independently doubling and halving the scales. The impact of initial-state radiation (ISR) is estimated by varying \(\alpha _{\text {s}} \) in the A14 tune. Similarly, the uncertainty related to final-state radiation (FSR) is assessed by varying the renormalisation scale for final-state parton-shower emissions by a factor of two. The uncertainty related to the choice of scale for the matching of the matrix-element calculation for the \(t\bar{t}\) process to the parton shower is evaluated by comparing the nominal samples with an alternative sample produced with the \(h_\text {damp}\) parameter set to \(h_\text {damp} = 3.0 \, m_t\). Uncertainties due to PDFs are obtained using the PDF4LHC15 combined PDF set [84]. A dominant systematic uncertainty in the modelling of the single-top processes stems from the handling of the interference between \(t\bar{t}\) and \(tW\) at NLO. The uncertainty is estimated by comparing the nominal sample for \(tW\) production generated using the diagram-removal scheme with an alternative sample using the diagram-subtraction scheme [45, 85]. Finally, an additional 30% normalisation uncertainty is assigned to events from \(t\bar{t}\) + heavy-flavour jets production [86].

Uncertainties in the top reweighting procedure arise from the chosen form of the parameterised function and the statistical uncertainties of events in the reweighting region. These are accounted for by varying the parameterised function by \(\pm 1\sigma \) from its nominal value, using the uncertainties of the fit parameters and taking their correlations into account. Each of the four jet bins for which the reweighting is determined is treated as an independent source of uncertainty.

For all other considered processes, namely \(V\)+jets, diboson, \(t\bar{t} V\), \(t\bar{t} H\), and \(tWZ\) production, the renormalisation and factorisation scales are independently varied by a factor of two. A 30% uncertainty is assigned to the heavy-flavour component of the \(W\)+jets process, based on a comparison between Sherpa 2.2.1 and data [87].

Backgrounds without a free-floating normalisation parameter in the profile-likelihood fit are assigned a theoretical cross-section uncertainty. For the \(t\bar{t} H\) process, an 11% [88] uncertainty is assigned and for \(t\bar{t} V\) and \(tWZ\) the uncertainty amounts to 15% and 12% [88], respectively. The cross-section uncertainty is taken to be 6% [89] for diboson production and 5% [90] for the \(Z\)+jets process.

Besides theoretical systematic uncertainties, detector-related uncertainties are considered in the analysis, the dominant one being the jet energy resolution [68]. Additional jet-related uncertainties are due to the jet energy scale, the jet mass scale and resolution, the efficiency of the JVT requirements, and the b-jet identification [72, 91]. Uncertainties associated with leptons arise from the efficiencies of the lepton identification, isolation, and reconstruction, as well as the lepton energy scale and resolution [65, 66]. Further experimental uncertainties are related to the scale and resolution of the track soft-term in the \(E_{\text {T}}^{\text {miss}}\) calculation [92]. Additional contributions to the total systematic uncertainty come from the uncertainty in the integrated luminosity and the pile-up profile.

8 Statistical analysis

The signal-enriched part of the binned NN output distribution (\(N\hspace{-1.00006pt}N_{\text {out}}\) > 0.5), and the overall number of events in the low-\(N\hspace{-1.00006pt}N_{\text {out}}\), \(W\)+jets, and single-top CRs are used to test for the presence of a signal. For hypothesis testing, binned profile-likelihood fits are performed for each of the seven NNs separately, following a modified frequentist method [93] implemented in RooStats [94], and taking the systematic uncertainties affecting the signal and background expectations into account as nuisance parameters.

The binned likelihood function \(\mathcal {L}(\mu ,\theta )\) is constructed as the product of Poisson probability terms over all bins. It depends on the signal strength parameter \(\mu \), a factor multiplying the theoretical signal production cross-section, and \(\theta \), a set of nuisance parameters, constrained in the likelihood function by Gaussian or log-normal priors. The low-\(N\hspace{-1.00006pt}N_{\text {out}}\), \(W\)+jets, and single-top CRs are used to mainly control the normalisations of \(t\bar{t}\), \(W\)+jets, and single-top backgrounds, for which additional unconstrained normalisation factors (\(\mu _{t\bar{t}}\), \(\mu _{W+\text {jets}}\), and \(\mu _\text {single top}\)) are included in the likelihood function. The number of events expected in a bin depends on the normalisation factors as well as on the nuisance parameters. The nuisance parameters \(\theta \) adjust the expectations for signal and background according to the corresponding systematic uncertainties, and their fitted values correspond to the amounts that best fit the data.

In order to avoid double-counting of normalisation uncertainties for the free-floating background processes, only shape effects and acceptance differences between the CRs and the SR are included when considering the systematic uncertainties in their modelling. A smoothing algorithm is applied to the templates for the systematic variations in case the statistical fluctuations between bins in the signal region are large. Furthermore, the templates for all systematic variations are symmetrised. Some of the dominant systematic uncertainties related to the modelling of the major background processes are one-sided and are symmetrised by mirroring the uncertainty. To simplify the fitting procedure, nuisance parameters are only included for systematic uncertainties that affect the event yield by more than 1% for a process in any bin. Normalisation and shape components for a source of systematic uncertainty are treated separately in this procedure.

The test statistic \(q_{\mu }\) is defined as the profile likelihood ratio, \(q_{\mu } = -2 \ln {\mathcal {L}(\mu ,\hat{\hat{\theta }})/\mathcal {L}(\hat{\mu },\hat{\theta })}\), where \(\hat{\mu }\) and \(\hat{\theta }\) are the values of the parameters that simultaneously maximise the likelihood function, and \(\hat{\hat{\theta }}\) are the values of the nuisance parameters that maximise the likelihood function for a fixed value of \(\mu \). The compatibility of the observed data with the background-only hypothesis is tested by setting \(\mu = 0\) in the test statistic \(q_0\). Upper limits on the signal production cross-section for each considered signal scenario are computed using \(q_\mu \) in the CL\(_\text {s}\) method [93] with the asymptotic approximation [95]. A given signal scenario is considered to be excluded at \(\ge 95\%\) confidence level (CL) if the value of the signal production cross-section (parameterised by \(\mu \)) yields a CL\(_\text {s}\) value \(\le 0.05\).

9 Results

Background-only likelihood fits are performed for each NN. The obtained normalisation factors for the \(t\bar{t}\), \(W\)+jets, and single-top processes vary between the fits for the different NNs: between \(1.00\pm 0.28\) and \(1.14 \pm 0.27\) for \(t\bar{t}\), between \(0.91\pm 0.19\) and \(1.08\pm 0.17\) for \(W\)+jets, and between \(0.53\pm 0.30\) and \(0.60\pm 0.23\) for single top. The reduction of the single-top contribution appears large, but it is less than the difference between the nominal and alternative schemes for modelling the interference between the \(t \bar{t}\) and tW processes, which are described in Sects. 3 and 7. The \(N\hspace{-1.00006pt}N_{\text {out}}\) distributions after the fit are validated in the CRs by comparing data with simulation. As an example, the plots for the NN training with a VLT signal with \(\mathcal {B}(Zt,Ht,Wb) = (0.8,0.1,0.1)\) and a VLB signal with \(\mathcal {B}(Zb,Hb,Wt) = (0.1,0.1,0.8)\) are shown in Fig. 5. For the training with the VLT signal, the corresponding number of events expected from each process in the three CRs and also the signal region is shown in Table 4, together with the number of observed events and the expected signal yield for a mass of 1.2 \(\text {TeV}\). The large uncertainty in the single-top yield due to the different schemes for modeling the interference can also be observed here. The uncertainty in the total background is less than the uncertainty in the separate processes because of strong (anti-)correlations between various systematic uncertainties.

The \(N\hspace{-1.00006pt}N_{\text {out}}\) distributions in the signal region for the two training cases shown in Fig. 5, and for another training for a VLT signal with \(\mathcal {B}(Zt,Ht,Wb) = (0.2,0.4,0.4)\), are shown in Fig. 6. No significant deviations from the SM expectation are observed for these three cases or when using other trained NNs.

Fig. 5
figure 5

Data and background expectation in the \(W\)+jets CR (left panels), the single-top CR (middle panels), and the low-\(N\hspace{-1.00006pt}N_{\text {out}}\) CR (right panels) after a background-only fit to data (Post-Fit) for a NN training considering a VLT signal with \(\mathcal {B}(Zt,Ht,Wb) = (0.8,0.1,0.1)\) (upper panels) and a VLB signal with \(\mathcal {B}(Zb,Hb,Wt) = (0.1,0.1,0.8)\) (lower panels). Minor background contributions from \(t\bar{t} H\), \(tWZ\), and \(Z\)+jets are combined into Others. The band indicates the post-fit uncertainty. The lower panels show the ratio of data to the background expectation

Table 4 Observed data event yields and the expected background event yields with their total uncertainties in the control and signal regions after the background-only fit considering a NN trained for a VLT signal with branching ratio \(\mathcal {B}(Zt,Ht,Wb) =(0.8,0.1,0.1)\). For comparison, the event yields for a VLT signal with a mass of 1.2 \(\text {TeV}\) and a branching ratio of \(\mathcal {B}(Zt,Ht,Wb) = (0.8,0.1,0.1)\) are given
Fig. 6
figure 6

Data and background expectation in the signal region after the simultaneous background-only fit to data (Post-Fit) for a NN training for a a VLT signal with \(\mathcal {B}(Zt,Ht,Wb) = (0.8,0.1,0.1)\), b a VLT signal with \(\mathcal {B}(Zt,Ht,Wb) = (0.2,0.4,0.4)\), and c a VLB signal with \(\mathcal {B}(Zb,Hb,Wt) = (0.1,0.1,0.8)\). Contributions from \(t\bar{t} H\), \(tWZ\), and \(Z\)+jets are combined into Others. Expected pre-fit signal distributions with the signal branching ratio corresponding to the respective training are added on top of the background expectation, using a signal mass of 1.2 \(\text {TeV}\). The band indicates the statistical and systematic uncertainties. The ratio of data to the background expectation is shown in the bottom panels of the plots

Upper limits on the pair-production cross-sections for T and B quarks are calculated at the 95% CL. For each signal mass and branching ratio, the NN giving the most stringent expected limit is selected. These obtained cross-section limits are compared with the theoretical cross-section to set exclusion limits on the signal mass. The limits are calculated for T and B quarks in the weak-isospin singlet and doublet representations, with mass-dependent branching ratios, as well as for pure \(T\rightarrow Zt\) and \(B\rightarrow Wt\) decays, where the latter corresponds to \(X \rightarrow Wt\) as well as to the (TB) doublet as mentioned before. For the doublet scenarios, the contribution from the VLQ partner is either not considered, leading to conservative limits, or considered assuming mass-degenerate VLQs. Mass differences of at most a few \(\text {GeV}\) are allowed, so that decays from one member of the doublet to the other remain suppressed [4, 6]. Also for the mass-degenerate doublet scenario the seven NNs described in Sect. 6 are used and the one with the most stringent expected limit is selected as described above, i.e., no additional NN is trained to potentially increase the sensitivity to the added yield from both doublet members.

The expected and observed lower limits on the VLQ mass in the aforementioned models are listed in Table 5 and shown in Fig. 7. The impact of the statistical uncertainties on the mass limits is larger than that of the systematic uncertainties. However, the latter is not negligible and for the case of pure \(T \rightarrow Zt\) (\(B\rightarrow Wt\)) decays it reduces the expected lower limit by about 40 \(\text {GeV}\) (70 \(\text {GeV}\)) to a value of 1.45 \(\text {TeV}\) (1.42 \(\text {TeV}\)). For the three T-quark scenarios in Table 5, the obtained mass limits are 300–400 \(\text {GeV}\) higher than in the earlier ATLAS analysis in the same final state using a subset of the Run 2 data [16]. This improvement is only partially due to the larger dataset, as the expected limits on the cross section for a VLT mass of 1.4 \(\text {TeV}\) improved between a factor of 4.5 for the pure \(T \rightarrow Zt\) and a factor of 7.7 for the SU(2) singlet. Especially when the branching fraction into Zt becomes smaller the major effect stems from the training of neural networks at several branching ratios instead of using a cut-and-count analysis with a single SR as done previously. The obtained mass limits for the first five scenarios in Table 5 are also better than the corresponding limits in the combination of all ATLAS results using 36 fb\(^{-1}\) [26], apart from the T singlet scenario where the observed limit is weaker than the expected limit. The strongest lower limits on the VLQ masses, 1.59 \(\text {TeV}\), are derived for the weak-isospin doublets assuming mass-degenerate VLQs.

Table 5 Expected (Exp.) and observed (Obs.) mass limits for the pair production of specific VLQs (T, B, X) in certain decay scenarios corresponding to SU(2) singlet or doublet representations or to the decay into just one specific final state. In the doublet scenarios, contributions from the VLQ partner are not considered, leading to conservative limits, except for the last row where the VLQs in the doublet are assumed to be mass degenerate. Since the analysis does not distinguish between particles and antiparticles, the limits for \(B\rightarrow Wt\) also apply to the vector-like X because it decays exclusively into Wt. Similarly, the (TB) doublet scenarios correspond to (XT) doublet scenarios
Fig. 7
figure 7

Expected and observed upper limits on the signal cross-section for a the case \(\mathcal {B}(T\rightarrow Zt) = {100}{\%}\), b the case \(\mathcal {B}(B\rightarrow Wt) = {100}{\%}\), c a T quark in the SU(2) singlet representation, d a B quark in the SU(2) singlet representation, and e a T quark in an SU(2) doublet. In the doublet scenario, contributions from the not-considered vector-like quark are neglected, leading to conservative limits. The SU(2) (TB) doublet scenario considering contributions from both the T and B quark is shown in (f), assuming mass-degenerate VLQs. The thickness of the theory curve represents the theoretical uncertainty from the PDFs, scales, and strong coupling constant \(\alpha _{\text {s}} \)

Fig. 8
figure 8

Expected (left) and observed (right) mass limits for \(T\bar{T}\) (upper row) and \(B\bar{B}\) (lower row) production. The mass limit is calculated using the NN giving the most stringent expected limit at each signal mass and branching ratio point. The white lines indicate mass exclusion contours. The black markers indicate the branching ratios for the SU(2) singlet and doublet scenarios for masses above 800 \(\text {GeV}\), where they are approximately independent of the VLQ mass. Since the analysis does not distinguish between particles and antiparticles, the mass exclusion for the B quark in the (TB) doublet is equivalent to the exclusion for the X quark in the (XT) doublet. The white areas indicate that the mass limit is below 800 \(\text {GeV}\)

Apart from limits for specific models and branching ratios, lower limits on the signal mass are set as a function of the T and B branching ratios. The resulting expected and observed mass limits are shown in Fig. 8. As expected, the highest sensitivity is found in the regions near \(\mathcal {B}(T\rightarrow Zt)={100}\%\) and \(\mathcal {B}(B\rightarrow Wt)={100}\%\). For the T quark, the sensitivity for the mixed ZtHt decay mode is larger than for the ZtWb decay. In the case of the B quark, the sensitivity decreases if the branching fraction for B decay into a Higgs or Z boson and a bottom quark increases. The differences between the observed and expected limits for a vector-like T quark around the SU(2) singlet branching ratio are not significant, as can be seen in Fig. 7(c). They result from the \(N\hspace{-1.00006pt}N_{\text {out}}\) distribution obtained from the NN trained for a branching ratio \(\mathcal {B}(Zt,Ht,Wb) = (0.2,0.4,0.4)\). In the last bin of the corresponding signal region in Fig. 6(b), the data slightly exceeds the predicted SM background.

10 Conclusion

A search for pair-produced vector-like T and B quarks, with electric charge \(+2/3\) and \(-1/3\), respectively, is performed in events with exactly one isolated lepton, at least four jets including one that is b-tagged, and high missing transverse momentum. The analysis is based on data collected by the ATLAS experiment in \(\sqrt{s}=13\,\text {TeV} \) proton–proton collisions at the LHC, corresponding to an integrated luminosity of 139fb\(^{-1}\). Several neural networks are trained for various branching ratios of the T and B quarks, assuming decays into a W, Z, or Higgs boson and a third-generation quark. The analysis considers all possible decays of the vector-like quarks, but it is most sensitive to the \(T\rightarrow Zt\) and \(B\rightarrow Wt\) decay modes. Since the analysis does not distinguish between particles and antiparticles, the limits for \(B\rightarrow Wt\) also apply to a vector-like X with electric charge \(+5/3\).

No significant deviations from the Standard Model expectation are observed, and 95% CL upper limits on the pair-production cross-sections for T and B quarks as a function of their mass are derived for various decay branching–ratio scenarios. The lower limits on the masses of the T and B quarks in the weak-isospin singlet model are 1.26 \(\text {TeV}\) and 1.33 \(\text {TeV}\), respectively, and 1.41 \(\text {TeV}\) for the T quark in the doublet representation. For the doublet, the contributions from the VLQ partner are not considered, leading to a conservative limit. Stronger lower limits of 1.47 \(\text {TeV}\) and 1.46 \(\text {TeV}\) are set on the masses when considering pure \(T \rightarrow Zt\) and \(B\rightarrow Wt\) decays, where the latter corresponds to the (TB) or (XT) doublet and also applies to \(X \rightarrow Wt\) decays. For the three discussed T-quark scenarios, the obtained mass limits are 300 to 400 \(\text {GeV}\) higher than in the earlier ATLAS analysis in the same final state using a subset of the Run 2 data. The strongest lower limits for T, B and X are at 1.59 \(\text {TeV}\) for (TB) and (XT) weak-isospin doublets where both VLQ partners are considered and assumed to be mass degenerate. Finally, lower limits on the T and B quark masses are derived for all possible branching ratios.