Search for supersymmetry in final states with missing transverse momentum and three or more b-jets in 139 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} of proton–proton collisions at s=13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s} = 13$$\end{document} TeV with the ATLAS detector

A search for supersymmetry involving the pair production of gluinos decaying via off-shell third-generation squarks into the lightest neutralino (χ~10)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\tilde{\chi }^0_1)$$\end{document} is reported. It exploits LHC proton–proton collision data at a centre-of-mass energy s=13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s} = 13$$\end{document} TeV with an integrated luminosity of 139 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} collected with the ATLAS detector from 2015 to 2018. The search uses events containing large missing transverse momentum, up to one electron or muon, and several energetic jets, at least three of which must be identified as containing b-hadrons. Both a simple kinematic event selection and an event selection based upon a deep neural-network are used. No significant excess above the predicted background is found. In simplified models involving the pair production of gluinos that decay via off-shell top (bottom) squarks, gluino masses less than 2.44 TeV (2.35 TeV) are excluded at 95% CL for a massless χ~10.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\chi }^0_1.$$\end{document} Limits are also set on the gluino mass in models with variable branching ratios for gluino decays to bb¯χ~10,\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b\bar{b}\tilde{\chi }^0_1,$$\end{document}tt¯χ~10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{t}\tilde{\chi }^0_1$$\end{document} and tb¯χ~1-/t¯bχ~1+.\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{b}\tilde{\chi }^-_1/\bar{t}b\tilde{\chi }^+_1.$$\end{document}


Introduction
Supersymmetry (SUSY) [1][2][3][4][5][6] is a generalisation of space-time symmetries that predicts new bosonic partners for the fermions of the Standard Model (SM) and new fermionic partners for its bosons. In SUSY models, if -parity is conserved [7], SUSY particles are produced in pairs and the lightest supersymmetric particle (LSP) is stable. The scalar partners of the left-and right-handed quarks, the squarks˜L and R , can mix to form two mass eigenstates˜1 and˜2, ordered by increasing mass. SUSY can reduce unnatural tuning in the Higgs sector by orders of magnitude [8][9][10][11] provided that the superpartners of the top quark (the top squarks,˜1 and˜2) have masses not too far above the weak scale [12]. Because of the SM weak-isospin symmetry, the mass of the lighter bottom squark˜1 is also expected to be close to the weak scale. The fermionic partners of the gluons, the gluinos (˜), are also motivated by naturalness [13] to have a mass around the TeV scale in order to limit their contributions to the radiative corrections to the top squark masses. For these reasons, and because the gluinos are expected to be pair-produced with a high cross-section at the Large Hadron Collider (LHC [14]), the search for gluino production with decays via top and bottom squarks is highly motivated at the LHC. This paper presents a search for pair-produced gluinos decaying via top or bottom squarks in events with multiple jets containing -hadrons ( -jets in the following), high missing transverse momentum of magnitude miss T , and potentially additional jets and/or an isolated electron or muon (referred to as 'leptons' hereafter). The results constitute an update of those obtained using 36.1 fb −1 of proton-proton ( ) collision data [15] from the ATLAS detector [16]. They exploit an expanded dataset of 139 fb −1 of collision data acquired at a centre-of-mass energy of 13 TeV. To make best use of the expanded dataset, the simple kinematic selections used in the earlier analysis have been re-optimised, and a new event selection based upon a deep neural network optimised to discern the gluino signatures from background is employed. The latter optimally combines selections requiring zero leptons or one lepton.
Interpretations are provided in the context of several simplified models [17][18][19] probing gluino decays into the LSP via off-shell top or bottom squarks. In these models, the LSP is assumed to be the lightest neutralino˜0 1 , a linear superposition of the superpartners of the neutral electroweak and Higgs bosons. One model also features the lighter charginos˜± 1 , which are linear superpositions of the superpartners of the charged Higgs and SM electroweak bosons. Several benchmark simplified model scenarios studied in the earlier instances of the analysis [15,20] are considered: two models, referred to as 'Gtt' and 'Gbb' respectively, which feature exclusively gluino decays to the LSP via off-shell top or bottom squarks (Figure 1), and a third model, referred to as 'Gtb', with variable branching ratios for˜→¯˜0 1 ,˜→¯˜0 1 and˜→¯˜− 1 /˜→¯˜+ 1 ( Figure 2 shows the additional decay processes that this model permits). In the Gtb models the mass difference between the˜± 1 and the˜0 1 is fixed to a small value (2 GeV), motivated by natural SUSY models in which the˜0 1 is an almost pure higgsino (see e.g. Ref. [12]).
Pair-produced gluinos with top-squark-mediated decays have also been sought in events containing either pairs of same-sign leptons or three leptons [21,22]. The same-sign/three-leptons search is comparable in sensitivity to the search presented in this paper only when the masses of the gluino and the LSP are of similar magnitude. Sensitivity to such scenarios is also obtained by searching for events with large jet multiplicity, jet ≥ 7-12 [23]. Similar searches for pair-produced gluinos by the CMS experiment with 36-137 fb −1 of 13 TeV collisions [24][25][26][27][28][29][30] produced results comparable to the previous ATLAS results. This paper is organised as follows. Section 2 describes the ATLAS detector, and Section 3 describes the data and simulated event samples used in the analysis. Section 4 introduces the event reconstruction methodology, and Section 5 introduces the analysis strategy. The event selection is discussed in Section 6, and systematic uncertainties in Section 7. The results of the analysis are presented and interpreted in Section 8. Section 9 gives the conclusions.  Figure 2: The additional decay processes permitted by the variable gluino branching ratio (Gtb) model, in addition to those shown in Figure 1. In diagram (a), both gluinos decay via˜→¯˜− 1 with˜− 1 →¯ ˜0 1 . In diagrams (b) and (c), only one gluino decays via the˜− 1 while the other decays via˜→¯˜0 1 or˜→¯˜0 1 , respectively. In diagram (d), one gluino decays via˜→¯˜0 1 and the other via˜→¯˜0 1 . In each case the charge conjugate processes are implied. The fermions originating from the˜± 1 decay have low momentum and are not detected because the mass difference between the˜± 1 and the˜0 1 is fixed to 2 GeV.

ATLAS detector
The ATLAS detector is a multipurpose particle physics detector with a forward-backward symmetric cylindrical geometry and nearly 4 coverage in solid angle. 1 The inner tracking detector (ID) consists of silicon pixel and microstrip detectors covering the pseudorapidity region | | < 2.5, surrounded by a transition radiation tracker, which enhances electron identification in the region | | < 2.0. The ID is surrounded by a thin superconducting solenoid providing an axial 2 T magnetic field and by a fine-granularity lead/liquid-argon (LAr) electromagnetic calorimeter covering | | < 3.2. A stainless-steel/scintillator tile calorimeter provides coverage for hadronic showers in the central pseudorapidity range (| | < 1.7). The endcaps (1.5 < | | < 3.2) of the hadronic calorimeter are made of LAr active layers with copper as the absorber material. The forward region (3.1 < | | < 4.9) is instrumented with a LAr calorimeter for both the EM and hadronic measurements. A muon spectrometer with an air-core toroidal magnet system surrounds the calorimeters. Three layers of high-precision tracking chambers provide coverage in the range | | < 2.7, while dedicated fast chambers allow triggering in the region | | < 2.4. The ATLAS trigger system [31] consists of a hardware-based level-1 trigger followed by a software-based high-level trigger (HLT). In terms of software, an extensive suite [32] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point in the centre of the detector. The positive -axis is defined by the direction from the interaction point to the centre of the LHC ring, with the positive -axis pointing upwards, while the beam direction defines the -axis. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle by = − ln tan( /2). Rapidity is defined as = 0.5 ln[( + )/( − )] where denotes the energy and is the component of the momentum along the beam direction. Table 1: List of MC generators used to simulate different signal and background processes. The third column ('Tune') describes the tuned set of underlying event and hadronisation parameters, the fourth column the PDF set used, and the fifth column the perturbative accuracy in the strong coupling constant used for the calculation of the cross-section used to normalise the sample. . The ℎ damp parameter in P , which controls the T of the first additional emission beyond the Born level and thus regulates the T of the recoil emission against the¯system, was set to 1.5 times the mass of the top quark (assumed to be top = 172.5 GeV) [57].
Other changes in generator settings for the modelling of sources of minor backgrounds were found not to affect the sensitivity of this analysis significantly. All simulated background processes are normalised using the best available theoretical calculation for their respective cross-sections.
The SUSY signal samples are normalised using the cross-section calculations at next-to-leading order (NLO) in the strong coupling constant, adding the resummation of soft gluon emission at next-to-leading-logarithm (NLL) accuracy [40][41][42][43][44]. The masses of the top and bottom squarks are assumed to be much greater than that of the gluino, such that gluinos decay according to 3-body phase-space. For the Gtb benchmark models, the mass difference between the˜± 1 and the˜0 1 is fixed to 2 GeV. For each signal model, the nominal cross-section and its uncertainty are taken from an envelope of cross-section predictions using different parton shower models and factorisation and renormalisation scales, as described in Ref. [45].
The E G 1.6.0 program [58] was used to describe the properties of the -and -hadron decays in the signal samples and in the background samples, except those produced with S . For all samples the response of the detector to particles was modelled with the full ATLAS detector simulation [59] based on G 4 [60]. All simulated events were overlaid with multiple collisions simulated with P 8.186 using the A3 tune [38] and the NNPDF2.3 PDF set. The MC samples were generated with variable levels of pile-up in the same and neighbouring bunch crossings, and were reweighted to match the distribution of the mean number of interactions observed in data in 2015-2018. electrons from heavy-flavour hadron decays. If the separation between an electron candidate and any jet satisfies 0.2 < Δ < 0.4, the candidate is considered to be a jet, and the electron candidate is removed.
Muons are reconstructed by matching tracks in the inner detector to tracks in the muon spectrometer and must have | | < 2.5 and T > 20 GeV [74]. The impact parameter along the beam direction is required to satisfy | 0 sin( )| < 0.5 mm. Events containing muons identified as originating from cosmic rays, with | 0 | > 0.2 mm and | 0 | > 1 mm, or as being poorly reconstructed, with ( / )/|( / )| > 0.2, are removed. Here, ( / )/|( / )| is a measure of the momentum uncertainty for a particle with charge . Muons are discarded if they are within Δ = 0.4 of jets that survive the electron-jet overlap removal, except when the number of tracks associated with the jet is less than three, where the muon is kept and the jet discarded. This last exception retains muons which have undergone bremsstrahlung.
After resolving the overlap with leptons, the candidate = 0.4 jets are reclustered [75] into larger-radius jets using the anti-algorithm with a radius parameter of 0.8. The calibration from the input = 0.4 jets propagates directly to the reclustered jets. These reclustered jets are then trimmed [75][76][77][78] by removing subjets with T below 10% of the T of the original reclustered jet. The resulting larger-radius jets are required to have T > 100 GeV and | | < 2.0. No additional overlap removal procedure is applied to such jets after reclustering. When it is not explicitly stated otherwise, in this paper the term 'jets' refers to the smaller-radius = 0.4 jets, while the reclustered larger-radius = 0.8 jets are called 'large-jets'.
The requirements on electrons and muons are tightened for the selection of events in signal regions and background control regions requiring at least one electron or muon (Section 5) . The electrons and muons passing the tight selection are called 'signal' electrons or muons in the following, as opposed to 'baseline' electrons and muons, which need only pass the requirements described above. Signal electrons (muons) must satisfy the 'Fix (Loose)' [73,79] ('FixedCutTightTrackOnly' [74,80]) T -dependent track-based and calorimeter-based isolation criteria. The calorimeter-based isolation is determined by taking the ratio of the sum of energy deposits in a cone of Δ = 0.2 around the electron or muon candidate to the sum of energy deposits associated with the electron or muon. The track-based isolation is estimated in a similar way but using a variable cone size with a maximum value of Δ = 0.2 for electrons and Δ = 0.3 for muons. Signal electrons are also required to pass a 'Tight' likelihood-based selection [73,79]. The impact parameter of the electron in the transverse plane is required to be less than five times the transverse impact parameter uncertainty ( 0 ). Further selection criteria are also imposed on signal muons: muon candidates are required to pass a 'Medium' quality selection and meet a | 0 | < 3 0 requirement [74,80]. The missing transverse momentum ì miss T , with magnitude miss T , is defined as the negative vector sum of the T of all selected and calibrated electrons, muons and jets in the event, with an extra term added to account for energy in the event that is not associated with any of these objects [81]. This last 'soft term' contribution is calculated from ID tracks with T > 500 MeV which are matched to the primary vertex, thus ensuring that this contribution is robust against pile-up contamination, and which are not associated with selected objects [81,82]. This analysis does not consider the contribution of reconstructed photons or hadronically decayingleptons when considering possible overlaps with other objects. They are also not included explicitly in the calculation of ì miss T , but the associated energy deposits contribute to this calculation via the overlapping reconstructed jets.

Analysis strategy
Evidence for the presence of SUSY signal events in the data sample is sought by selecting events populating multiple signal regions (SRs), which are expected to be enriched in such events on the basis of simulation studies. The expected yields of major SM background processes in the SRs are determined with a profile likelihood fit (Section 8) using MC samples with normalisations constrained to data in dedicated SR-dependent control regions (CRs). The yields of subdominant backgrounds are estimated directly from MC samples, except in the case of multĳet backgrounds, where a data-driven procedure is used. The accuracy of the background estimation procedure is verified by comparing data with background predictions in validation regions (VRs) with low signal contamination, which are located between the CRs and SRs in the multidimensional space of event selection variables.
Signal regions are defined using two alternative methodologies. The first methodology, the 'cut-and-count' (CC) analysis, defines SRs by applying selections independently to a series of observables sensitive to differences in kinematics between signal and background ('discriminating variables'). These are expected to provide rejection of SM background events while retaining events from a broad range of Gtt, Gbb and Gtb signal models to provide maximum discovery power. This methodology is well suited to subsequent reinterpretation of the results in the context of other theories not considered in this paper. The CC SR event selection criteria fall into two broad categories targeting final states which contain no leptons or at least one lepton (referred to as '0-lepton' and '1-lepton' SRs henceforth). The second event selection methodology, the neural network (NN) analysis, classifies events using a supervised machine-learning technique in which correlations between discriminating variables are further exploited to maximise exclusion power for specific Gtt and Gbb signal models. The event selection for each NN SR optimally selects a mixture of events containing different lepton multiplicities, depending on the values of the other discriminating variables. A set of CRs and VRs is associated with each CC or NN SR. The event selection criteria are discussed in Section 6.

Discriminating variables
The following discriminating variables are used in the CC and NN SR, CR and VR event selections, in addition to more widely used variables such as miss T and the momenta and multiplicities of jets, -jets and leptons.
The 'inclusive effective mass' ( eff ), is defined by the following scalar sum: where the first and second sums run over the selected jets ( jet ) and signal leptons ( lepton ), respectively. This variable is correlated with the invariant mass scale of the final-state particles in the event, and typically takes a higher value for pair-produced gluino events than for SM background events with lower mass scales.
In regions with at least one selected lepton, the transverse mass T is calculated from the T of the leading selected lepton (ℓ) and miss T , and is defined as: This variable is used to reject¯and +jets background events in which one boson decays leptonically. The T distribution for these backgrounds, assuming on-mass-shell bosons, has an upper bound corresponding to the boson mass, leading to a Jacobian edge in the observed distribution, and typically has higher values for Gtt signal model events. In addition, the minimum transverse mass formed by miss T and any of the three highest-T -tagged jets in the event, -jets T,min , is used in regions with any lepton multiplicity: Another powerful variable for selecting signal events is the total jet mass variable, defined as: where , is the mass of large-jet in the event. The decay products of a high-T (boosted) hadronically decaying top quark can be reconstructed in a single large-jet, resulting in a jet with a high mass. This variable typically takes larger values for Gtt signal model events than for background events, because the former can contain as many as four hadronically decaying top quarks while the latter typically contain a maximum of two.
The requirement of a selected lepton, with the additional requirements on jets, miss T and event variables described above, makes the multĳet background negligible for the ≥ 1-lepton signal regions. For the 0-lepton signal regions, the distribution of the minimum azimuthal angle Δ

Kinematic reweighting of MC samples
In signal-depleted regions requiring the presence of exactly one lepton with loose event selections, discrepancies are observed in the shapes of T -related observables, such as eff , Σ and miss T , between data and the MC background expectations. Similar discrepancies are also observed in other similar analyses, e.g. Ref. [57,83]. No similar discrepancies are visible in the 0-lepton regions, or for events with ≥ 2 leptons. To reduce these discrepancies, a kinematic reweighting procedure is applied to simulated background events containing at least one signal lepton, prior to their use in the main analysis fits. Dedicated reweighting regions (RRs) with loose event selection criteria are defined for the¯process, the single-top,¯+ / / and¯¯processes, the +jets process, and the +jets and electroweak diboson processes, as set out in Table 2. Requirements on the number of reconstructed -jets ( -jets ) are applied to ensure that the RRs are orthogonal to all analysis signal regions, which include a -jets ≥ 3 requirement. Simulated events for these processes are first normalised to the total number of observed events in the respective RR. The  60,120] ratio of data events to normalised MC events is then computed as a function of eff , for exclusive bins of jet = 4, 5, 6 and ≥ 7 (for the¯and +jets RRs) or for jet ≥ 4 (for the single-top,¯+ / / and¯R R, and the +jets and electroweak diboson RR). This ratio is found to be well-fitted with a decreasing exponential function of eff for each jet bin. The resulting fitted functions are then used to reweight MC simulated events with exactly one lepton. The reweighting factors typically take values between ∼1.17 and ∼0.19 for eff ranging from threshold to 4200 GeV for the¯and +jets processes, and between ∼1.7 and ∼0.43 for the same range of values of eff for the single-top,¯+ / / and¯¯processes, and the +jets and electroweak diboson processes. The reweighting procedure reduces the discrepancies between data and MC background expectations in the validation regions used in the analysis (Section 8.1).

Background estimation
The dominant background process in most signal regions is the production of a¯pair in association with heavy-and light-flavour jets. In both the CC and NN SRs the dominant contribution to this background arises from events in which exactly one of the top quarks decays via a boson to a lepton and a neutrino. In selected background events containing no leptons, the lepton is outside the acceptance of the analysis or is a hadronically decaying -lepton. A normalisation factor for the¯background is extracted for each CC or NN SR from a CR which is defined for a similar but orthogonal region of kinematic phase space. The normalisation factors are derived by dividing the data event yields in the CRs by the equivalent MC predictions, and then applied as factors multiplying the event yields in the SRs predicted by MC simulation. This procedure is equivalent to propagating the CR event yields to the SRs by multiplying with transfer factors derived from MC simulation. Systematic uncertainties associated with the MC simulations used in this procedure are taken into account in the final fit (Section 7).
In the CC analysis the CRs are defined with a lepton multiplicity requirement lepton = 1, for both 0-lepton and 1-lepton SRs, and a requirement on T inverted with respect to that used in the 1-lepton SRs [15]. In the NN analysis they are defined with orthogonal selections on the neural network output. In each CR,¯production is the dominant process and the contribution from rare background processes, such as + / / , or¯¯, is low. At least 20 data events are also required in each¯CR to minimise the data statistical uncertainties of the normalisation factors. The CRs are required to possess a signal-to-background ratio which is expected to be less than 5% for Gtt, Gbb and Gtb signal models near the expected 95% CL exclusion contours of the analysis. The CRs are also found to possess less than 10% potential contamination from signal models beyond the exclusion contours of previous analyses.
The normalised¯background estimates extracted from the CRs are cross-checked with VRs that share similar background compositions with the SRs but use an orthogonal event selection. In the CC analysis the VRs incorporate an inverted requirement on one of the SR observables: Σ , eff or miss T . In the NN analysis the VRs apply an orthogonal requirement to the neural network output together with inverted requirements on Σ and eff . The signal-to-background ratio in the VRs is expected to be lower than 30% for benchmark Gtt, Gbb and Gtb signal points near the expected 95% CL exclusion contours of the analysis. The purity of the CRs and VRs in¯events is expected to be higher than 50% and 33%, respectively.
The leading non-¯backgrounds in this analysis consist of single-top, +jets, +jets,¯+ / / ,¯¯and electroweak diboson ( ) events, which are estimated using simulated samples (Table 1) normalised to theoretical cross-sections. There is one exception to this procedure -the +jets process makes a significant contribution to the total background in the NN Gbb SRs and is therefore normalised, for these SRs only, with dedicated 2-lepton CRs (Section 6.3). Due to their relatively low selection efficiency, these CRs contain fewer data events than the¯CRs described above.
The remaining multĳet background in the 0-lepton CC regions, and in the NN regions, is estimated following the strategy of Ref. [84]. This method estimates the multĳet background from a CR with the same requirements as the SR, but with a selection requiring Δ 4j min < 0.1 to enhance the yield of events in which the missing transverse momentum is correlated with the T of a leading jet. The yield is extrapolated from the multĳet CRs to their corresponding SRs with exponential functions. The decay parameters of these functions are fixed by fits to the Δ 4j min distribution of events passing the miss T trigger and a miss T > 200 GeV requirement. The multĳet background prediction is validated by comparing the data with the total prediction in the range 0.1 < Δ 4j min < 0.2. The contribution of multĳet background events to the SRs is found to be 5%.

Preselection
Events used in the analysis are required to meet a set of loose preselection criteria. All events (except those used in the +jets CR described below) are required to possess at least four jets, of which at least three must be -tagged, and miss T > 200 GeV, which ensures that the efficiency of the miss T triggers used in this analysis is close to 100% for selected events. Events selected in the CC 0-lepton regions are additionally required to contain no baseline leptons and possess Δ 4j min > 0.4, while those selected in the CC 1-lepton regions are required to contain at least one signal lepton. No additional requirements on lepton multiplicity are applied at this stage in the NN analysis.

Figures 3 and 4 show the multiplicities of selected jets and -tagged jets, and the distributions of miss
T , eff , -jets T,min , and Σ for events meeting the 0-lepton and 1-lepton preselection criteria, respectively. The reweighting described in Section 5.2 is applied in the 1-lepton preselection to events with at least one lepton. The uncertainty bands depict the statistical and experimental systematic uncertainties, as described in Section 7, but not the theoretical uncertainties in the background modelling. Distributions for example SUSY signal models are overlaid for comparison.  T,min for events meeting the 0-lepton preselection criteria. The statistical and experimental systematic uncertainties (as defined in Section 7) are included in the uncertainty bands. The final bin in each case includes overflow events. The lower panel of each figure shows the ratio of data to the background prediction. All backgrounds (including¯) are normalised using the theoretical calculations described in Section 3. The background category '¯+ ' includes¯/ ,¯and¯¯events. Distributions for example SUSY signal models, applying a normalisation scaling of 50, are overlaid for comparison.  T,min for events meeting the 1-lepton preselection criteria, after applying the kinematic reweighting to the eff distribution as described in the text. The statistical and experimental systematic uncertainties (as defined in Section 7) are included in the uncertainty bands. The final bin in each case includes overflow events. The lower panel of each figure shows the ratio of data to the background prediction before (red empty circles) and after (black filled circles with error bars) the kinematic reweighting. All backgrounds (including¯) are normalised using the theoretical calculations described in Section 3. The background category '¯+ ' includes¯/ ,¯and¯¯events. Distributions for example SUSY signal models, applying a normalisation scaling of 50, are overlaid for comparison. Table 3: Event selection requirements for the CC Gtt 0-lepton SRs together with the associated¯CRs and VRs, classified according to the˜-˜0 1 mass splitting (Δ ) targeted. The thresholds in bold for each control and validation region ensure orthogonality with the corresponding signal region. lepton = 0 requires zero baseline leptons, while lepton = 1 requires one signal lepton.
Targeted kinematics Type

Cut-and-count analysis
The cut-and-count analysis employs a set of overlapping and not statistically independent single-bin SRs. The event selection criteria for the CC SRs, CRs and VRs are listed in Tables 3-6. The CC SR event selection criteria are optimised to maximise the expected significance of Gtt, Gbb and Gtb models close to the 95% CL exclusion contours in the (˜)-(˜0 1 ) mass plane set by the previous ATLAS search in this channel using a smaller dataset [15]. Separate SRs are defined for each of these three classes of models. The SRs are further categorised according to whether they target models with large˜-˜0 1 mass splitting (Δ = (˜) − (˜0 1 ) 1.5 TeV, Regions B), moderate mass splitting (0.3 TeV ∼ < Δ ∼ < 1.5 TeV, Regions M1 and M2) or small mass splitting (Δ ∼ < 0.3 TeV, Regions C). These regions differ mainly in the selections applied to the eff , miss T , T and Σ variables. For each SR, a CR is defined to constrain the¯background (Section 5.3). The CRs for the 0-lepton SRs require the presence of exactly one signal lepton. The 1-lepton SRs and the associated CRs require at least one signal lepton, with the latter also implementing an inverted requirement on T . The CRs also place looser requirements on jet multiplicity and the other discriminating variables.
The VRs for the 0-lepton SRs use inverted requirements on Σ , eff or miss T to remove overlap with the respective SRs. For the 1-lepton SRs, two VRs are defined to validate the background prediction in high--jets T,min and high-T regions by increasing the threshold for jet and/or inverting the Σ or T requirements to remove overlap with both the corresponding SR and the CR.   Table 6: Event selection requirements for the CC Gtb 0-lepton SRs together with the associated¯CRs and VRs, classified according to the˜-˜0 1 mass splitting (Δ ) targeted. The thresholds in bold for each control and validation region ensure orthogonality with the corresponding signal region. lepton = 0 requires zero baseline leptons, while lepton = 1 requires one signal lepton.

Neural network analysis
The NN analysis uses low-level kinematic variables [85] as inputs to a dense neural network with three hidden layers trained to discriminate Gtt or Gbb model events from SM background events. Each SR is defined by event selection criteria applied to the outputs of the neural network, optimised to maximise the statistical significance of the SUSY model considered. The neural network input variables are: • The four-momenta ( T , , , ) of the 10 leading jets, in decreasing order of T , and a set of binary variables indicating which jets are -tagged; • The four-momenta of the four leading large-jets, in decreasing order of T ; • The four-momenta of the four leading leptons ( or ), in decreasing order of T ; • The two components of the vector ì miss T . These four-momenta describe the expected final state of a typical Gtt signal event. If a given event contains fewer jets or leptons than specified above, the remaining inputs are set to zero. This procedure enables the analysis to optimally select a mixture of events with different jet and lepton multiplicities. The neural network generates three output scores measuring the probability of a given event being a signal event ( (Gtt) or (Gbb), depending on the signal model targeted), a¯background event ( (¯)), or a +jets background event ( ( )). The output scores for all processes are normalised such that they sum to unity.
The neural network was trained using the Keras [86] library with the TensorFlow [87] backend. Training was performed once for each representative model. The neural network hyperparameters were optimised with a random search [88] followed by a line scan of the learning rate. The number of training epochs was not fixed but instead defined through an early-stopping strategy using the cross-entropy loss as a figure of merit. The training samples consisted of a total of 9.2 × 10 5 signal events and 5.6 × 10 5 background events. Following event preselection, the training samples for all signal and background processes corresponded to the same integrated luminosity as that of the data sample used in the analysis.
To ensure optimum discrimination power across the Gtt and Gbb model planes, a parameterised training method [89] was employed in which the neural network was also given the ( (˜), (˜0 1 )) pair of the signal point under consideration as well as a binary variable indicating if discrimination of background versus Gtt or Gbb is required. Background events were assigned random parameter values. After training, the neural network outputs were evaluated by unconditionally setting the parameters to that of the model point under consideration and processing all MC and data events. To reduce the large number of potential SRs that could emerge from such a strategy, i.e. one SR per model point, a set-cover algorithm [90] was used to iteratively select the SR which excludes the most as-yet non-excluded model points until all such points are exhausted. The result is a minimum set of SRs that are expected to exclude the same number of Gtt or Gbb models as a more extensive set with one SR for each model point. The resulting SRs are optimal for excluding models that are representative of regions of the Gtt or Gbb model parameter space. The resulting representative models are specified by the following ( (˜), (˜0 1 )) mass values (in GeV): There is one SR for each of these eight models. The acceptance times efficiency of the SR selections is typically 1%-10% for the representative models that they target. For example, the acceptance times efficiency of the SR-Gtt-2300-1200 selection for the Gtt (2300, 1200) representative model is 6.4%.
The eight NN SRs are defined in Tables 7 and 8, together with the associated CRs and VRs. For each SR, a CR is defined to constrain the¯background (Section 5.3). The CRs are defined by placing requirements on the neural network outputs orthogonal to those used in the SRs, and placing additional requirements on eff and Σ to select events with kinematics similar to those in the corresponding SR. For the NN Gbb SRs, the +jets background process (principally (→ )+jets) also contributes significantly to the background yield. Dedicated control regions for this process (labelled 'CRZ' in Table 9 and thereafter) are therefore defined for the Gbb SRs only. The CRs are used to normalise the +jets background estimates with the same procedure as is used for the¯background estimates (Section 5.3). These CRs (Table 9) Tables 7 and 8 and their relationship to the SRs and¯CRs is illustrated in Figure 5. Two VRs are defined for each NN Gbb SR to validate both the¯and +jets background estimates.  Table 7: Definitions of the NN Gtt SRs together with the associated¯CRs and VRs. In the first column, the two numbers separated by a hyphen specify the values of (˜) and (˜0 1 ) in GeV for the targeted representative Gtt model. The third and fourth columns specify the ranges of probability for the targeted Gtt signal model andb ackground, respectively, generated by the selection applied to the NN output. The fifth and sixth columns specify the values of additional requirements applied to eff and Σ in the CRs and VRs to select events with kinematics similar to those in the corresponding SRs.  Table 8: Definitions of the NN Gbb SRs together with the associated¯CRs and VRs. In the first column, the two numbers separated by a hyphen specify the values of (˜) and (˜0 1 ) in GeV for the targeted representative Gbb model. The third, fourth and fifth columns specify the ranges of probability for the targeted Gbb signal model and thē or +jets background, respectively, generated by the selection applied to the NN output. The sixth, seventh and eighth columns specify the values of additional requirements applied to Δ 4j min , eff and Σ in the CRs and VRs to select events with kinematics similar to those in the corresponding SRs.

Rep. model
Region (Gbb) log 10 ( (¯)) log 10 ( ( )) Δ 4j min Gbb-2800-1400  Table 9: Definitions of the NN Gbb +jets CRs (CRZ). The first seven rows specify the event selection criteria. The final three rows specify the ranges of probability for the targeted Gbb signal model and the¯or +jets background, respectively, generated by the selection applied to the NN output. lepton = 2 requires two signal leptons.

Systematic uncertainties
The magnitudes of the post-fit uncertainties in the background estimates for the various signal regions, obtained following the profile likelihood fit described in Section 8.1, are summarised in Figure 6. The uncertainties considered include experimental and theoretical systematic uncertainties, and statistical uncertainties in data CR yields and MC background samples.  . The impact of the JES uncertainties on the expected background yields is between 0.6% and 38% (with the largest uncertainty observed in SR-Gbb-2000-1800), while JER uncertainties affect the background yields by 1%-49% in the various regions (with the largest uncertainty observed in SR-Gtt-1900-1400). This JER variation is the principal uncertainty contributing to the large total uncertainty observed in SR-Gtt-1900-1400 in Figure 6. The impact of JMS uncertainties on the expected background yields is 0.3%-41%, depending on the region (with the largest uncertainty observed in SR-Gtt-2100-1). Uncertainties in the measured -tagging efficiencies and mis-tagging rates are 0.3%-17% (with the largest uncertainty observed in SR-Gbb-B).
The experimental uncertainties due to lepton reconstruction, identification and isolation efficiency differences between data and simulation [80, 95] are also taken into account, and so are the uncertainties in lepton energy measurements [96]. These uncertainties contribute at most 14% to the total uncertainty (in SR-Gtt-0L-B). All lepton and jet measurement uncertainties are propagated to the calculation of miss T , and additional uncertainties in the scale and resolution of the soft term [82] are included. The overall impact of the miss T soft-term uncertainties is at most 13% (in SR-Gtt-0L-B).
Considering theoretical uncertainties, hadronisation and parton showering model uncertainties of theb ackground are evaluated by comparing two samples generated with P B and showered by either H 7.04 or P 8.230 [15]. In addition, systematic uncertainties in the modelling of initial-and final-state radiation are explored with P B samples showered with two alternative settings [97] of P 8.230 [98]. The uncertainty due to the choice of matrix-element event generator is estimated by comparing the expected yields obtained using¯samples generated with either M G 5_ MC@NLO or P B . The total theoretical uncertainty in the¯background estimation is taken as the sum in quadrature of these individual components, corresponding to an impact of 6%-42% (with the largest uncertainty observed in SR-Gbb-2000-1800). Moreover, an additional uncertainty of 30% is assigned to the fraction of¯events produced in association with additional heavy-flavour jets [15,99] (i.e.¯+ ≥ 1 and¯+ ≥ 1 ), which has an impact of at most 10%.
Modelling uncertainties affecting single-top processes arise especially from the interference between the¯and processes. This uncertainty is estimated using inclusive events, generated using M G 5_aMC@NLO, which are compared with the sum of¯and events. Furthermore, as in the¯modelling uncertainties, variations of P 8.230 settings increasing or decreasing the amount of radiation are also used. An additional uncertainty is included in the cross-section of single-top processes [48], and is at most 5%. Overall, the modelling uncertainties affecting single-top processes lead to changes of 4%-19% in total yields in the various regions (with the largest uncertainty observed in SR-Gbb-2800-1400).
Uncertainties related to factorisation and renormalisation scales and affecting the matching procedure between the matrix element and parton shower in the / +jets backgrounds are also taken into account [15]. The resulting uncertainties in the total yield range up to 53% in the various regions (with the largest uncertainty in SR-Gbb-2100-1600). A constant 30% uncertainty in the heavy-flavour content of / +jets is assumed, motivated by Ref. [100], which contributes at most 8% (with the largest uncertainty observed in SR-Gtt-0L-B). Furthermore, a 50% uncertainty is assigned to¯+ / / ,¯¯and diboson backgrounds, and is assumed to be uncorrelated across all regions. It is found to have no significant impact on the sensitivity of this analysis, and contributes at most 15% (with the largest uncertainty observed in SR-Gtt-2100-1) to the total background uncertainty. The effect of the uncertainties related to the parton distribution functions affect the background yields by less than 2%, and therefore are neglected here. Uncertainties due to the limited number of events in the MC background samples are included and can reach 30% in regions targeting moderate/large mass-splittings.
Systematic uncertainties are also assigned to the kinematic reweighting procedure, by propagating the statistical uncertainties in the parameters of the exponential fits (Section 5.2). In addition, the changes in estimated background yield obtained by omitting the reweighting procedure are added in quadrature to conservatively assess the impact of the procedure on the final results, which was observed to contribute to total yield variations of 0%-49% across all regions (with the largest observed in SR-Gtt-1L-B). These uncertainties are applied to all simulated background events containing at least one signal lepton. The uncertainties are treated as nuisance parameters in the fits, in the same way as the other uncertainties.
The uncertainties in the cross-sections of signal processes are determined from different cross-section predictions extracted using alternative event generators and parton shower variations, as described in Section 3. These are at most 30% in all SRs.

Results
The SM background yields are estimated separately for each SR with a profile likelihood fit [101] implemented in the HistFitter framework [102], referred to as a background-only fit. The fit (one for each SR) uses the observed event yield in the associated CR(s) to adjust the¯normalisation (and +jets normalisation for the NN Gbb regions), assuming that no signal contributes to this yield, and applies that normalisation factor to the number of background events predicted by simulation for the equivalent process in the SR. The normalisation factors are allowed to vary freely, without application of constraints derived from uncertainties in theoretical cross-sections. The mean yields predicted by simulation in the SR and CR(s) are used for all background processes. The numbers of events observed in each region are described by Poisson probability density functions. The systematic uncertainties in the expected values are included in the fit as nuisance parameters. They are constrained by Gaussian distributions with widths corresponding to the sizes of the uncertainties and are treated as correlated, when appropriate, between the various regions. The product of the various probability density functions forms the likelihood, which the fit maximises by adjusting the¯(and, in the NN Gbb regions, the +jets) normalisation and the nuisance parameters. The values of the normalisation factors range from 0.8 to 1.9 depending on the CRs (Figure 7). The normalisation factors are broadly consistent with unity within uncertainties, with the largest pull across the 26 CRs being 2.3 (for the¯normalisation factor in CC SR Gtt-0L-M2).
The results of the background-only fits are extrapolated to the VRs, following the definitions of Section 6. A comparison of the observed and expected yields in the VRs after the fit is shown in Figure 8 for the CC and NN analyses. The largest yield differences in the CC and NN VRs are observed for VR2-Gtt-1L-C (1.3 across 18 VRs) and VR2-Gbb-2000-1800 (1.8 across 12 VRs) respectively. The MC reweighting gives better agreement between the observed and expected yields in the VRs; in the CC analysis the change in the predicted yields due to the MC reweighting, which is applied to events containing exactly one signal lepton, is on average 5% and 10% for the 0-lepton and 1-lepton channels, respectively. This reweighting affects the CC 0-lepton channels, which require exactly zero signal leptons, via the¯normalisation factors obtained from CRs requiring exactly one signal lepton.
The observed and expected event yields in the SRs for the CC and NN analyses are shown in Figure 9 and Tables 10-12. The significances of the deviations of the observed data from the background expectations, evaluated using a model-independent fit (described in Section 8.2) with pseudo-experiments, are presented in the lower panels of Figure 9, and in Table 13. No statistically significant deviation from the background expectation is found for any of the presented SRs or analysis methodologies. The largest deviation across the 22 SRs is observed in SR-Gtb-B, with a significance of 2.3 . In some cases the SR event selections are not orthogonal and hence the significances can be correlated between regions.         Bins without data-points correspond to zero observed events. The uncertainty bands include both the experimental and theoretical systematic uncertainties. The background category '¯+ ' includes¯/ ,¯and¯¯events. The bottom panels show the Gaussian significance ( obs − pred )/ tot of the deviation of the observed yield ( obs ) from the background expectation ( pred ) in each SR, obtained using the combined statistical+systematic background uncertainty ( tot ). Table 10: Post-fit results of the background-only fit extrapolated to the Gtt 0-lepton and Gtt 1-lepton SRs in the CC analysis, for both the total expected background yields and the main contributing background processes. The quoted uncertainties include both the experimental and theoretical systematic uncertainties. The data in the SRs are not included in the fit. The background category '¯+ ' includes¯/ ,¯and¯¯events. The row 'Pre-fit background' provides the total background prediction when the¯normalisation is obtained from a theoretical calculation [46], taking into account the kinematic weights described in Section 5.   Table 11: Post-fit results of the background-only fit extrapolated to the Gbb and Gtb SRs in the CC analysis, for both the total expected background yields and the main contributing background processes. The quoted uncertainties include both the experimental and theoretical systematic uncertainties. The data in the SRs are not included in the fit. The background category '¯+ ' includes¯/ ,¯and¯¯events. The row 'Pre-fit background' provides the total background prediction when the¯normalisation is obtained from a theoretical calculation [46], taking into account the kinematic weights described in Section 5.  Pre-fit background 3.0 1.3 5.8 Table 12: Post-fit results of the background-only fit extrapolated to the SRs in the NN analysis, for both the total expected background yields and the main contributing background processes. The quoted uncertainties include both the experimental and theoretical systematic uncertainties. The data in the SRs are not included in the fit. The background category '¯+ ' includes¯/ ,¯and¯¯events. The row 'Pre-fit background' provides the total background prediction when the¯and +jets normalisations are obtained from theoretical calculation [46,53], taking into account the kinematic weights described in Section 5.2.

Interpretation
In the absence of any significant excess over the expected background from SM processes, the data are used to derive one-sided upper limits at 95% confidence level (CL). Two levels of interpretation are provided in this paper: model-independent exclusion limits and model-dependent exclusion limits set on the Gtt, Gbb and Gtb models. Exclusion limits are evaluated using pseudo-experiments.
To set upper limits on the number of 'beyond the Standard Model' (BSM) signal events in each SR, a model-independent fit is used [102]. This fit proceeds in the same way as the background-only fit, where yields in the CRs are used to constrain the predictions of backgrounds in each SR, while the SR yield is also used in the likelihood function with an additional parameter-of-interest describing potential signal contributions. The observed and expected upper limits at 95% CL on the number of events from BSM phenomena for each signal region ( 95 obs and 95 exp ) are derived using the CL s prescription [103], neglecting any possible signal contamination in the CRs. These limits, when normalised by the integrated luminosity of the data sample, may be interpreted as upper limits on the visible cross-section of BSM physics ( 95 vis ), where the visible cross-section is defined as the product of production cross-section, acceptance and efficiency. All SRs from both the CC and NN analyses are considered, to aid in the reinterpretation of the results of this analysis. The results of the model-independent fit are presented in Table 13.
Model-dependent fits [102] in all the SRs are used to set limits in the mass or branching ratio planes of the Gtt, Gbb and Gtb models. Such a fit proceeds in the same way as the model-independent fit, except that both the signal yield in the SR and the signal contamination in the CR(s) are taken into account. Correlations between signal and background systematic uncertainties are taken into account where appropriate. Systematic uncertainties in the assumed signal yields due to detector effects and the theoretical uncertainties in the signal acceptance are included in the fit. The NN analysis SRs provide the higher expected exclusion sensitivity for the Gtt and Gbb models and hence are used to obtain the exclusion limits for these models. For the Gtb models, for which the NN analysis SRs were not optimised, the CC analysis SRs are used. In all cases, at each model point the result obtained from the SR with the best expected CL s value is used. For the CC analysis applied to the Gtb models, all CC SRs are considered in this optimisation. The Gtt or Gbb SRs are found to be optimal for Gtb models in which the gluino branching ratio is dominated respectively by˜→¯˜0 1 or˜→¯˜0 1 , while the Gtb SRs are found to be optimal otherwise. All the fits for the various model points and parameter spaces considered give fitted SUSY signal cross-sections consistent with zero within uncertainties.
The 95% CL observed and expected exclusion limits for the Gtt and Gbb models, obtained from the NN analysis, are shown in the˜-˜0 1 mass plane in Figure 10. The ±1 SUSY theory lines around the observed limits are obtained by changing the SUSY production cross-section by one standard deviation (±1 ), as described in Section 3. The yellow band around the expected limit shows the ±1 uncertainty, including all statistical and systematic uncertainties except the theoretical uncertainties in the SUSY cross-section. The expected limits on the gluino mass, assuming a massless neutralino LSP, obtained from the CC analysis are ∼150 GeV and ∼50 GeV lower for the Gtt and Gbb models, respectively. Compared to the previous results [15], the expected limits on the gluino mass from the current search (assuming a massless˜0 1 ) have improved by 280 GeV and 330 GeV for the Gtt and Gbb models, respectively. Gluinos with masses below 2.44 TeV (2.35 TeV) are excluded at 95% CL for massless neutralinos in the Gtt (Gbb) models. The most stringent exclusion limits on the neutralino mass are approximately 1.35 TeV and 1.65 TeV for the Gtt and Gbb models, obtained for a gluino mass of approximately 2.20 TeV and 2.10 TeV, respectively. The 95% CL observed and expected exclusion limits for the Gtb model, with (˜0 1 ) set to 1 GeV, are shown in Figure 11. The results, which are obtained from the CC analysis, are presented as limits on the mass of the gluino as a function of the branching ratios for˜→¯˜0 1 and˜→¯˜0 1 . The sum of these branching ratios with that for the decays˜→¯˜− 1 and˜→¯˜+ 1 (which are set to be equal in the simulation) is set to unity. The exclusion limits are strongest when B (˜→¯˜0 1 ) or B (˜→¯˜0 1 ) saturate the total branching ratio (top-left and bottom-right corners of the figures), and weakest when B (˜→¯˜− 1 /˜→¯˜+ 1 ) saturates. Similar results are obtained for (˜0 1 ) = 600 GeV ( Figure 12) and (˜0 1 ) = 1000 GeV ( Figure 13).

Conclusion
This paper has discussed a search for supersymmetry involving the pair production of gluinos decaying via off-shell third-generation squarks into a lightest neutralino LSP. The analysis exploits proton-proton collision data at a centre-of-mass energy √ = 13 TeV with an integrated luminosity of 139 fb −1 collected with the ATLAS detector at the LHC from 2015 to 2018. The search uses events containing large missing transverse momentum, zero or one electrons or muons, and several energetic jets, at least three of which must be identified as containing -hadrons. Two analysis methodologies are used: one applying selections independently to a series of kinematic observables in multiple signal regions sensitive to a broad range of SUSY models, and one targeting specific signal models using a deep neural-network to further exploit correlations between observables constructed from the four-vectors of the reconstructed final-state particles. The latter methodology, which also combines events with differing numbers of final-state electrons or muons, provides enhanced discovery and exclusion power for the specific signals targeted. No significant excess is found with respect to the predicted background. Model-independent limits are set on the visible cross-section for new physics processes. Exclusion limits are set on the gluino and neutralino LSP masses in two simplified models where the gluino decays exclusively as˜→¯˜0 1 or˜→¯˜0 1 . For a massless neutralino, gluino masses of less than 2.44 TeV or 2.35 TeV are excluded at 95% CL in these two scenarios. The results are also interpreted in a model with variable gluino branching ratios to¯˜0 1 ,¯˜0 1 and¯˜− 1 /¯˜+ 1 . These limits represent a substantial increase in performance over previous ATLAS analyses exploiting smaller datasets collected at the LHC.
The crucial computing support from all WLCG partners is acknowledged gratefully, in particular from CERN, the ATLAS Tier-1 facilities at TRIUMF (Canada)                   [85] P. Baldi, P. Sadowski and D.