Observation of four-top-quark production in the multilepton final state with the ATLAS detector

This paper presents the observation of four-top-quark (tt¯tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{t}t\bar{t}$$\end{document}) production in proton-proton collisions at the LHC. The analysis is performed using an integrated luminosity of 140 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {fb}^{-1}$$\end{document} at a centre-of-mass energy of 13 TeV collected using the ATLAS detector. Events containing two leptons with the same electric charge or at least three leptons (electrons or muons) are selected. Event kinematics are used to separate signal from background through a multivariate discriminant, and dedicated control regions are used to constrain the dominant backgrounds. The observed (expected) significance of the measured tt¯tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{t}t\bar{t}$$\end{document} signal with respect to the standard model (SM) background-only hypothesis is 6.1 (4.3) standard deviations. The tt¯tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{t}t\bar{t}$$\end{document} production cross section is measured to be 22.5-5.5+6.6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$22.5^{+6.6}_{-5.5}$$\end{document} fb, consistent with the SM prediction of 12.0±2.4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$12.0 \pm 2.4$$\end{document} fb within 1.8 standard deviations. Data are also used to set limits on the three-top-quark production cross section, being an irreducible background not measured previously, and to constrain the top-Higgs Yukawa coupling and effective field theory operator coefficients that affect tt¯tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t\bar{t}t\bar{t}$$\end{document} production.


Introduction
The top quark is the heaviest elementary particle in the Standard Model (SM) and has a strong connection to the SM Higgs boson as well as potentially new particles in various theories beyond the SM (BSM). It is therefore relevant to study rare processes involving the top quark, such as the production of four top quarks (tttt), which is predicted by the SM but has not been observed yet. The tttt cross section could be enhanced in many BSM models, including gluino pair production in supersymmetric theories [1,2], scalar-gluon pair production [3,4], the associated production of a heavy scalar or pseudoscalar boson with a top-quark pair in two-Higgsdoublet models [5][6][7], or in top-quark-compositeness models [8]. Additionally, the tttt cross section is sensitive to the strength of the top-quark Yukawa coupling, and its charge conjugation and parity (C P) properties [9,10]. It is also sensitive to various four-fermion interactions [11][12][13][14] and the Higgs oblique parameter [15] in the context of an effective field theory (EFT) framework.
Within the SM, the predicted cross section of tttt production in proton-proton ( pp) collisions at a centre-of-mass energy √ s = 13 TeV is σ tttt = 12.0 fb at next-to-leading order (NLO) in QCD including electroweak corrections, with a relative scale uncertainty of ± 20% [16][17][18]. This prediction is used as a reference throughout this paper, as for the previous publications [19,20]. Including threshold resummation at next-to-leading logarithmic accuracy increases the total production cross section by approximately 12% and reduces the scale uncertainty significantly, leading to σ tttt = 13.4 +1 [21]. Illustrative Feynman diagrams for SM tttt production are shown in Fig. 1. Other rare multi-top-quark production processes which haven't been observed yet, such as threetop-quark production (ttt), 1 are expected to have cross sections of O(1 fb) in the SM, an order of magnitude smaller than tttt production. They can also be sensitive to BSM effects.
The tttt process results in various final states depending on the top quark decays. These final states are classified based on the number of electrons or muons produced in the semileptonic top quark decays, including those originating from subsequent τ leptonic decays. This paper focuses on two types of events: those with exactly two same-charge isolated leptons (2LSS) and those with at least three isolated leptons (3L). In the SM, 7% and 5% of the produced four top-quark events result in 2LSS and 3L final states, respectively. While these channels, referred to as 2LSS/3L, are relatively rare final states, they are advantageous due to low levels of background. The tttt topology is also characterised by a high light-jet and b-jet multiplicity and a large rest mass of the event, amounting to about 700 GeV.
The ATLAS and CMS experiments have reported evidence for tttt production in 13 TeV pp collisions at the LHC. The latest ATLAS result combines two analyses using 139 fb −1 at √ s = 13 TeV: one in the 2LSS/3L channel [19] and the other in the channel comprising events with one lepton or two leptons with opposite electric charge. This combination results in a measured cross-section of 24 +7 −6 fb, corresponding to an observed (expected) signal significance of 4.7 (2.6) standard deviations over the background-only predictions [20]. The latest CMS result is from a combination of several measurements using 138 fb −1 at √ s = 13 TeV, in the channels with zero, one and two electrons or muons with opposite-sign electric charges and the 2LSS/3L channel, yielding an observed (expected) significance of 4.0 (3.2) standard deviations [22]. The tttt cross section measured by the CMS collaboration is 17 ± 5 fb.
This paper presents a re-analysis of the 140 fb −1 data set at √ s = 13 TeV in the 2LSS/3L channel with the ATLAS detector and supersedes the result of Ref. [19]. Compared to the previous result that showed evidence for tttt production [19], this new measurement brings several improve-ments: an optimised selection with lower cuts on the leptons' and jets' transverse momenta; improved b-jet identification; a new data-driven estimation of the tt W +jets background, one of the main backgrounds in this channel; a revised set of systematic uncertainties; an improved treatment of the ttt background and a more powerful multivariate discriminant to separate the signal from background. This paper also presents limits under several BSM scenarios, such as limits on the three-top-quark production cross section, on the top-Higgs Yukawa coupling, and on EFT operators corresponding to heavy-flavour four-fermion interactions and the Higgs oblique parameter.

The ATLAS detector
The ATLAS detector [23] at the LHC covers nearly the entire solid angle around the collision point. 2 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range |η| < 2.5. The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [24,25]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to |η| = 2.0. The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range |η| < 4.9. Within the region |η| < 3.2, electromagnetic calorimetry is provided by barrel and endcap highgranularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering |η| < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillatortile calorimeter, segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. Three layers of precision chambers, each consisting of layers of monitored drift tubes, covers the region |η| < 2.7, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range |η| < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [26]. The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [27] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and Monte Carlo simulation samples
This analysis was performed using the pp collision data collected by the ATLAS detector between 2015 and 2018 at √ s = 13 TeV. After the application of data-quality requirements [28], the data set corresponds to an integrated luminosity of 140 fb −1 .
Samples of simulated, or Monte Carlo (MC), events were produced to model the different signal and background processes. Additional samples were produced to estimate the modelling uncertainties for each process. The configurations used to produce the samples are summarised in Table 1. The effects of the additional pp collisions in the same or a nearby bunch crossing (pile-up) were modelled by overlaying minimum bias events simulated using Pythia8.186 [29] with the A3 set of tuned parameters (tune) [30] on events from hard-scatter processes. The MC events were weighted to reproduce the distribution of the average number of interactions per bunch crossing observed in the data. All samples generated with PowhegBox [31][32][33][34] and Mad-Graph5_aMC@NLO [35] were interfaced to Pythia8 to simulate the parton shower, fragmentation, and underlying event with the A14 tune [36] and the NNPDF2.3lo distribution function (PDF) set. Samples using Pythia8 and Her-wig7 have heavy-flavour hadron decays modelled by EvtGen v1.2.0 or EvtGen v1.6.0 [37]. All samples include leadinglogarithm photon emission, either modelled by the parton shower generator or by PHOTOS [38]. The masses of the top quark, m top , and of the SM Higgs boson, m H , are set to 172.5 GeV and 125 GeV, respectively. The signal and background samples were processed through the simulation of the ATLAS detector geometry and response using Geant4 [39], and then reconstructed using the same software as for collider data. Some of the alternative samples used to evaluate systematic uncertainties were instead processed through a fast detector simulation making use of parameterised showers in the calorimeters [40].
The nominal sample used to model the SM tttt signal was generated using the MadGraph5_aMC@NLOv2.6.2 [45] generator, which provides matrix elements at NLO in QCD. The renormalisation (μ R ) and factorisation (μ F ) scales are set to μ R = μ F = m T /4, where m T is the scalar sum of the transverse masses p 2 T + m 2 of the particles generated from the matrix element calculation, following Ref. [16]. An alternative tttt sample generated with the same Mad-Graph5_aMC@NLO set-up, but interfaced to Herwig7.04 [46,47] with the H7UE tune [47], is used to evaluate uncertainties due to the choice of parton shower and hadronisation model. Another tttt sample was produced at NLO using the Sherpa 2.2.11 [48] generator to account for the modelling uncertainty from an alternative generator choice. The scales for this sample are set to μ R = μ F =Ĥ T /2, whereĤ T is defined as the scalar sum of the transverse momenta of all final state particles. To mitigate the effect of the large fraction of events with negative weights present in the nominal sample that would be detrimental to the training of the multivariate discriminant used to separate signal from background, an additional sample with settings similar to the nominal ones was generated using leading order (LO) matrix elements (MEs). The tttt MC samples with different BSM effects such as non-SM top Yukawa coupling and various EFT operators, including the four-fermion and the Higgs oblique operators, were generated at LO with MadGraph5. They are referred to as "tttt κ t " and "tttt EFT" in Table 1. Dedicated UFO models implemented in MadGraph5 with FeynRules [49,50] are used to simulate these BSM effects. The alternative Yukawa coupling scenarios were simulated Table 1 Summary of the generation setups used for the signal and background simulation. The samples used to estimate the systematic uncertainties are indicated in parentheses. V refers to production of an electroweak boson (W or Z /γ * ). The matrix element (ME) order refers to the order in QCD of the perturbative calculation. The parton distribution function (PDF) used for the ME is shown in the table. Tune refers to the underlying-event tune of the parton shower (PS) generator. MEPS@NLO and FxFx refer to the methods used to match the ME to the PS in Sherpa [41][42][43][44] and in MadGraph5_aMC@NLO [35], respectively

Process
Generator ME order PDF Parton shower Tune using the Higgs characterisation model [51]. The effects of the four-fermion and Higgs oblique operators are simulated using the SMEFT@NLO model [52] and the dedicated UFO model from Ref. [15], respectively. The tt W process was simulated using the Sherpa 2.2.10 [48,53] generator. The MEs were calculated for up to one additional parton at NLO and up to two partons at LO using Comix [42] and OpenLoops [41], and merged with the Sherpa parton shower [43] using the MEPS@NLO prescription [44] with a merging scale of 30 GeV. The choice of scales is μ R = μ F = m T /2. In addition to the nominal prediction at NLO in QCD, higher-order corrections related to electroweak (EW) contributions are also included. Eventby-event correction factors are applied that provide virtual NLO EW corrections to O(α 2 α 2 s ) and LO corrections to O(α 3 ) [48,54,55]. An independent Sherpa 2.2.10 sample was produced at LO to account for the sub-leading EW corrections to O(α 3 α s ) [16]. The NLO QCD and NLO EW contributions from Sherpa are combined following the method of Ref. [56]. The tt W samples are normalised using a cross section of 722 fb computed at NLO including the hard non-logarithmically enhanced radiation at NLO in QCD [56]. This normalisation is used to determine the contribution of the tt W +jets background before additional data-driven corrections are derived. The impact of the systematic uncertainty associated with the choice of generator is evaluated using an alternative tt W sample generated with up to one additional parton in the final state at NLO accuracy in QCD using Mad-Graph5_aMC@NLO. In this sample, the different jet multiplicities were merged using the FxFx NLO matrix-element and parton-shower merging prescription [35] with a merging scale of 30 GeV. A separate MadGraph5 tt W +jets sample including only the sub-leading EW corrections to O(α 3 α s ) is also included.
The tt Z/γ * sample was generated at NLO, including off-shell Z contributions with m( + − ) > 1 GeV and is normalised to the calculation of Ref. [57], extrapolated to take into account off-shell effects down to m( + − ) = 1 GeV. The tt H process was generated at NLO QCD using the five-flavour scheme with the h damp parameter 3 set to 0.75 × (2m top + m H ). It is normalised to the cross section computed at NLO QCD plus NLO EW [57,58]. The production of tt and single-top-quark events were modelled at NLO QCD using PowhegBox. For the tt sample, the h damp parameter is set to 1.5 m top [59]. The tt and single-top-quark simulated samples are normalised to the cross sections calculated at next-to-next-to-leading order (NNLO) QCD including the resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms [60][61][62][63]. The overlap between the tt and the single top t W final states is removed using the diagram-removal scheme [64]. The ttt sample was generated at LO QCD using the five-flavour scheme, and includes ttt W and tttq processes. The ttt W and tttq samples are normalised to a cross section of 1.02 fb and 0.65 fb, respectively, computed at NLO using MadGraph5_aMC@NLO, with a central scale choice ofĤ T /2 and the NNPDF23 PDF set. Rare and small background contributions including t W Z, t Zq, tt V V , tt H H, tt W H, V , V V , V V V and V H are normalised using their NLO theoretical cross sections.

Objects and event selection
Events were selected using single-lepton or dilepton triggers with variable electron and muon transverse momentum ( p T ) thresholds, and various identification and isolation criteria depending on the lepton flavour and the data-taking period [65][66][67][68]. Events are required to have at least one vertex with at least two associated ID tracks with p T > 0.5 GeV. In each event, the primary vertex is defined as the reconstructed vertex having the highest scalar sum of squared p T of associated tracks [69] and is consistent with the average beam-spot position.
Electron candidates are reconstructed from energy deposits in the EM calorimeter associated with ID tracks [70] and are required to consist of a calorimeter energy cluster with a pseudorapidity of |η cluster | < 2.47, excluding candidates in the transition region between the barrel and the endcap calorimeters (|η cluster | / ∈ [1.37, 1.52]). Electrons are required to meet the 'tight' likelihood-based identification criterion [70]. Muon candidates are reconstructed by combining tracks in the ID with tracks in the muon spectrometer [71]. They are required to have |η| < 2.5 and meet the 'medium' cut-based identification criterion.
The electron and muon candidates are required to have p T > 15 GeV and meet the tight working point of the prompt-lepton isolation discriminant [72], trained to separate prompt and non-prompt leptons. The transverse impact parameter divided by its estimated uncertainty, |d 0 |/σ (d 0 ), is required to be lower than five (three) for electron (muon) candidates. The longitudinal impact parameter must satisfy |z 0 sin(θ )| < 0.5 mm for both lepton flavours. To suppress the background arising from incorrect charge assignment, an additional requirement is imposed on electrons based on the score of a boosted decision tree (BDT) discriminant that uses their calorimeter energy clusters and track properties [70]. Electrons and muons passing these criteria are referred to as 'tight' in the following. 'Loose' electrons are required to pass the same set of requirements as tight electrons except that they satisfy the 'medium' likelihood-based identification and loose isolation instead. Similarly, for 'loose' muons, the isolation requirement is relaxed.
Jets are reconstructed using a particle flow algorithm [73,74] by combining measurements from both the ID and the calorimeter. The anti-k t algorithm [75,76] with a radius parameter of R = 0.4 is used. Jets are required to have p T > 20 GeV and |η| < 2.5 and are calibrated as described in Ref. [74]. To reduce the effect from pile-up, the jet-vertex tagger [77], which identifies jets originating from the selected primary vertex, is applied to jets with p T < 60 GeV and |η| < 2.4.
Jets containing b-hadrons are identified (b-tagged) via the DL1r algorithm [78,79]. This algorithm uses deep-learning neural networks exploiting the distinct features of b-hadrons in terms of track impact parameters and displaced vertices reconstructed in the ID. A jet is considered b-tagged if it passes the operating point corresponding to 77% average efficiency for b-quark jets in simulated tt events, with rejection factors against light-quark/gluon jets and c-quark jets of 192 and 6, respectively. Three additional operating points are defined with average b-tagging efficiencies of 85%, 70% and 60%. To fully exploit the b-tagging information of an event, each jet is assigned a pseudo-continuous b-tagging (PCBT) score that defines if a jet passes a given operating point but fails the adjacent tighter one. A score of two, three, four or five is assigned to a jet passing the 85%, 77%, 70% or 60% operating point. If a jet does not pass any operating point, a score of one is assigned.
The missing transverse momentum in the event, whose magnitude is denoted in the following by E miss T , is defined as the negative vector sum of the p T of the reconstructed and calibrated objects in the event [80]. This sum also includes the momenta of the ID tracks that are matched to the primary vertex but are not associated with any other reconstructed objects.
A sequential overlap removal procedure described in Ref. [19] and based on loose leptons is applied, such that the same calorimeter energy deposit or the same track being reconstructed is not used in two different objects.
The event selection closely follows that of Ref. [19]. Each event must have at least one reconstructed lepton that matches a lepton that fired the trigger. The highest p T lepton is required to have p T > 28 GeV to be always above the trigger threshold. The events are required to have one same-sign tight lepton pair or at least three tight leptons, and at least one b-tagged jet. Each event must have at least one reconstructed lepton that matches a lepton that fired the trigger. Events with two same-sign electrons are required to have invariant mass satisfying m ee > 15 GeV and |m ee − 91 GeV| > 10 GeV, to reduce the charge mis-assignment background originating from low-mass resonances and Z boson decays. In events with at least three leptons, all opposite-sign same-flavour lepton pairs are required to satisfy |m − 91 GeV| > 10 GeV to reduce the contamination from Z boson decay.
Events in the signal region (SR) are required to contain at least six jets, two of which are b-tagged. The scalar sum of the transverse momentum of the selected leptons and jets in the event (H T ) is required to be above 500 GeV. This requirement is motivated by the high light-flavour jet and b-jet multiplicity as well as the large overall event activity characteristic of the tttt process.
The main sources of physics background are tt W +jets, tt Z+jets and tt H+jets processes. Smaller backgrounds, for which all selected leptons are from W or Z boson decays or from leptonic τ -lepton decays, include diboson or triboson production, single-top-quark production in association with a Z boson (t W Z, t Zq) and rare processes such as ttt, tt H H, tt W H and tt V V , where V = W, Z . These backgrounds, except for tt W +jets production, are evaluated using simulated events. The tt W +jets background is evaluated based on the information from the simulation and a data-driven technique described in Sect. 5. This is motivated because the tt W +jets process is notoriously challenging to model accurately using only simulation [16], and because existing measurements of its cross section [19,81,82] are consistently above the theoretical predictions. The background events originating from tt+jets and t W +jets production pass the selection if prompt leptons have a misassigned charge (QmisID) or leptons are fake/non-prompt. The QmisID background is evaluated using a data-driven technique while the fake/non-prompt background estimation is based on the template method utilising inputs from both simulation and data as described in Sect. 5 and already used in Ref. [19].
The estimated yield for each source of background is given in Sect. 8.

Estimation of the tt W background
The theoretical modelling of the tt W background at high jet multiplicity, corresponding to the phase space of this analysis, suffers from large uncertainties. To mitigate this effect, the normalisation of this background in jet multiplicity bins is determined using a data-driven approach, while other characteristics of the tt W events are modelled by simulated events. The uncertainties related to the modelling of the kinematic distributions taken from simulation are discussed in Sect. 7.3. The estimate of the normalisation in each jet bin is based on a functional form that describes the evolution of the number of tt W events N as a function of the jet multiplicity j, At high jet multiplicity, R( j) follows the so-called 'staircase scaling' with R( j) being a constant denoted a 0 . This behaviour implies a fixed probability of additional jet radiation. For lower jet multiplicities, where a 1 is a constant and n is the number of additional jets to the hard process. It was derived in the Abelian limit of QCD by considering gluon radiation off hard quark legs and validated using experimental data [86]. The transition point between these scaling behaviours depends on the jet kinematic selection. Since tt W production in the phase-space region of this analysis is dominated by events that include at least four jets in the matrix element at tree level, the parameterisation of the tt W events starts at the fourth jet (n = j − 4). Independent normalisation factors (NF) for the expected number of tt W + and tt W − events are introduced to take into account different production rates of these processes in pp collisions. The normalisation factor for tt W events with j > 4 is then given by . (1) Dedicated control regions (CR) labelled as CR tt W + +jets, CR tt W − +jets, CR 1b(+) and CR 1b(-) are defined to determine the a 0 and a 1 scaling parameters and the overall number of tt W + and tt W − events with exactly 4 jets, NF ttW + (4jet) and NF ttW − (4jet) , respectively. The definition of these CRs is summarised in Table 2. The CR tt W + +jets and CR tt W − +jets regions are primarily used to adjust the tt W normalisation in a phase space with at least two b-tagged jets as in the SR, while the CR 1b(+) and CR 1b(-) regions, which include events with N j ≥ 4, are mainly used to determine the tt W jet multiplicity spectrum.
The tt W CRs, along with the CRs defined in Sect. 5.2, are included in the likelihood fit to data, together with the SR (see Sect. 6). Figure 2 shows the jet-multiplicity distributions Table 2 Definition of the signal and control regions. The control regions labeled as CR tt W + +jets, CR tt W − +jets, CR 1b(+) and CR 1b(-) are defined to determine the background modelling parameters of tt W + and tt W − events. The control regions labeled as CR HF e, CR HF μ, CR Mat. Conv. and CR Low m γ * are defined to determine the normalisation of fake/non-prompt lepton background events. N j (N b ) indicates the jet (b-tagged jet) multiplicity in the event. H T is defined as the scalar sum of the transverse momenta of the isolated leptons and jets. 1 , 2 and 3 refer to the highest p T , second-highest p T and third-highest p T leptons, respectively. η(e) refers to the electron pseudorapidity. Total charge is the sum of charges for all leptons. GNN denotes the Graph Neural Network discriminant trained to separate tttt signal from the background described in Sect. 6

Region
Channel Other selection Fitted variable CR Low m γ * SS, ee or eμ 4 ≤ N j < 6 ≥ 1 1 or 2 is from virtual photon (γ *) decay Event yield 1 and 2 are not from photon conversion CR Mat. Conv.
SS, ee or eμ 4 ≤ N j < 6 ≥ 1 1 or 2 is from photon conversion Event yield CR HF e eee or eeμ in each of the four tt W CRs. After the fit, the predictions provide a good description of the data. As mentioned above, unlike the signal and other background processes, tt W production has a pronounced charge asymmetry in pp collisions: tt W + events are produced approximately twice as abundantly as tt W − events. This property is used to validate the data-driven estimation of this background by examining the difference between the number of events with all tight leptons positively charged (positive events) and the number of events with all tight leptons negatively charged (negative events). The charge-symmetric tttt signal and most non-tt W backgrounds are expected to cancel in this difference, providing a validation of the tt W background. The distribution of the number of jets obtained from this procedure for events entering the four tt W CRs and the SR defined in Table 2 is shown in Fig. 3. A good description of the jet multiplicity distribution in data confirms that it can be predicted by two scaling parameters, as given in Eq. 1.

Fake/non-prompt lepton background
A template method is used to estimate the fake/non-prompt background. This method relies on simulation samples to model the kinematic distributions of the background processes arising from fake and non-prompt leptons and on CRs to determine the normalisation of these components. These CRs are included in the fit, together with the tt W CRs and the SR, and the normalisation factors are determined simultaneously with the tttt signal. The uncertainties related to the modelling of the kinematic distributions predicted in the simulation are discussed in Sect. 7.4. The method is similar to that used in the previous analysis [19]. • Events with a single reconstructed non-prompt electron (muon) from heavy-flavour decay; this contribution is referred to as HF e (HF μ). Minor contributions to the fake/non-prompt background arising from events with a lepton originating from light-meson decay or with a jet mis-identified as a lepton are determined using simulated events. As summarised in Table 2, the CRs labelled as CR HF e and CR HF μ are defined to determine the normalisation of the HF e and HF μ backgrounds, respectively. Similarly, the CRs labelled as CR Mat. Conv. and CR Low m γ * are defined to determine the normalisation of Mat. Conv. and Low m γ * backgrounds, respectively. Each region is designed to have a dominant background component. Figure 4 shows the distributions and event yields used in the template method: the third-highest lepton p T and the number of events. A good description of the data distributions by the fitted predictions is observed in all CRs.

Charge mis-assignment background
The QmisID background affects only the ee and eμ channels of the 2LSS region, and is estimated using the same method as in the previous analysis [19]. The probability for an electron to have its charge incorrectly assigned is estimated using a data sample of Z → ee events requiring the invariant mass of the electron pair to be within 10 GeV of the Z boson mass and without any requirement on the charge of the two electrons. The charge mis-assignment rate extracted from the fraction of events with same-charge electrons within this sample, is parameterised as a function of the electron transverse momentum and pseudorapidity. The rate varies from 0.004% to 4% depending on the electron p T and |η|. The expected number of events arising from the QmisID background is determined by applying the measured charge misassignment rate to data events satisfying the requirements of the kinematic selection of the 2LSS channel, except that the two leptons are required to be of opposite charge.

Signal extraction and cross section measurement
The background composition of the SR is largely dominated by the production of top-quark pairs in association with bosons. A multivariate discriminant built with a Graph Neural Network (GNN) [87] is used to separate the tttt signal from the background using the graph_nets library from TensorFlow [88]. From each event, a fully connected graph is constructed with 'nodes', corresponding to reconstructed jets, electrons, muons, and the missing transverse momentum of the event. The features of each node include the four momentum of the object, assumed to be massless, 4 the jet PCBT score, the lepton charge, and an integer labelling the type of object represented by the node. The 'edges' between nodes carry three features with information about the angular separation between the objects they connect. Additionally, the jet multiplicity is treated as a 'global' feature. The nodes, edges, global feature and the GNN hyperparameters are optimised to maximise the integral under the receiver operating characteristic curve of the GNN event classifier.
The GNN training is performed for events passing the SR requirements. The LO tttt simulated signal sample is used in the training. The MC simulated samples, corresponding to all background components, represent the background in the training. The GNN discriminant is chosen as the observable of the analysis to extract the tttt signal. Figure 5 shows the distribution of the GNN score in the signal region.
Following the strategy of the previous publication [19], a BDT discriminant is also trained as a cross-check in the SR by combining several input observables, of which the sum of the four highest PCBT scores among all jets in the event has the highest discriminating power. The separation power of the tttt signal from the background is slightly better for the GNN, leading to a 10% higher expected significance for this method. The tttt production cross-section, the normalisation factors of the backgrounds, and the tt W modelling parameters defined in Sect. 5.1 are determined via a binned likelihood fit to the discriminant score distribution in the SR and to the event yields or to the discriminating variable distributions in the eight CRs listed in Table 2. The maximum-likelihood fit is performed with the RooFit package [89] based on a likelihood function built on the Poisson probability that the observed data are compatible with the model prediction. The value of each nuisance parameter, describing the systematic uncertainties for both signal and background processes (see Sect. 7), is constrained by a Gaussian penalty term present in the likelihood function, while all normalisation factors and tt W modelling parameters, described in Sect. 5, are unconstrained.

Systematic uncertainties
Systematic uncertainties arise from the modelling of the signal processes, from theoretical or data-driven predictions of the background processes, as well as from experimental effects. These uncertainties affect both the shape and the normalisation of the estimations. In addition, they can lead to event migration between different regions. The different sources of systematic uncertainty are described below and their impact on the tttt cross section measurement after the fit is shown in Table 6.

Experimental uncertainties
The dominant experimental uncertainties arise from the measurement of the b-tagging efficiencies and mis-tagging rates [90][91][92]. Corrections applied to simulated samples to match the performance in data and the associated uncertainties are determined for each jet flavour separately in five PCBT score bins and in bins of jet p T . The resulting uncertainties are decomposed into 40 independent components (referred to as eigenvector components) for b-jet tagging, 20 for c-jet and 20 for light-jet mis-tagging. Due to a lack of data for b-/c-/light jets calibration with p T > 400/300/250 GeV, the corrections for these jets are extrapolated from the highest p T bin below these thresholds; uncertainties in this extrapolation are determined from the simulation, independently for b-/c-/light jets.
Uncertainties in the calibration of the jet energy scale and resolution [93-95] play a subleading role among the experimental uncertainties. An additional minor uncertainty arises from the correction applied to simulated events associated with the jet-vertex-tagger selection [96].
Other experimental uncertainties have minor impacts on the measurements. The uncertainty in the combined 2015-2018 integrated luminosity is 0.83% [97], obtained using the LUCID-2 detector [98], complemented by measurements using the inner detector and calorimeters. An uncertainty in the corrections on the pile-up profile in the simulated samples is also considered. Uncertainties in electrons and muons arise from the calibration of the simulated samples to data. The calibrations correct for the efficiencies of the trigger, reconstruction, identification and isolation requirements, as well as the energy scale and resolution [70,71]. For electrons, an additional uncertainty associated with the efficiency

Signal modelling uncertainties
Uncertainties in the modelling of SM tttt production have the dominant impact on the measurements. They affect both the signal acceptance and the shape of the GNN discriminant. The two leading uncertainties are determined by comparing the nominal prediction with alternative samples generated with Sherpa and with Mad-Graph5_aMC@NLO+Herwig7. These uncertainties mainly cover two effects: different matching schemes between the matrix element and parton shower generators as well as different parton shower, hadronisation and fragmentation models. The uncertainty due to missing higher-order QCD corrections in simulated events is estimated by varying the renormalisation and factorisation scales in the matrix element individually or simultaneously up or down by a factor of two, and taking the maximum up and down variations. The PDF uncertainty is 1%. It is calculated as the RMS of the predictions from the 100 replicas of the NNPDF30_nlo_as_0118 PDF set following the PDF4LHC prescription [99]. The effect of PDF variation on the shape of the fitted distributions was found to be negligible.

Uncertainties in irreducible backgrounds
The dominant uncertainty in the background predictions arises from the modelling of tt W , tt H and tt Z/γ * events. Uncertainties due to generator choices are determined by comparing the nominal predictions with those obtained with the alternative matrix element and/or parton shower generators specified in Table 1. Given that the N j distribution in tt W production is estimated using a data-driven method, the alternative MC sample simulated using Mad-Graph5_aMC@NLO is weighted to have the same N j distribution as the nominal prediction. This avoids double counting of the systematic effects in N j modelling. Minor uncertainties due to missing higher-order QCD corrections and PDF variations for each of the three processes are studied and determined using the same method as for tttt production.
The uncertainty in the predicted cross-sections of the background processes have minor impact on the measurements. The uncertainty in the ttt production cross section is set to 35%. This includes the effect of varying the renormalisation and factorisation scales up and down by a factor of two in the NLO QCD prediction, and the uncertainties from the PDF choice, EW LO contribution and the acceptance difference between MC samples generated using the five-flavour and the four-flavour schemes.
Uncertainties of 12% and 10% are included for the tt Z/γ * and tt H production cross sections, respectively [57]. An uncertainty of 30% is applied to the sum of t Zq and t W Z processes [100-102]. The uncertainty in V V production is taken as the discrepancies between the measured differential cross section as a function of the jet multiplicity and the prediction from Sherpa 2.2.2 [103]. For V V events with ≤ 3/4/ ≥ 5 jets, an uncertainty of 20%/50%/60% is applied. The three components are considered as uncorrelated. An uncertainty of 50% is considered for tt W W production based on the NLO prediction [45]. For all other minor background processes with negligible impact including the production of V H, V V V , tt Z Z, tt W Z, tt H H and tt W H, a single uncertainty of 50% is considered. This value is based on previous analyses in a similar final state [19,104], and is large enough to cover the different predictions listed in Ref. [45].
For all background contributions estimated based on simulations, additional uncertainties are considered for events produced with associated b-jets. This is motivated by the expected high b-jet multiplicities in signal-like events and the general difficulty in modelling additional heavy-flavour jets in simulations. For each background process, an uncorrelated uncertainty of 50% is assigned to events with exactly three 'true' b-jets 5 and to events with at least four true bjets. Since ttt events have three b-jets arising from top quark decays, an uncertainty of 50% is only assigned to ttt events if they have at least four true b-jets. These estimates are based on the measurement of tt+jets production with additional heavy-flavour jets [105] and on comparisons between data and prediction in ttγ events with three and four b-tagged jets.

Uncertainties in reducible backgrounds
Uncertainties in the estimate of the fake/non-prompt and QmisID backgrounds have minor impact. These uncertainties are derived and treated in the same way as in Ref. [19]. The resulting uncertainties for the QmisID background range from 5-30% depending on the p T and η of the lepton. For leptons from heavy-flavour hadron decay, the uncertainties range from 20-100%. For material conversion and virtual photon conversion, the uncertainties are 30% and 21%, respectively. Conservative uncertainties are applied to other smaller contributions, including an uncertainty of 100% for events with fake/non-prompt leptons from light jets, and a 30% uncertainty for all other sources of fake/non-prompt background. These values are taken from previous analyses in a similar final state [19,104].
Given that tt+jets events are the main source of fake/nonprompt and QmisID backgrounds, additional uncertainties are considered for tt+jets events produced in association with additional b-jets. An uncorrelated uncertainty of 50% is assigned to events with 3 b-jets and to events with at least four b-jets.

Result for the tt tt cross section measurement
A maximum-likelihood fit is performed to the GNN score distribution in the SR and the different distributions used in the CRs (see Table 2). Figure 5 shows the distribution of the GNN score in the SR before and after performing the fit. Good agreement is observed between data and the prediction after the fit. The best-fit value of the signal strength μ, defined as the ratio of the measured tttt cross section to the SM expectation of σ tttt = 12.0 ± 2.4 fb from Ref. [16], is found to be: The corresponding measured tttt production cross section is: The systematic uncertainty, is determined by subtracting the statistical uncertainty in quadrature from the total uncertainty. The statistical uncertainty is obtained from a fit where all nuisance parameters are fixed to their post-fit values. The systematic uncertainty on the signal strength includes the theoretical uncertainty on the SM tttt cross section. The measured tttt production cross section is consistent within 1.8 standard deviations with the SM prediction of Ref. [16], and within 1.7 standard deviations with the resummed σ tttt calculation of Ref. [21]. The probability for the background-only hypothesis to result in a signal-like excess at least as large as seen in data is derived using the profile-likelihood ratio following the procedure described in Ref. [106]. From this, the significance of the observed signal is found to be 6.1 standard deviations, while 4.3 standard deviations are expected using the SM cross section of σ tttt = 12.0 ± 2.4 fb from Ref. [16]. Using the SM cross section of 13.4 +1.0 −1.8 fb from Ref. [21], the expected significance would be 4.7 standard deviations. The goodness-of-fit evaluated using a saturated model [107,108] yields a probability of 76%. Compared to the previous result of Ref. [19], the gain in expected sensitivity comes from the updated lepton and jet selection and uncertainties, from the use of the GNN discriminant and from the improved treatment of the ttt background. This leads to a better purity of the signal and a smaller uncertainty on the background in the signal-enriched region. The overall uncertainty on the cross section is slightly smaller than the result in Ref. [19], mainly because of an updated treatment of the systematic uncertainty for signal modelling.
The normalisation factors of the different fake/nonprompt lepton background sources and the parameters of the data-driven tt W background model determined from the fit are shown in Tables 3 and 4. The post-fit values of the background and signal yields before and after the fit as well as for events with a GNN score equal to or higher than 0.6 are shown in Table 5. The post-fit number of tt W events with 6 and 7 jets is smaller than at the pre-fit level, while the number of tt W events with ≥9 jets is increased compared to the pre-fit prediction. The overall number of fitted tt W is in agreement with the tt W cross section measurement in Ref.
[82]. The number of background events from material conversion is also increased. The post-fit normalisation factors from Tables 3 and 4 agree with their nominal value of 1, except for NF Mat. Conv. . They provide good agreement with data as shown in Table 5 and Fig. 5. Figure 6 presents the distributions of the number of jets, the number of b-jets, the sum of the four highest PCBT scores   Table 5 Pre-fit and post-fit background and signal yields in the signal region and for events with GNN score larger than 0.6. The total systematic uncertainty differs from the sum in quadrature of the different uncertainties due to correlations of jets in the event and H T , in the signal-enriched region for events with a GNN score equal to or higher than 0.6. Good agreement between data and the post-fit predictions is observed.
As a cross-check, a maximum-likelihood fit to the BDT score distribution in data is also performed yielding the same signal strength corresponding to the observed significance of 6.0 standard deviations while 3.9 is expected.
The largest systematic uncertainties in the measurement of σ tttt arise from the modelling of the tttt signal, with the uncertainty from the choice of the MC generator being by far the largest followed by the uncertainty from the parton shower and hadronisation model. The uncertainty in the data- driven tt W background estimate is also significant. Among the experimental uncertainties, the largest effects come from the b-jet tagging and the jet energy scale. The parameter associated with the tttt MC generator choice is pulled by around −0.5 standard deviations after the fit. No other nuisance parameter is found to be significantly adjusted or con-strained by the fit. The uncertainties impacting the tttt cross section measurement are summarised in Table 6. This analysis derives 95% confidence level (CL) intervals on the cross section of the ttt process, σ ttt . In the SM, triple top quarks are always produced in association with other particles, and can be split into the ttt W and tttq pro- Table 6 List of the uncertainties in the cross section σ tttt , grouped in categories. The quoted values are obtained by repeating the fit, fixing a set of nuisance parameters of the sources corresponding to each category, and subtracting in quadrature the resulting uncertainty from the total uncertainty of the nominal fit presented in the last row. The total uncertainty is different from the sum in quadrature of the components due to correlations among nuisance parameters . These processes give rise to an experimental signature and kinematic properties of the event similar to that of the tttt signal, resulting in a similar GNN shape for tttt and ttt production. Varying the normalisation of the tttt and ttt processes in the likelihood fit simultaneously results in an anti-correlation between the two processes of −93%. Figure 7 shows the two-dimensional negative log-likelihood contour for the likelihood fit where the ttt and tttt cross sections are treated as free parameters. Although the best-fit value in this case is consistent with a tttt cross section of zero, it is also consistent with the SM prediction at the level of 2.1 standard deviations.  The two components of the ttt process (ttt W and tttq) have different sensitivity to possible BSM effects. In particular, a number of higher-dimensional operators generating flavour changing neutral currents involving the top quark [112,113] lead to the production of ttt events without an associated W boson in the final state. Thus it is interesting to derive 95% CL intervals on the cross sections of the ttt W and tttq processes separately. Under the assumption of a tttt signal strength of 1 and of 1.9, the 95% CL intervals on ttt, ttt W and tttq are shown in Table 7. The 95% CL intervals for the tttq process are wider than those for the ttt W process because the tttq process is characterised by lower lepton and jet multiplicities in the final state, which results in a lower selection efficiency.

Interpretations
Limits and 95% CL intervals are also obtained on the topquark Yukawa coupling, on EFT operators that parametrize BSM tttt production, and on a Higgs oblique parameter.

Limits on the top-quark Yukawa coupling
The top-quark Yukawa coupling enters the electroweak tttt Feynman diagram where a pair of top quarks is mediated by a Higgs boson (see Fig. 1). This makes tttt production sensitive to the top-quark Yukawa coupling, including the coupling strength and its C P properties. An additional dependence on the Yukawa coupling comes through the tt H background. The tttt cross section can be parameterised as a function of two parameters: the top Yukawa coupling strength modifier κ t , and the C P-mixing angle α [10,114]. The SM corresponds to a pure C P-even coupling with α = 0 and κ t = 1, while a C P-odd coupling is realised when α = 90 • . In case of a pure C P-even coupling, κ t can be expressed as a ratio of the top-quark Yukawa coupling y t to the SM prediction y SM t . The tt H cross section also depends on these parameters, but unlike tttt production, the tt H kinematic distributions change only when the C P-odd term κ t sin(α) is non-zero. The tttt and tt H yields in each bin of the GNN distribution are parameterised as a function of κ t and α. The observed (expected) 95% CL limits are shown in Fig. 8 in the two-dimensional parameter space (|κ t cos(α)|, |κ t sin(α)|). Fixing the top-quark Yukawa coupling to be C P-even only (i.e. α = 0), the following observed (expected) limits are extracted: |κ t | < 1.8 (1.6). This limit is less stringent than that reported in Ref.
[115] using a similar technique, due to a slightly higher measured SM tttt cross section than the prediction. As a check to probe the effect from the tt H process, an alternative fit is performed. As opposed to the fit used to obtain the nominal result, the tt H background yields are not parametrised, whilst the normalisation of the tt H background is treated as a free parameter of the fit. This leads to an observed (expected) limit of |κ t | < 2.2 (1.8).

Limits on EFT operators and the Higgs oblique parameter
Within the EFT framework, the tttt process is sensitive to four heavy-flavour fermion operators Qt , which can probe the BSM models that enhance interactions between the third-generation quarks [12]. The tttt production cross section can be approximated by:  Fig. 8 Two-dimensional negative log-likelihood contours for |κ t cos(α)| versus |κ t sin(α)| at 68% and 95%, where κ t is the top-Higgs Yukawa coupling strength parameter and α is the mixing angle between the C P-even and C P-odd components. The gradientshaded area represents the observed likelihood value as a function of κ t and α. Both the tttt signal and tt H background yields in each fitted bin are parameterised as a function of κ t and α. The blue cross shows the SM expectation, while the black cross shows the best fit value Table 8 Expected and observed 95% CL intervals on EFT coupling parameters assuming one EFT parameter variation in the fit

Operators
Expected where C i denotes the coupling parameters of the four heavyflavour fermion operators, C i σ (1) i is the linear term that represents the interference of dimension-6 operators with SM operators, and C i C i σ (2) i, j is the quadratic term that also includes the interference between different EFT operators. The 95% CL intervals on the EFT parameters are extracted by parameterising the tttt yield in each bin of the GNN score distribution as a quadratic function of the coefficient of the corresponding EFT operator (C i / 2 ) and performing the fit to data. The fit is carried out assuming that only one operator contributes to the tttt cross section, while the coefficients of the other three operators are fixed to the SM value of zero. The expected and observed 95% CL intervals on the coefficients are summarised in Table 8  An oblique parameter is a self-energy correction term applied to electroweak propagators in the SM. The BSM additions to such a correction can be expressed as an EFT expansion, with the self-energy correction to the Higgs boson parameterised by the parameterĤ , whereĤ = 0 corresponds to the SM prediction [15]. TheĤ parameter affects the off-shell Higgs interaction, and thus the tttt cross section, as well as processes involving a Higgs boson, in particular tt H production, which is a significant background to the tttt measurement. To account for this effect, the tt H contribution is parameterised as a function ofĤ [15]: μ tt H = 1 −Ĥ , where μ tt H is the tt H normalisation factor with respect to the SM cross section. A limit onĤ is extracted from the likelihood scans shown on Fig. 9. The observed (expected) upper limit on theĤ value is 0.20 (0.12) at 95% CL. The observed limit coincides with the largest value of this parameter that preserves unitarity in the perturbative theory. Previously, limits on theĤ parameter were reported in Refs. [15,115,117].

Conclusion
This paper presents a measurement of tttt production using 140 fb −1 of data at √ s = 13 TeV with the ATLAS detector at the LHC. Events are selected with exactly two same-charge isolated leptons or at least three isolated leptons. The normalisation of the tt W background in jet multiplicity bins is determined using a data-driven approach. A Graph Neural Network is used to separate the tttt signal from the background. Four-top production is observed with a significance of 6.1 standard deviations with respect to the backgroundonly hypothesis. The expected significance is 4.3 or 4.7 standard deviations depending on the assumed SM cross section. The measured tttt production cross section is 22.5 +6.6 −5.5 fb. It is consistent with the SM predictions within 1.8 or 1.7 standard deviations and with the previous ATLAS measurement. In addition, 95% confidence level intervals on the cross section for inclusive ttt production and its subprocesses tttq and ttt W are provided, for the scenarios assuming the SM tttt production cross section and the measured tttt cross section. The results are used to set limits on several new physics scenarios. Constraints on the C P properties of the top-quark Yukawa coupling are obtained in the form of limits in the two-dimensional parameter space (|κ t cos(α)|, |κ t sin(α)|). Assuming a pure C P-even coupling (α = 0), the observed upper limit on |κ t | = |y t /y SM t | at 95% CL is 1.8. Constraints at 95% CL are obtained on the four dimension-6 heavy-flavour fermion operators. Assuming one operator taking effect at a time, the observed constraints on the coeffi- Data availability statement This manuscript has no associated data or the data will not be deposited. [Authors' comment: All ATLAS scientific output is published in journals, and preliminary results are made available in Conference Notes. All are openly available, without restriction on use by external parties beyond copyright law and the standard conditions agreed by CERN. Data associated with journal publications are also made available: tables and data from plots (e.g. cross section values, likelihood profiles, selection efficiencies, cross section limits, ...) are stored in appropriate repositories such as HEPDATA (http:// hepdata.cedar.ac.uk/). ATLAS also strives to make additional material related to the paper available that allows a reinterpretation of the data in the context of new theoretical models. For example, an extended encapsulation of the analysis is often provided for measurements in the framework of RIVET (http://rivet.hepforge.org/).] Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Funded by SCOAP 3 . SCOAP 3