Search for production of four top quarks in final states with same-sign or multiple leptons in proton–proton collisions at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=13$$\end{document}s=13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {TeV}$$\end{document}TeV

The standard model (SM) production of four top quarks (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {t} {}{\overline{\text {t}}} \text {t} {}{\overline{\text {t}}} $$\end{document}tt¯tt¯) in proton–proton collisions is studied by the CMS Collaboration. The data sample, collected during the 2016–2018 data taking of the LHC, corresponds to an integrated luminosity of 137\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {fb}^{-1}$$\end{document}fb-1 at a center-of-mass energy of 13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {TeV}$$\end{document}TeV. The events are required to contain two same-sign charged leptons (electrons or muons) or at least three leptons, and jets. The observed and expected significances for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {t} {}{\overline{\text {t}}} \text {t} {}{\overline{\text {t}}} $$\end{document}tt¯tt¯ signal are respectively 2.6 and 2.7 standard deviations, and the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {t} {}{\overline{\text {t}}} \text {t} {}{\overline{\text {t}}} $$\end{document}tt¯tt¯ cross section is measured to be \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$12.6^{+5.8}_{-5.2}\,\text {fb} $$\end{document}12.6-5.2+5.8fb. The results are used to constrain the Yukawa coupling of the top quark to the Higgs boson, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{\text {t}}$$\end{document}yt, yielding a limit of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|y_{\text {t}}/y_{\text {t}}^{\mathrm {SM}} | < 1.7$$\end{document}|yt/ytSM|<1.7 at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$95\%$$\end{document}95% confidence level, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{\text {t}}^{\mathrm {SM}}$$\end{document}ytSM is the SM value of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{\text {t}}$$\end{document}yt. They are also used to constrain the oblique parameter of the Higgs boson in an effective field theory framework, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{H}<0.12$$\end{document}H^<0.12. Limits are set on the production of a heavy scalar or pseudoscalar boson in Type-II two-Higgs-doublet and simplified dark matter models, with exclusion limits reaching 350–470\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {GeV}$$\end{document}GeV and 350–550\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {GeV}$$\end{document}GeV for scalar and pseudoscalar bosons, respectively. Upper bounds are also set on couplings of the top quark to new light particles.


Introduction
The production of four top quarks (tt tt) is a rare standard model (SM) process, with a predicted cross section of σ (pp → tt tt) = 12.0 +2.2 −2.5 fb in proton-proton (pp) collisions at a center-of-mass energy of 13 TeV, as calculated at nextto-leading-order (NLO) accuracy for both quantum chromodynamics and electroweak interactions [1]. Representative leading-order (LO) Feynman diagrams for SM production of tt tt are shown in Fig. 1.
The tt tt cross section can be used to constrain the magnitude and CP properties of the Yukawa coupling of the top quark to the Higgs boson [2,3]. Moreover, tt tt produce-mail: cms-publication-committee-chair@cern.ch tion can be significantly enhanced by beyond-the-SM (BSM) particles and interactions. New particles coupled to the top quark, such as heavy scalar and pseudoscalar bosons predicted in Type-II two-Higgs-doublet models (2HDM) [4][5][6] and by simplified models of dark matter (DM) [7,8], can contribute to σ (pp → tt tt) when their masses are larger than twice the mass of the top quark, with diagrams similar to Fig. 1 (right). Additionally, less massive particles can enhance σ (pp → tt tt) via off-shell contributions [9]. In the model-independent framework of SM effective field theory, four-fermion couplings [10], as well as a modifier to the Higgs boson propagator [11], can be constrained through a measurement of σ (pp → tt tt). Conversely, models with new particles with masses on the order of 1 TeV, such as gluino pair production in the framework of supersymmetry [12][13][14][15][16][17][18][19][20][21], are more effectively probed through studies of tt tt production in boosted events or by requiring very large imbalances in momentum.
Each top quark primarily decays to a bottom quark and a W boson, and each W boson decays to either leptons or quarks. As a result, the tt tt final state contains jets mainly from the hadronization of light (u, d, s, c) quarks (light-flavor jets) and b quarks (b jets), and can also contain isolated charged leptons and missing transverse momentum arising from emitted neutrinos. Final states with either two same-sign leptons or at least three leptons, considering W → ν ( = e or μ) and including leptonic decays of τ leptons, correspond to a combined branching fraction of approximately 12% [22]. The relatively low levels of background make these channels the most sensitive to tt tt events produced with SM-like kinematic properties [23].
Previous searches for tt tt production in 13 TeV pp collisions were performed by the ATLAS [24,25] and CMS [23,26,27] Collaborations. The most sensitive results, based on an integrated luminosity of approximately 36 fb −1 collected by each experiment, led to cross section measurements of 28.5 +12 −11 fb with an observed (expected) significance of , and 13 +11 −9 fb with an observed (expected) significance of 1.4 (1.1) standard deviations by CMS [23], both consistent with the SM prediction.
The analysis described in this paper improves upon the CMS search presented in Ref. [27], and supersedes the results, by taking advantage of upgrades to the CMS detector and by optimizing the definitions of the signal regions for the integrated luminosity of 137 fb −1 . The reference cross section for SM tt tt, 12.0 +2.2 −2.5 fb, used to determine the expected statistical significance of the search, as well as in interpretations for which SM tt tt is a background, includes NLO electroweak effects, in contrast to the 9.2 +2.9 −2.4 fb [28] used in the previous search. In addition to the analysis strategy used in the previous search, a new multivariate classifier is defined to maximize the sensitivity to the SM tt tt signal.

Background and signal simulation
Monte Carlo (MC) simulated samples at NLO are used to evaluate the signal acceptance for the SM tt tt process and to estimate the backgrounds from diboson (WZ, ZZ, Zγ, W ± W ± ) and triboson (WWW, WWZ, WZZ, ZZZ, WWγ, WZγ) processes. Simulated samples generated at NLO are also used to estimate backgrounds from associated production of single top quarks and vector bosons (tWZ, tZq, tγ), or tt produced in association with a single boson (tt W, tt Z, tt H, ttγ). Three separate sets of simulated events for each process are used in order to match the different data-taking conditions and algorithms used in 2016, 2017, and 2018. Most samples are generated using the Mad-Graph5_amc@nlo 2.2.2 (2.4.2) program [28] at NLO for 2016 samples (2017 and 2018 samples) with at most two additional partons in the matrix element calculations. In particular, the tt W sample is generated with up to one additional parton, and tt Z and tt H with no additional partons. The tt Z sample, which includes tt Z/γ * → , is generated with a dilepton invariant mass greater than 1 GeV. For the WZ sample used with 2016 conditions, as well as all ZZ and tt H samples, the powheg box v2 [29,30] program is used. The MadGraph5_amc@nlo generator at LO with up to three additional partons, scaled to NLO cross sections, is used to produce a subset of samples for some of the data taking periods: Wγ (2016), ttγ (2017 and 2018), tZq (2018), and tγ (2018) [28]. Other rare backgrounds, such as tt production in association with dibosons (tt WW, tt WZ, tt ZZ, tt WH, tt ZW, tt HH) and triple top quark production (tt t, tt tW), are generated using LO MadGraph5_amc@nlo without additional partons, and scaled to NLO cross sections [31]. The background from radiative top decays, with γ * → , was found to be negligible in this analysis.
The top quark associated production modes for a heavy scalar (H) or pseudoscalar (A) in the mass range of [350,650] GeV, ttH/A, tqH/A, and tWH/A, with subsequent decays of H/A into a pair of top quarks, are generated using LO MadGraph5_amc@nlo, with one additional parton for all but the tqH/A production mode. In the context of type-II 2HDM, these samples are scaled to LO cross sections obtained with MadGraph5_amc@nlo model, "2HDMtII" [32,33]. For the choice tan β = 1 in the alignment limit [34], where tan β represents the ratio of vacuum expectation values of the two Higgs doublets, these cross sections reproduce those of Ref. [6], which were also used in the previous CMS result [27]. In the context of simplified models of dark matter, these samples are scaled to LO cross sections obtained with the model used in Ref. [35], which includes kinematically accessible decays of the mediator into a pair of top quarks. The processes are simulated in the narrowwidth approximation, suitable for the parameter space studied here, in which the width of the mediator is 5% of its mass or less. Samples and cross sections used for constraining the modified Higgs boson propagator are generated using

The CMS detector and event reconstruction
The central feature of the CMS detector is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant variables, can be found in Ref. [46].
Events of interest are selected using a two-tiered trigger system [47]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 μs. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage.
The reconstructed vertex with the largest value of summed physics-object squared-transverse-momentum is taken to be the primary pp interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [48,49] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the transverse momentum ( p T ) of those jets.
The particle-flow algorithm [50] aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. The energy of photons is directly obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with the electron track [51]. The momentum of muons is obtained from the curvature of the corresponding track, combining information from the silicon tracker and the muon system [52]. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers. The energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
Hadronic jets are clustered from neutral PF candidates and charged PF candidates associated with the primary vertex, using the anti-k T algorithm [48,49] with a distance parameter of 0.4. The jet momentum is determined as the vectorial sum of all PF candidate momenta in the jet. An offset correction is applied to jet energies to take into account the contribution from pileup [53]. Jet energy corrections are derived from simulation and are improved with in situ measurements of the energy balance in dijet, multijet, γ +jet, and leptonically decaying Z+jet events [54,55]. Additional selection criteria are applied to each jet to remove jets potentially affected by instrumental effects or reconstruction failures [56]. Jets originating from b quarks are identified as b-tagged jets using a deep neural network algorithm, DeepCSV [57], with a working point chosen such that the efficiency to identify a b jet is 55-70% for a jet p T between 20 and 400 GeV. The misidentification rate is approximately 1-2% for light-flavor and gluon jets and 10-15% for charm jets, in the same jet p T range. The vector p miss T is defined as the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all reconstructed PF candidates in an event [58]. Its magnitude, called missing transverse momentum, is referred to as p miss T .

Event selection and search strategy
The identification, isolation, and impact parameter requirement with respect to the primary vertex, imposed on electrons and muons are the same as those of Ref.
[27] when analyzing the 2016 data set, while for the 2017 and 2018 data sets the identification of electrons and the isolation of both electrons and muons are modified to take into account the increased pileup. For electrons, identification is based on a multivariate discriminant using shower shape and track quality variables, while muon identification is based on the quality of the geometrical matching between measurements in the tracker and the muon system. The isolation requirement, introduced in Ref.
[59], is designed to distinguish the charged leptons produced in W and Z decays ("prompt leptons") from the leptons produced in hadron decays or in conversions of photons in jets, as well as hadrons misidentified as leptons (collectively defined as "nonprompt leptons"). The requirements to minimize charge misassignment are the same as in Ref.
[27]: muon tracks are required to have a small uncertainty in p T and electron tracks are required to have the same charge as that obtained from comparing a linear projection of the pixel detector hits to the position of the calorimeter deposit. The combined efficiency to reconstruct and identify leptons is in the range of 45-80 (70-90)% for electrons (muons), increasing as a function of p T and reaching the maximum value for p T > 60 GeV. For the purpose of counting leptons and jets, the following requirements are applied: the number of leptons (N ) is defined to be the multiplicity of electrons and muons with p T > 20 GeV and either |η| < 2.5 (electrons) or |η| < 2.4 (muons), the number of jets (N jets ) counts all jets with p T > 40 GeV and |η| < 2.4, and the number of btagged jets (N b ) counts b-tagged jets with p T > 25 GeV and |η| < 2.4. In order to be included in N jets , N b , and the H T variable, which is defined as the scalar p T sum of all jets in an event, jets and b-tagged jets must have an angular separation ΔR > 0.4 with respect to all selected leptons. This angular separation is defined as ΔR = √ (Δη) 2 + (Δφ) 2 , where Δη and Δφ are the differences in pseudorapidity and azimuthal angle, respectively, between the directions of the lepton and the jet.
Events were recorded using either a dilepton+H T (2016) or a set of dilepton triggers (2017 and 2018). The dilepton+H T trigger requires two leptons with p T > 8 GeV and a minimum H T requirement that is fully efficient with respect to the offline requirement of 300 GeV. The dilepton triggers require either two muons with p T > 17 and 8 GeV, two electrons with p T > 23 and 12 GeV, or an eμ pair with p T > 23 GeV for the higherp T (leading) lepton and p T > 12 (8) GeV for the lowerp T (trailing) electron (muon). The trigger efficiency within the detector acceptance is measured in data to be greater than 90% for ee, eμ, and μμ events, and nearly 100% for events with at least three leptons.
We define a baseline selection that requires H T > 300 GeV and p miss T > 50 GeV, two or more jets (N jets ≥ 2) and b-tagged jets (N b ≥ 2), a leading lepton with p T > 25 GeV, and a trailing lepton of the same charge with p T > 20 GeV. Events with same-sign electron pairs with an invariant mass below 12 GeV are rejected to reduce the background from production of low-mass resonances with a charge-misidentified electron. Events where a third lepton with p T > 7 (5) GeV for electrons (muons) forms an opposite-sign (OS) same-flavor pair with an invariant mass below 12 GeV or between 76 and 106 GeV are also rejected. Inverting this resonance veto, the latter events are used to populate a tt Z background control region (CRZ) if the invariant mass is between 76 and 106 GeV and the third lepton has p T > 20 GeV. After this baseline selection, the signal acceptance is approximately 1.5%, including branching fractions.
Events passing the baseline selection are split into several signal and control regions, following two independent approaches. In the first analysis, similarly to Ref.
[27] and referred to as "cut-based", the variables N jets , N b , and N are used to subdivide events into 14 mutually exclusive signal regions (SRs) and a control region (CR) enriched in tt W background (CRW), to complement the CRZ defined above, as detailed in Table 1. In the boosted decision tree (BDT) analysis, the CRZ is the only control region, and the remaining events are subdivided into 17 SRs by discretizing the discriminant output of a BDT trained to separate tt tt events from the sum of the SM backgrounds.
The BDT classifier utilizes a gradient boosting algorithm to train 500 trees with a depth of 4 using simulation, and is based on the following 19 variables: N jets , N b , N , p miss T , H T , two alternative definitions of N b based on b tagging working points tighter or looser than the default one, the scalar p T sum of b-tagged jets, the p T of the three leading leptons, of the leading jet and of the sixth, seventh, and eighth jets, the azimuthal angle between the two leading leptons, the invariant mass formed by the leading lepton and the leading jet, the charge of the leading lepton, and the highest ratio of the jet mass to the jet p T in the event (to provide sensitivity to boosted, hadronically-decaying top quarks and W bosons). Three of the most performant input variables, N jets , N b , and N , correspond to the variables used for the cut-based analysis. Top quark tagging algorithms to identify hadronically decaying top quarks based on invariant masses of jet combinations, similarly to Ref.
[23], were also tested, but did not improve the expected sensitivity. Such algorithms could only contribute in the handful of events where all the top quark decay products were found, and these events already have very small background yields. In each analysis, the observed and predicted yields in the CRs and SRs are used in a maximum likelihood fit with nuisance parameters to measure σ (pp → tt tt), following the procedure described in Sect. 7.

Backgrounds
In addition to the tt tt signal, several other SM processes result in final states with same-sign dileptons or at least three leptons, and several jets and b jets. These backgrounds primarily consist of processes where tt is produced in association with additional bosons that decay to leptons, such as tt W, tt Z, and tt H (mainly in the H → WW channel), as well as dilepton tt events with a charge-misidentified prompt-lepton and single-lepton tt events with an additional nonprompt lepton.
Inverted resonance veto CRZ The prompt-lepton backgrounds, dominated by tt W, tt Z, and tt H, are estimated using simulated events. Dedicated CRs are used to constrain the normalization for tt W (cutbased analysis) and tt Z (cut-based and BDT analyses), while for other processes described in the next paragraph, the normalization is based on the NLO cross sections referenced in Sect. 2.
Processes with minor contributions are grouped into three categories. The "ttVV" category includes the associated production of tt with a pair of bosons (W, Z, H), dominated by ttWW. The "Xγ" category includes processes where a photon accompanies a vector boson, a top quark, or a top-antitop quark pair. The photon undergoes a conversion, resulting in the identification of an electron in the final state. The category is dominated by ttγ, with smaller contributions from Wγ, Zγ, and tγ. Finally, the "Rare" category includes all residual processes with top quarks (tZq, tWZ, ttt, and tttW) or without them (WZ, ZZ, W ± W ± from single-and doubleparton scattering, and triboson production).
Since the tt W, tt Z, and tt H processes constitute the largest backgrounds to tt tt production, their simulated samples are corrected wherever possible to account for discrepancies observed between data and MC simulation. To improve the MC modeling of the additional jet multiplicity from initialstate radiation (ISR) and final-state radiation (FSR), simulated tt W and tt Z events are reweighted based on the number of ISR or FSR jets (N ISR/FSR jets ). The reweighting is based on a comparison of the light-flavor jet multiplicity in dilepton tt events in data and simulation, where the simulation is performed with the same generator settings as those of the tt W and tt Z samples. The method requires exactly two jets identified as originating from b quarks in the event and assumes that all other jets are from ISR or FSR. The N ISR/FSR jets reweighting factors vary within the range of [0.77, 1.46] for N ISR/FSR jets between 1 and 4. This correction is not applied to tt H (H → WW) events, which already have additional jets from the decay of the additional W bosons. In addition to the ISR or FSR correction, the tt W, tt Z, and tt H simulation is corrected to improve the modeling of the flavor of additional jets, based on the measured ratio of the tt bb and tt jj cross sections, 1.7±0.6 , reported in Ref.
[60], where j represents a generic jet. This correction results in a 70% increase of events produced in association with a pair of additional b jets. Other topologies, such as those including c quarks, are negligible by comparison, and no dedicated correction is performed.
The nonprompt lepton backgrounds are estimated using the "tight-to-loose" ratio method [59]. The tight identification (for electrons) and isolation (for both electrons and muons) requirements of the SRs are relaxed to define a loose lepton selection, enriched in nonprompt leptons. The efficiency, TL , for nonprompt leptons that satisfy the loose selection to also satisfy the tight selection is measured in a control sample of single-lepton events, as a function of lepton flavor, p T , and |η|, after subtracting the prompt-lepton contamination based on simulation. The loose selection is chosen to ensure that TL remains stable across the main categories of nonprompt leptons specified in Sect. 4, allowing the same TL to be applied to samples with different nonprompt lepton composition. For leptons failing the tight selection, the p T variable is redefined as the sum of the lepton p T and the energy in the isolation cone exceeding the isolation threshold value. This parametrization accounts for the momentum spectrum of the parent parton (the parton that produced the nonprompt lepton), allowing the same TL to be applied to samples with different parent parton momenta with reduced bias. To estimate the number of nonprompt leptons in each SR, a dedicated set of application regions is defined, requiring at least one lepton to fail the tight selection while satisfying the loose one (loose-not-tight). Events in these regions are then weighted by a factor of TL /(1 − TL ) for each loosenot-tight lepton. To avoid double counting the contribution of events with multiple nonprompt leptons, events with two loose-not-tight leptons are subtracted, and the resulting total weight is used as a prediction of the nonprompt lepton yield.
The background resulting from charge-misidentified leptons is estimated using the charge-misidentification probability measured in simulation as a function of electron p T and |η|. This probability ranges between 10 −5 and 10 −3 for electrons and is at least an order of magnitude smaller for muons. Charge-misidentified muons are therefore considered negligible, while for electrons this probability is applied to a CR of OS dilepton events defined for each same-sign dilepton SR.
A single correction factor, inclusive in p T and |η|, is applied to the resulting estimate to account for differences between data and simulation in this probability. A correction factor, derived from a control sample enriched in Z → e + e − events with one electron or positron having a misidentified charge, is very close to unity for the 2016 simulation, while it is approximately 1.4 for the 2017 and 2018 simulation. Even with the larger correction factors, the charge-misidentification probability is smaller in 2017 and 2018 than in 2016, due to the upgraded pixel detector [61].

Uncertainties
Several sources of experimental and theoretical uncertainty related to signal and background processes are considered in this analysis. They are summarized, along with their estimated correlation treatment across the 2016, 2017, and 2018 data sets, in Table 2. Most sources of uncertainties affect simulated samples, while the backgrounds obtained using control samples in data (charge-misidentified and nonprompt leptons) have individual uncertainties described at the end of this section.
The uncertainties in the integrated luminosity are 2.5, 2.3, and 2.5% for the 2016, 2017, and 2018 data collection periods, respectively [62][63][64]. Simulated events are reweighted to match the distribution of the number of pileup collisions per event in data. This distribution is derived from the instantaneous luminosity and the inelastic cross section [65], and uncertainties in the latter are propagated to the final yields, resulting in yield variations of at most 5%.
The efficiency of the trigger requirements is measured in an independent data sample selected using single-lepton triggers, with an uncertainty of 2%. The lepton reconstruction and identification efficiency is measured using a data sample enriched in Z → events [51,52], with uncertainties of up to 5 (3)% per electron (muon). The tagging efficiencies for b jets and light-flavor jets are measured in dedicated data samples [57], and their uncertainties result in variations between 1 and 15% of the signal region yields. In all cases, simulated events are reweighted to match the efficiencies measured in data. The uncertainty associated with jet energy corrections results in yield variations of 1-15% across SRs. Uncertainties in the jet energy resolution result in 1-10% variations [54].
As discussed in Sect. 5, we correct the distribution of the number of additional jets in tt W and tt Z samples, with reweighting factors varying within the range of [0.77, 1.46]. We take one half of the differences from unity as the systematic uncertainties in these factors, since they are measured in a tt sample, but are applied to different processes. These uncertainties result in yield variations up to 8% across SRs. Similarly, events with additional b quarks in tt W, tt Z, and tt H are scaled by a factor of 1.7±0.6, based on the CMS mea- Table 2 Summary of the sources of uncertainty, their values, and their impact, defined as the relative change of the measurement of σ (tttt) induced by one-standard-deviation variations corresponding to each uncertainty source considered separately. The first group lists experimental and theoretical uncertainties in simulated signal and background processes. The second group lists normalization uncertainties in the estimated backgrounds. Uncertainties marked (not marked) with a † in the first column are treated as fully correlated (fully uncorrelated) across the 3 years of data taking For background processes, uncertainties in the normalization (number of events passing the baseline selection) and shape (distribution of events across SRs) are considered, while for signal processes, the normalization is unconstrained, and instead, we consider the uncertainty in the acceptance (fraction of events passing the baseline selection) and shape. For each of the Rare, Xγ, and ttVV categories, normalization uncertainties are taken from the largest theoretical cross section uncertainty in any constituent physics process, resulting in uncertainties of 20%, 11%, and 11%, respectively. For the tt W and tt Z processes, we set an initial normalization uncertainty of 40%, but then allow the maximum-likelihood fit to constrain these backgrounds further using control samples in data. For tt H, we assign a 25% normalization uncertainty to reflect the signal strength, which is the ratio between the measured cross section of tt H and its SM expectation, of 1.26 +0.31 −0.26 measured by CMS [66]. The shape uncertainty resulting from variations of the renormalization and factorization scales in the event gener-  (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14), before fitting to data, where the last bins include the overflows. The hatched areas represent the total uncertainties in the SM signal and background predictions.
The tt tt signal assumes the SM cross section from Ref. [1]. The lower panels show the ratios of the observed event yield to the total prediction of signal plus background ators is smaller than 15% for backgrounds, and 10% for the tt tt and 2HDM signals, while the effect of the PDFs is only 1%. For the tt tt and 2HDM signals, the uncertainty in the acceptance from variations of the scales is 2%. The uncertainty in the scales that determine ISR and FSR, derived from tt tt samples, results in up to 6 and 10% uncertainties in signal acceptance and shape, respectively. When considering tt tt as a background in BSM interpretations, a cross section uncertainty of 20% (based on the prediction of 12.0 +2.2 −2.5 fb [1]) is additionally applied to the tt tt process.
The charge-misidentified and nonprompt-lepton backgrounds are assigned an uncertainty of 20 and 30%, respectively, where the latter is increased to 60% for nonprompt electrons with p T > 50 GeV. For the charge-misidentified lepton background, the uncertainty is based on the agree-ment observed between the prediction and data as a function of kinematic distributions, in a data sample enriched in Z → e + e − events with one electron or positron having a misidentified charge. For the nonprompt-lepton background, the uncertainty is based on the agreement observed in simulation closure tests of the "tight-to-loose" method using multijet, tt, and W+ jets samples. The contamination of prompt leptons, which is subtracted based on simulation, is below 1% in the application region, but it can be significant in the control sample where TL is measured, resulting in an uncertainty up to 50% in TL . The statistical uncertainty in the estimate based on control samples in data is taken into account for both backgrounds. It is negligible for the charge-misidentified lepton background, while for the nonprompt-lepton background it can be comparable or larger than the systematic uncertainty. and tt Z (lower) CRs, before fitting to data. The hatched areas represent the uncertainties in the SM signal and background predictions. The tt tt signal assumes the SM cross section from Ref. [1]. The lower panels show the ratios of the observed event yield to the total prediction of signal plus background Experimental uncertainties in normalization and shape are treated as fully correlated among the SRs for all signal and background processes. Two choices of correlation across years (uncorrelated or fully correlated) were tested for each experimental uncertainty, and their impact on the measurement of σ (tttt) was found to be smaller than 1%. For simplicity, these uncertainties are then treated as uncorrelated. Systematic uncertainties in the background estimates based on control samples in data and theoretical uncertainties in the normalization of each background process are treated as uncorrelated between processes but fully correlated among the SRs and across the 3 years. Scale and PDF uncertainties, as well as uncertainties in the number of additional b quarks, are correlated between processes, signal regions, and years. Statistical uncertainties due to the finite number of simulated events or control region events are considered uncorrelated.

Results
Distributions of the main kinematic variables (N jets , N b , H T , and p miss T ) for events in the baseline region, as defined in Sect. 4, are shown in Fig. 2 and compared to the SM background predictions. The N jets and N b distributions for the CRW and CRZ are shown in Fig. 3. The expected SM tt tt signal, normalized to its predicted cross section, is shown in both figures. The SM predictions are statistically consistent with the observations. A binned likelihood is constructed using the yields from the signal regions, the CRZ, as well as the CRW for the cut-based analysis only, incorporating the experimental and theoretical uncertainties described in Sect. 6 as "nuisance" parameters. The measured cross section for tt tt and the significance of the observation relative to the background-only Fig. 4 Observed yields in the control and signal regions for the cutbased (upper) and BDT (lower) analyses, compared to the post-fit predictions for signal and background processes. The hatched areas represent the total post-fit uncertainties in the signal and background predictions. The lower panels show the ratios of the observed event yield to the total prediction of signal plus background hypothesis are obtained from a profile maximum-likelihood fit, in which the parameter of interest is σ (pp → tt tt) and all nuisance parameters are profiled, following the procedures described in Refs. [22,67]. In addition, an upper limit at 95% confidence level (CL) is set on σ (pp → tt tt) using the modified frequentist CL s criterion [68,69], with the profile likelihood ratio test statistic and asymptotic approximation [70]. We verified the consistency between the asymptotic and fully toy-based methods. Alternatively, by considering the SM, including the tt tt process with the SM cross section and uncertainty [1], as the null hypothesis, the fit provides cross section upper limits on BSM processes with new scalar and pseudoscalar particles, as discussed in Sect. 8.
The values and uncertainties of most nuisance parameters are unchanged by the fit, but the ones significantly affected include those corresponding to the tt W and tt Z normalizations, which are both scaled by 1.3 ± 0.2 by the fit, in agreement with the ATLAS and CMS measurements of these processes [71][72][73]. The predicted yields after the maximumlikelihood fit (post-fit) are compared to data in Fig. 4 for the cut-based (upper) and BDT (lower) analyses, where the fitted tt tt signal contribution is added to the background predictions. The corresponding yields are shown in Tables 3 and  4 for the cut-based and BDT analysis, respectively. The tt tt cross section and the 68% CL interval is measured to be 9.4 +6.2 −5.6 fb in the cut-based analysis, and 12.6 +5.8 −5.2 fb in the BDT analysis. Relative to the background-only hypothesis, the observed and expected significances are 1.7 and 2.5 standard deviations, respectively, for the cut-based analysis, and 2.6 and 2.7 standard deviations for the BDT analysis. The observed 95% CL upper limits on the cross section are 20.0 fb in the cut-based and 22.5 fb in the BDT analyses. The corresponding expected upper limits on the tt tt cross section, assuming no SM tt tt contribution to the data, are 9.4 +4.3 −2.9 fb (cut-based) and 8.5 +3.9 −2.6 fb (BDT), a significant improvement relative to the value of 20.8 +11.2 −6.9 fb of Ref.
[27]. The BDT and cut-based observed results were found to be statistically compatible by using correlated toy pseudo-data sets. We consider the BDT analysis as the primary result of this paper, as it provides a higher expected measurement precision, and use the results from it for further interpretations in the following section.

Interpretations
This analysis is used to constrain SM parameters, as well as production of BSM particles and operators that can affect the tt tt production rate. The existence of tt tt Feynman diagrams with virtual Higgs bosons allows interpreting the upper limit on σ (pp → tt tt) as a constraint on the Yukawa coupling, y t , between the top quark and the Higgs boson [2,3]. Similarly, the measurement can be interpreted as a constraint on the Higgs boson oblique parameterĤ , defined as the Wilson coefficient of the dimension-six BSM operator modifying the Higgs boson propagator [11]. More generically, Feynman diagrams where the virtual Higgs boson is replaced by a virtual BSM scalar (φ) or vector (Z ) particle with mass smaller than twice the top quark mass (m < 2m t ), are used to interpret the result as a constraint on the couplings of such new particles [9]. In addition, new particles with m > 2m t , such as a heavy scalar (H) or pseudoscalar (A), can be produced on-shell in association with top quarks. They can subsequently decay into top quark pairs, generating final states with three or four top quarks. Constraints on the production of such heavy particles can be interpreted in terms of 2HDM When using our tt tt to determine a constraint on y t , we verified using a LO simulation that the signal acceptance is not affected by the relative contribution of the virtual Higgs boson Feynman diagrams. We take into account the dependence of the backgrounds on y t by scaling the tt H cross section by |y t /y SM t | 2 prior to the fit, where y SM t represents the SM value of the top quark Yukawa coupling. As a result of the tt H background rescaling, the measured σ (pp → tt tt) depends on |y t /y SM t |, as shown in Fig. 5. The measurement is compared to the theoretical prediction obtained from the LO calculation of Ref. [2], scaled to the 12.0 +2.2 −2.5 fb cross section obtained in Ref. [1], and including the uncertainty associated with doubling and halving the renormalization and factorization scales. Comparing the observed limit on σ (pp → tt tt) with the central, upper, and lower values of its theoretical prediction, we obtain 95% CL limits of |y t /y SM t | < 1.7, 1.4, and 2.0, respectively, an improvement over the previous CMS result [27]. Alternatively, assuming that the on-shell Yukawa The observed σ (pp → tt tt) (solid line) and 95% CL upper limit (hatched line) are shown as a function of |y t /y SM t |. The predicted value (dashed line) [2], calculated at LO and scaled to the calculation from Ref. [1], is also plotted. The shaded band around the measured value gives the total uncertainty, while the shaded band around the predicted curve shows the theoretical uncertainty associated with the renormalization and factorization scales coupling is equal to that of the SM, we do not rescale the tt H background with respect to its SM prediction, and obtain corresponding limits on the off-shell Yukawa coupling of |y t /y SM t | < 1.8, 1.5, and 2.1. Since y t affects the Higgs boson production cross section in both the gluon fusion and tt H modes, constraints on y t can also be obtained from a combination of Higgs boson measurements [74]. However, these constraints require assumptions about the total width of the Higgs boson, while the tt tt-based limit does not. For thê H interpretation, the BDT analysis is repeated using simu-  to account for small acceptance and kinematic differences, as described in Sect. 2. We rescale the tt H cross section by (1 −Ĥ ) 2 to account for itsĤ dependency [11]. This results in the 95% CL upper limit ofĤ < 0.12. For reference, the authors of Ref.
To study the off-shell effect of new particles with m < 2m t , we first consider neutral scalar (φ) and neutral vector (Z ) particles that couple to top quarks. Such particles are at present only weakly constrained, while they can give significant contributions to the tt tt cross section [9]. Having verified in LO simulation that these new particles affect the signal acceptance by less than 10%, we recalculate the σ (pp → tt tt) upper limit of the BDT analysis including an The excluded regions, shaded, are above the limit curves. The dashed lines show the limits assuming a weaker coupling between H/A and χ, g DM = 0.5 additional 10% uncertainty in the acceptance, and obtain the 95% CL upper limit of 23.0 fb on the total tt tt cross section, slightly weaker than the 22.5 fb limit obtained in Sect. 7. Comparing this upper limit to the predicted cross section in models where tt tt production includes a φ or a Z in addition to SM contributions and associated interference, we set limits on the masses and couplings of these new particles, shown in Fig. 6. These limits exclude couplings larger than 1.2 for m φ in the 25-340 GeV range and larger than 0.1 (0.9) for m Z = 25 (300) GeV.
We consider on-shell effects from new scalar and pseudoscalar particles with m > 2m t . At such masses, the production rate of these particles in association with a single top quark (tqH/A, tWH/A) becomes significant, so we include these processes in addition to ttH/A. As pointed out in Ref.
[6], these processes do not suffer significant interference with the SM tt tt process. To obtain upper limits on the sum of these processes followed by the decay H/A → tt, we use the BDT analysis and treat the SM tt tt process as a background. Figure 7 shows the excluded cross section as a function of the mass of the scalar (left) and pseudoscalar (right). Comparing these limits with the Type-II 2HDM cross sections with tan β = 1 in the alignment limit, we exclude scalar (pseudoscalar) masses up to 470 (550) GeV, improving by more than 100 GeV with respect to the previous CMS limits [26]. Alternatively, we consider the simplified model of dark matter defined in Ref. [35], which includes a Dirac fermion dark matter candidate, χ , in addition to H/A, and where the cou-plings of H/A to SM fermions and χ are determined by parameters g SM and g DM , respectively. In this model, exclusions similar to those from 2HDM are reached by assuming g SM = 1 and g DM = 1, and taking m H/A < 2m χ . Relaxing the 2HDM assumption of tan β = 1, Fig. 8 shows the 2HDM limit as a function of H/A mass and tan β, considering one new particle at a time and also including a scenario with m H = m A inspired by a special case of Type-II 2HDM, the hMSSM [75]. Values of tan β up to 0.8-1.6 are excluded, depending on the assumptions made. These exclusions are comparable to those of a recent CMS search for the resonant production of H/A in the p → H/A → tt channel [76]. Relaxing the m H/A < 2m χ assumption in the dark matter model, Fig. 9 shows the limit in this model as a function of the masses of both H/A and χ , for g DM = 1 and for two different assumptions of g SM . Large sections of the phase space of simplified dark matter models are excluded, and the reach of this analysis is complementary to that of analyses considering decays of H/A into invisible dark matter candidates, such as those of Refs. [35,77].

Summary
The standard model (SM) production of tt tt has been studied in data from √ s = 13 TeV proton-proton collisions collected using the CMS detector during the LHC 2016-2018 data-taking period, corresponding to an integrated luminosity of 137 fb −1 . The final state with either two same-sign leptons or at least three leptons is analyzed using two strategies, the first relying on a cut-based categorization in lepton and jet multiplicity and jet flavor, the second taking advantage of a multivariate approach to distinguish the tt tt signal from its many backgrounds. The more precise multivariate strategy yields an observed (expected) significance of 2.6 (2.7) standard deviations relative to the background-only hypothesis, and a measured value for the tt tt cross section of 12.6 +5.8 −5.2 fb. The results based on the two strategies are in agreement with each other and with the SM prediction of 12.0 +2.2 −2.5 fb [1]. The results of the boosted decision tree (BDT) analysis are also used to constrain the top quark Yukawa coupling y t relative to its SM value, based on the |y t | dependence of σ (pp → tt tt) calculated at leading order in Ref. [2], resulting in the 95% confidence level (CL) limit of |y t /y SM t | < 1.7. The Higgs boson oblique parameter in the effective field theory framework [11] is similarly constrained toĤ < 0.12 at 95% CL. Upper limits ranging from 0.1 to 1.2 are also set on the coupling between the top quark and a new scalar (φ) or vector (Z ) particle with mass less than twice that of the top quark (m t ) [9]. For new scalar (H) or pseudoscalar (A) particles with m > 2m t , and decaying to tt, their production in association with a single top quark or a top quark pair is probed. The resulting cross section upper limit, between 15 and 35 fb at 95% CL, is interpreted in the context of Type-II two-Higgs-doublet models [4-6,75] as a function of tan β and m H/A , and in the context of simplified dark matter models [7,8], as a function of m H/A and the mass of the dark matter candidate. addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF ( Grants 123842, 123959, 124845, 124850, 125105, 128713, 128786, and 129058 (Hungary); the Council of Science and Industrial Research, India; the HOMING PLUS program of the Foundation for Polish Science, cofinanced from European Union, Regional Development

Data Availability Statement
This manuscript has no associated data or the data will not be deposited. [Authors' comment: Release and preservation of data used by the CMS Collaboration as the basis for publications is guided by the CMS policy as written in its document "MS data preservation, re-use and open access policy" (https://cmsdocdb.cern.ch/cgi-bin/PublicDocDB/RetrieveFile? docid=6032&filename=CMSDataPolicyV1.2.pdf&version=2).] Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Funded by SCOAP 3 .