Search for Higgs boson pairs decaying to WW*WW*, WW*$\tau\tau$, and $\tau\tau\tau\tau$ in proton-proton collisions at $\sqrt{s}$ = 13 TeV

The results of a search for Higgs boson pair (HH) production in the WW*WW*, WW*$\tau\tau$, and $\tau\tau\tau\tau$ decay modes are presented. The search uses 138 fb$^{-1}$ of proton-proton collision data recorded by the CMS experiment at the LHC at a center-of-mass energy of 13 TeV from 2016 to 2018. Analyzed events contain two, three, or four reconstructed leptons, including electrons, muons, and hadronically decaying tau leptons. No evidence for a signal is found in the data. Upper limits are set on the cross section for nonresonant HH production, as well as resonant production in which a new heavy particle decays to a pair of Higgs bosons. For nonresonant production, the observed (expected) upper limit on the cross section at 95% confidence level (CL) is 21.3 (19.4) times the standard model (SM) prediction. The observed (expected) ratio of the trilinear Higgs boson self-coupling to its value in the SM is constrained to be within the interval $-$6.9 to 11.1 ($-$6.9 to 11.7) at 95% CL, and limits are set on a variety of new-physics models using an effective field theory approach. The observed (expected) limits on the cross section for resonant HH production range from 0.18 to 0.90 (0.08 to 1.06) pb at 95% CL for new heavy-particle masses in the range 250-1000 GeV.


Introduction
Since the discovery of the Higgs (H) boson [1][2][3], many of its properties have already been measured with high precision [4][5][6]. One important property that remains largely unknown is the H boson self-coupling. A precise measurement of this coupling is necessary to determine the shape of the Higgs potential, and thus verify that the mechanism breaking the electroweak gauge symmetry is indeed the Higgs mechanism [7][8][9][10][11][12] of the standard model (SM) [13][14][15]. The SM predicts the existence of both trilinear and quartic H boson self-couplings. Due to the very low predicted cross section for triple H boson production, the SM quartic self-coupling will not be experimentally accessible at the CERN LHC, even with the full integrated luminosity of 3000 fb −1 scheduled to be delivered after the high-luminosity LHC upgrade [16,17]. The strength of the trilinear self-coupling, however, can be determined using measurements of H boson pair (HH) production.
In the SM, most HH pairs are produced in two types of processes. The Feynman diagrams for the dominant "gluon fusion" (ggHH) process at leading order (LO) in perturbative quantum chromodynamics (QCD) are shown in Fig. 1. The left "triangle" diagram amplitude varies proportionally to the H boson self-coupling (λ) and the Yukawa coupling of the top quark (y t ), while the right "box" diagram amplitude is insensitive to λ and varies as y 2 t . The triangle and box diagrams interfere destructively, so the ggHH cross section exhibits a strong dependence on both λ and y t . The ggHH cross section in the SM has been computed to be 31.1 +2.1 −7.2 fb at next-to-next-to-LO (NNLO) accuracy in QCD using the FT approx scheme, in which the true top quark mass is used for the real radiation matrix elements, while the virtual part is computed using an infinite top quark mass [18]. The predicted SM cross section for the subdominant "vector boson fusion" (qqHH) process is 1.73 ± 0.04 fb at next-to-NNLO accuracy in QCD [19].  Figure 1: Leading order Feynman diagrams for SM nonresonant HH production via gluon fusion, including the "triangle" diagram (left) and the "box" diagram (right).
Deviations of the coupling strength modifiers κ λ = λ/λ SM and κ t = y t /y SM t from unity would affect both the rate of HH production and kinematic distributions of the HH signal. The HH invariant mass (m HH ) is particularly sensitive to changes in κ λ and κ t , as these couplings affect the triangle and box diagram amplitudes differently. Because SM ggHH and qqHH production do not include a heavy resonant particle, and typically result in a broad m HH distribution, they are referred to as "nonresonant". Changes in κ λ and κ t also influence the rate of single Higgs boson production and the Higgs boson decay branching fractions [20,21].
The presence of undiscovered particles or interactions, predicted by a variety of theoretical models beyond the SM, may alter the HH production rate as well as observable kinematic distributions. Such particles could give rise to loop diagrams similar to the one shown on the left of Fig. 1. These diagrams may significantly enhance the HH production rate, as they occur at the same loop level as HH production in the SM. Since no particles beyond those predicted by the SM have been observed so far, their mass may be at the TeV scale or higher, well above the scale of electroweak symmetry breaking. Loop contributions of such heavy particles can be approximated as contact interactions with the H boson using an effective field theory (EFT) approach [22,23]. Following Ref. [24], the contact interactions relevant for HH production are parametrized by the couplings c g , c 2g , and c 2 , referring to the interactions between two gluons and one H boson, two gluons and two H bosons, and two top quarks and two H bosons, respectively. The corresponding Feynman diagrams for ggHH production are shown in Fig. 2. The LO diagrams for qqHH production contain no gluons or top quarks, so the impacts of c g , c 2g , and c 2 are only considered in the ggHH signal in this publication. An excess of HH signal events may also result from decays of new heavy particles, denoted as X, into pairs of H bosons. Various theoretical models of new physics postulate such decays, in particular two-Higgs-doublet models [25,26], composite-Higgs models [27,28], Higgs portal models [29,30], and models inspired by warped extra dimensions [31]. In the last class of models, the new heavy particles may have spin 0 ("radions") or spin 2 ("gravitons") [32]. In this paper, the resulting "resonant" HH production is sought for mass values of X from 250 to 1000 GeV, and the width of X is assumed to be negligible compared to the experimental resolution in m HH . This would create a peak in the reconstructed m HH distribution around the mass m X of the resonance. The Feynman diagram for this process is shown in Fig. 3. For resonance masses above 1 TeV the strongest constraints are given by searches for HH production targeting H boson decays to bottom quarks [33][34][35], as the selection and reconstruction efficiency for hadronic decays increases, in particular in the trigger, and relevant backgrounds decrease with energy. For leptonic decay modes, the selection and reconstruction efficiency in general is high and as such do not increase notably for high masses above 1 TeV. Phenomenological studies of the prospects for discovering HH signal in the WW * WW * decay mode are documented in Refs. [36][37][38][39][40], where the symbol * denotes virtual particles. The ATLAS Collaboration published results of a search for nonresonant and resonant HH pairs decaying to WW * WW * based on 36 fb −1 of proton-proton (pp) collision data recorded at √ s = silicon tracker measures charged particles within the pseudorapidity range |η| < 2.5 for data recorded in 2016, and within the range |η| < 3.0 for data recorded in 2017 and 2018. The ECAL is a fine-grained hermetic calorimeter with quasi-projective geometry, and is divided into a barrel region covering |η| < 1.5, and two endcaps that extend to |η| = 3.0. The HCAL barrel and endcaps similarly cover the region |η| < 3.0. Forward calorimeters extend beyond these endcaps to |η| = 5.0. Muons are detected within the range |η| < 2.4 by gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. Collision events of interest are selected using a two-tiered trigger system. The level-1 trigger, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select less than 100 kHz of events from a 40 MHz base event rate, within a fixed latency of 4 µs [56]. The second tier, known as the high-level trigger, is a processor farm which runs a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage [57]. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the most relevant kinematic variables, can be found in Ref. [58].

Data samples and Monte Carlo simulation
The analyzed pp collision data correspond to an integrated luminosity of 138 fb −1 , collected by the CMS detector over three years: 36 fb −1 in 2016, 42 fb −1 in 2017, and 60 fb −1 in 2018 [59][60][61]. This analysis uses triggers requiring one or more reconstructed e, µ, or τ h candidates to be associated with the same collision vertex. The exact triggers and their thresholds varied slightly from year to year because of changes in luminosity and detector conditions, as well as improvements to the trigger algorithms. The transverse momentum (p T ) thresholds imposed by the trigger on the "leading" (highest p T ), "subleading" (second-highest p T ), and third e, µ, or τ h , and the corresponding η requirements for each year are shown in Table 1. All triggers include identification and isolation requirements on the e, µ, and τ h candidates [57]. When combined, the triggers achieve an efficiency of 95-100% for simulated SM HH signal events in each of the seven search categories.
Monte Carlo (MC) simulated samples are used to model HH signal events and a wide range of SM background processes that produce final states with e, µ, or τ h . Background MC samples include processes producing a single W or Z boson, two bosons (WW, WZ, ZZ, Wγ, and Zγ), three bosons (WWW, WWZ, WZZ, ZZZ, and WZγ), a single H boson (via gluon fusion, vector boson fusion, or associated production with a W or Z boson), a single top quark, a top quarkantiquark pair (tt), and top quarks associated with one or more bosons (ttW, ttZ, ttH, tHq, and tHW). All MC samples were generated using either MADGRAPH5 aMC@NLO v2 [62,63], POWHEG v2 [64][65][66], MCFM v7 [67][68][69], or PYTHIA v8.2 [70]. All samples that include a H boson were produced for a H boson mass of 125 GeV. Specific details of the simulated processes are summarized in Table 2.
The parton distribution functions (PDFs) of the proton are modeled using the NNPDF3.0 and NNPDF3.1 PDF sets [85][86][87][88][89]. Parton shower, hadronization processes, and τ decays are modeled by PYTHIA, using the tunes CP5, CUETP8M1, CUETP8M2, or CUETP8M2T4 [90][91][92], depending on the process and the data-taking period that is being modeled. The matching of matrix elements to parton showers is performed using the MLM scheme [93] for the LO samples and the FxFx scheme [94] for the NLO samples. The interactions of particles with the CMS detector material was simulated in detail using GEANT4 [95]. Simulated events were reconstructed using the same procedure as in data. The response of the trigger is included in the simulation. Additional pp interactions (pileup) were generated with PYTHIA and overlaid on Triple e p T (e) > 16,12,8 GeV Two e + µ p T (e) > 12, 12 GeV, p T (µ) > 8 GeV Two µ + e p T (µ) > 9, 9 GeV, p T (e) > 9 GeV Triple µ p T (µ) > 12, 10, 5 GeV all MC events, with event weights used to match the collision multiplicity to the distribution inferred from data. Residual differences between data and simulation are rectified by applying corrections to simulated events.
A variety of HH signal samples were generated at LO and NLO accuracy in QCD to simulate nonresonant HH production, covering the ggHH and qqHH production processes, with the H bosons decaying to either WW * , ZZ * , or ττ. The NLO samples are used to extract the rate of the HH signal from the data, while LO samples with a larger number of simulated events are used to train machine learning algorithms. Separate ggHH samples are produced for SM HH production and for a total of twelve EFT benchmark (BM) scenarios in the Higgs Effective Field Theory (HEFT) approach [24]. These benchmarks, along with the seven benchmarks from Ref. [96], represent different combinations of κ λ , κ t , c g , c 2g , and c 2 HEFT parameter values, and are chosen to probe distinct classes of HH kinematic configurations. These benchmarks are referred to as JHEP04 BM1-12, and JHEP03 BM1-7, respectively. The benchmark JHEP04 BM8 is complemented by a modified version of this benchmark, published in Ref. [97], denoted as JHEP04 BM8a. The parameter values of these twenty BM scenarios are shown in Table 3. The values of the c g and c 2g couplings published in Ref. [96] have been scaled by factors of 1.5 and −3, respectively, to convert them to the convention introduced for these couplings in Ref. [24]. In order to increase the number of simulated events and to model kinematic configurations not explicitly generated, such as JHEP03 BM1-7, the ggHH samples are merged and the events in the merged samples are reweighted, using the procedure documented in Ref. [98], to match the distributions in m HH and |cos θ * | computed at NLO accuracy and published in Ref. [97]. This procedure is applied to the LO and NLO ggHH samples separately. The symbol cos θ * denotes the cosine of the polar angle of one H with respect to the beam axis in the HH rest frame. The qqHH samples are produced only for SM HH production.

Event reconstruction
The CMS particle-flow (PF) algorithm [99] aims to reconstruct and identify each individual particle in an event, using an optimized combination of information from the various elements of the CMS detector. The particles are subsequently classified into five mutually exclusive types: electrons, muons, photons, and charged and neutral hadrons. These particles are then combined to reconstruct hadronic τ decays, jets, and the missing transverse momentum in the event.
The candidate vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. The physics objects used for this determination are the jets, clustered using the infrared and collinear safe anti-k T algorithm [100,101], with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets.
Electrons are reconstructed within the geometric acceptance of the tracking detectors (|η| < 2.5) by combining information from the tracker and the ECAL [102]. They are initially identified using an MVA classifier which distinguishes real electrons from hadrons, along with requirements that the track be associated with the collision vertex, and limits on hadronic energy deposits separated by ∆R < 0.4 from the electrons (their "isolation"). The angular separation Table 3: Parameter values for κ λ , κ t , c 2 , c g , and c 2g in MC samples modeling twenty benchmark scenarios in the EFT approach, plus SM HH production.
where the symbol φ refers to the azimuthal angle of the particle. Electrons passing this initial selection are referred to as "loose". In this analysis, events with electrons originating from hadron decays ("nonprompt"), or with hadrons misidentified as electrons, constitute the largest source of background. This motivates the use of an additional MVA classifier, which is trained to select "prompt" electrons from W, Z, and τ lepton decays, and to reject nonprompt or misidentified electrons. This MVA classifier was previously used for measurements of ttH production in events with multiple leptons [103]. It combines observables comparing measurements of the electron energy and direction in the tracker and the ECAL, the compactness of the electron cluster, the bremsstrahlung emitted along the electron trajectory, and the electron isolation. Two levels of thresholds on the output of this MVA classifier are used in the analysis, referred to as the "tight" and "medium" electron selections for the more and less restrictive thresholds, respectively. The tight selection has an average efficiency of 60% for electrons from SM HH decays. Only the electrons passing the tight selections are used to reconstruct signal candidate events, while data events with electrons passing the medium selections and failing the tight selections are used to estimate the contribution of misidentified-and nonprompt-electron backgrounds in each search category. Compared to Ref.
[103], this analysis uses lower thresholds on the MVA classifier output for the medium and tight electron selections, in order to increase the efficiency in particular for low-p T electrons, which frequently appear in the HH signal events studied in this analysis. Electrons from photon conversions in the tracker are suppressed by requiring that the track is missing no hits in the innermost layers of the silicon tracker, and is not matched to a reconstructed conversion vertex. In the 2 ss category, further electron selection criteria are applied, which require agreement among three independent measurements of the electron charge, including the Gaussian sum filter and Kalman filter track curvatures, as well as the ECAL supercluster position [104]. The remaining charge misidentification rate is measured to be less than 0.1% for |η| < 1.479, and under 0.4% for |η| > 1.479. The charge quality requirement reduces the electron identification efficiency by about 4%.
Muons are reconstructed by extrapolating tracks in the silicon tracker to hits in the gas-ionization muon detectors embedded in the steel flux-return yoke outside the solenoid [105]. To pass the initial loose identification requirement for this analysis, muons must satisfy criteria related to isolation and track proximity to the primary interaction vertex, as well as track quality observables and matching between the tracker and muon chambers. Additional requirements on the prompt vs. nonprompt muon identification MVA classifier from Ref.
[103] serve to select muons passing a tight selection for signal candidate events, and a medium selection for nonprompt background estimation. Inputs to this MVA classifier include energy deposits close to the muon in the ECAL and HCAL, the hits and track segments reconstructed in the muon detectors located outside the solenoid, the quality of the spatial matching between the track segments reconstructed in the silicon tracker and in the muon detectors, and the isolation of the muon with respect to other particles. Again, lower selection thresholds on the MVA classifier output compared to Ref.
[103] bring higher efficiency for the HH signal, amounting to 80% per muon in simulated SM HH events for the tight selection. In the 2 ss channel, the uncertainty in the curvature of the muon track is required to be less than 20% to ensure a highquality charge measurement [103]. This requirement reduces the muon identification efficiency by about 2%.
Hadronic decays of tau leptons are identified using the "hadrons-plus-strips" algorithm [106]. This algorithm classifies individual hadronic decay modes of the τ by combining charged hadrons from the PF reconstruction with neutral pions. The latter are reconstructed by clustering electrons and photons into rectangular strips, which are narrow in η but wide in the φ direction. The spread in φ accounts for photons originating from neutral pion decays that convert into electron-positron (e − e + ) pairs while traversing the silicon tracker. The e − and e + are bent in opposite directions in φ by the magnetic field, and may further emit bremsstrahlung photons before reaching the ECAL. The decay modes considered in this analysis produce one charged pion or kaon plus up to two neutral pions (collectively referred to as "one-prong" τ h ), or three charged pions or kaons plus zero or one neutral pion (referred to as "three-prong" τ h ). The DEEPTAU algorithm [107] distinguishes true τ h objects from quark and gluon jets, electrons, and muons using a convolutional artificial neural network (NN) [108] with 42 high-level observables as input, together with low-level information obtained from the silicon tracker, ECAL, HCAL, and the muon detectors. The former include the p T , η, φ, and mass of the τ h candidate, the reconstructed τ h decay mode, its isolation with respect to charged and neutral particles, and the estimated distance that the τ lepton traverses between its production and decay. For three-prong τ h candidates, this distance is determined by reconstructing the decay vertex, while for one-prong τ h candidates, the transverse impact parameter of the charged pion track with respect to the primary pp interaction vertex is used as an estimate of the distance. The low-level information quantifies the particle activity within two η × φ grids, centered on the direction of the τ h candidate: an inner grid of size 0.2 × 0.2, filled with 0.02 × 0.02 cells, and an outer grid of size 0.5 × 0.5 (partially overlapping with the inner grid), with 0.05 × 0.05 cells. Selected τ h candidates in this analysis must have p T > 20 GeV and |η| < 2.3, and are subjected to two levels of thresholds on the NN output that separates τ h from quark and gluon jets, referred to as the tight and medium τ h selections, respectively.
Hadronic jets (j) are reconstructed with the anti-k T algorithm using the particles reconstructed with the PF algorithm as input, and serve to identify H → WW * → jj ν decays in this anal-ysis. Jets reconstructed with size parameters of 0.4 ("small-radius jets") and 0.8 ("large-radius jets") are both used: two small-radius jets to reconstruct the two quarks from low-p T W boson decays, or a single large-radius jet to reconstruct high-p T W boson decays, where the quarks are collimated. Overlap between small-radius jets and electrons, muons, and τ h is resolved by discarding those small-radius jets that contain one or more PF particles matched to an electron, a muon, or a constituent of a τ h passing the medium selection criteria. In case of large-radius jets, electrons and muons passing the loose selection are removed from the collection of PF particles used as input to the jet reconstruction, so that leptons produced in H → WW * → jj ν decays of Lorentz-boosted H bosons are not clustered into those jets.
The effect of pileup on the reconstruction of large-radius jets is mitigated by applying the pileup per particle identification algorithm (PUPPI) [109,110] to the collection of particles used as input to the jet reconstruction. For small-radius jets, the effect of pileup is reduced by removing charged particles identified with pileup vertices from the jet reconstruction, and applying corrections to the jet energy to account for neutral particles from pileup.
After calibration, the jet energy resolution at the central rapidities amounts to 15-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV [111]. This analysis only considers jets reconstructed in the region |η| < 2.4. Small-radius jets must have p T > 25 GeV, while large-radius jets must have p T > 170 GeV. Additional criteria requiring that each large-radius jet contain exactly two identifiable, energetic subjets are applied to specifically select those from boosted hadronic W boson decays [112].
Events containing small-radius jets identified with the hadronization of bottom quarks (b jets) are vetoed in this analysis. The DEEPJET algorithm [113] exploits observables related to the long lifetime of b hadrons and the higher particle multiplicity and mass of b jets compared to light quark and gluon jets. Both "loose" and "medium" b jet selections on the DEEPJET output are employed in this analysis, corresponding to b jet selection efficiencies of 84 and 70%, while the misidentification rates for light-quark or gluon jets are 11 and 1.1%, respectively.
The missing transverse momentum vector p miss T is computed as the negative vector p T sum of all the particles reconstructed by the PF algorithm in an event, and its magnitude is denoted as p miss

Event selection
Events are selected with the aim of maximizing the acceptance for HH decays to WW * WW * , WW * ττ, and ττττ, while simultaneously rejecting the large backgrounds from multijet production, single and pair production of W and Z bosons, and tt production. To achieve this, each event must contain multiple reconstructed or τ h associated with the primary interaction vertex. The and τ h may originate from the decay of a W boson or a τ lepton. Seven mutually exclusive search categories, distinguished by the number of reconstructed and τ h candidates, are included in the analysis: 2 ss, 3 , 4 , 3 + 1τ h , 2 + 2τ h , 1 + 3τ h , and 4τ h . Here "ss" indicates a same-sign pair, with two leptons of identical electric charge. The and τ h candidates selected in any of the seven search categories must pass the tight selection criteria described in Section 4. In addition, they are required to pass category-specific p T thresholds motivated by the trigger selection. Further requirements are placed on the sum of and τ h charges, and, in two categories, on the discriminant p miss,LD T and the multiplicity of jets.
The leading and subleading leptons in the 2 ss category must pass p T selection thresholds of 25 and 15 GeV, respectively. Events in this category are required to contain two or more smallradius jets, or at least one large-radius jet, targeting hadronic W boson decays. Dielectron events must have p miss,LD T > 30 GeV and m( ) < 81 GeV or m( ) > 101 GeV, in order to suppress charge-misidentified Z → ee background. If the event contains a τ h , the charge of the τ h must be opposite to the charge of the leptons. After this selection, the main backgrounds in the 2 ss category arise from WZ production, from Wγ events in which the photon converts into an e − e + pair and either the e − or the e + is not reconstructed, and from events in which one or both reconstructed leptons are due to a nonprompt or a misidentified hadron, as shown in Table 6. The "other" background given in the table is dominated by same-sign W boson pairs and WWW production. The WW * WW * decay mode accounts for roughly 70% of SM HH signal events selected in the 2 ss category, with WW * ττ events accounting for the other 30%.
In the 3 category, the leading, subleading, and third are required to have p T values greater than 25, 15, and 10 GeV, respectively, and the sum of their charges must be either +1 or −1. At least one small-or large-radius jet must be present, and the p miss,LD T quantity must be greater than 30 GeV, or 45 GeV if there is at least one same-flavor opposite-sign (SFOS) pair in the event. Again, backgrounds are dominated by WZ production and events with misidentified . Notable contributions to the "other" background arise from WWW and WWZ production. The signal composition is similar to the 2 ss category.
The 4 category has identical lepton selection criteria to the 3 category, except that the third must have p T > 15 GeV, and a fourth with p T > 10 GeV is required, and the sum of the four lepton charges is required to be equal to zero. In this category and all the remaining categories, there are no selection requirements on jets or p miss,LD T . Almost 70% of signal events come from the WW * WW * decay mode, and about 30% from WW * ττ, while ZZ production accounts for 85% of the background. Events in the 3 + 1τ h category are required to satisfy the 3 criteria on the objects, except that an additional τ h with p T > 20 GeV and charge opposite to the sum of the charges is required. Background events in which the reconstructed τ h fails a loose selection on the NN output of the DEEPTAU algorithm that separates τ h from electrons, or falls near the ECAL barrel-endcap transition region in 1.460 < |η| < 1.558 are removed. About 70% of signal events come from the WW * ττ decay mode, while ZZ production and events with at least one misidentified or τ h dominate the background.
In the 2 + 2τ h category, the leading and subleading are required to pass p T thresholds of 25 and 15 GeV, while the two τ h must have p T > 20 GeV. The sum of plus τ h charges is required to be zero. Signal contributions are mostly from the WW * ττ (60%) and ττττ (40%) decay modes, while background contributions arise from ZZ production and events with a misidentified or τ h candidate.
In the 1 + 3τ h category, the is required to satisfy the conditions |η| < 2.1 and p T > 20 (15) GeV if it is an electron (muon). The leading, subleading, and third τ h must have p T > 40, 30, and 20 GeV, respectively, and the sum of τ h and charges is required to be zero. Background events containing a Z → ee decay where one electron is misidentified as a τ h are vetoed by discarding events containing an e-τ h pair of opposite charge and mass 71 < m(eτ h ) < 101 GeV, and in which the τ h either fails a loose selection on the discriminant that separates τ h from electrons, or falls into the region 1.460 < |η| < 1.558. Around 80% of HH signal events selected in the 1 + 3τ h category are from ττττ and 20% from the WW * ττ decay mode, while the majority of background events stem from ZZ production or contain a misidentified or τ h .
The 4τ h category requires the leading and subleading τ h to pass p T thresholds of 40 and 30 GeV, respectively, and the third and fourth τ h to have p T > 20 GeV. Given the extremely low backgrounds in this category, no charge sum criterion or Z → ee veto is applied. Almost all signal events come from the ττττ decay mode, while 55% of the background events contain at least one misidentified τ h candidate, and the remainder arises from ZZ (30%) and single Higgs boson (15%) production.
In all seven search categories, the background contamination from processes with top quarks is reduced by discarding events with at least one selected small-radius jet passing the medium b jet identification, or at least two passing the loose b jet identification. Leptons originating from low-mass Drell-Yan production, decays of J/ψ and Υ mesons, cascade decays of bottom quarks, and photon conversions are removed by vetoing events containing any pair of loose with mass m( ) < 12 GeV. To eliminate overlap with events selected in the ongoing search for HH production in the bbZZ, ZZ → 4 decay mode, no event in the 2 ss, 3 , and 4 categories may contain two SFOS loose pairs with a mass of the four-system of less than 140 GeV. In addition, to reduce the Z → background, these three categories along with 2 + 2τ h and 3 + 1τ h exclude events where any SFOS loose pair has an invariant mass of 81-101 GeV (Z boson veto).
A summary of the event selection criteria applied in the different categories is given in Table 4. Criteria that are common to all seven search categories are given in Table 5.
Two control regions (CRs) are used to validate the modeling of the WZ and ZZ backgrounds. These CRs match the signal regions of the 3 and 4 categories, but with the Z boson veto inverted, and are referred to as the "3 WZ" CR and "4 ZZ" CR, respectively.
The number of events selected in the signal regions of each of the seven search categories and in the 3 WZ and 4 ZZ CRs are given in Table 6. The contribution expected from nonresonant HH production with event kinematics as predicted by the SM, but 30 times the SM cross section, is given separately for HH decays into WW * WW * , WW * ττ, and ττττ in the upper three rows of each table. The event yields given in the rows labeled WW * WW * include a small contribution from HH decays into WW * ZZ * and ZZ * ZZ * , and, similarly, the numbers quoted in the rows labeled WW * ττ include a small contribution from HH decays into ZZ * ττ.

Analysis strategy
The rate of the HH signal is extracted through a binned maximum likelihood (ML) fit to the distributions in the output of boosted decision tree (BDT) classifiers [116], which are trained to discriminate the HH signal from backgrounds, along with kinematic distributions from the two CRs above. The data from each of the three years are fit separately. Three classifiers are trained for each of the seven search categories using a mix of MC simulation from all three years, targeting nonresonant HH production and resonant HH production from the decay of heavy particles of spin 0 and of spin 2. The binning is chosen with the objective of maximizing the sensitivity for the HH signal, while maintaining sufficient background events in each bin to keep the statistical uncertainty in the background prediction under control. In the two categories with high event yields (2 ss and 3 ) the BDT output binning is chosen such that each bin contains a similar number of expected HH signal events. The four categories contain- Table 4: Event selection criteria applied in the seven search categories. The p T thresholds for and τ h with the highest, second-, third-, and fourth-highest p T are separated by slashes. The symbol "-" indicates that no requirement is applied. ing events with τ h (3 + 1τ h , 2 + 2τ h , 1 + 3τ h , and 4τ h ) have low event yields and sizable background contributions arising from the misidentification of and τ h candidates, which are determined from data and statistically limited. For these categories, we choose the binning for each BDT output distribution such that a similar number of expected background events is contained in each bin. In the 4 category, the fact that the background is dominated by ZZ production, which is modeled by the MC simulation with low statistical uncertainties, allows one to choose the binning in the same way as for the 2 ss and 3 categories. The number of bins is determined by the condition that the relative statistical uncertainty in the background Table 5: Reconstructed object and event selection requirements in all seven search categories. Electrons or muons in the pairs include any leptons passing the loose selection criteria.

Object and event properties
Selection criteria Lepton and τ h pseudorapidity |η| < 2.5 for e, |η| < 2.4 for µ, |η| < 2.3 for τ h Dilepton invariant mass m > 12 GeV (all pairs) Four-lepton invariant mass m 4 > 140 GeV (any two SFOS pairs) b jet veto 0 medium and ≤ 1 loose b-tagged small-radius jet prediction in each bin does not exceed 15%. Higher bin numbers correspond to a higher BDT output value, and feature a higher signal-to-background ratio. For the SM HH signal, the bins with the highest BDT output values feature a signal-to-background ratio up to 10 times higher than the inclusive ratio in each category.
The inputs to the BDT classifiers differ by search category and include the p T and η of reconstructed and τ h ; the angular separation ∆R and invariant mass of , τ h , and τ h τ h pairs; the ∆R and invariant mass between an or τ h candidate and the nearest jet(s); the number of jets in the event; the discriminant p miss,LD T ; the scalar p T sum of all reconstructed e, µ, τ h , and jets; the "visible" mass of the Higgs boson pair, given by the mass of the system of reconstructed e, µ, τ h , and jets; and where applicable, the "full" mass of the HH system, including neutrinos, reconstructed using the algorithm from Ref. [117] designed for reconstructing Higgs pair decays into τ leptons. This algorithm targets HH signal events decaying to ττττ and thus works best in the 4τ h and 1 + 3τ h search categories. Distributions in some of the observables used as inputs to the BDT classifiers in the 2 ss and 3 categories are shown in Fig. 4. These observables are complemented by further inputs, which parametrize the BDT as a function of the model parameters: the Higgs boson couplings λ, y t , c g , c 2g , and c 2 for nonresonant HH production, and the mass of the heavy particle X in resonant HH production. When training the BDT that targets nonresonant HH production, the values for the couplings are chosen according to the twelve EFT benchmark scenarios given in Ref. [24] and the SM, indicated by thirteen binary inputs to the BDT. The BDT classifiers used for the analysis of resonant HH production are trained separately for spin-0 and spin-2 on the full set of resonance masses listed in Section 3, and the resonance mass is used as an input to the BDT. Each simulated background event is replicated multiple times in the training sample, with different values assigned to the Higgs boson couplings and the mass of the heavy particle X.
The training is performed using simulated samples of signal and background events. The signal events used in the training consist of ggHH events in the HH decay modes WW * WW * , WW * ττ, and ττττ. Background contributions arising from the misidentification of and τ h candidates and from the mismeasurement of the electron charge are included in the simulation. The signal and background events used in the training are required to pass the event selection criteria for the respective search category, described in Section 5. The number of training events is increased by applying the medium and τ h identification criteria instead of the tight ones. Weights are applied to background events arising from different sources, such that the relative fractions of different types of backgrounds in the training match the fractions expected in the signal region of the analysis, i.e. when the tight and τ h identification criteria are applied. The MC samples used for the BDT training overlap with the samples used to model signal and background contributions in the analysis. To avoid potential biases, the training samples are split into two samples of equal size, based on even and odd event numbers. The BDTs trained on even events are evaluated on odd events, and vice versa, thereby ensuring that BDTs are not trained and evaluated on the same events. The training is performed using the Table 6: The number of expected and observed events in each of the seven search categories, and in two CRs, which validate the modeling of the WZ and ZZ backgrounds. The symbol "-" indicates that the background is not relevant for the category. The HH signal represents the sum of the ggHH and qqHH production processes and is normalized to 30 times the event yield expected in the SM, corresponding to a cross section of about 1 pb. The event yields are obtained by performing the event selection and applying appropriate corrections to the simulated events. Quoted uncertainties represent the sum of statistical and systematic components. Uncertainties that are smaller than half the value of the least significant digit have been rounded to zero.  XGBOOST algorithm [118], interfaced to the SCIKIT-LEARN machine learning library [119]. The parameters of the BDT training (so-called "hyperparameters") are optimized using the particle swarm optimization algorithm described in Ref. [120].

Background estimation
Background contributions are classified as either "reducible" or "irreducible". In this analysis, three types of reducible backgrounds are considered, arising from misidentified or τ h , electron charge misidentification, and electrons from photon conversions. Background events in which all selected and τ h come from W, Z, or H boson decays, and are reconstructed with the correct charge, are considered "irreducible". The /τ h misidentification and electron charge misidentification backgrounds are both determined from data, while electron conversions and The main contribution to the irreducible background arises from WZ production in the 2 ss and 3 categories, and ZZ production in the remaining five categories. The production of pairs of bosons (γ, W, Z, or H) other than WZ, ZZ and HH, and production of bosons with top quarks, including Wγ, Zγ, WH, ZH, tH, ttH, tW, ttW, tZ, ttZ, tγ, and tt γ, constitute subdominant additional backgrounds. The tZ and ttZ backgrounds also include contributions from off-shell tt γ * and tγ * production. Background processes which include at least one top quark are suppressed by the b jet veto described in Section 5, but are still sizable compared to the expected HH signal. All irreducible backgrounds are modeled using the MC simulation.
The modeling of the dominant irreducible WZ and ZZ backgrounds is validated using the "3 WZ" and "4 ZZ" CRs introduced in Section 5. Distributions in kinematic observables from these CRs (shown in Fig. 5) are included in the ML fit that is used to extract the HH signal, described in Section 9. This provides in-situ constraints on the WZ and ZZ backgrounds and on systematic uncertainties related to lepton identification and trigger efficiency. The transverse mass, m T = 2 p T p miss T (1 − cos ∆φ), in the 3 WZ CR is computed using the that is not identified as originating from the Z boson decay. The symbol ∆φ refers to the angle in the transverse plane between the momentum and the p miss T . The observable m 4 refers to the mass of the 4 system in the 4 ZZ CR. The modeling of the reducible /τ h misidentification background is validated in two further CRs, the "2 ss CR" and the "2 + 2τ h CR". They are based on the signal regions (SRs) of the 2 ss and 2 + 2τ h categories. In the 2 ss CR, no b jet veto is applied, and at least one smallradius jet passing the medium b jet identification is required. The 2 + 2τ h CR differs from the SR of the 2 + 2τ h category in that the sum of plus τ h charges is required to be non-zero, and no Z boson veto is applied. The 2 ss CR is dominated by events with misidentified , while the 2 + 2τ h CR is dominated by events with misidentified τ h . Distributions in the transverse mass m T in the 2 ss CR and in the mass of the HH candidate in the 2 + 2τ h CR, reconstructed by the algorithm described in Ref. [117], are shown in Fig. 6. The transverse mass in the 2 ss CR is computed using the leading . The data agree well with the background prediction in both CRs.
Simulated events are only considered as irreducible background if every selected e, µ, and τ h candidate matches a prompt MC generator-level counterpart. Events with at least one selected electron from a photon conversion, and the remaining and τ h candidates matched to prompt leptons in MC simulation, are classified as conversion background. Electrons that are misidentified as τ h , and τ h that are misidentified as e are also modeled using the MC simulation. All other simulated events are discarded, as the /τ h misidentification and charge misidentification backgrounds are estimated from data, as described below.

Lepton and τ h misidentification background
The background from events with misidentified and τ h candidates is estimated using the "fake factor" or "FF" method from Ref. [115]. An estimate of this background's contribution to the SR of each search category is obtained by selecting a sample of events that satisfy all selection criteria of the SR for the respective search category, except that the e, µ, and τ h are required to pass the medium selections instead of the tight ones. The sample of events thus obtained is referred to as the application region (AR) of the FF method. Events in which every and τ h satisfies the tight selections are excluded from the AR.
The prediction for misidentification backgrounds in the SR is obtained by applying suitably chosen weights w to the events selected in the AR, where w is given by the expression The product extends over all e, µ, and τ h that pass the medium, but fail the tight selection criteria, and n refers to the total number of such and τ h . The symbol f i (p T , η) corresponds to the probability for a single e, µ, or τ h that passes the medium selection to also pass the tight selection. These probabilities are measured separately for e, µ, and τ h candidates, parametrized as a function of p T and η, and vary between 5 and 30%. The contributions of irreducible backgrounds to the AR are subtracted based on the MC expectation of such processes. The alternating sign in Eq. (1) is necessary to avoid double-counting arising from events with more than one misidentified or τ h [115].
The probabilities f i (p T , η) for electrons and muons are measured in multijet events, as described in Ref.
[103]. The f i (p T , η) for τ h are measured using Z → µµ+jets events, where the misidentified τ h candidates arise from quark or gluon jets. These events are selected by requiring a muon pair passing the tight selection, with opposite charge and invariant mass 60 < m µµ < 120 GeV, plus at least one τ h candidate that passes the medium τ h selection. The leading and subleading muons must have p T > 25 and 15 GeV, respectively. Events must also pass the b jet veto described in Section 5 to remove tt background.

Charge misidentification background
The electron charge misidentification background in the 2 ss category is estimated using the method described in Ref.
[103]. A sample of dielectron events passing all selection criteria of the SR of the 2 ss category, except that both electrons are required to have opposite-instead of same-sign charge, is selected and assigned appropriately chosen weights. The weights are computed by summing the probabilities for the charge of either electron to be mismeasured. The probability for the mismeasurement of the electron charge is determined using Z → ee events, and ranges from under 0.1% in the barrel up to 0.4% in the endcap. The probability for mismeasuring the charge of muons is negligible [103].

Systematic uncertainties
Multiple sources of systematic uncertainty affect the predicted event yields, the distributions in the output of the BDT classifiers, or both. These uncertainties may be theoretical, affecting the predicted cross section or decay kinematics of the collision process, or experimental, accounting for differences in object reconstruction and calibration between data and the MC simulation, or for uncertainties on the estimates of the /τ h misidentification and electron charge misidentification background obtained from data. The systematic uncertainties may be correlated or uncorrelated across the three data-taking years, and among the various signal and background processes considered in the analysis.
The SM prediction for the ggHH production cross section at √ s = 13 TeV has a relative uncertainty of +6.7%/−23.2% [122], while the qqHH cross section uncertainty is ±2.1% [19]. The predicted H boson decay branching fractions to WW * , ττ, and ZZ * have relative uncertainties of 1.54%, 1.65%, and 1.54%, respectively [123]. Correlations between these uncertainties have a negligible effect. Alternate HH predictions are generated with the renormalization and factorization scales varied up and down by a factor of 2. Variations that increase the factorization scale and decrease the renormalization scale (and vice versa) are excluded, following the recommendation of Ref. [123]. All theoretical uncertainties in the HH signal model are correlated across all three data-taking years and among the seven search categories. The uncertainties in the H boson decay branching fractions and the effect of renormalization and factorization scale uncertainties in the signal acceptance impact the measurement of cross sections for both nonresonant and resonant HH production. Conversely, the uncertainties in the SM prediction for the ggHH and qqHH cross sections only affect the measurement of the HH production rate as a ratio to the SM prediction.
Theoretical uncertainties also affect the irreducible background prediction. The relative uncertainties in the cross sections of the dominant WZ and ZZ backgrounds are 2.1 and 6.3%, respectively [124][125][126]. The uncertainties in the cross sections for the subdominant single H boson backgrounds range from 2 to 9% for ggHH, qqHH, WH, and ZH. The cross sections for the production of W, Z, or H bosons with one or two top quarks are known with uncertainties of 8-15%. The event yields of extremely rare backgrounds not mentioned above (e.g., triple boson or four top quark production) are given a conservative uncertainty of 50%, since the analysis has little sensitivity to these processes. Following Ref.
[103], background contributions arising from photon conversions are assigned a 30% yield uncertainty. The theoretical uncertainties affecting background cross sections are partially correlated among different processes. Here, contributions arising from uncertainties in the proton PDFs are correlated among processes with a similar initial state. Processes involving single H boson production are an exception. These uncertainties are uncorrelated from other background processes but correlated among each other depending on the initial state. Uncertainties arising from the choice of the renormalization and factorization scales are correlated for processes with similar production modes, for example among all processes involving diboson production (WW, WZ, ZZ, Wγ, and Zγ). Uncertainties in α s are correlated among all background processes. The theoretical cross section uncertainties for signal processes are uncorrelated with those of background processes, but otherwise follow the same uncertainty scheme for proton PDF, scale, and α s contributions. All theoretical cross section uncertainties are treated as correlated across the different data-taking years and among all seven search categories.
The rate of the misidentified /τ h background is assigned a 30% uncertainty in all search categories, to account for variations in the misidentification rates between the ARs of the FF method and the multijet (Z → µµ+jets) event samples used to measure the f i (p T , η) for e and µ (τ h ). In the 3 + 1τ h and 1 + 3τ h categories, an additional uncertainty of 30% (uncorrelated with the other 30% uncertainty) is assigned to the rate of the misidentified /τ h background, to account for the extra uncertainty arising from the modified τ h selection criteria that suppress the misidentification of electrons as τ h . The effect of statistical uncertainties in the probabilities f i (p T , η) for electrons and muons is evaluated by varying these probabilities in bins of p T and η and determining the resulting change in the shape of the BDT classifier output distribution obtained for the misidentified /τ h background. For τ h , the effect of statistical uncertainties in f i (p T , η) is evaluated by fitting the probabilities in bins of η with functions that are linear in p T , varying the slope of these functions up and down within the uncertainties obtained from the fit, and determining the resulting change in the shape of the BDT classifier output distribution.
An additional uncertainty in the BDT output shape in each category is evaluated for events with a nonprompt or misidentified or τ h as follows: Simulated events passing all signal selection criteria are compared to those with at least one or τ h candidate failing the tight identification criteria, scaled according to the FF method described in Section 7, but with the probabilities f i (p T , η) taken from the MC simulation instead of from the data. The ratio of these two shapes is fitted with a linear function, which is convoluted with the misidentified /τ h background prediction from the data to serve as an uncertainty in the BDT output shape for these events in the SR. The systematic uncertainties associated with the misidentified /τ h background prediction and the uncertainty associated with the electron charge misidentification rate are treated as uncorrelated among the different data-taking years.
The rate of the electron charge misidentification background in the 2 ss category is assigned a 30% uncertainty. It covers the uncertainty on the electron charge misidentification rates measured in Z → ee events, including the effect of background contamination in these samples, and accounts for differences observed in the following "closure" test: Simulated events are required to pass all signal selection criteria of the 2 ss category, except that the two leptons are required to have opposite-sign electric charges. The selected events are scaled according to the electron charge misidentification probability in simulated events, determined by applying the procedure detailed in Section 7.2 to MC simulation. The resulting background estimate is compared to the one obtained by applying the nominal signal selection criteria of the 2 ss category to simulated events.
Uncertainties in the modeling of the trigger and object reconstruction efficiency affect all signal and background processes that are estimated using MC simulation. Trigger efficiencies for events with at least two are compared between data and MC simulation in control regions enriched in the tt, WZ, and ZZ background processes, as a function of lepton flavor, p T , and η. This results in a small p T -dependent uncertainty correlated between the 2 ss and 2 + 2τ h categories, and a 1% normalization uncertainty, which is correlated among the 3 , 3 + 1τ h , and 4 categories. The data-to-simulation agreement in trigger efficiency for the 4τ h and 1 + 3τ h categories is computed using an independent set of data, as a function of the p T and η of the and all τ h , and the reconstructed decay modes of all τ h . The trigger uncertainties for these two categories are treated as uncorrelated. All systematic uncertainties related to trigger modeling are correlated across different physics processes, but uncorrelated among the three data-taking years.
The uncertainties in the reconstruction and identification efficiencies for e, µ, and τ h candidates have been measured in Z boson enriched regions in data for each level of identification criteria (tight, medium, and loose), and are applied to each event as a function of p T and η for leptons and of p T and the reconstructed hadronic decay mode for τ h . The reconstructed τ h energy has an uncertainty of around 1%, depending on the data-taking year and reconstructed τ h decay mode. These uncertainties affect the predicted rate and BDT output shape for signal and background, and are correlated among the different physics processes, but uncorrelated across different data-taking years.
The jet energy scale and resolution are determined using dijet control regions [111,127]. The jet energy scale is evaluated using 11 separate components, accounting for partial correlations between the data recorded in different years. The jet energy resolution uncertainty is uncorrelated among the three data-taking years. Jet energy uncertainties are also propagated to the p miss T calculation. An additional uncertainty in p miss T comes from uncertainty in the energy of "unclustered" PF hadrons (PF hadrons not clustered into either small-or large-radius jets), which is uncorrelated across different years. The probability for true b jets to fail the multivariate b jet identification criteria, or for jets from gluons or light flavored quarks to be misidentified as b jets, is compared in data and MC simulation in event regions that are enriched in lightflavor quark or gluon, or heavy-flavor jets. The resulting uncertainty in the data-to-simulation agreement affects the yields and BDT output shapes of multiple physics processes. The statisti-cal component of this uncertainty is treated as uncorrelated across different data-taking years, while other experimental sources are correlated.
The integrated luminosities for data collected in 2016, 2017, and 2018 have 1.2-2.5% individual uncertainties [59][60][61], while the overall uncertainty for the 2016-2018 period is 1.6%. The uncertainty in the measured cross section for inelastic pp collisions, amounting to 5% [128], is taken into account by varying the number of pileup interactions in MC simulation, which impacts the jet reconstruction and the isolation of and τ h .
The sources of systematic uncertainty which create the largest uncertainties in the measured ratio of the HH production cross section to its SM prediction are the theoretical uncertainties in the HH production cross section and decay branching fractions (25%), the uncertainties in the rate and shape of backgrounds from misidentified or τ h (22%), and in the rates of backgrounds modeled using MC simulation (13%). These uncertainties in the signal measurement are determined by removing uncertainties that correspond to a given systematic source from an ML fit to pseudodata, as described in Section 9, and subtracting the obtained uncertainty in the signal measurement in quadrature from the total uncertainty. The impacts of systematic uncertainties are small compared to the effect of the statistical uncertainty in the data (79%), and are comparable to the statistical uncertainties in the distributions in the BDT classifier output for background processes (33%). The latter includes the effect of statistical uncertainties in the MC simulation and in the /τ h misidentification and electron charge misidentification backgrounds obtained from data. All other sources of uncertainty have an impact of 5% or less.

Results
The data selected in the seven search categories are tested against multiple HH production hypotheses: the SM prediction; variations of the SM coupling strength modifiers κ λ , κ t , κ V , and κ 2V ; the effective couplings c g , c 2g , and c 2 in the EFT approach; and resonant production of H boson pairs originating from the decay of heavy particles with spins of 0 or 2 and masses m X ranging from 250 to 1000 GeV. In each case, the data observed in the seven search categories is fit simultaneously to a model composed of the background prediction (with uncertainties) and the HH signal hypothesis under consideration. The distributions in m T in the 3 WZ CR and in m 4 in the 4 ZZ CR shown in Fig. 5 are included in these fits, in order to obtain insitu constraints on the systematic uncertainties described in Section 8. This in turn reduces the uncertainties in the signal and background predictions.
The SM "signal strength" parameter µ is defined as the ratio of the measured HH production cross section to its predicted value in the SM. This parameter modifies the expected signal yield by the same proportion in each category. By contrast, variations in the κ modifiers may affect the signal yields in each category differently, and also change the BDT classifier output shape for HH events. The twenty benchmark scenarios spanning combinations of κ λ , κ t , c g , c 2g , and c 2 values in the coupling parameter space each correspond to different kinematic distributions, so the HH production cross section for each point is measured separately. Similarly, signal efficiency and BDT classifier output shapes vary dramatically for different resonant masses, and thus a separate measurement is performed for each mass and spin hypothesis. The SM signal strength measurement is performed using the output of the BDT classifier that has been trained for SM nonresonant HH production, while the κ λ measurement uses the BDT trained for benchmark scenario JHEP04 BM7. In the scenario JHEP04 BM7, the m HH value tends to be close to the lower limit of 250 GeV, which matches the event kinematics for nonresonant HH production in the κ λ range of the expected limit. When setting limits on the twenty different benchmark scenarios, the binary BDT inputs correspond to the given scenario, or in case of the benchmarks from Ref. [96] the kinematically closest scenario. In case of resonant HH production, the BDT input for the resonance mass is set to the m X value for which the limit is computed.
The SM signal strength is measured using a profile likelihood test statistic [129], with systematic uncertainties treated as nuisance parameters θ in a frequentist approach [130]. The effect of variations in θ on the shape of the BDT classifier output distribution for the HH signal and for background processes is incorporated into the ML fit using the technique described in Ref. [131]. Statistical uncertainties in these distributions are also taken into account using the approach detailed in Ref. [131]. The likelihood ratio q µ for a fixed "test" signal strength value µ is whereμ andθ are the signal strength and nuisance parameter values that give the maximum value of the likelihood function L for the given set of data (requiringμ ≥ 0), andθ µ is the set of θ values which maximize L for the fixed µ. The 95% confidence level (CL) upper limit for µ is obtained using the CL s criterion [132,133], with q µ set to 0 when µ <μ. The probabilities to observe a given value of the likelihood ratio q µ under the signal-plusbackground and background-only hypotheses are computed using the asymptotic approximation from Ref. [129]. The limits on µ obtained using the asymptotic approximation, match the limits obtained with toy MC experiments [130] within 10%. The SM coupling strength modifiers and the cross sections for the various HH production hypotheses are measured by scanning the likelihood ratio q µ as a function of µ. Theoretical and experimental uncertainties affecting the signal and background yields or the shape of the BDT classifier output distributions may be correlated or uncorrelated across different years, search categories, and BDT output bins, as described in Section 8.
For the case of nonresonant HH production with event kinematics as predicted by the SM, the best-fit value of the HH production rate, obtained from the simultaneous fit of all seven search categories, amounts toμ = 2 ± 8 (stat.) ±6 (syst.) times the SM expectation. The measured value of the signal strength refers to the sum of ggHH and qqHH production and is compatible with both the SM and background-only hypotheses, within statistical and systematic uncertainties. Distributions in the output of the BDT classifier for SM nonresonant HH production in the seven search categories are shown in Figs. 7 and 8, and the corresponding expected event yields are given in Table 7. The data excess in the rightmost bin of the BDT classifier output distribution for the 3 category is not statistically significant: 11 events are observed in this bin, while 5.2 ± 0.7 (stat.) ±0.2 (syst.) are expected from background processes, amounting to a local significance of about 1.7 standard deviations. The observed (expected) 95% CL upper limit on the cross section for nonresonant HH production is 651 (592) fb. Taking into account the theoretical uncertainties in the SM HH production cross section, this corresponds to an observed (expected) limit on the nonresonant HH production rate of 21.3 (19.4) times the SM expectation. These limits are shown in Fig. 9 for individual categories and for the combination of all seven search categories, which is referred to as the "HH → multilepton" result. The 3 and 1 + 3τ h categories are the most sensitive to SM HH production, followed closely by the other categories.
The observed (expected) 95% CL interval for the H boson trilinear self-coupling strength modifier is measured to be −6.9 < κ λ < 11.1 (−6.9 < κ λ < 11.7). The upper limit on κ λ is one of the strongest constraints on this fundamental SM parameter to date, with only HH searches in the bbγγ [42, 43] and bbbb [45] decay modes providing tighter bounds. The observed and Table 7: The number of expected and observed events in each of the seven search categories, and in two CRs, which validate the modeling of the WZ and ZZ backgrounds. The /τ h misidentification and electron charge misidentification backgrounds are determined from data, as described in Section 7, while the HH signal and all other backgrounds are modeled using MC simulation. The symbol "-" indicates that the background is not relevant for the category. The HH signal represents the sum of the ggHH and qqHH production processes and is normalized to 30 times the event yield expected in the SM, corresponding to a cross section of about 1 pb. The expected event yields are computed for the values of nuisance parameters obtained from the ML fit described in Section 9. Quoted uncertainties represent the sum of statistical and systematic components. Uncertainties that are smaller than half the value of the least significant digit have been rounded to zero.  expected upper limits on the HH production cross section as a function of κ λ , obtained from the simultaneous fit of all seven search categories, are shown in Fig. 10, along with the limits obtained for each category individually.
The observed and expected limits on the ggHH production cross section for the twenty benchmark scenarios are shown in Fig. 11 and summarized in Table 8. Signal contributions from the qqHH process, at the rate expected in the SM, are about two orders of magnitude lower than the limits that we set on the rate of the ggHH signal in these measurements and are therefore neglected. The observed (expected) limits on nonresonant HH production in the different benchmark scenarios range from 0.21 to 1.09 (0.16 to 1.16) pb, depending on the scenario. These limits are a factor of 2-3 higher than those obtained by the CMS measurement in the bbγγ final state [42]. The variation in expected limits reflects differences in the m HH distribution among the benchmark scenarios, which in turn affect the p T and angles between the particles produced in the H boson decays. As a consequence, the signal acceptance can change, along with the separation of the HH signal from backgrounds through the BDT classifiers described in Section 6. The most and least stringent limits on the cross section are expected for the benchmark scenarios JHEP04 BM2 and BM7, respectively. The former has a pronounced tail of the m HH distribution extending to high values, while the latter is characterized by low m HH values, as seen in Fig. 5 of Ref. [24]. Figure 12 shows the observed and expected upper limits on the HH production cross section   as a function of the coupling c 2 , and the region excluded in the κ t -c 2 plane. The effects of variations in κ λ and κ t on the rate of the SM single H boson background [21] and on the H boson decay branching fractions [20] are taken into account when computing these limits and those shown in Fig. 10. The magnitude of these effects is typically 5 to 10% within the scanned range of κ λ and κ t . Assuming κ t and κ λ are both equal to 1, the coupling c 2 is observed (expected) to be constrained to the interval −1.05 < c 2 < 1.48 (−0.96 < c 2 < 1.37) at 95% CL.   Figure 11: Observed and expected 95% CL upper limits on the HH production cross section for the twelve benchmark scenarios from Ref. [24], the additional benchmark scenario 8a from Ref. [97], the seven benchmark scenarios from Ref. [96], and for the SM. The upper plot shows the result obtained by combining all seven search categories, while the lower plot shows the limits obtained for each category separately, and the combined limit.  Similar to the right part of Figure 12, Figure 13 shows the observed and expected regions excluded in the κ tκ λ and κ λ -c 2 planes.  Figure 14 shows the observed and expected limits on the resonant HH production cross section as a function of m X for a spin-0 or spin-2 particle X decaying to HH. The mass points probed are listed in the fourth paragraph of Section 3. The limits are expected to become more stringent as m X increases, as the acceptance for the HH signal increases and the signal can be more easily distinguished from backgrounds. The observed (expected) 95% CL upper limits on the resonant HH production cross section range from 0.18 to 0.90 (0.08 to 1.06) pb, depending on the mass and spin. Tabulated results are provided in the HEPData record for this analysis [134]. For m X 600 GeV, the observed limit is less stringent than the expected limit, due to a small excess of events in the data that is concentrated near m X = 750 GeV in the 2 ss and 3 categories. The distributions in the output of the BDT classifier targeting resonances with spin 2 and mass 750 GeV in the 2 ss and 3 categories are shown in Fig. 15. A small excess of events can be seen in the rightmost bin of both distributions. In the 2 ss (3 ) category, 42 (17) events are observed in this bin in the data, while 27.3 ± 2.8 (stat.) ±0.7 (syst.) (8.0 ± 0.8 (stat.) ±0.5 (syst.)) are expected from background processes, amounting to a local significance of about 2.1 (2.1) standard deviations. The excess affects the observed limits in a broad mass range from 600 to 1000 GeV. No measurement is made for masses above 1000 GeV, as limits on HH decays producing at least one bottom quark pair are much more stringent in this phase space [33,34]. The presence of multiple neutrinos in HH signal events in these categories, coming from W boson or τ lepton decays, limits the experimental resolution on m X and causes the BDT classifier output distributions to be highly correlated for resonances of similar mass. No significant excess is observed in any of the other five search categories. The significance for the combination of all seven search categories at 750 GeV amounts to 1.9 standard deviations, without accounting for the "look elsewhere effect" [135].  Figure 14: Observed and expected 95% CL upper limits on the production of new particles X of spin 0 (upper) and spin 2 (lower) and mass m X in the range 250-1000 GeV, which decay to H boson pairs. The plot on the left shows the result obtained by combining all seven search categories, while the plot on the right shows the limits obtained for each category separately, and the combined limit.

Summary
The results of a search for nonresonant and resonant Higgs boson pair (HH) production in final states with multiple reconstructed leptons, including electrons and muons ( ) and hadronically decaying tau leptons (τ h ), has been presented. The search targets the HH decay modes WW * WW * , WW * ττ, and ττττ, using proton-proton collision data recorded by the CMS experiment at a center-of-mass energy of 13 TeV and corresponding to an integrated luminosity of 138 fb −1 . Seven search categories, distinguished by and τ h multiplicity, are included in the analysis: 2 ss, 3 , 4 , 3 + 1τ h , 2 + 2τ h , 1 + 3τ h , and 4τ h , where "ss" indicates an pair with the same charge. No evidence for a signal is found in the data. Upper limits on the cross sections for both nonresonant and resonant HH production are set. The observed (expected) limits on the nonresonant HH production cross section in twenty EFT benchmark scenarios range from 0.21 to 1.09 (0.16 to 1.16) pb at 95% confidence level (CL), depending on the scenario. For nonresonant HH production with event kinematics as predicted by the standard model (SM), the observed (expected) 95% CL upper limit on the HH production rate is 21.3 (19.4) times the rate expected in the SM. The results of the search for nonresonant HH production are used to exclude regions in the plane of the H boson coupling to the top quark, y t , and of the trilinear Higgs boson self-coupling, λ. Assuming y t has the value expected in the SM, the observed (expected) 95% CL interval for λ is between −6.9 and 11.1 (−6.9 and 11.7) times the value expected in the SM. The resonant production of H boson pairs, resulting from decays of new heavy particles X with mass m X , is probed within the mass range 250-1000 GeV. The corresponding observed (expected) 95% CL upper limits on the cross section for resonant HH production range from 0.18 to 0.90 (0.08 to 1.06) pb, depending on the mass and spin of the resonance.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid and other centers for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the following funding agencies:   [35] ATLAS Collaboration, "Search for resonant and non-resonant Higgs boson pair production in the bb τ + τ − decay channel using 13 TeV pp collision data from the ATLAS detector", 2022. arXiv:2209.10910. Submitted to JHEP.    [46] CMS Collaboration, "Search for nonresonant pair production of highly energetic Higgs bosons decaying to bottom quarks", 2022. arXiv:2205.06667. Submitted to Phys. Rev. Lett.