Search for resonant and nonresonant Higgs boson pair production in the bblnulnu final state in proton-proton collisions at sqrt(s) = 13 TeV

Searches for resonant and nonresonant pair-produced Higgs bosons (HH) decaying respectively into ll nu nu, through either W or Z bosons, and bbbar are presented. The analyses are based on a sample of proton-proton collisions at sqrt(s) = 13 TeV, collected by the CMS experiment at the LHC, corresponding to an integrated luminosity of 35.9 inverse femtobarns. Data and predictions from the standard model are in agreement within uncertainties. For the standard model HH hypothesis, the data exclude at 95% confidence level a product of the production cross section and branching fraction larger than 72 fb, corresponding to 79 times the prediction, consistent with expectations. Constraints are placed on different scenarios considering anomalous couplings, which could affect the rate and kinematics of HH production. Upper limits at 95% confidence level are set on the production cross section of narrow-width spin-0 and spin-2 particles decaying to Higgs boson pairs, the latter produced with minimal gravity-like coupling.


Introduction
The Brout-Englert-Higgs mechanism is a key element of the standard model (SM) of elementary particles and their interactions, explaining the origin of mass through spontaneous breaking of electroweak symmetry [1][2][3][4][5][6].The discovery of a Higgs boson with a mass m H around 125 GeV by the ATLAS and CMS experiments [7][8][9] fixes the value, in the SM, of the self-coupling λ in the scalar potential, whose shape is determined by the symmetries of the SM and the requirement of renormalisability.Direct information on the Higgs three-and four-point interactions will provide an indication of the scalar potential structure.
Nonresonant Higgs boson pair production (HH) can be used to directly study the Higgs boson self-coupling.At the CERN LHC, Higgs boson pairs are predominantly produced through gluon-gluon fusion via two destructively interfering diagrams, shown in Fig. 1.In the SM the destructive interference between these two diagrams makes the observation of HH production extremely challenging, even in the most optimistic scenarios of energy and integrated luminosity at the future High Luminosity LHC [10,11].The SM cross section for HH production in proton-proton collisions at √ s = 13 TeV for a Higgs boson mass of 125 GeV is σ HH = 33.5 fb at next-to-next-to-leading order (NNLO) in quantum chromodynamics (QCD) for the gluongluon fusion process [12][13][14][15][16][17][18][19][20][21].
Indirect effects at the electroweak scale arising from beyond the standard model (BSM) phenomena at a higher scale can be parameterised in an effective field theory framework [22][23][24] by introducing coupling modifiers for the SM parameters involved in HH production, namely κ λ = λ/λ SM for the Higgs boson self-coupling λ and κ t = y t /y t SM for the top quark Yukawa coupling y t .Such modifications of the Higgs boson couplings could enhance Higgs boson pair production to rates observable with the current dataset.The relevant part of the modified Lagrangian takes the form: where H is the Higgs boson field, v is the vacuum expectation value of H, m t is the top quark mass, tL and t R are the left-and right-handed top quark fields, and h.c. is the Hermitian conjugate.The appearance of new contact-like interactions, not considered in this paper, could also result in an enhancement of HH production.
Extensions of the scalar sector of the SM postulate the existence of additional Higgs bosons.An explored scenario is the two-Higgs-doublet model (2HDM) [25], where a second doublet of complex scalar fields is added to the SM scalar sector Lagrangian.The generic 2HDM potential has a large number of degrees of freedom, which can be reduced to six under specific assumptions.In case the new CP-even state is massive enough (mass larger than 2m H ) it can decay to a pair of Higgs bosons.Models inspired by warped extra dimensions [26] predict the existence of new heavy particles that can decay to pairs of Higgs bosons.Examples of such particles are the radion (spin 0) [27][28][29][30] or the first Kaluza-Klein (KK) excitation of the graviton (spin 2) [31,32].In the following, we will use X to denote a generic state decaying into pairs of Higgs bosons.
Searches for Higgs boson pair production have been performed by the ATLAS and CMS experiments using LHC proton-proton collision data.These include searches for BSM production as well as more targeted searches for production with SM-like kinematics in √ s = 8 TeV [33][34][35][36][37] and 13 TeV data [38,39].This paper reports on a search for Higgs boson pair production, HH, and resonant Higgs boson pair production, X → HH, where one of the H decays into bb, and the other into Z( )Z(νν) or W( ν)W( ν), where is either an electron, a muon, or a tau lepton that decays leptonically.The search is based on LHC proton-proton collision data at √ s = 13 TeV collected by the CMS experiment, corresponding to an integrated luminosity of 35.9 fb −1 .The analysis focuses on the invariant mass distribution of the b jet pairs, searching for a resonant-like excess compatible with the Higgs boson mass, in combination with an artificial neural network discriminator based on kinematic information.The dominant background is tt production, with smaller contributions from Drell-Yan (DY) and single top quark production.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections.Forward calorimeters extend the pseudorapidity coverage provided by the barrel and endcap detectors.Muons are detected in gas-ionisation chambers embedded in the steel flux-return yoke outside the solenoid.A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [40].

Event simulation
The main background processes, in decreasing order of importance, are tt, DY, and single top quark production.Diboson, triboson, ttV and SM single Higgs boson production are also considered.Other contributions, such as W+jets or QCD multijet events with jets misidentified as leptons, are negligible due to the dilepton selection described in the next section.The dominant contribution, especially in the e ± µ ∓ channel, arises from tt production yielding the same final state as the signal process (two b quark jets, two leptons, and two neutrinos) when both W bosons from top quark decays further decay leptonically.
Background simulation samples have been generated at next-to-leading order (NLO) in QCD using POWHEG 2 [41][42][43][44][45], and MADGRAPH5 aMC@NLO versions 2.2.2.0 and 2.3.2.2 [46] with FxFx merging [47] and MADSPIN [48].The signal samples of gluon fusion production of two Higgs bosons, and of spin-0 or spin-2 narrow resonances decaying into two Higgs bosons, have been generated at leading order (LO) in QCD using MADGRAPH5 aMC@NLO version 2.2.2.0.The spin-2 narrow resonance is produced as a KK-graviton with minimal coupling [49], leading to spin projection ±2 on the beam axis.The mass of the Higgs boson has been fixed to 125 GeV [50], and its branching fractions to those in the SM.One of the Higgs bosons is required to decay into a pair of b quarks, while the second one is required to decay to final states containing two leptons and two neutrinos of any flavour.This implies that the signal samples contain both H → Z( )Z(νν) and H → W( ν)W( ν) decay chains, leading to a total branching fraction B(HH → bbVV → bb ν ν) of 2.7% [12].The event generators used for both signal and background samples are interfaced with PYTHIA 8.212 [51,52] for simulation of the parton showering, hadronisation, and underlying event using the CUETP8M1 tune [53].The NNPDF 3.0 [54] LO and NLO Parton Distribution Functions (PDF) are used.
For all processes, the detector response is simulated using a detailed description of the CMS apparatus, based on the GEANT4 package [55].Additional pp interactions in the same and in the neighbouring bunch crossings (pileup) are generated with PYTHIA and overlapped with the simulated events of interest to reproduce the pileup measured in data.
All background processes are normalised to their most accurate theoretical cross sections.The tt, DY, single top quark and W + W − samples are normalised to NNLO precision in QCD [56][57][58][59], while remaining diboson, triboson and ttV processes are normalised to NLO precision in QCD [46,60].The single Higgs boson production cross section is computed at the NNLO precision of QCD corrections and NLO precision of electroweak corrections [12].

Event selection and background predictions
Data are collected with a set of dilepton triggers.The p T thresholds applied by the triggers are asymmetric and channel-dependent, and vary from 17 to 23 GeV for the leading leptons and from 8 to 12 GeV for the subleading leptons.Trigger efficiencies are measured with a "tag-andprobe" technique [61] as a function of lepton p T and η in a data control region consisting of Z → events.
Events with two oppositely charged leptons (e + e − , µ + µ − , e ± µ ∓ ) are selected using asymmetric p T requirements, chosen to be above the corresponding trigger thresholds, for leading and subleading leptons of 25 GeV and 15 GeV for ee and µe events, 20 GeV and 10 GeV for µµ events, and 25 GeV and 10 GeV for eµ events.Electrons in the pseudorapidity range |η| < 2.5 and muons in the range |η| < 2.4 are considered.
Electrons, reconstructed by associating tracks with ECAL clusters, are identified by a sequential selection using information on the cluster shape in the ECAL, track quality, and the matching between the track and the ECAL cluster.Additionally, electrons from photon conversions are rejected [62].Muons are reconstructed from tracks found in the muon system, associated with tracks in the silicon tracking detectors.They are identified based on the quality of the track fit and the number of associated hits in the different tracking detectors [63].The lepton isolation, defined as the scalar p T sum of all particle candidates in a cone around the lepton, excluding the lepton itself, divided by the lepton p T , is required to be <0.04 for electrons (with a cone of radius ∆R = √ (∆φ) 2 + (∆η) 2 = 0.3) and <0.15 for muons (with a cone of radius ∆R = 0.4).Lepton identification and isolation efficiencies in the simulation are corrected for residual differences with respect to data.These corrections are measured in a data sample, enriched in Z → events, using a "tag-and-probe" method and are parameterised as a function of lepton p T and η.
Jets are reconstructed using a particle flow (PF) technique [64].PF candidates are clustered to form jets using the anti-k T clustering algorithm [65] with a distance parameter of 0.4, implemented in the FASTJET package [66].Jet energies are corrected for residual nonuniformity and nonlinearity of the detector response [67].Jets are required to have p T > 20 GeV, |η| < 2.4, and be separated from identified leptons by a distance of ∆R > 0.3.The missing transverse momentum vector, defined as the projection onto the transverse plane relative to the beam axis, of the negative vector sum of the momenta of all PF candidates, is referred to as p miss T [68,69].Its magnitude is denoted by p miss T .Corrections to the jet energies are propagated to p miss T .
The reconstructed vertex with the largest value of summed object p 2 T is taken to be the primary pp interaction vertex, considering the objects returned by a clustering algorithm applied to all charged tracks associated with the vertex, plus the corresponding associated p miss T .
The combined multivariate algorithm [70,71] is used to identify jets originating from b quarks.Jets are considered as b tagged if they pass the medium working point of the algorithm, which provides around 70% efficiency with a mistag rate less than 1%.Correction factors are applied in the simulation to the selected jets to account for the different response of the combined multivariate algorithm between data and simulation [71].Among all possible dijet combinations fulfilling the previous criteria, we select the two jets with the highest combined multivariate algorithm outputs.
After the final object selection consisting of two opposite sign leptons and two b-tagged jets, a requirement of 12 < m < m Z − 15 GeV is applied to suppress quarkonia resonances and jets misidentified as leptons, and to remove the large background at the Z boson peak as well as the high-m tail of the DY and tt processes.This requirement has a negligible impact on signal events where one H decays as H → W( ν)W( ν), and removes only the portion of H → Z( )Z(νν) decays with on-shell Z( ) legs. Figure 2 shows the dijet p T for data and simulated events after requiring the selection criteria described in this section.
All the background processes are estimated from simulation, with the exception of DY production in the e + e − and µ + µ − channels.The DY contribution in the e ± µ ∓ channels is almost negligible, and is taken from simulation.
The contribution of the DY process in the analysis selection is estimated from a data sample enriched in DY plus jets events.The estimate is performed by requiring all the selection criteria described above, except for the b tagging requirements.The resulting dataset is corrected with weights to represent the DY contribution in the full selection.The weights are a function of kinematic variables and are tuned to reproduce the effect of applying the b tagging requirements on the DY process.They account for the following features: • The b tagging efficiencies are not constant and depend on jet kinematics.Moreover, this dependency is different for b-, c-or light-flavour jets.
• The relative contributions of DY plus two jets of flavours k and l, where k, l = b, c, or light-flavour, to the DY plus two jets process are not constant throughout the phase-space.Modelling the effect of b tagging requires to parameterise the fractions F kl of jets with flavours k and l as a function of event kinematics.
We compute the weights as: where k and l are the b tagging efficiencies for kand l-flavour jets calculated from simulation as a function of p T and η of the jets and corrected for differences between data and simulation, and j 1 and j 2 denote the two p T -ordered jets selected according to the above requirements.The expected fractions of jets with flavours k and l in the dataset are denoted by F kl and are parameterised as a function of the output value of a Boosted Decision Tree (BDT) [72].The indices k and l refer to the assumed flavour of j 1 and j 2 , respectively.The flavour fractions F kl are estimated from a simulated DY sample.Their dependency on the BDT output value accounts for the different kinematical behaviours of heavy-or light-flavour associated DY processes, effectively reducing the dimensionality of the phase-space to a single variable.The BDT is trained to discriminate DY+bb, cc from other DY associated production processes using the following input variables: p T , η j 1 , η j 2 , p jj T , p T , η , ∆φ( , p miss T ) (defined as the ∆φ between the dilepton system and p miss T ), number of jets, and H T defined as the scalar sum of the transverse momentum of all selected leptons and jets.
Beside DY, the data sample without b tagging requirements contains small contributions from other backgrounds such as tt, single top quark and diboson production.Hence, the same reweighting procedure is applied to simulated samples for these processes, and the result is subtracted from the weighted data to define the estimate of the DY background in the analysis region.
The method is validated both in simulation and in two data control regions requiring either m > m Z − 15 GeV or |m − m Z | < 15 GeV.The predicted DY distributions are in agreement with data and simulation within the uncertainties of the method, described in section 6.

Parameterised multivariate discriminators for signal extraction
Deep neural network (DNN) discriminators, based on the Keras library [73], are used to improve the signal-to-background separation.As the dominant background process (tt production) is irreducible, the DNNs rely on information related to event kinematics.The variables provided as input to the DNNs exploit the presence in the signal of two Higgs bosons decaying into two b jets on the one hand, and two leptons and two neutrinos on the other hand, which results in different kinematics for the dilepton and dijet systems between signal and background processes.The variables used as input are: m , ∆R , ∆R jj , ∆φ( , jj) (defined as the ∆φ between the dijet and the dilepton systems), p T , p jj T , min ∆R j , and The DNNs utilise a parameterised machine learning technique [74] in order to ensure optimal sensitivity on the full range of signal hypotheses considered in these searches.In this approach, one or more physics parameters describing the wider scope of the problem, as for example the mass of the resonance in the resonant search case, are provided as input to the DNNs, in addition to reconstructed quantities.The parameterised network is able to perform as well as individual networks trained on specific hypotheses (parameter values) while requiring only a single training, and provides a smooth interpolation to cases not seen during the training phase, as shown by Fig. 3. Two parameterised DNNs are trained: one for the resonant and one for the nonresonant search.In the first case, the set of parameters are the masses of the resonance, providing 13 values for the network training (m X = 260, 270, 300, 350, 400, 450, 500, 550, 600, 650, 750, 800, 900 GeV), and a discrete variable indicating the dilepton flavour channel: same flavour (e + e − and µ + µ − ) or different flavour (e ± µ ∓ ).In the second case, the parameters are κ λ and κ t , providing 32 combinations of those for the network training (κ λ = −20, -5, 0, 1, 2.4, 3.8, 5, 20 and κ t = 0.5, 1, 1.75, 2.5), and the dilepton flavour channel variable as in the resonant case.
The m jj distributions, and resonant and nonresonant DNN discriminators after selection requirements, are shown in Figs. 4 and 5, respectively.Given their discrimination power between signal and background, both variables are used to enhance the sensitivity of the analysis.We define three regions in m jj : two of them enriched in background, m jj < 75 GeV and m jj ≥ 140 GeV, and the other enriched in signal, m jj ∈ [ 75, 140 ) GeV.In each region, we use the DNN output as our final discriminant, as shown in Fig. 6, where the three m jj regions are represented in a single distribution.

Systematic uncertainties
We investigate sources of systematic uncertainties and their impact on the statistical interpretation of the results by considering both uncertainties in the normalisation of the various processes in the analysis, as well as those affecting the shapes of the distributions.
Theoretical uncertainties in the cross sections of backgrounds estimated using simulation are considered as systematic uncertainties in the yield predictions.The uncertainty in the total integrated luminosity is determined to be 2.5% [75].Training with all masses, evaluated at m X = 650 GeV Training with all masses except 650 GeV, evaluated at m X = 650 GeV The following sources of systematic uncertainties that affect the normalisation and shape of the templates used in the statistical evaluation are considered: • Trigger efficiency, lepton identification and isolation: uncertainties in the measurement, using a "tag-and-probe" technique, of trigger efficiencies as well as electron and muon isolation and identification efficiencies, are considered as sources of systematic uncertainties.These are evaluated as a function of lepton p T and η, and their effect on the analysis is estimated by varying the corrections to the efficiencies by ±1 standard deviation.• Jet energy scale and resolution: uncertainties in the jet energy scale are of the order of a few percent and are computed as a function of jet p T and η [67].A difference in the jet energy resolution of about 10% between data and simulation is accounted for by worsening the jet energy resolution in simulation by η-dependent factors.The uncertainty due to these corrections is estimated by a variation of the factors applied by ±1 standard deviation.Variations of jet energies are propagated to p miss T .
• b tagging: b tagging efficiency and light-flavour mistag rate corrections and associated uncertainties are determined as a function of the jet p T [71].Their effect on the analysis is estimated by varying these corrections by ±1 standard deviation.• Pileup: the measured total inelastic cross section is varied by ±5% [76] to produce different expected pileup distributions.
• Renormalisation and factorisation scale uncertainty: this uncertainty is estimated by varying the renormalisation (µ R ) and the factorisation (µ F ) scales used during the generation of the simulated samples independently by factors of 0.5, 1, or 2. Unphysical cases, where the two scales are at opposite extremes, are not considered.An envelope is built from the 6 possible combinations by keeping maximum and minimum variations for each bin of the distributions, and is used as an estimate of the scale uncertainties for all the background and signal samples.
• PDF uncertainty: the magnitudes of the uncertainties related to the PDFs and the variation of the strong coupling constant for each simulated background and signal process are obtained using variations of the NNPDF 3.0 set [54], following the PDF4LHC prescriptions [77,78].
• Simulated sample size: the finite nature of simulated samples is considered as an additional source of systematic uncertainty.For each bin of the distributions, one additional uncertainty is added, where only the considered bin is altered by ±1 standard deviation, keeping the others at their nominal value.
• DY background estimate from data: the systematic uncertainties listed above, which affect the simulation samples, are propagated to k and F kl , both computed from simulation.These uncertainties are then propagated to the weights W sim and to the normalisation and shape of the estimated DY background contribution.The uncertainty due to the finite size of the simulation samples used for the determination of k and F kl is also taken into account.Since previous measurements [79,80] have shown that the flavour composition of DY events with associated jets in data is compatible with the simulation within scale uncertainties, no extra source of theoretical uncertainty has been considered for F kl .To account for residual differences between the e + e − and µ + µ − channels not taken into account by F kl , due to the different requirements on lepton p T , a 5% uncertainty in the normalisation of the DY background estimate is added in both channels.
The effects of these uncertainties on the total yields in the analysis selection are summarised in Table 1.

Results
A binned maximum likelihood fit is performed in order to extract best fit signal cross sections.The fit is performed using templates built from the DNN output distributions in the three m jj regions, as shown in Fig. 6, and in the three channels (e + e − , µ + µ − , and e ± µ ∓ ).The likelihood function is the product of the Poisson likelihoods over all bins of the templates and is given by where n i is the number of observed events in bin i and the Poisson mean for bin i is given by where k denotes all of the considered background processes, T k,i is the bin content of bin i of the template for process k, and S i is the bin content of bin i of the signal template.The parameter β k is the nuisance parameter for the normalisation of the process k, constrained by theoretical uncertainties with a log-normal prior, and β signal is the signal strength, unconstrained.For each systematic uncertainty affecting the shape (normalisation) of the templates, a nuisance parameter is introduced with a Gaussian (log-normal) prior.
The best-fit values for all the nuisance parameters, as well as the corresponding post-fit uncertainties, are extracted by performing a binned maximum likelihood fit, in the background-only hypothesis, of the m jj vs. DNN output distributions (such as Fig. 6 left) to the data.Only nuisance parameters affecting the backgrounds are considered.

Resonant production
The fit results in signal cross sections compatible with zero; no significant excess above background predictions is observed for X particle mass hypotheses between 260 and 900 GeV.We set upper limits at 95% confidence level (CL) on the product of the production cross section for X and branching fraction for X → HH → bbVV → bb ν ν using the asymptotic modified frequentist method (asymptotic CL s ) [81][82][83] as a function of the X mass hypothesis.The limits are shown in Fig. 7.The observed upper limits on the product of the production cross section and branching fraction for a narrow-width spin-0 resonance range from 430 to 17 fb, in agreement with expected upper limits of 340 +140 −100 to 14 +6 −4 fb.For narrow-width spin-2 particles produced in gluon fusion with minimal gravity-like coupling, the observed upper limits range from 450 to 14 fb, in agreement with expected upper limits of 360 +140 −100 to 13 +6 −4 fb.The left plot of Fig. 7 shows possible cross sections for the production of a radion, for the parameters Λ R = 1 TeV (mass scale) and kL = 35 (size of the extra dimension).The right plot of Fig. 7 shows possible cross sections for the production of a Kaluza-Klein graviton, for the parameters k/M Pl = 0.1 (curvature) and kL = 35.These cross sections are taken from [49], assuming absence of mixing with the Higgs boson.

Nonresonant production
Likewise for the nonresonant case, the fit results in signal cross sections compatible with zero; no significant excess above background predictions is seen.We set upper limits at 95% CL on the product of the Higgs boson pair production cross section and branching fraction for HH → bbVV → bb ν ν using the asymptotic CL s , combining the e + e − , µ + µ − and e ± µ ∓ channels.The observed upper limit on the SM HH → bbVV → bb ν ν cross section is found to be 72 fb, in agreement with an expected upper limit of 81 +42 −25 fb.Including theoretical uncertainties in the SM signal cross section, this observed upper limit amounts to 79 times the SM prediction, in agreement with an expected upper limit of 89 +47 −28 times the SM prediction.In the BSM hypothesis, upper limits are set as a function of κ λ /κ t , as shown in Fig. 8 (left panel), since the signal kinematics depend only on this ratio of couplings.Red lines show the theoretical cross sections, along with their uncertainties, for κ t = 1 (SM) and κ t = 2.The theoretical signal cross section is minimal for κ λ /κ t = 2.45 [84], corresponding to a maximal interference between the diagrams shown on Fig. 1.
Excluded regions in the κ t vs. κ λ plane are shown in Fig. 8 (right panel).The signal cross sections and kinematics are invariant under a (κ λ , κ t ) ↔ (−κ λ , −κ t ) transformation, hence the expected and observed limits on the production cross section, as well as the constraints on the κ λ and κ t parameters respect the same symmetry.The red region in the panel corresponds to parameters excluded at 95% CL with the observed data, whereas the dashed black line and the blue areas correspond to the expected exclusions and the 68 and 95% bands.Isolines of the product of the theoretical cross section and branching fraction for HH → bbVV → bb ν ν are shown as dashed-dotted lines.

CMS
35.9 fb 1 (13 TeV) Observed 95% upper limit Expected 95% upper limit 68% expected 95% expected RS1 KK graviton, kL = 35, k/M Pl = 0.1 Figure 7: Expected (dashed) and observed (continuous) 95% CL upper limits on the product of the production cross section for X and branching fraction for X → HH → bbVV → bb ν ν, as a function of m X .The inner (green) band and the outer (yellow) band indicate the regions containing 68 and 95%, respectively, of the distribution of limits expected under the backgroundonly hypothesis.These limits are computed using the asymptotic CL s method, combining the e + e − , µ + µ − and e ± µ ∓ channels, for spin-0 (left) and spin-2 (right) hypotheses.The solid circles represent fully-simulated mass points.The dashed red lines represent possible cross sections for the production of a radion (left) or a Kaluza-Klein graviton (right), assuming absence of mixing with the Higgs boson [49].Parameters used to compute these cross sections can be found in the legend.

Summary
A search for resonant and nonresonant Higgs boson pair production (HH) is presented, where one of the Higgs bosons decays to bb, and the other to VV → ν ν, where V is either a W or a Z boson.The LHC proton-proton collision data at √ s = 13 TeV collected by the CMS experiment corresponding to an integrated luminosity of 35.9 fb −1 are used.Masses are considered in the range between 260 and 900 GeV for the resonant search, while anomalous Higgs boson selfcoupling and coupling to the top quark are considered in addition to the standard model case for the nonresonant search.
The results obtained are in agreement, within uncertainties, with the predictions of the standard model.For the resonant search, the data exclude a product of the production cross section and branching fraction of narrow-width spin-0 particles from 430 to 17 fb, in agreement with the expectations of 340 +140 −100 to 14 +6 −4 fb, and narrow-width spin-2 particles produced with minimal gravity-like coupling from 450 to 14 fb, in agreement with the expectations of 360 +140 −100 to 13 +6 −4 fb.For the standard model HH hypothesis, the data exclude a product of the production cross section and branching fraction of 72 fb, corresponding to 79 times the SM cross section.The expected exclusion is 81 +42 −25 fb, corresponding to 89 +47 −28 times the SM cross section.

Figure 1 :
Figure 1: Feynman diagrams for Higgs boson pair production via gluon fusion in the SM.The coupling modifiers for the Higgs boson self-coupling and the top quark Yukawa coupling are denoted by κ λ and κ t , respectively.

Figure 2 :
Figure 2: The dijet p T distributions in data and simulated events after requiring two leptons, two b-tagged jets, and 12 < m < m Z − 15 GeV, for e + e − (top left), e ± µ ∓ (top right), and µ + µ − (bottom) events.The various signal hypotheses displayed have been scaled to a cross section of 5 pb for display purposes.Error bars indicate statistical uncertainties, while shaded bands show post-fit systematic uncertainties.

Figure 3 :
Figure3: Performance of the parameterised DNN for the resonant search, shown as the selection efficiency for the m X = 650 GeV signal as a function of the selection efficiency for the background (ROC curve), for the combined e + e − , µ + µ − and e ± µ ∓ channels.The dashed line corresponds to the DNN used in the analysis, trained on all available signal samples, and evaluated at m X = 650 GeV.The dotted line shows an alternative DNN trained using all signal samples except for m X = 650 GeV, and evaluated at m X = 650 GeV.Both curves overlap, indicating that the parameterised DNN is able to generalise to cases not seen during the training phase by interpolating the signal behaviour from nearby m X points.

Figure 4 :
Figure 4: The m jj distribution in data and simulated events after requiring all selection criteria in the e + e − (top left), e ± µ ∓ (top right), and µ + µ − (bottom) channels.The various signal hypotheses displayed have been scaled to a cross section of 5 pb for display purposes.Error bars indicate statistical uncertainties, while shaded bands show post-fit systematic uncertainties.

Figure 5 :
Figure 5: The DNN output distributions in data and simulated events after requiring all selection criteria, in the e + e − (top), e ± µ ∓ (middle), and µ + µ − (bottom) channels.Output values towards 0 are background-like, while output values towards 1 are signal-like.The parameterised resonant DNN output (left) is evaluated at m X = 400 GeV and the parameterised nonresonant DNN output (right) is evaluated at κ λ = 1, κ t = 1.The two signal hypotheses displayed have been scaled to a cross section of 5 pb for display purposes.Error bars indicate statistical uncertainties, while shaded bands show post-fit systematic uncertainties.

Figure 6 :
Figure 6: The DNN output distributions in data and simulated events, for the e + e − (top), e ± µ ∓ (middle), and µ + µ − (bottom) channels, in three different m jj regions: m jj < 75 GeV, m jj ∈ [ 75, 140 ) GeV, and m jj ≥ 140 GeV.The parameterised resonant DNN output (left) is evaluated at m X = 400 GeV and the parameterised nonresonant DNN output (right) is evaluated at κ λ = 1, κ t = 1.The two signal hypotheses displayed have been scaled to a cross section of 5 pb for display purposes.Error bars indicate statistical uncertainties, while shaded bands show post-fit systematic uncertainties.

Figure 8 :
Figure8: Left: expected (dashed) and observed (continuous) 95% CL upper limits on the product of the Higgs boson pair production cross section and branching fraction for HH → bbVV → bb ν ν as a function of κ λ /κ t .The inner (green) band and the outer (yellow) band indicate the regions containing 68 and 95%, respectively, of the distribution of limits expected under the background-only hypothesis.Red lines show the theoretical cross sections, along with their uncertainties, for κ t = 1 (SM) and κ t = 2. Right: exclusions in the (κ λ , κ t ) plane.The red region corresponds to parameters excluded at 95% CL with the observed data, whereas the dashed black line and the blue areas correspond to the expected exclusions and the 68 and 95% bands (light and dark respectively).Isolines of the product of the theoretical cross section and branching fraction for HH → bbVV → bb ν ν are shown as dashed-dotted lines.The diamond marker indicates the prediction of the SM.All theoretical predictions are extracted from Refs.[12][13][14][15][16][17]84].

Table 1 :
Summary of the systematic uncertainties and their impact on total background yields and on the SM and m X = 400 GeV signal hypotheses in the signal region.