Search for non-resonant Higgs boson pair production in the $2b + 2\ell + E_\mathrm{T}^\mathrm{miss}$ final state in $pp$ collisions $\sqrt{s} = 13\,\mathrm{TeV}$ with the ATLAS detector

A search for non-resonant Higgs boson pair ($HH$) production is presented, in which one of the Higgs bosons decays to a b-quark pair ($b\bar b$) and the other decays to $WW^*$, $ZZ^*$, or $\tau^+\tau^-$, with in each case a final state with $\ell^+\ell^- +$ neutrinos ($\ell = e, \mu$). The analysis targets separately the gluon-gluon fusion and vector boson fusion production modes. Data recorded by the ATLAS detector in proton-proton collisions at a centre-of-mass energy of 13 TeV at the Large Hadron Collider, corresponding to an integrated luminosity of $140\mathrm{fb}^{-1}$, are used in this analysis. Events are selected to have exactly two $b$-tagged jets and two leptons with opposite electric charge and missing transverse momentum in the final state. These events are classified using multivariate analysis algorithms to separate the $HH$ events from other Standard Model processes. No evidence of the signal is found. The observed (expected) upper limit on the cross-section for non-resonant Higgs boson pair production is determined to be 9.7 (16.2) times the Standard Model prediction at 95% confidence level. The Higgs boson self-interaction coupling parameter $\kappa_\lambda$ and the quadrilinear coupling parameter $\kappa_{2V}$ are each separately constrained by this analysis to be within the ranges ${[-6.2, 13.3]}$ and ${[-0.17, 2.4]}$, respectively, at 95% confidence level, when all other parameters are fixed.


Introduction
The Standard Model (SM) of particle physics employs the Higgs mechanism [1][2][3][4][5][6] to formulate a theoretical model in which the weak gauge bosons (, ) and the fermions acquire mass.Following the discovery of the Higgs boson, referred to as , by the ATLAS and CMS collaborations [7,8] at the Large Hadron Collider (LHC) in 2012, one of the remaining open questions is about the structure of the Higgs potential which is still largely unconstrained.The rate of the double Higgs production () process is sensitive to the trilinear Higgs self-interaction coupling  3 , i.ė. the coupling describing a vertex with three Higgs bosons interacting with one another.It is common practice to define a Higgs boson trilinear coupling modifier as the actual trilinear coupling value divided by its SM expectation value:   ≡  3 /   3 .The dominant  production process at the LHC is through gluon-gluon fusion (ggF), which involves either a top-quark box-loop (also referred to as a box-diagram) or decay of an off-shell Higgs boson (also referred to as a triangle-diagram) at lowest order; the Feynman diagrams are shown in Figures 1(a) and 1(b) respectively.The box-diagram and triangle-diagram interfere destructively, leading to a small cross-section for the  →  processes,  ggF = 31.1 +6.7% −23.2% fb, calculated at the next-to-next-to-leading order (NNLO) and including finite top-quark mass effects for a Higgs boson mass of   = 125 GeV [9][10][11][12][13][14][15][16].
The vector-boson-fusion (VBF) process is the sub-leading Higgs boson pair production mode, with a cross-section of  VBF = 1.726 ± 2.1% fb calculated at next-to-next-to-next-to-leading order (N3 LO) QCD for a Higgs boson mass of   = 125 GeV [17].The production of  via fusion of vector bosons () is not only sensitive to   but also depends on the coupling modifier of the Higgs boson to vector bosons (  ) and on another coupling modifier related to the quartic vertex involving two vector bosons and two Higgs bosons ( 2 ).The corresponding diagrams are shown in Figure 2.
In the presence of physics beyond the Standard Model (BSM), the  production cross-section can be altered by modifying the value of the self-coupling  3 , leading to values for   different from the SM prediction as discussed in Refs.[18,19].Similarly, modifications of the coupling to vector bosons can alter  2 .Probing the Higgs boson self-coupling can therefore provide additional information about the validity of the SM.
Searches for SM  production through the decay channels  →  [20],  → 4 [21] and  →  [22] were performed using the full Run 2 dataset collected by the ATLAS experiment at √  = 13 TeV.The results from these three channels have been combined [23] to improve the search sensitivity.The observed (expected in the absence of  production) combined upper limit on the production rate of SM Higgs boson pairs, at the 95% confidence level (CL), was found to be 2.4 (2.9) Figure 2: The LO Feynman diagrams for the vector-boson-fusion process of Higgs boson pair production, where   ,   , and  2 are the coupling modifiers related to the , , and  vertices.
This paper focuses on a data analysis using 2 + 2ℓ +  miss The experimental signature is characterised by two -tagged jets with the invariant mass   close to   , two leptons with opposite charges, and large missing transverse energy  miss T due to neutrinos escaping detection.The decay channel with a Higgs boson decaying to two  bosons and the subsequent final state with two leptons and two quark-initiated jets from the   system,   → 2ℓ2, is also considered.Although this final state does not have  miss T due to the presence of neutrinos, it can have  miss T due to detector and reconstruction inefficiencies.Dominant background arises from top-quark processes ( t,  and  t with  = , ), and production of a  boson associated with heavy-flavour jets (+HF).The search uses the full Run 2 dataset collected with the ATLAS experiment in proton-proton ( ) collisions at 13 TeV, corresponding to a total integrated luminosity of 140 fb −1 .
A search for  production in the 2 + 2ℓ +  miss T final state considering only  →  * using the full Run 2 dataset has already been performed [31]; the work presented in this paper significantly improves the analysis techniques by optimizing a deep neural network (DNN) to better classify the 2 + 2ℓ +  miss T events into  signal or background and considers additional decay channels, leading to a factor of two better sensitivity.Furthermore, studies of  production through the VBF process are included for the first time in the 2 + 2ℓ +  miss T final state.In addition, the compatibility of the data with non-SM values of the   and  2 parameters is constrained in this paper.

ATLAS detector
The ATLAS detector [32] at the LHC covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range || < 2.5.The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [33,34].It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track.These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to || = 2.0.The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range || < 4.9.Within the region || < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering || < 1.8 to correct for energy loss in material upstream of the calorimeters.Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within || < 1.7, and two copper/LAr hadron endcap calorimeters.The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets.The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector.Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region || < 2.7, complemented by cathode-strip chambers in the forward region, where the background is highest.The muon trigger system covers the range || < 2.4 with resistive-plate chambers in the barrel and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [35].The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [36] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe.The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards.Cylindrical coordinates (, ) are used in the transverse plane,  being the azimuthal angle around the -axis.
The pseudorapidity is defined in terms of the polar angle  as  = − ln tan(/2).Angular distance is measured in units of

Data and Monte Carlo samples
The analysis is based on the full Run 2 dataset collected in   collisions at the LHC with a centre-of-mass energy of √  = 13 TeV and a 25 ns bunch crossing interval.The data used in this analysis are required to have been recorded while all relevant components of the ATLAS detector were in their nominal operating conditions [37].The integrated luminosity of the dataset collected over the full Run 2 data-taking period and suitable for physics analysis corresponds to 140 fb −1 [38].The candidate events with oppositely charged leptons are selected based on a combination of single-lepton and di-lepton triggers. 2The use of a given trigger depends on the flavour and the transverse momenta ( T ) of the two leptons in the event, and on the data-taking period.Single-lepton triggers have  T thresholds between 21 GeV and 28 GeV.Di-lepton triggers are only considered if no single-lepton trigger criteria are met and have  T thresholds as low as 13 (9) GeV for the leading (sub-leading) lepton.
Monte Carlo (MC) simulation [39] is generally used to model the  signal and SM background processes.The generation of the physics processes of interest, including the underlying event and immediate decays, are carried out by dedicated event generators.The generated events are then passed through a simulation of the ATLAS detector based on Geant4 [40].Some background MC event samples, including +jets with 10 GeV <  ℓℓ < 40 GeV and the  t and  alternative samples used for estimating systematic uncertainties, are simulated through the ATLAS fast simulation framework, Atlfast-II (AF2) [39] instead.Additionally, the effect of multiple interactions in the same and neighbouring bunch crossings (pile-up) is modelled by overlaying the simulated hard-scattering event with inelastic   events generated with Pythia 8.186 [41] using the NNPDF2.3loset of parton distribution functions (PDF) [42] and the A3 set of tuned parameters [43].A summary of the event samples used for the simulation of the signal and background processes is shown in Table 1.
SM  signal events produced via the ggF production mechanism are generated with Powheg Box v2 at next-to-leading order (NLO) interfaced with Pythia 8.244 [44] using the PDF4LHC15nlo [45] PDF set and the A14 tune [46].For parton shower variations, Herwig 7.1.5[47] is used instead.Samples for non-SM ggF production, i. e. with   ≠ 1 are obtained from simulated samples at different values of these coupling modifiers and combined using morphing techniques [48] with detailed validation studies of this procedure available in Ref. [49].The SM VBF  signal samples are generated with MadGraph5_aMC@NLO 2.7.3 [50] using the NNPDF3.0nlo[51] PDF set and the A14 tune [46] interfaced to Pythia 8.244 [44] with Herwig 7.2.1 [47] used for parton shower variations.As is the case in Ref. [23], signal templates with coupling modifiers (  ≠ 1,  2 ≠ 1) are obtained by linear combination of six samples with different values for the   and  2 parameters [52].Therefore five more independent samples with (0, 0), (1, 0.5), (1, 3), (2, 1) and (10, 1) are generated as well with MadGraph interfaced to Pythia8.To vary only  2 , three independent samples are enough: two more independent samples with  2 = 1.5 and  2 = 2 are generated in the same manner.The decays of bottom and charm hadrons of all SM  signal events were performed by EvtGen 1.6.0[53].
SM background processes are generally estimated using simulation, although the cross-sections of the dominant background components,  t,  and +jets, are constrained using control regions (CRs) to better predict their contribution in the signal regions (SRs).The estimate is therefore said to be semi-data-driven with the normalization being constrained from data and the shape being taken from simulation.For processes not constrained from dedicated CRs, both the shape and the overall normalization are taken from simulation.In addition, the fake-lepton background is estimated in a data-driven way.
SM top-quark production ( t) and the production of top-quarks in association with  bosons () contribute as a significant background contamination in the 2 + 2ℓ +  miss T final state.At NLO, non-trivial interference arises between these two processes that may be enhanced in phase-space regions with high fractions of  events [54].The two most commonly used schemes to remove the overlap between these two processes are the diagram removal (DR) and diagram subtraction (DS) schemes [55].The former is used in the present analysis to remove the overlapping events while the latter is used to evaluate the systematic uncertainty in corresponding background event yields.
The decays of bottom and charm hadrons of all SM background events not simulated with Sherpa were performed by EvtGen 1.6.0[53].
Production of  t events is modelled using the Powheg Box v2 [56][57][58][59] generator at NLO with the NNPDF3.0nlo[51] PDF set and the ℎ damp parameter3 set to 1.5 times the top-quark mass [60].The events are interfaced to Pythia 8.230 [44] to model the parton shower, hadronisation, and underlying event, with parameters set according to the A14 tune [46] and using the NNPDF2.3loset of PDFs [42].In order to reduce mis-modelling, events are reweighted [61] to match the predictions at NNLO QCD and NLO EW based on the true  T -value of the top-quark (i.e. not the anti-top-quark) provided by the MC event samples.
The +jets samples ( =  or ) are simulated with Sherpa 2.2.1 [63] interfaced with NNPDF3.0nlo[42] for both the matrix element (ME) calculation and the parton shower (PS) tuning.The merging of different parton multiplicities is achieved through a matching scheme based on the CKKW-L [64,65] merging technique using a scale parameter of   = 20 GeV.The modelling of higher jet multiplicities relies on the parton shower algorithm.The parton shower and underlying event models used are the ones provided internally by Sherpa.The Sherpa 2.2.1 generator adopts a full 5-flavour scheme, with massless -and -quarks in the matrix elements, while massive quarks can be produced in the parton shower.The +jets samples are split according to the  T of the vector boson and the scalar sum of jet transverse momenta,  T , of the event, introducing a cut at generation level and producing samples for different slices in max( T ,   T ) where   T is defined as the transverse momentum of the true lepton pair from the decay of the  boson.The samples are also generated by applying different filters to select the flavour composition of the jets produced in association with the  boson.
Di-boson processes with four charged leptons, three charged leptons and one neutrino, or two charged leptons and two neutrinos are simulated with the Sherpa 2.2.2 event generator [63], and di-boson processes with leptons and jets are simulated with Sherpa 2.2.1 [63].The matrix elements contain all diagrams with four electroweak vertices.They are calculated for up to one parton at NLO and up to three partons at LO using Comix [66] and OpenLoops [67] and merged with the Sherpa parton shower according to the ME+PS@NLO prescription [68].The NNPDF3.0nnloPDF set is used in conjunction with dedicated parton shower tuning developed by the Sherpa authors.The event generator cross-sections are used in this case (already at NLO).

Object definition and event selection
Proton-proton interaction vertices are reconstructed in events with at least two tracks, each with  T > 0.5 GeV.The primary hard-scatter vertex for each event is defined as the one with the highest value of the sum of squared track transverse momenta [105].
Electron candidates are reconstructed from energy deposits measured in the electromagnetic calorimeter which are matched to ID tracks [106].They are required to satisfy || < 2.47, excluding the calorimeter transition region 1.37 < || < 1.52, and have a transverse momentum  T > 10 GeV.Electron candidates are required to satisfy a 'medium' identification criterion based on the use of shower shape, track-cluster matching and TRT parameters in a likelihood-based algorithm [106].Additionally, a 'loose' isolation requirement [106] is applied to electron candidates to ensure that they are well separated from other objects in the event.This requirement is based on the momentum of nearby tracks and calorimeter energy deposits within a cone around the electron candidate.
Muon candidates are reconstructed from high-quality tracks found in the MS [107].A matching of these tracks to ID tracks is required in the region || < 2.5.Muon candidates are required to have || < 2.7 and  T > 9 GeV, and to satisfy a 'medium' identification criterion [108].Additionally, a 'loose' isolation requirement [108] is imposed on muon candidates to ensure that they are well separated from other objects in the event.This requirement is based on investigating the nearby activities within a cone around the muon candidate.
Jets are reconstructed using the anti-  algorithm [109,110] with a radius parameter of  = 0.4.It is applied to || < 4.5 noise-suppressed positive-energy topological energy clusters [111,112] and charged particle tracks processed using a particle-flow algorithm [113].Pile-up is taken into account in the formation of topological energy clusters [114].Jet candidates are required to have  T > 20 GeV and those with || < 2.5 are considered to select the candidate jets for the reconstruction of the hadronic decay of the Higgs to -tagged jets while jets with || > 2.5 are considered 'forward' jets.To reject jet candidates originating from pile-up interactions, they must satisfy a tight pile-up suppression requirement based on a multivariate method [115].The method removes jets that appear to be inconsistent with the primary vertex and have  T < 60 GeV [105].In the end, a neural network-based -tagging algorithm [116] is used to identify jets containing -hadrons (-tagged jets) at a 77% efficiency working point in simulated  t events.The rejection rate for this working point is ∼ 6 for jets originating from -quarks and ∼ 200 for jets originating from light quarks [116].
The missing transverse momentum,  miss T , the magnitude of which is denoted as  miss T , quantifies the amount of energy and momentum carried by invisible particles that do not interact with the detector.It is determined as the magnitude of the negative vectorial sum of the transverse momenta of the selected and calibrated physics objects, including jets, electrons, and muons, and inner detector tracks from the hard-scatter collision vertex not associated with any physics object [117].
Different reconstructed objects can share the same detector signature, leading to ambiguities in the identification of these objects.To resolve these ambiguities, as in [22], an overlap removal procedure is performed sequentially, such that only objects that survive the previous step are considered in the following steps.The steps are as follows: if any electrons share a track, only the highest  T electron is kept.If a hadronically decaying -lepton candidate is within 4 Δ  = 0.2 of any electron or muon, it is removed.If an electron and a muon share a track, the muon is kept only if it is associated with a signature in the muon spectrometer.Any jet within Δ  = 0.2 of an electron and subsequent any electron within Δ  = 0.4 of any jet is removed.Any jet within Δ  = 0.2 of a muon, or having an inner detector track ghost-matched [118] to a muon within Δ  = 0.2 of the jet, is removed if it has fewer then three associated tracks.Any muon within Δ  = 0.4 of a jet is removed as well as any jet within Δ  = 0.2 of a hadronically decaying -lepton candidate.The analysis selects candidate events that contain exactly two opposite-charge light leptons, either electrons or muons, and exactly two -tagged jets.While the signal processes contain sources of  miss T , an explicit selection on  miss T is not performed, to ensure high training statistics available for the MVA discriminants.To suppress the contribution of misidentified jets, the analysis excludes events that contain jets satisfying one of the bad jet criteria [115].The yields of the ggF and VBF signal samples at different preselection steps are summarized in Table 2; about an order of magnitude of reduction in the event yields is observed from this initial selection.
The events that satisfy the preselection criteria are further divided into two categories: the signal region (SR) and dedicated background control regions (CRs) for  plus heavy flavour jets (+HF),  t and , as depicted in Figure 3.The greyed out region is not considered since it does not add significantly to the sensitivity.The CRs are used to constrain the background normalization in the SRs.The SR and CRs are separated based on the invariant mass of the two leptons, with a lower threshold set at 15 GeV and an upper threshold set at 75 GeV for same flavour (SF) leptons and 110 GeV for different flavour (DF) leptons, as illustrated in Figure 3(a) and Figure 3(b).The mass of the di-lepton system is required to be  ℓℓ > 110 GeV for the CRs enriched in events coming from top-quark processes.This region is further split into dedicated  tand -enriched control regions by requiring 5  ℓ [54] to be below or above 250 GeV, respectively.The +HF CR is required to have two SF leptons in the events, and to satisfy the di-lepton invariant mass requirement, 75 GeV <  ℓℓ < 110 GeV, excluding events where the invariant mass of the two -tagged jets satisfies 40 GeV <   < 210 GeV.
A VBF selection is applied to further categorize the events into two orthogonal ggF/VBF enriched regions by reverting (for ggF-like regions) or applying (for VBF-like regions) the VBF selection.The VBF selection requires the presence of at least two forward jets: these jets must have  T > 30 GeV, at least one pair of these extra jets must have a minimum pseudorapidity separation larger than 4.0, and at least one pair of these jets must have a mass larger than 600 GeV.The jets pairs considered are not required to be the same for the pseudorapidity separation and mass requirements; in less than ∼ 1% of the MC events, different pairs of jets are chosen.The VBF selection has a relative selection efficiency of ∼ 60% on VBF signal events.Table 3 shows the yields for SM background processes and non-resonant SM ggF and VBF signals in the SRs and CRs with their statistical uncertainty taken either directly from MC or from the statistics of the template, derived as described in Section 6, in case for the fake-lepton background.For two minor backgrounds negative yields are observed due to NLO MC statistical fluctuations; the total is always positive.The SRs are then used to extract the final results, with the backgrounds constrained by the CRs, as where either the electron () is leading in  T or the muon ().The greyed-out region is excluded as it makes a negligible contribution to the final results.The  ℓ discriminant variable [54] is used to further separate the top CR into separate  t and  control regions.
described later.To enhance the sensitivity to the signal process and maximize the rejection of the expected SM backgrounds, a multivariate approach is used to select signal events, as described in the next section.

Multivariate analyses
Events that pass the selections described in the previous section are then passed through a MVA to separate rare signal events from the large amount of background events.A DNN and a Boosted Decision Tree (BDT) are used to classify events into the ggF and VBF categories, respectively.The outputs of the MVA models are used as discriminants in the statistical analysis discussed in Section 8.
In the ggF category, the Keras library [119] with Tensorflow as the backend [120] is used to design a DNN classifier with a multi-output architecture to optimise the separation between the ggF  signal, background from  t and  and all other background processes simultaneously.In a first stage, 50% of the simulated events are used to optimize the set of input variables and the hyperparameters of the DNN in a five-fold cross-validation with a train/test split fraction of 80%/20%.The final set of input variables is selected based on the permutation feature importance [121] to keep only the most important ones; this strategy allows the analysis to not compromise the performance but reduces the complexity of the model and therefore its goodness given the finite training statistics.These input variables are listed in Table 4, with  miss T significance being ranked as the ninth and  miss T being ranked as the twelfth most important input variables with respect to the overall sensitivity.The data are well described by MC for these variables.The hyperparameters of the DNN are optimized through an automatic process using the Optuna [122] package, with the figure of merit being the 95% CL upper limit on the signal strength, i. e. the ratio of the measured  production cross section to its SM prediction, without considering systematic uncertainties, using the CL  prescription [123].The final DNN model includes nine fully connected (FC) layers, each with 512 nodes, and is trained at a learning rate of 0.00011, with ReLU activations [124] used for the FC layers and softmax activations used for the output layer.It is trained with a two-fold cross-validation strategy with a train/test split fraction of 50%/50%.A dropout rate of 0.3 is applied to the FC layers to prevent over-fitting; as a result the loss curve reaches a plateau for both training and test datasets at the end of the training.The score of the signal output node is binned so that the statistical uncertainty of the backgrounds is less than 30% and ∼ 1  signal event from the combined ggF and VBF production Table 3: Pre-fit yields for SM background processes and non-resonant SM ggF and VBF signals in the SRs and CRs.The event yields in the row 'Fakes' come from the data-driven fake-lepton background estimate described in Section 6.The event yields for +jets processes are split into ones with heavy-flavour (HF) and light-flavour (LF) jets as defined in Section 6.A dash represents no contribution of the respective MC event sample in the respective region.Uncertainties are from MC statistics and template statistics only.  of the two Higgs bosons modes is expected in each bin.The seven bins with the highest DNN output score are further considered for the statistical analysis, leading to a maximum of O ( 102 ) background events being present in a bin to be considered as part of the final signal region.
In contrast to the ggF event category, due to limited statistics and a tighter phase space in the VBF category, a DNN approach cannot be fully exploited for such events.Therefore, a BDT classifier is trained using the adaptive boosting (AdaBoost) method and the TMVA framework [128] with a cross-validation setup.Input variables are listed in Table 5, with  miss T significance being ranked as the fourteenth most important input variable with respect to the overall sensitivity.The data are well described by MC for these variables.
The training parameters are optimized to have 350 trees with a maximum depth of four and a minimum terminal node size of 2.5%.The signal tree provided to the algorithm consists exclusively of VBF  events, while the background consists of ggF  signal events and other SM background events.The ggF  events are classified as background since maximising the sensitivity to the VBF  production mode is the goal of this MVA discriminant.A two-fold cross-validation was utilized with a train/test split fraction of 50%/50%.Each BDT was employed for the half of the dataset it had not been trained on, ensuring a reliable evaluation of performance across the entire dataset.After careful examination, it has been decided to train the BDT on a VBF sample with   = 0, as it showed the best overall performance across a set of SM and BSM scenarios.The BDT output score is binned into ten bins taking into account their signal over background ratio, with the most sensitive bin being required to have a minimum of ∼ 2 background events.The five bins with the highest BDT output score are further considered for the statistical analysis, leading to a maximum of O ( 103 ) background events being present in a bin to be considered as part of the final signal region. 6

Background estimation
The dominant background processes expected to contribute to the signal region are top quark pair production ( t), single top-quark in association with a  boson (), and / * production in association with heavy-flavour jets.The contributions of these background processes are constrained in dedicated CRs as defined in Section 4.
The simulated +jets background events are divided into heavy flavour (HF) and light flavour (LF), based on the generator-level information on the true origin of the jets.If at least one of the two  T -ordered leading jets in the +jets sample is matched at generator level with either a -or -quark, events are classified as 'heavy flavour'.All other +jets events, i. e. events where at least one of the two  T -leading jets is a light flavour jet, are classified as light flavour.As the cross-section of  boson production in association with heavy flavour jets is known to be mis-modelled [131,132], the MC prediction is constrained with data in control regions enriched in the +HF processes.
The contribution from events containing photons or jets that are mis-identified as leptons as well as leptons from the hadronic decays of heavy flavour quarks is collectively referred as 'fake-lepton' background.It is estimated using a data-driven method.
Firstly, the corresponding CR is defined to have the same selections as the SR, except reverting the opposite-sign (OS) requirement of the di-lepton system in the SR definition to have only same-sign (SS) lepton pairs.The contribution of fake-lepton events in the SS region is then estimated by subtracting the predicted contribution from prompt leptons of SM backgrounds from the observed number of the SS di-lepton events in the data.Next, transfer factors binned as a function of sub-leading lepton  T are calculated as the ratio of the number of OS fake-lepton events to the number of SS fake-lepton events, , as estimated from the background MC events; they range from 1.2 to 1.9.In the end, the number of fake-lepton events in the SRs is extrapolated by applying the transfer factors to the number of fake-lepton events estimated in the SS region:

Systematic uncertainties
The results account for several sources of systematic uncertainty on the signal and background processes, which are classified as either experimental (detector-or luminosity-related) or theoretical modelling uncertainties.Statistical uncertainties of the simulated event samples are also taken into account.The total pre-fit event yields, with their uncertainties, and the background composition in the different signal and control regions, are shown in Figure 4.
The uncertainties due to experimental sources are primarily due to the mismeasurement of reconstructed object momentum and to the finite level of precision when determining reconstruction efficiencies.They include uncertainties on the jet energy scale [114] and jet energy resolution [133].Additional uncertainties for -tagged jets arise from the precision of the -tagging efficiency and from the rates at which charmand light-flavoured jets are selected as -tagged jets [116].Lepton-related uncertainties arise on the electron [134] and muon [111] reconstructed energy (momentum) measurements, as well as on the precision of their reconstruction and identification efficiencies.The  miss T scale and resolution [117] uncertainties, as well as uncertainties from the mis-modelling of pile-up, trigger efficiency and luminosity, are also taken into account.The uncertainty in the combined 2015-2018 integrated luminosity is 0.83% [38], obtained using the LUCID-2 detector [135] for the primary luminosity measurements, complemented by measurements using the inner detector and calorimeters.
The normalization corrections on the dominant background processes, namely  t,  and  + jets, are determined using data in the control regions during the statistical analysis.The experimental uncertainties and the systematic uncertainties from the modelling of these processes are taken into account and constrained during the fitting process.For the  t and  processes, the QCD uncertainty is estimated by comparing events with different renormalization scale (  ) and factorization scale (  ) settings, where the largest 7 The centrality  ℓℓ centrality indicates the location of the missing transverse energy with respect to the two final state leptons in the -plane.It is one when the direction of  miss T lies directly between the two leptons and 1/ √ 2 if it is aligned with one of the leptons.It is defined as and   deviation is chosen as the systematic uncertainty.The uncertainties arising from the modelling of initialand final-state radiation in the generators used to simulate the  t () background processes are evaluated using the method described in ref. [136].The effects of uncertainties due to the choice of PDF and the value of  S are evaluated by varying the PDF as well as the value for  S and taking the maximum variation into account as uncertainty.The ME and PS uncertainties are estimated by comparing events from the nominal simulation samples with events from samples using alternative ME generators and PS generators, taking their difference as the respective uncertainty and then symmetrising it.The uncertainty arising from the interference between the NLO predictions for  t and  processes is estimated by taking the difference between the predicted yields obtained with the DR and DS schemes [136].The  + jets modelling uncertainties are estimated using the nominal Sherpa 2.2.1 samples by considering different merging and re-summation scales [137,138].The uncertainties due to PDF variations and changes in   and   are calculated using the same procedures as for the  t and  backgrounds.
Systematic uncertainties in the signal acceptance obtained by varying   and   , as well as PDF-induced uncertainties, are taken into account as recommended [15] and evaluated using the same procedure as for the top-quark background processes.The uncertainty due to the parton shower modelling is computed by comparing Herwig 7 with Pythia 8.The uncertainty in the  production cross-section, evaluated to be ±3% for PDF+ S and +6% −23% for the combined scale and top-quark mass scheme for ggF, and ±2.1% for PDF+ S and +0.03% −0.04% for the scale for VBF, is included as an uncertainty in    /→ /    when computing the upper limits on the signal strength.

Statistical procedure
The statistical procedure used to interpret the data is described in Ref. [139].The likelihood function is constructed from the product of Poisson probabilities: where   and   are the signal and background contributions in the -th bin of the fitted variable distribution;  is the signal strength,   is the normalization factor for the respective background, and  denotes the nuisance parameters, which account for the uncertainties of the measurements; G( θ |) is the Gaussian scaling function of the nuisance parameters constructed as deviations from the nominal model of the systematic uncertainties, where θ provides a maximum likelihood estimate for .The parameter of interest in the statistical analysis is the global signal strength factor  = / SM , which acts on the total number of events predicted by the signal model.This factor is defined such that  = 0 corresponds to the background-only hypothesis and  > 0 corresponds to a  signal in addition to the background.Hypothesised values of  are tested based on the profile likelihood ratio [140], which compares data with background-only () and signal+background ( + ) models using the following test statistic: where μ and θ are the values for  and  when maximizing  with all parameters floating (referred to as the unconditional maximum-likelihood (ML) estimators).The θ is the conditional ML estimator of  for a fixed value of .This test statistic extracts the information on the signal strength from a full likelihood fit to the data.The likelihood function includes all the parameters that describe the systematic uncertainties and their correlations.The constraints on the coupling modifiers are obtained by using the respective modifier (  or  2 ) in place of the signal strength as the parameter of interest in the profile likelihood ratio.
Exclusion limits at 95% CL are based on the CL  prescription [123], and they are set on the  production cross-section times branching fraction divided by the corresponding SM prediction.The statistical analysis uses the distributions of the MVA output score in both the ggF and VBF SRs as final discriminant.The background CRs are used to constrain the overall normalization for the  t,  and +HF backgrounds in the SR.Other background normalizations and shapes are fixed with prior uncertainties included in the fit.

Limits on 𝑯𝑯 production
All regions used for this search are displayed in Figure 5 after a fit with the signal strength fixed to the upper limit, while pre-fit distributions are shown in Figure 4.A downward fluctuation of the data in the last bin of both the NN and BDT distributions is observed.A negative signal strength has been extracted from the fit:    = −8.5 +7.7 −8.4 .Using the approach described in Section 8.1, the upper limits on the signal strength parameter for Higgs boson pair production with consideration of both the ggF and VBF   signals are determined.Figure 6 summarizes the ggF, VBF and combined results with the impact of all systematic uncertainties being shown.The observed combined upper limit at 95% CL on the signal strength is    = 9.7.In the CRs, the normalization and modelling of the backgrounds play a prominent role, and they dominate the sources of uncertainty.In the SRs, the systematic uncertainties mostly arise from background modelling, experimental sources and the signal normalization.In the most sensitive bins, i. e. ggF-SR 1 to ggF-SR 3 and VBF-SR 1 and VBF-SR 2, the statistical uncertainty becomes dominant due to the limited number of expected events.enhancement of sensitivity from the presence of VBF  events within the ggF SR.A likelihood scan is performed to set constraints on the  2 parameter; this is shown in Figure 7(b).Values for  2 are constrained to be within the range of [−0.17, 2.4] at 95% CL which is slightly better than the expected range of [−0.51, 2.7] at 95% CL due to the observed downward fluctuation of the data.

Conclusion
A search for non-resonant Higgs boson pair production via the ggF and VBF production modes is performed.It probes decay channels with one of the Higgs bosons decaying to  b and the other to either  * ,   * , or  +  − .Selected events contain exactly two -tagged jets, two light leptons with opposite electric charge and missing transverse energy.The analysis employs 140 fb −1 of   collision data at √  = 13 TeV, recorded by the ATLAS detector at the LHC.The results are consistent with the predictions for the SM background processes.An observed (expected) 95% CL upper limit on the cross-section for the production of Higgs boson pairs is set at 9.7 (16.2) times the SM prediction, which is a significant improvement compared to the previous ATLAS search in this channel.The analysis also establishes separate 95% CL limits on the Higgs coupling parameters   and  2 excluding values outside the ranges [−6.2, 13.3] and [−0.17, 2.4], respectively.These ranges are obtained under the assumption that all other couplings, except the individual coupling being tested, are set to their SM values.
[17] F. A. Dreyer and A. Karlberg, Vector-Boson Fusion Higgs Pair Production at N 3

Figure 1 :
Figure 1: The leading order (LO) Feynman diagrams for the gluon-gluon fusion process of Higgs boson pair production at the LHC, where   denotes the Higgs boson trilinear coupling modifier   ≡  3 /

Figure 3 :
Figure 3: Definition of signal and control regions for same lepton flavour (a) and different lepton flavour (b) events,where either the electron () is leading in  T or the muon ().The greyed-out region is excluded as it makes a negligible contribution to the final results.The  ℓ discriminant variable[54]  is used to further separate the top CR into separate  t and  control regions.

Figure 4 :
Figure4: Pre-fit yields of the  t, +HF and  CRs, both for the ggF and VBF event selection, as well as the highest-score bins, numbered from high (VBF-SR 1 and ggF-SR 1) to low score (VBF-SR 5 and ggF-SR 7), of the BDT and DNN output distribution in the VBF and ggF event categories, respectively, as used in the final result.The shaded bands include both statistical and systematic uncertainties.

Figure 5 :
Figure5: Post-fit yields from the signal+background fit of the  t, +HF and  CRs, both for the ggF and VBF event selections, as well as the highest-score bins, numbered from high (VBF-SR 1 and ggF-SR 1) to low score (VBF-SR 5 and ggF-SR 7), of the BDT and DNN output distribution in the VBF and ggF event categories respectively as used in the final result.The fit is a conditional fit with the signal strength fixed to the observed upper limit of    = 9.7.The shaded bands include both statistical and systematic uncertainties.

Figure 6 :
Figure 6: Observed and expected upper limits on the ratios of the Higgs boson pair production cross-section to the corresponding Standard Model prediction    / SM  for the ggF  signal only (top row), the VBF  signal only while considering ggF  as background (second row) and the combined ggF+VBF  signal considering only the ggF SR (third row) and considering all SRs (bottom row) at a 95% confidence level.The relative ratio between the ggF and VBF production modes is fixed to the SM value.

Table 1 :
Summary of nominal SM background processes considered in the analysis along with a description of the event generators used for matrix element (ME) generation, the set of parton distribution functions (PDF), the hadronisation, parton shower (PS) and underlying event (UE) model, and the underlying event tune.

Table 2 :
Cutflow for event selection using SM / →  signal samples in various decay channels.For both ggF and VBF signal samples, the SM  cross-section, , and branching ratio, B, are assumed when computing event yields for a luminosity of L = 140 fb −1 .Efficiencies are different for   (→ 2ℓ2) compared to   (→ 2ℓ2) since the initial number of events considers  →  while the former does not.

Table 4 :
Input features used for the DNN in the ggF category.Indices 0 and 1 refer to  T -leading and  T -sub-leading objects respectively.Δ ℓℓ , Δ  Δ between the two leptons and two -tagged jets  ℓ min{max(  0 ℓ 0 ,   1 ℓ 1 ), max(  0 ℓ 1 ,   1 ℓ 0 )} [54] min Δ ℓ minimum Δ of all -tagged jet and lepton combinations  ℓℓ invariant mass of the ℓℓ system  miss T ,  miss T -sig missing transverse energy and its significance [127]  T (ℓ 0 ,  miss T ) transverse mass of the  T -leading lepton with respect to  miss

Table 5 :
Input features for the BDT algorithm in the VBF category.The usage of  in variable names indicates that only non -tagged jets being considered.Indices 0 and 1 refer to  T -leading and  T -sub-leading objects respectively. ℓ 1 ,  ℓ 0 ,  ℓ 1 ,  ℓ 0 T ,  ℓ 1 T , ,  T of the  T -(sub)leading lepton   0 ,   1 ,   0 ,   1 ,   0 T ,   1 T , ,  T of the  T -(sub)leading -tagged jet   0 ,   1 ,   0 ,   1 ,  Δ  , Δ  ,    T , Δ, Δ and invariant mass of di--jet system  ℓℓ T , Δ ℓℓ , Δ ℓℓ ,  ℓℓ ,  ℓℓ centrality  T , Δ, Δ,  T and centrality 7 of di-leptons system  ℓℓ  T of ℓℓ +  miss T +  T -leading and -sub-leading jet  tot invariant mass of ℓℓ +  miss T +  T -leading and -sub-leading jet  KLF  Kalman fitter top-quark mass [129] min Δ ℓ 0  , min Δ ℓ 1  minimum Δ between  T -(sub)leading ℓ- couples  ℓ  sum of the invariant masses of all ℓ+jet combinations max    T , max    maximum  T and invariant mass of any two non -tagged jets max Δ   , max Δ   maximum Δ and Δ between any two non -tagged jets min Δ ℓ minimum Δ of all -tagged jet and lepton combinations  forward jets ,   number of forward jets, number of non -tagged jets MMC value of the MMC algorithm (reconstruction of    )[130] This analysis is extended by performing likelihood scans on the   and  2 parameters.The single Higgs boson background has a small dependence upon   through loop effects, which is neglected.Coupling modifiers other than the one tested in the respective scan are set to their SM value.Hence, the   ,   and  2 coupling modifiers are set to   =   =  2 = 1 for the   -scan.Both the ggF and VBF  signal regions are used in the analysis, and the result is shown in Figure7(a).The observed result constrains   to be within the range [−6.2, 13.3] at 95% CL which is slightly better than the expected range of [−8.1, 15.5] at 95% CL due to the observed downward fluctuation of the data.A scan over the  2 parameter is also conducted.Again, all other couplings are set to their respective SM values.Although only the VBF production mode of Higgs boson pairs is sensitive to the  2 coupling modifier, both the ggF and VBF SRs are included in the scan, allowing the analysis to achieve a slight LO, Phys.Rev. D 98 (2018) 114016, arXiv: 1811.07906[hep-ph].[18] L. Di Luzio, R. Gröber and M. Spannowsky, Maxi-sizing the trilinear Higgs self-coupling: how large could it be?, Eur.Phys.J. C 77 (2017) 788, arXiv: 1704.02311[hep-ph].[19] G. D. Kribs, A. Maier, H. Rzehak, M. Spannowsky and P. Waite, Electroweak oblique parameters as a probe of the trilinear Higgs boson self-interaction, Phys.Rev. D 95 (2017) 093004, arXiv: 1702.07678[hep-ph].[20] ATLAS Collaboration, Search for Higgs boson pair production in the two bottom quarks plus two photons final state in   collisions at √  = 13 TeV with the ATLAS detector, Phys.Rev. D 106 (2021) 052001, arXiv: 2112.11876[hep-ex].[21] ATLAS Collaboration, Search for resonant pair production of Higgs bosons in the  b b final state using   collisions at √  = 13 TeV with the ATLAS detector, Phys.Rev. D 105 (2022) 092002, arXiv: 2202.07288[hep-ex].[22] ATLAS Collaboration, Search for resonant and non-resonant Higgs boson pair production in the  b +  − decay channel using 13 TeV   collision data from the ATLAS detector, JHEP 07 (2022) 040, arXiv: 2209.10910[hep-ex].