Search for Higgs boson pair production in association with a vector boson in pp collisions at s=13TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=13\,\text {TeV}$$\end{document} with the ATLAS detector

This paper reports a search for Higgs boson pair (hh) production in association with a vector boson (WorZ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W\; {\text {o}r}\; Z$$\end{document}) using 139 fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-1}$$\end{document} of proton–proton collision data at s=13TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=13\,\text {TeV}$$\end{document} recorded with the ATLAS detector at the Large Hadron Collider. The search is performed in final states in which the vector boson decays leptonically (W→ℓν,Z→ℓℓ,νν\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W\rightarrow \ell \nu ,\, Z\rightarrow \ell \ell ,\nu \nu $$\end{document} with ℓ=e,μ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell =e, \mu $$\end{document}) and the Higgs bosons each decay into a pair of b-quarks. It targets Vhh signals from both non-resonant hh production, present in the Standard Model (SM), and resonant hh production, as predicted in some SM extensions. A 95% confidence-level upper limit of 183 (87) times the SM cross-section is observed (expected) for non-resonant Vhh production when assuming the kinematics are as expected in the SM. Constraints are also placed on Higgs boson coupling modifiers. For the resonant search, upper limits on the production cross-sections are derived for two specific models: one is the production of a vector boson along with a neutral heavy scalar resonance H, in the mass range 260–1000 GeV, that decays into hh, and the other is the production of a heavier neutral pseudoscalar resonance A that decays into a Z boson and H boson, where the A boson mass is 360–800 GeV and the H boson mass is 260–400 GeV. Constraints are also derived in the parameter space of two-Higgs-doublet models.


Introduction
Since the discovery of the Higgs boson (ℎ) with a mass ( ℎ ) of approximately 125 GeV in 2012 [1, 2], its couplings to vector bosons and fermions have been found to be consistent with Standard Model (SM) predictions within the current measurement precision [3][4][5], providing strong evidence that the Higgs boson has SM properties. The SM also predicts the Higgs boson self-coupling (or trilinear coupling) as well as quartic couplings with itself and with massive vector bosons. These couplings are direct consequences of electroweak symmetry breaking (EWSB) [6][7][8] and are yet to be confirmed experimentally.
The Higgs boson self-coupling and quartic coupling to vector bosons can be probed through studies of Higgs boson pair (ℎℎ) production. In proton-proton ( ) collisions, SM production of ℎℎ is dominated by the gluon-gluon fusion (ggF) process [9,10]. Extensive searches for this process have led to significant bounds on the Higgs boson self-coupling [11][12][13][14]. Production through the vector-boson fusion (VBF) process has the second largest cross-section [9,10]. Searches for VBF ℎℎ production have resulted in additional constraints on the Higgs boson self-coupling and are also sensitive to the SM prediction of the Higgs boson quartic coupling to vector bosons [15][16][17]. While produced through non-resonant processes in the SM, ℎℎ production can also occur in resonant processes in scenarios beyond the Standard Model (BSM) through the decays of heavy resonances, such as the heavy Higgs boson predicted in two-Higgs-doublet models (2HDM) [18][19][20] or the spin-2 Kaluza-Klein gravitons in Randall-Sundrum models [21][22][23]. Searches for resonant ℎℎ production in ggF and VBF processes have also led to constraints in the parameter spaces of these models [24][25][26].
This paper reports a search for Higgs boson pairs produced in association with a vector boson, ℎℎ ( = , ), a process previously unexplored. The search targets both non-resonant ℎℎ production, which occurs in the SM, and BSM-inspired resonant ℎℎ production. It is performed on a dataset of collisions at a centre-of-mass energy of √ = 13 TeV collected between 2015 and 2018 with the ATLAS detector at the Large Hadron Collider (LHC), corresponding to an integrated luminosity of 139 ± 2.4 fb −1 [27]. The search considers vector bosons decaying into leptons ( → ℓ , → ℓℓ, ) and Higgs bosons decaying into a pair of -quarks (ℎ → ), leading to three distinct leptonic channels: ℎℎ → (denoted by 0L), ℎℎ → ℓ (denoted by 1L), and ℎℎ → ℓℓ (denoted by 2L). Here ℓ denotes either an electron ( ) or a muon ( ). 1 Non-resonant ℎℎ production in association with a boson arises in the SM from three distinct Higgs boson couplings: coupling to vector bosons, self-coupling, and quartic coupling to vector bosons. The leadingorder processes are depicted in Figures 1(a), 1(b), and 1(c), respectively. In the SM, the cross-sections of these processes at the LHC are small compared with those of the ggF and VBF processes, 0.50 ± 0.01 fb for ℎℎ ( + ℎℎ: 0.329 ± 0.007 fb, and − ℎℎ: 0.173 ± 0.005 fb) and 0.36 ± 0.01 fb for ℎℎ at √ = 13 TeV with ℎ = 125 GeV [9,10], computed at next-to-next-to-leading-order (NNLO) accuracy in QCD.
Two BSM scenarios are considered for resonant ℎℎ production as illustrated in Figures 2(a) and 2(b). The first scenario, labelled as , is the 'Higgstrahlung' production of a generic neutral CP-even scalar boson which couples directly to vector bosons and decays into ℎℎ, i.e.
→ ℎℎ. Examples of such a scalar resonance are the CP-even heavy Higgs boson predicted in the electroweak singlet model [28] or in the type-II 2HDM [29]. In this search, the scalar is assumed to be a narrow resonance; i.e. its natural width is much smaller than the expected experimental relative mass resolution of approximately 3%. This scenario was explored previously by ATLAS in a VBF ℎℎ search [15]. The search is complementary because it is sensitive to the and couplings separately, while the VBF search is sensitive only 1 Although → and → are not considered explicitly, final states with electrons or muons from decays are included. to their combination. The second scenario, labelled as → , is a specific process in the 2HDM which predicts three neutral Higgs bosons: two CP-even scalars ℎ and (with mass hierarchy > ℎ ), and one CP-odd scalar . In parts of the 2HDM parameter space where the light Higgs boson ℎ is similar to the SM Higgs boson and has a mass ℎ ∼ 125 GeV favourable for electroweak baryogenesis, the boson has a mass below about 800 GeV but is heavier than the boson [30]. If is in the range 2 ℎ < < 2 , the → ℎℎ decay branching ratio could be substantial, leading to a sizeable rate for → → → ℎℎ. Here is the mass of the top quark. For this search, natural widths up to 20% of its mass are considered for , and a narrow width is assumed for . Searches for → were performed previously by ATLAS and CMS in the → , , and decay channels [31-34].

ATLAS detector
The ATLAS detector [35] at the LHC covers nearly the entire solid angle around the collision point. 2 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range | | < 2.5. The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [36,37]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to | | = 2.0. The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, the electromagnetic calorimeter (ECAL) consists of barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. Three layers of precision chambers, each consisting of layers of monitored drift tubes, covers the region | | < 2.7, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [38]. The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [39] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.
2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Δ ≡ √︁ (Δ ) 2 + (Δ ) 2 .

Data and Monte Carlo samples
The data used in this analysis were collected using unprescaled single-lepton, missing transverse momentum ( miss T ), or single-photon triggers. The single-lepton trigger requirements are applied as a logical OR of single-electron or single-muon triggers [40,41], with transverse momentum ( T ) thresholds that started at 20 GeV or 24 GeV in 2015 for muons or electrons, respectively, and increased to 26 GeV in 2016-2018. The miss T trigger [42] threshold was raised from 70 GeV to 110 GeV between the 2015 and 2018 data-taking periods. The single-photon trigger [40] was only used in this analysis for background estimation and had a threshold of 140 GeV for the full data-taking period. Events are selected for analysis only if they are of good quality and if all the relevant detector components are known to have been in good operating condition [43], which corresponds to a total integrated luminosity of 139.0 fb −1 . The uncertainty in the combined 2015-2018 integrated luminosity is 1.7% [44], obtained using the LUCID-2 detector [27] for the primary luminosity measurements. The recorded events contain an average of 34 inelastic collisions per bunch-crossing.
Monte Carlo (MC) simulations are used to optimise the search sensitivities and to estimate background contributions. MC samples were produced with various event generators, interfaced to different programs for parton showers, hadronisation, and underlying-event simulations. No potential interference between signal and background processes is considered. Table 1 summarises the simulation of the signal and background processes relevant for the searches described in this paper. All MC samples are passed through the ATLAS detector simulation program [45] based on G 4 [46]. Simulated processes are normalised to the most accurate theoretical cross-section predictions currently available. The effects of multiple interactions in the same or nearby bunch crossings (pile-up) were modelled by overlaying minimum-bias events, simulated using the soft QCD processes of P

Non-resonant signal samples
As shown in Figure 1, three types of the Higgs boson couplings are responsible for non-resonant ℎℎ production in the SM. Although these couplings are predicted in the SM once the Higgs boson mass ℎ is known, their values should be determined experimentally. Deviations from their SM values are traditionally parameterised using the coupling modifiers [71] denoted, in this paper, by , , and 2 for the ℎ , ℎℎℎ, and ℎℎ vertices, respectively, assuming the same coupling modifiers for the and bosons. 3 While it is impractical to produce signal samples for arbitrary values of the coupling modifiers probed by this analysis, such samples can be constructed from six independent samples composed of different combinations of the leading-order (LO) processes shown in Figure 1. These component samples were produced with = = 2 = 1 for the following couplings and their combinations: (1) : diagram without the ℎℎℎ and ℎℎ vertices, Figure 1(a); (2) : diagram with the ℎℎℎ vertex, Figure 1   (6) + 2 : all diagrams with either an ℎℎℎ or an ℎℎ vertex, Figures 1(b) and 1(c).
Samples 1-3 determine the contributions from their respective diagrams, while samples 4-6 allow the determination of the contributions from interference between the three diagrams. Non-resonant signal samples for any coupling modifier deviations from the SM values can be built from the combinations of these component samples, weighted by the coupling modifiers. A SM ℎℎ sample with = = 2 = 1 was also produced to validate this procedure. Kinematic distributions of the reweighted sample are found to agree with those of the directly produced sample. Since the ℎ coupling has been constrained through the single Higgs boson measurements [4], = 1 is assumed in this paper.
In addition to the quark-initiated diagrams shown in Figure 1, ℎℎ can also be produced via the gluoninitiated → ℎℎ diagram. Although → ℎℎ is technically a higher-order process for ℎℎ production, its cross-section is predicted to be approximately 24% of that of the → ℎℎ process in the SM [10]. A correction is applied to the non-resonant → ℎℎ cross-section, as a function of and 2 , to account for the → ℎℎ contribution. The generator-level difference, in both normalisation and shape, between the → ℎℎ sample and the sum of the → ℎℎ and → ℎℎ samples is taken as an uncertainty. For a signal as expected in the SM, the normalisation component of this uncertainty has a magnitude of 24%. It tends to be larger than 24% for large 2 and smaller than 24% for large .

Resonant signal samples
The production of a generic CP-even scalar resonance in association with a boson, where the decays into a pair of the ℎ Higgs bosons, is modelled using the → → ℎℎ process in the 2HDM as shown in Figure 2(a). The heavy Higgs boson is assumed to have a narrow width, i.e. its total decay width is far smaller than the experimental ℎℎ resolution of approximately 3%, but to decay promptly nevertheless. Ten signal samples were produced for each leptonic channel, corresponding to values of 260, 280, 300, 400, 500, 600, 700, 800, 900, and 1000 GeV. The lower bound is dictated by 2 ℎ , while the upper bound is limited by the ability to reconstruct two separate jets from the highly boosted ℎ → decay.
The → → → ℎℎ signal samples were produced for mass combinations of = 360, 400, 500, 600, 700, and 800 GeV and = 260, 300, and 400 GeV subject to the kinematic bound of > + , leading to a total of 15 ( , ) grid points for each of the 0L and 2L channels. These mass combinations cover the 2HDM parameter space relevant for this search, but unexplored by previous → → searches [31][32][33][34]. The CP-odd boson could have a substantial total decay width, depending on the 2HDM parameter values. Two sets of samples were produced, one for a narrow-width (NW) boson and the other for a large-width (LW) boson. In both scenarios, the total decay width of the heavy CP-even boson is assumed to be narrow. For the LW samples, the boson width is assumed to be 20% of its mass. To perform searches for bosons with different widths, MC distributions for an boson with width equal to 5%, 10%, or 15% of its mass are derived through reweighting. The generator-level boson mass distribution for each of these intermediate widths is compared with that of the LW boson to derive reweighting factors which are then applied to the distributions of the simulated LW MC samples to obtain the corresponding distributions for the intermediate widths.

Object reconstruction and identification
As discussed in Section 1, the signal events are characterised by the products of the targeted decays of the vector and Higgs bosons: electrons or muons, jets ( -jets in particular), and missing transverse momentum. Additionally, events with identified hadronically decaying -leptons are vetoed to reduce backgrounds. Events with photons are used for background estimation. The reconstruction and identification of these physics objects are described in this section.
Electrons are reconstructed by matching topological energy clusters in the ECAL with tracks in the ID [72] and are required to have T > 7 GeV within the pseudorapidity range | | < 2.47. They are identified using likelihood-based identification criteria that combined the requirements of calorimeter shower shape, track-to-cluster matching, and associated track qualities. Electron candidates are required to satisfy the tight criterion in the 1L channel and the loose criterion for the rest. All candidates must satisfy a loose track-and calorimeter-based isolation criterion to minimise the number of jets misidentified as electrons.
Similarly to electrons, photon reconstruction starts with topological energy clusters in the ECAL [72], but the clusters are required to have either no matching tracks (unconverted photons) or one or two matching tracks consistent with a conversion vertex (converted photons). Photon candidates are required to have T > 10 GeV within the pseudorapidity range | | < 1.37 or 1.52 < | | < 2.37 and satisfy the tight identification criterion as well as a calorimeter-based isolation requirement called 'TightCaloOnly' [72] to suppress backgrounds.
Muons are reconstructed by matching tracks in the ID to either full tracks or track segments in the MS or, for | | < 0.1 only, to energy deposits in the calorimeter [73]. 4 They must have T > 7 GeV and be within | | < 2.5, the combined acceptance of the ID and MS. Muons must satisfy the medium quality criterion in the 1L channel and the loose criterion in others. The same isolation requirement used for the electron selection is applied to all muon candidates to reduce the rate of muons from heavy-flavour decays.
Electrons (muons) are required to have associated tracks satisfying | 0 / 0 | < 5 (3) and | 0 sin | < 0.5 mm, where 0 is the transverse impact parameter relative to the beam line, 0 is its uncertainty, and 0 is the distance between the longitudinal position of the track along the beam line at the point where 0 is measured and the longitudinal position of the primary collision vertex.
Jets are reconstructed from particle-flow objects using the anti-algorithm [74,75] with a radius parameter of = 0.4 and are calibrated as described in Ref. [76]. They are required to have T > 20 GeV and | | < 4.5. Jets with T < 60 GeV and | | < 2.4 originating from pile-up are suppressed with the jet-vertex-tagger [77], a likelihood discriminant based on matching enough of the jet's tracks to the primary vertex.
Jets containing -hadrons ( -jets) are identified with the DL1r algorithm [78]. The algorithm is based on information such as properties of displaced tracks and reconstructed secondary vertices in the jet. A jet is -tagged if the DL1r value is above a preset threshold. Four thresholds, referred to as working points (WP), are defined with average efficiencies of 60%, 70%, 77%, and 85% for tagging -jets from simulatedē vents. A pseudo-continuous -tagging score is defined for each jet as the number of WPs it satisfies, with zero being failing, and four passing, all WPs. In some cases, instead of directly applying -tagging in MC simulation, events with -and light-flavor jets are weighted by the probability that these jets pass the -tagging requirement [79] and a -tagging score is chosen for each jet based on its -tagging probability. This procedure, referred to as 'truth'-tagging, increases the effective MC sample size for subdominant backgrounds. A momentum correction is applied to -tagged jets to account for the energy lost to soft radiation and to muons and neutrinos in semileptonic -hadron decays, following the procedure used in Ref. [79]. Furthermore, correction factors are applied to the simulated samples to compensate for the differences between the -tagging efficiencies in data and simulation [78].
Hadronically decaying -leptons, ℎ , are identified from the reconstructed jets [80][81][82]. A Recurrent Neural Network multivariate discriminant is used to select jets with energy-deposit profiles consistent with those expected from ℎ decay products and to match tracks in the ID to the ℎ candidates, using the 'medium' ℎ criteria. The ℎ candidates must have one or three associated tracks and satisfy the requirements of T > 20 GeV and | | < 2.5, excluding the region 1.37 < | | < 1.52.
The missing transverse momentum miss T , with magnitude miss T , is calculated as the negative vectorial sum of the transverse momenta of reconstructed physics objects, namely electrons, muons, photons, hadronically decaying -leptons, and jets [83,84]. A component, called the 'soft-term', from energy deposits due to the underlying event and other soft radiation not included in the physics objects is added in the miss T calculation. A miss T significance, S( miss T ), is defined in order to test whether the measured miss T is incompatible with zero real miss T . It is calculated from the measured miss T , its resolution, and the correlation between measurements parallel and perpendicular to the miss T direction [85].
A sequential overlap-removal procedure is applied to ensure that energy deposits in the calorimeter and tracks in the ID are not included in two or more different reconstructed objects. If two electrons share a track, the electron with lower T is removed. If an electron and muon share an inner-detector track, the muon is removed if it is calorimeter-tagged, and the electron is removed otherwise. The closest jet within Δ = 0.2 of a selected electron is removed. If the nearest jet surviving that selection is within Δ = 0.4 of the electron, the electron is discarded. Muons are usually removed if they are separated from the nearest jet by Δ < 0.4 + 10 GeV/ T , since this reduces the background from heavy-flavour decays inside jets. However, if this jet has fewer than three associated tracks, the muon is kept and the jet is removed instead; this avoids an inefficiency for high-energy muons undergoing significant energy loss in the calorimeter. A ℎ candidate is rejected if it is separated by Δ < 0.2 from any selected electron, muon, or jet.

Analysis
The non-resonant and resonant signal models targeted in this search share the same ℎℎ final state but have different event kinematics. The search starts with the selection of the ℎℎ final states which are characterised by a leptonically decaying boson and four -jets from the decays of the two Higgs bosons. Three leptonic channels (0L, 1L, 2L), one for each leptonic decay mode of the and bosons ( → , → ℓ , → ℓℓ), define the signal regions (SRs) of the search. Multivariate techniques based on boosted decision trees (BDT), trained to distinguish between signal and background events in each SR and for each signal model, provide the final discriminants to extract potential signal contributions. This is achieved through simultaneous fits to the BDT distributions observed in data with the hypotheses of signal plus background contributions. For the resonant models, mass requirements are applied to the resonance candidates before the fits to the BDT distributions. Background contributions are estimated using both data control regions (CRs) and MC simulations. The SR event selections, background estimations, the designs and trainings of the BDTs, and the mass requirements for resonant models are described in this section.

Signal region event selection
The 0L channel is intended for the ℎℎ → final state. Candidate events are required to have miss T > 150 GeV and S( miss T ) > 12. The high miss T criterion is necessitated by the trigger requirement. Events with identified loose leptons or ℎ are vetoed. To suppress multi-jet backgrounds with mismeasured miss T , the minimum azimuthal opening angle between miss T and the Higgs boson candidates (see below) must satisfy min Δ ( miss T , ) > 1. The ratio of the miss T trigger efficiencies in data and simulation, measured in single-muon events as a function of miss T + T , is applied as an event-weight scale factor to simulated events [42], with corresponding uncertainties described in Section 6.1.
The 1L channel is designed for the ℎℎ → ℓ final state. In this channel, candidate events are selected by requiring exactly one tight electron with T > 27 GeV or one medium muon with T > 25 GeV. In addition, the events must have miss T > 30 GeV. Candidates with additional loose leptons or identified ℎ are removed. The 1L channel is split into two SRs based on the charge of the lepton, 1L+ and 1L−, motivated by the expected large difference between the signal + ℎℎ and − ℎℎ cross-sections 5 in contrast to the mostly charge-symmetric backgrounds.
The 2L channel targets the ℎℎ → ℓℓ final state. The → ℓℓ candidates are selected by requiring exactly two oppositely charged loose leptons with the same flavour, + − or + − , at least one of which has T > 27 GeV. The invariant mass of the lepton pair must satisfy 81 < ℓℓ < 101 GeV for compatibility with the boson mass.
Candidate ℎℎ → decays are selected by requiring at least four jets passing the 85% -tagging WP. Within the four-jet combination with the highest pseudo-continuous -tagging scores (considering all combinations of four jets when more than one combination satisfies this requirement), the jets are paired to form the two ℎ → candidates, ℎ 1 and ℎ 2 , by minimising the value of | ℎ 1 − 120 GeV| + | ℎ 2 − 120 GeV|. Here ℎ 1 and ℎ 2 are the invariant masses of the two candidates and 120 GeV is their most probable value in simulation. The pair with the higher T is labelled as ℎ 1 and the other as ℎ 2 . The efficiency for correctly identifying and pairing the four -jets into two ℎ → decays amongst the selected -jets depends on the signal model and, if applicable, the resonance mass values. For the non-resonant SM ℎℎ signal, the efficiency is 73%. For the resonant signal, the efficiency varies from 63% at = 260 GeV to 85% at = 1000 GeV. The efficiencies for the → signals are similar to those for the signals. Table 2 summarises the selections that define the SRs for the three leptonic channels. Also included in the table are selections for control regions discussed below. The products of the acceptances and efficiencies of the SR selections, A × , are shown in Figure 3 as functions of the model parameters for a few selected signal models.
≥ 4 jets with T > 20 GeV and passing the 85% -tagging WP 5 The cross-section ratio of + ℎℎ and − ℎℎ is 1.9 in the SM [10]. For the resonant signal models, the ratio varies between 1.8 and 2.3, depending on the resonance masses.

Background estimations
Major background sources in the ℎℎ search are the production of top-quark pairs (¯), single top quarks, vector bosons in association with jets ( + jets), diboson and multi-boson events, and multi-jet processes.
Their relative contributions depend on the channel. In the 0L and 1L channels, the¯background dominates, with the subleading contribution being from + jets ( + jets for 0L and + jets for 1L). In the 2L channel, + jets and¯are the two leading sources. A mixture of data-driven and simulation-based methods are used to estimate these backgrounds. MC simulations are used to model the kinematics of background processes except for the multi-jet backgrounds. Contributions from¯and + jets processes are normalised to the data through the use of CRs, while the rest of the non-multi-jet sources are normalised to their theoretical cross-sections within the estimated uncertainties. For the multi-jet backgrounds, their normalisations and kinematic models are derived from auxiliary data samples. These data-driven procedures are described below.
A¯control region (CR¯) is used to constrain the normalisations of the leptonic¯events 6 in the SRs. The control region is defined by using a selection similar to that for the 2L SR, but requiring a different-flavour opposite-sign lepton pair ( ± ∓ ) instead of a same-flavour opposite-sign lepton pair ( + − or + − ) and removing the dilepton mass requirement to increase the sample size. The jet requirements are the same as for the SR. These selections are summarised in Table 2.
Top-quark-pair events have two -jets from top-quark decays and can mimic signal events if there are two additional genuine -jets from radiation or if there are misidentified -jets or light-flavour jets ( ) from radiation or hadronic boson decays. The flavour content of this radiation is difficult to model, particularly for heavy flavours ( or ). To allow for flavour-dependent variations of the¯background contribution in the signal extraction, simulated¯events are labelled according to the 'truth' flavours of jets from the radiation: events with one or more -hadrons (¯+ ≥ 1 ), events with one or more -hadrons but no -hadrons (¯+ ≥ 1 ), and events with no -hadrons or -hadrons (¯+ ). The 'truth' flavour of a jet is determined from hadrons with T > 5 GeV found within a cone of size Δ = 0.3 around the jet axis [78].
The + jets process is a major background source in all three channels ( + jets for 0L and 2L and + jets for 1L). A + jets CR (CR +jets ), defined using + jets events, is used to assess their contributions. As summarised in Table 2, events in CR +jets are selected by requiring exactly one photon with T > 150 GeV and no identified , , or ℎ from the data collected using a single-photon trigger. They must pass the same jet requirements as events in the SRs. At T , + jets events and + jets events are expected to have similar kinematics because they originate from comparable diagrams and the impact of the finite mass of the bosons becomes small at high T . Residual differences between + jets and + jets, driven mainly by lower-T events, are taken into account by extrapolation uncertainties (see Section 6).
The + jets events can be selected as signal events if they have four jets passing the -tagging requirement, either from genuine heavy-flavour jets or from misidentified light-flavour jets. Like the¯events, modelling the jet flavour composition is challenging. The simulated + jets events are categorised according to 'truth' flavour and jet matching: events with three or more -hadrons ( + ≥ 3 ), events with ≥ 1 -hadron but with ≤ 2 -hadrons ( + ≥ 1 ), and events with zero -hadrons and ≤ 2 -hadrons ( + ).
The contribution from multi-jet events, including hadronic¯events, 7 is negligible in the 0L and 2L SRs as well as in CR¯, minor in the 1L SRs, but significant in CR +jets . Both its rate and kinematics are difficult to simulate. Auxiliary data samples rich in multi-jet events are selected to model the corresponding contributions in the 1L SRs and CR +jets . For CR +jets , the auxiliary sample is defined by inverting the photon isolation requirement while keeping the rest of the selection the same. The kinematic distributions of multi-jet events in CR +jets are modelled by the selected events in this auxiliary sample after subtracting 6 Referring to¯events with at least one boson decaying leptonically. 7 Referring to¯events with both bosons decaying hadronically. approximately 10% non-multi-jet contributions taken from simulation, and an uncertainty in the shapes of these distributions is defined by varying the non-multi-jet contributions in the auxiliary sample by ±100%. These distributions are assigned a pre-fit normalisation equal to the difference between the number of data events and the non-multi-jet contributions in CR +jets , with an uncertainty of 100%. The multi-jet modelling in the 1L SRs follows the same approach, except that the auxiliary sample is selected by inverting the lepton isolation requirement and the multi-jet contribution in the 1L SRs is normalised through a fit to the distribution of the transverse mass of the lepton and miss T system, 8 in events selected with the 1L SR requirements but with the miss T criteria reversed to miss T < 30 GeV, which leads to negligible contamination of potential signals produced with cross-sections comparable to the search sensitivity.
Jet -tagging scores offer the most sensitive information for separating the different flavour contributions of the¯and + jets backgrounds. Thus the distributions of the sum of the pseudo-continuous -tagging scores of the four jets with the highest scores, pc -tag , which ranges from 4 to 16, are used to disentangle contributions from the three flavour components. The pc -tag distributions observed in CR¯and CR +jets , divided into three bins (4-9, 10-12, and 13-16), are included in the fits to determine the potential signal contributions as discussed in Section 7. In the fits, the flavour components of the¯and + jets contributions are normalised with their separate normalisation factors. The pc -tag template, i.e. the shape of the pc -tag distribution, of each component is taken from simulation except for the multi-jet contribution in CR +jets discussed above. Figure 4 compares the pc -tag distributions of the CR¯and CR +jets observed in the data with their post-fit background expectations. The¯and + jets contributions comprise 92.4% and 79.7% of the total event yields in their respective CRs.
Validation regions (VRs) are defined for each of the SRs and CRs (CR¯and CR +jets ) to study the background modelling uncertainties (see Section 6). Events in VRs are selected in the same way as those for SRs and CRs but with different jet requirements. Instead of four -tagged jets, events are required to have at least four jets, exactly three of which pass the 85% -tagging WP. The highest-T non--tagged jet in | | < 2.5 is taken as the fourth jet to build ℎℎ candidates and is assigned a pseudo-continuous -tagging score of 4, which is found to best reproduce the multivariate discriminants in the signal regions.

Multivariate discriminant
BDTs are used to exploit the kinematic differences between signal and background events passing the SR event selection. One BDT is constructed for each channel (0L, 1L, 2L) and each signal model ( ℎℎ, , → ), resulting in eight BDTs in total (three each for 0L and 2L, two for 1L). To minimise the complexity of the analysis, these BDTs are built in the same way where possible. They differ in having channel-dependent and signal-model-dependent variables. BDT input variables are chosen through extensive comparisons between their distributions in signal and background events while minimising correlations among the variables. Moreover, for the resonant and → searches, only variables with weak correlations with the resonance mass are considered so as to lessen the dependence of the BDTs on any particular hypothesised signal mass value. Requirements on the reconstructed resonance masses are applied separately (Section 5.4). Table 3 summarises the input variables used for the eight BDTs.
A common set of seven input variables are used for all eight BDTs. These variables are 8 The transverse mass of a system is defined as √︃ 2 + 2 T , where and T are the mass and transverse momentum of the system, respectively.
Other Uncertainty  • the sum ( ℎ 1 + ℎ 2 ) and the difference ( ℎ 1 − ℎ 2 ) of the masses of the Higgs boson candidates ℎ 1 and ℎ 2 , • the sum of the -tagging scores pc -tag , • the number of jets ( jets ), • the sum of the transverse energy of jets excluding the four selected -jets ( ex T ), • FSR ℎ 1 and FSR ℎ 2 , the masses of the two Higgs boson candidates calculated by including additional jets in a cone of size Δ = 0.8 around each -jet's axis. This calculation is intended to recover jet energy lost due to final-state radiation (FSR).
Input variables specific to channels or signal models are the invariant mass ℎℎ and transverse momentum ℎℎ T of the reconstructed Higgs boson pair, miss T , the transverse momentum of the vector boson ( T ), the transverse mass of the lepton-miss T system ( T ), and four variables exploiting differences between the angular distributions of signal and background events. Because of its strong correlation with the resonance mass, ℎℎ is not used in the BDTs for the resonant searches. Instead, requirements on ℎℎ are applied afterwards (see Section 5.4). The variable ℎℎ T is used in all BDTs except those for the → search, due to its correlation with . For the same reason, miss T ( T ) is not used in the 0L (2L) channel of the → search. The variables cosh(Δ ) 1 − cos(Δ ) 1 and cosh(Δ ) 2 − cos(Δ ) 2 9 of the ℎ 1 and ℎ 2 candidates exploit the angular differences between the signal ℎ → decay and the background gluon-splitting * → process and are motivated by 2 ℎ1 ∝ cosh(Δ ) 1 − cos(Δ ) 1 and 2 ℎ2 ∝ cosh(Δ ) 2 − cos(Δ ) 2 in the approximation that jets are massless. Similarly, the Higgs bosons and vector bosons are expected to be produced more centrally for signal events and more forwardly for background events. The two rapidity difference variables, between ℎ 1 and ℎ 2 (| ℎ 1 − ℎ 2 |) and between and ℎℎ (| − ℎℎ |), are designed to take advantage of these subtle differences. Figure 5 compares the ℎℎ , ℎ 1 − ℎ 2 , and ex T distributions in data with those expected from background processes for events passing the SR selections in the three leptonic channels. The background distributions are obtained from the fit to the background-only hypothesis discussed in Section 7. Also shown are the expected distributions from example signal models. Channel and signal model 0L The BDTs are trained in the TMVA [86] framework. All background processes modelled using simulation are included in the training. In order to make use of the complete set of simulated MC events for the BDT training and evaluation in an unbiased way, the MC events are split into two samples of equal size. A BDT is trained on each of the two samples and applied to the other, such that the same events are never used for both the BDT training and evaluation. Similarly, half of the events in data are evaluated with one of the two BDTs, and half are evaluated with the other. As a result, while significant overtraining is not observed, the effect of any potential overtraining is minimised by following this method. For each BDT, event weights are included in the training so that the relative importance of each background process is taken into account.       1b 1b  For the search for non-resonant ℎℎ production, the SM, -only, and 2 -only signal samples are added together, with equal weights, to form a combined signal sample for the BDT training. The inclusion of the and 2 samples in the training improves the sensitivity to their respective couplings. Three different BDTs are trained, one for each leptonic channel. Little degradation in sensitivity is observed with this strategy relative to training BDTs for each signal sample separately.
For the search for resonant ℎℎ signals, MC samples produced with different values of the resonance mass are combined with equal weights to form the signal sample for the BDT training in each channel. It results in a BDT that is not strongly correlated with the resonance mass and therefore has good sensitivity to signals with different mass values. This strategy is applied to both the and → searches; the combination for the latter includes samples with different and values as well as bosons with narrow and large widths.

Mass requirements for resonance searches
For the resonant and → searches, the Higgs boson pair ℎℎ is produced from the decay of a new heavy scalar . Consequently, the signal ℎℎ distributions are expected to peak around with a width equal to the natural width of the new scalar convolved with the detector resolution. Therefore, ℎℎ is a powerful discriminant against the expected continuum SM backgrounds.
Since the lighter Higgs boson ℎ is a narrow resonance with known mass, the ℎℎ resolution can be improved by constraining the measured masses of the two Higgs boson candidates, ℎ 1 and ℎ 2 , to their expected value of 125 GeV. This is achieved by scaling the momenta of the -jets from each Higgs boson candidate by the ratio of 125 GeV to the measured di--jet mass. Figure 6(a) compares the ℎℎ distributions before and after the rescaling for a few selected values in the 2L channel of the search. The rescaling improves the relative ℎℎ resolution from 12.1% (6.1%) to 3.5% (2.6%) at = 300 (1000) GeV. Similar improvements are obtained in the other leptonic channels of both the and → searches. Moreover, in the 2L channel of the → search the rescaling leads to improvement in the mass resolution of ℎℎ from the decay of a narrow resonance, as shown in Figure 6(b). At = 300 GeV, the relative ℎℎ mass resolution improves from 9.4% (5.4%) to 2.8% (2.5%) at = 400 (800) GeV. For LW bosons, the rescaling has negligible impact on the width of the ℎℎ mass distribution. In the following analysis, the rescaled -jet momenta are used solely to calculate the masses of resonance candidates.
As discussed in Section 5.3, the ℎℎ variable is not used to construct BDTs for the and → searches. Instead, ℎℎ is required to be in a window around the target value afterwards. The window sizes are optimised to improve the search sensitivities. They vary from 30 GeV at = 260 GeV to 220 GeV at = 1000 GeV, corresponding to approximately 3-8 times of the expected ℎℎ resolution after the rescaling. At high resonance masses the backgrounds are small, so the mass windows are widened (relative to the resolution) to increase signal efficiencies. The same -dependent ℎℎ windows are used for all channels and for both the and → searches.
The → model has an additional resonance, the boson. In the 2L channel, the → → ℓℓℎℎ decay can be fully reconstructed. For bosons with a width significantly smaller than the detector mass resolution, i.e. the NW case, the width of the ℎℎ mass distribution is dominated by the ℎℎ resolution as a result of the good lepton momentum resolution and the relatively narrow width of the boson. Therefore, the invariant mass of the ℎℎ candidate, ℎℎ , is required to fall in a window of the same size as the ℎℎ window for a given , but shifted from the value to the targeted value. For bosons with a width equal to 20% of the boson mass, i.e. the LW case, such requirements reduce the signal efficiencies substantially as the ℎℎ distributions are broadened by the boson width. In this case, ℎℎ > 475 GeV is required only for ≥ 500 GeV to reduce backgrounds which are present mostly at low ℎℎ values. In the 0L channel, the → → ℎℎ decay cannot be reconstructed fully, due to the escaping neutrinos. Instead, the transverse mass of the ℎℎ and miss

Systematic uncertainties
Systematic uncertainties are divided into four categories: experimental uncertainties, theoretical uncertainties of the overall background normalisations, theoretical uncertainties of acceptances and BDT shapes, and data-driven background modelling uncertainties.    scale of jets containing -quarks. The impact of the JES uncertainties is estimated by scaling the jet energies within their uncertainties. JER uncertainties are also determined from in situ measurements of -boson-jet , photon-jet, and dĳet T balance [87]. The effect of the JER uncertainties is calculated by increasing the resolution within its uncertainties, smearing the jet energy by the resulting change in resolution, and comparing the result with the nominal shape and normalisation in simulation.

Experimental systematic uncertainties
Subdominant experimental uncertainties originate from the -tagging correction factors. The -tagging correction factors, determined from the difference between the efficiencies measured in data and simulation, are evaluated in five DL1r discriminant bins and are derived separately for -jets, -jets, and light-flavour jets [78, 88, 89]. All of the correction factors for the three jet flavours have uncertainties estimated from multiple measurements, which are decomposed into uncorrelated components that are then treated independently. Extra uncertainties are applied when 'truth'-tagging is used (in the highest BDT-score bins for subleading backgrounds), and these are decorrelated between channels and defined as an overall 30% uncertainty in the yield for these backgrounds. The 'truth'-tagging uncertainties are designed to cover any differences in BDT shapes between background samples when 'truth'-tagging is or is not applied.
Uncertainties in the reconstruction, identification, isolation, and trigger efficiencies of electrons [90] and muons [91] are considered, along with the uncertainty in their energy scale and resolution. These are found to have only a small impact on the results. The uncertainties in the energy scale and resolution of the jets and leptons are propagated to the calculation of miss T , which also has additional uncertainties from the modelling of the underlying event and the momentum scale, momentum resolution, and reconstruction efficiency of the tracks used to compute the soft-term (see Section 4) [83,84]. An uncertainty is assigned to the miss T trigger correction factors in the 0L channel, defined as the entire difference between the trigger efficiencies for data and simulated events. The uncertainty in the combined 2015-2018 integrated luminosity is 1.7%, as described in Section 3. The average number of interactions per bunch crossing in the simulation is rescaled by 1.03 to improve agreement between the data and the simulation, based on a measurement of the visible cross-section in minimum-bias events [92], and an uncertainty, as large as the correction, is included.

Background normalisation systematic uncertainties
For all background processes, uncertainties are included in the overall normalisations following a similar approach taken for the ATLAS¯ℎ(→ ) measurement [93]. These uncertainties are correlated across the three leptonic channels. The¯+ ≥ 1 and + ≥ 3 flavour components are free to float in the fit. The¯+ ≥ 1 and + ≥ 1 flavour components are assigned a large uncertainty, 100%, intended to be conservative in case there is significant mismodelling of these processes. These uncertainties are always constrained by the fits. The light-flavour¯+ and + components are assigned a smaller uncertainty, 10%, as these are better-measured processes. In practice, these overall¯+ and + uncertainties have little impact on the analysis, since the yields from these backgrounds, particularly at high BDT score, are small. These uncertainties are all constrained by the pc -tag distributions in CR¯and CR +jets , as well as by the low BDT-score regions of the SRs.
The above + jets uncertainties are correlated between + jets, + jets, and + jets backgrounds. Extra uncertainties are defined for extrapolations between these backgrounds to account for their kinematic differences. Each flavour component of + jets is assigned an extra 20% uncertainty, and each component of + jets is assigned an extra 30% uncertainty. The sizes of the uncertainties are defined so as to cover flavour-composition differences between the + jets, + jets, and + jets backgrounds.
The remaining normalisation uncertainties are subdominant in the analysis. A 100% uncertainty is applied to the multi-jet backgrounds in both the 1L SRs and CR +jets . The 1L SR multi-jet uncertainty has a small impact due to the size of the background, and the CR +jets multi-jet background is constrained by the fits to have an uncertainty smaller than 100%. Uncertainties of 25% are applied to each of the remaining small SM backgrounds. Given the small relative yields from these backgrounds, their uncertainties have little impact on the analysis and are thus chosen to be conservative.

Data-driven background modelling uncertainties
A set of extra data-driven background uncertainties is included. The relative differences of the BDT distributions between data and simulation in the validation region define a set of VR non-closure uncertainties, which are determined prior to any fit. In each of the 0L, 1L (1L+ and 1L−), and 2L regions, two uncertainties are defined: one for normalisation and one for shape (based on the bin-by-bin differences), each of which is correlated between all backgrounds in the channel. In practice, each signal region tends to be dominated by a single background (either¯or + jets), such that correlating these uncertainties between backgrounds makes little difference compared to decorrelating the uncertainties between backgrounds. In CR¯, only the normalisation component of the non-closure is considered, since the individual normalisation factors for each¯flavour component can cover any shape difference between data and simulation. In CR +jets , no non-closure uncertainty is included, as the normalisation component is redundant because of the multi-jet normalisation uncertainty, which by design covers any non-closure between data and simulation. In total, this defines nine components of the validation region non-closure uncertainty, each of which is taken as 100% of the relative difference in normalisation or shape between data and simulation in the validation regions. These uncertainties are designed to be conservative, and in practice they tend to be constrained by the fits.
For the multi-jet backgrounds in CR +jets and the 1L SRs, shape uncertainties are included in addition to the normalisation uncertainties defined in Section 6.2. The shape uncertainty included in the CR +jets multi-jet template is obtained by increasing or decreasing the + jets contamination by 100% in this region. Two shape uncertainties are included in the multi-jet template in the 1L SRs: one by increasing or decreasing the prompt-lepton contamination by 30% when evaluating the template, to account for potential mismodelling of the lepton selection inefficiency, and the other by raising or lowering the T threshold used to determine this template, to account for the potential impact of the selection on the extracted BDT shapes.

Theoretical uncertainties in BDT shapes and acceptances
In addition to the normalisation uncertainties defined in Section 6.2, a number of theoretical uncertainties in background and signal modelling are included. These uncertainties tend to be subdominant relative to the background normalisation and data-driven uncertainties, which are in turn subdominant relative to the statistical uncertainty of the data. Each uncertainty is correlated across all analysis regions, and the effects of each uncertainty on the acceptance and BDT shape are correlated.  [95,96], and an extra matching uncertainty is defined by comparison with an alternative sample produced with M G 5_ MC@NLO. For each¯component, initial-state radiation (ISR) and final-state radiation (FSR) variations are included, following the procedure described in Ref. [97]. For the component of the single-top-quark background, a comparison between the nominal sample, which uses the diagram removal scheme [98], and an alternative sample, which uses the diagram subtraction scheme [99], defines an extra uncertainty. For + jets, variations of the CKKW parameter for merging/matching the matrix element with the parton shower are included, as are variations of the resummation scale.
Modelling of the¯background is improved by correcting the top-quark T distribution to that predicted by calculations of top-quark-pair differential distributions at NNLO QCD and NLO EW accuracy [100]. Previous studies have seen improved agreement between data and prediction in¯events, particularly for the top-quark T distribution, when comparing the data with the NNLO calculations [101]. For eachc omponent, the change in the BDT shape from the NNLO correction is taken as an uncertainty.
For the signal samples, acceptance uncertainties evaluated at the generator level are included to account for scale variations, PDF variations, and variations of the ISR, FSR, multi-parton interaction (MPI), and colour reconnection parameters [54]. For the non-resonant ℎℎ signal only, an extra uncertainty is included to cover potential mismodelling of → ℎℎ. An uncertainty is included in the ℎℎ normalisation and BDT shape, covering the difference between the → ℎℎ process alone and the sum of the → ℎℎ and → ℎℎ processes (the effects on the normalisation and shape are correlated). No other uncertainties are included in the overall cross-section for any signal sample.

Impact of systematic uncertainties
The effects of the statistical and systematic uncertainties on the search sensitivities are studied for hypothesised signals following the procedure discussed in Section 7. Table 4 lists the leading sources of uncertainty and shows, for a few selected signal models, the expected relative uncertainties in the fitted signal-strength parameter , a factor multiplying the predicted cross-section for the hypothesised signal. In the resonant case, the mass values are chosen to be those with the largest excesses observed in the data as discussed in Section 7. The cross-sections used for signal normalisation correspond approximately to their expected upper limits. In each case, the leading source of uncertainty is the statistical uncertainty of the data.

Results and interpretations
Potential signal contributions in the data are determined through maximum-likelihood fits to the BDT distributions in the SRs and the pc -tag distributions in CR¯and CR +jets . The procedure is based on the framework described in Refs. [102][103][104]. A profile-likelihood-ratio test statistic is used to test the signal-plus-background hypothesis, with the signal production rate as the parameter-of-interest. All SRs (0L, 1L, and 2L) are included in the fits for the non-resonant ℎℎ search and the resonant search, while only the 0L and 2L SRs are included for the resonant → search. The BDT distributions are divided into four bins, determined through optimisations of signal sensitivities while maintaining a reasonable number of background MC events in each bin. The binning boundaries are optimised for each channel separately, but are kept the same for different signal models for simplicity.
The¯and + jets backgrounds in the SRs from MC simulation are decomposed into three jet-flavour categories in the same way as those in the¯and + jets CRs discussed in Section 5.2. These flavourdependent contributions share the same normalisation factors (NFs) as their corresponding components in the CRs. In the fits, the NFs are unconstrained for the¯+ ≥ 1 and + ≥ 3 components and are constrained to unity within their estimated uncertainties for other components. Systematic uncertainties, described in Section 6, are incorporated as additional multiplicative terms, parameterised with nuisance parameters, in the likelihood calculations, where each nuisance parameter is given a prior distribution based on individual studies.
To test the overall compatibility of the data with the background expectations, fits are first performed for the background-only hypothesis. Since BDTs are model dependent, a fit to the BDT and pc -tag distributions for the non-resonant search is chosen to illustrate the background modelling. Figure 5 compares the data with the background expectations from the fit for a few selected kinematic variables used in training the BDTs. Overall, the post-fit backgrounds are found to reproduce the data well. The fitted NF values of the heavy-flavour components of the¯and + jets backgrounds are listed in Table 5, along with those from fits to the different signal-plus-background hypotheses discussed below.
The fits are repeated for the signal-plus-background hypotheses for each signal model and resonance mass assumption. The NF values shown in Table 5 from different fits are stable and consistent. Upper limits on signal production cross-sections are calculated with the CL s method [105], using the˜test statistic in the asymptotic approximation [106].
Global significances are calculated following the procedure given in Ref. [107]. Pseudo-experiments are generated for the background-only hypothesis. For each pseudo-experiment, a scan over all considered mass points is performed to determine the largest local significance for that pseudo-experiment. The fraction of pseudo-experiments with a local significance greater than the maximum local significance found in the scan over the real data defines the global -value, which in turn defines the global significance.

Search for non-resonant production
The BDTs for non-resonant production are used to search for ℎℎ signals from three production scenarios: 'SM', , and 2 . The 'SM' scenario assumes SM kinematics but with its cross-section scaled by a signal-strength parameter . The scenario tests for an anomalous tri-linear ℎℎℎ coupling, assuming SM couplings for the rest. Similarly, the 2 scenario tests for an anomalous quartic ℎℎ coupling. For the and 2 scenarios, both the event kinematics and production cross-section depend on their respective coupling modifier.
Constraints on non-resonant ℎℎ production are obtained through fits to the signal-plus-background hypotheses described above, assuming the SM value for the ℎ → decay branching ratio [10]. The three non-resonant scenarios have the same data BDT distribution, but differ in the signal BDT distributions. The background BDT distributions are largely the same, barring small variations in the post-fit background contributions. For 'SM' ℎℎ production, a 95% confidence-level (CL) upper limit of 183 on is observed compared with 87 +41 −24 expected. The corresponding post-fit BDT distribution is shown in Figure 8(a). For the and 2 scenarios, ℎℎ cross-section upper limits are derived from the fits for different values of the coupling modifiers. These limits lead to the observed (expected) 95% CL intervals of −34.4 < < 33.3 (−24.1 < < 22.9) and −8.6 < 2 < 10.0 (−5.7 < 2 < 7.1) for the two coupling modifiers. The observed bounds are weaker than the expectations largely because of small excesses of data in the highest BDT bins. These are the first limits derived from the ℎℎ process, and are considerably weaker than those obtained from the ℎℎ searches focused on the ggF and VBF processes [11][12][13][14][15][16]. In addition, this analysis can search for deviations of the ℎℎ and ℎℎ couplings from their SM values, parameterised by the respective coupling modifiers 2 and 2 (in the SM, 2 = 1, and 2 = 1). Cross-section upper limits are derived from the fits for different values of these coupling modifiers, leading to limits for the observed (expected) 95% CL intervals of −12.3 < 2 < 13.5 (−8.6 < 2 < 9.8) and −9.9 < 2 < 11.3 (−7.1 < 2 < 8.5) for the two coupling modifiers. Higgs boson couplings other than the one being tested are set to their SM values.

Searches for → production
Constraints on the production of a heavy narrow scalar resonance in association with a boson are determined through the fits of the BDT distributions for resonant production to the signal-plusbackground hypothesis in 5 GeV steps. The step size is chosen to be comparable to, or smaller than, the experimental ℎℎ mass resolution. For each tested value, the BDT distributions are obtained after imposing the ℎℎ mass window requirement discussed in Section 5.4. For values with no corresponding MC signal sample, the BDT distributions are linearly interpolated from those of the two closest neighbouring mass points with MC signal samples. To validate this interpolation, the results of fits performed at points with a MC signal sample are compared with the results obtained when the BDT distribution is an interpolation between those of neighbouring points with MC signal samples.
Fits are performed separately for the → ℓ ℎℎ and → (ℓℓ/ )ℎℎ searches. All three channels are included in the fits for both searches. The signal contributes mostly to the 1L channel, but with a sizeable contribution in the 0L channel due to an inefficiency in lepton identification. Its contribution to the 2L channel is negligible. Inclusion of the 2L channel in the fit effectively makes the channel an additional CR for the → ℓ ℎℎ search, further constraining the + jets background. Similarly, the signal contributes mostly to the 0L and 2L channels. Inclusion of the 1L channel helps to constrain the + jets background.  Figure 8(b). The heavy-flavour background NFs from the fits are compared with those from other fits in Table 5.
The observed and expected 95% CL upper limits on the cross-section ( ) × ( → ℎℎ → ) as a function of are shown in Figure 9. The resonant search is sensitive to the and couplings separately. Compared with the search in the VBF channel [15], which is sensitive to the combination of the two couplings, the search has better sensitivity for up to ∼ 450 GeV, assuming that SU(2) custodial symmetry (like that in the SM) applies to the and couplings [109].

Search for → production
The → search follows a strategy similar to that used in the search. The → BDT distributions in the 0L and 2L channels, after applying the mass requirements discussed in Section 5.4, are used to constrain → → → ℎℎ → production for each ( , ) hypothesis with 10 GeV steps in both masses, separately for NW and LW bosons. The BDT distributions for ( , ) hypotheses without MC samples are linearly interpolated from those of the four closest neighbouring mass points with MC samples. For a NW boson, the expected and observed upper limits in the ( , ) plane are shown in Figure 10(a) and Figure 10(b) respectively. The upper limits for a LW boson, with a width of 20%, are shown in Figure 10(c) (expected) and Figure 10(d) (observed). The ratios of the observed and the expected upper bounds in the ( , ) plane are shown in Figure 11 for the two width scenarios.
The most significant excesses are observed at ( , ) = (790, 300) GeV with a local (global) significance of 3.9 (2.1 ) in the NW scenario and at ( , ) = (420, 320) GeV with a local (global) significance of 3.8 (2.8 ) in the LW scenario. Figure 8(c) shows the post-fit BDT distributions in the LW scenario from the search at ( , ) = (420, 320) GeV. Due to the selection for the LW scenario, there is little sensitivity to either the boson mass or width, and a real LW signal would likely produce a higher-than-otherwise-expected cross-section upper limit over a wide range of probed boson mass values. Thus the broad excess in the boson mass distribution in the LW scenario is consistent with the expectation for a signal.
The 2HDM benchmark used for interpretations has four free parameters: , , tan , and cos( − ), where tan is the ratio of the vacuum expectation values of the two doublets and is the mixing angle of the CP-even Higgs bosons. The limiting case of cos( − ) → 0 corresponds to the 2HDM weak decoupling limit [110] in which the lightest CP-even Higgs boson has the same couplings as the SM Higgs boson at the lowest order. The → search results are interpreted as constraints in the plane defined by cos( − ) and for given and tan values. For the remaining 2HDM parameters, the mass of the charged Higgs boson is set to be equal to and the potential parameter 2 12 is set to 2 tan /(1 + tan 2 ). For regions relevant to the sensitivity of this search, the natural width of the boson remains narrow, especially where cos( − ) is close to zero. It is estimated that the cross-section upper limits should be valid as long as the natural width of the boson is less than 1% and, therefore, the search constrains only parts of the 2HDM parameter space that conform with this requirement. On the other hand, the natural width of the boson varies from narrow to about 20%. This means that cross-section limits have to be calculated for a range of boson natural widths in the ( , ) plane, which are subsequently interpreted in the 2HDM planes discussed previously. The signal hypothesis is tested for several values of the boson natural width, and cross-section upper limits for those widths are derived as a function of . Linear interpolation is used to derive the limit for any natural width between a pair of tested width values.
To interpret these results in the 2HDM, the upper limits on the cross-section are compared with the theoretical predictions of the model. In the type-I and lepton-specific 2HDMs, only gluon-gluon fusion production of the boson, → , is relevant, and it is calculated with corrections at up to NNLO in QCD as implemented in SusHi [111][112][113][114]. The widths and branching ratios of the Higgs bosons ( , , and ℎ) are calculated using the 2HDMC code [115]. The procedure used to calculate the cross-sections and branching ratios, as well as to choose the 2HDM parameter values, follows Ref. [10]. The upper limits are shown for some representative values in Figure 12 for the type-I 2HDM and in Figure 13 for the lepton-specific 2HDM. The ℎℎ coupling vanishes at cos( − ) = 0, a feature which is reflected by the inability of this analysis to exclude this region of the (cos( − ), ) plane. For tan = 1 the sensitivity is the same for both 2HDM scenarios and therefore the corresponding results are omitted from Figure 13. The sensitivity of this search is complementary to that of the → → ℓℓ /ℓℓ search [34] and to the constraints from the Higgs boson coupling measurements [4].      [108]. The LW boson has a total decay width equal to 20% of its mass.   (d). The boson has a total decay width that is negligible compared to the experimental mass resolution in the NW case and is 20% of its mass in the LW case.

Summary
Searches for Higgs boson pair production in association with a vector boson in collisions at √ = 13 TeV are performed using a data sample corresponding to an integrated luminosity of 139 fb −1 , recorded by the ATLAS experiment between 2015 and 2018 at the LHC. The Higgs bosons are identified via their decays into a pair of -quarks and the vector bosons are required to decay into leptons, leading to final states with zero, one or two charged leptons along with four -jets. The searches target both SM-inspired non-resonant ℎℎ production and BSM-motivated resonant ℎℎ production. The non-resonant ℎℎ search is carried out for scenarios with either SM kinematics but an enhanced production cross-section or modified Higgs boson couplings to vector bosons or itself. The resonant ℎℎ searches are designed for the production of a vector boson along with a heavy neutral scalar Higgs boson decaying into ℎℎ, either directly or indirectly from the decay of another heavier neutral pseudoscalar Higgs boson .
In general, the data are found to be in good agreement with the estimated background contributions, except for a few notable excesses. The most significant global excess is observed in the → → → ℎℎ search for a large-width boson at ( , ) = (420, 320) GeV, where the local (global) significance is 3.8 (2.8) standard deviations. More data are needed to ascertain the nature of this excess. Upper bounds on the ℎℎ production cross-sections are derived. For non-resonant production with SM kinematics, a 95% CL upper limit of 183 (87) is observed (expected) for the ℎℎ cross-section relative to its SM prediction. For resonant production, the observed (expected) upper limits are presented as a function of in the range 260-1000 GeV for and separately, and in the ( , ) plane for → , covering the range 360-800 GeV and range 260-400 GeV. The constraints on → production are also interpreted in the (cos( − ), ) parameter space of type-I and lepton-specific two-Higgs-doublet models.
Catalunya and PROMETEO and GenT Programmes Generalitat Valenciana, Spain; Göran Gustafssons Stiftelse, Sweden; The Royal Society and Leverhulme Trust, United Kingdom.