Search for resonant and non-resonant Higgs boson pair production in the $b\bar b\tau^+\tau^-$ decay channel using 13 TeV $pp$ collision data from the ATLAS detector

A search for Higgs boson pair production in events with two $b$-jets and two $\tau$-leptons is presented, using a proton-proton collision dataset with an integrated luminosity of 139 fb$^{-1}$ collected at $\sqrt{s}=13$ TeV by the ATLAS experiment at the LHC. Higgs boson pairs produced non-resonantly or in the decay of a narrow scalar resonance in the mass range from 251 to 1600 GeV are targeted. Events in which at least one $\tau$-lepton decays hadronically are considered, and multivariate discriminants are used to reject the backgrounds. No significant excess of events above the expected background is observed in the non-resonant search. The largest excess in the resonant search is observed at a resonance mass of 1 TeV, with a local (global) significance of $3.1\sigma$ ($2.0\sigma$). Observed (expected) 95% confidence-level upper limits are set on the non-resonant Higgs boson pair-production cross-section at 4.7 (3.9) times the Standard Model prediction, assuming Standard Model kinematics, and on the resonant Higgs boson pair-production cross-section at between 21 and 900 fb (12 and 840 fb), depending on the mass of the narrow scalar resonance.


Introduction
The discovery of a Higgs boson ( ) with a mass of about 125 GeV [1, 2] has led to a comprehensive programme of measurements and searches by the ATLAS [3] and CMS [4] collaborations at the Large Hadron Collider (LHC) [5] using proton-proton ( ) collision data. To date, all of the measured properties of the Higgs boson are found to be consistent with their Standard Model (SM) predictions [6][7][8][9][10][11][12][13], and no unexpected particles or Higgs boson decay modes have been observed. The SM predicts non-resonant production, with approximately 90% of the total cross-section being due to the gluon-gluon fusion (ggF) process. The leading-order (LO) contributions to ggF production are the 'triangle diagram', which includes a Higgs boson self-coupling vertex, and the heavy-quark 'box diagram', which has two fermion-fermion-Higgs vertices, as shown in Figure 1(a) and Figure 1(b), respectively. These diagrams interfere destructively, leading to a small SM ggF non-resonant cross-section, which is predicted to be 31.1 +2.1 −7.2 fb at next-to-next-to-leading (NNLO) order in s , including an approximation of finite top-quark-mass effects, for = 125 GeV and √ = 13 TeV [14][15][16][17][18][19][20][21]. The vector-boson fusion (VBF) process provides a sub-leading source of production in the SM, and has a cross-section of 1.73 ± 0.04 fb at N 3 LO accuracy in QCD, for = 125 GeV and √ = 13 TeV [22][23][24][25][26]. Due to these small cross-sections, an observation of SM non-resonant production is not expected with the currently available LHC dataset, although significant non-resonant and resonant enhancements to the cross-section are predicted in many beyond-the-SM (BSM) theories. Due to the diagram shown in Figure 1(a) and its interference with the diagram shown in Figure 1(b), non-resonant production is a sensitive probe of the Higgs boson trilinear self-coupling and the shape of the Higgs field potential, which have important implications for the stability of the electroweak vacuum [27,28] and for baryogenesis [29] and inflation [30,31]. Modifications to the non-resonant cross-section occur in BSM scenarios with new, light, coloured scalars [32], composite Higgs models [33], theoretical scenarios with couplings between pairs of top quarks and pairs of Higgs bosons [34], as well as models with a modified coupling of the Higgs boson to the top quark.
This paper describes a search for non-resonant and resonant production in the final state with two -leptons and two jets containing -hadrons ( -jets). The sizeable fraction of all possible SM decays that result in this final state, B ( →¯+ − ) = 7.3% [71,72], and relatively low backgrounds make this one of the most sensitive search signatures. In the search for non-resonant production, the signal kinematics are assumed to follow the SM prediction, and the search is optimised for maximum sensitivity to the cross-section rather than the Higgs boson self-coupling. Additionally, only the ggF and VBF non-resonant production modes are considered, because other production modes are not expected to contribute significant additional sensitivity in this search. A narrow CP-even scalar particle ( ) with a mass between 251 and 1600 GeV is used as the benchmark model for the resonant signal. Decay modes in which both -leptons decay hadronically ( had ), or in which one decays hadronically and the other leptonically ( lep ), are considered; these are referred to as had had and lep had , respectively. The presence of had are determined by detector signatures compatible with the expected visible decay products ( had-vis ). Events are categorised according to the type of trigger that accepted the event, and are required to contain two oppositely charged had-vis and two -tagged jets in the had had final state, or an electron or muon and an oppositely charged had-vis and two -tagged jets in the lep had final state. Signal events also contain neutrinos from the decay of -leptons and -hadrons, which manifest themselves as missing momentum transverse to the beamline (p miss T ). Backgrounds in this search include the production of top-quark pairs (¯), single top quarks, and bosons in association with jets, dibosons ( , , ), single Higgs bosons, and multi-jet events. In some background events, quark-or gluon-initiated jets are misidentified as had-vis . The selected events are tested for the presence of the signal by performing profile-likelihood fits to multivariate discriminant distributions. The search is performed using a dataset obtained from collisions delivered by the LHC during Run 2, between 2015 and 2018, at a centre-of-mass energy of √ = 13 TeV. The data were collected by the ATLAS detector [3] and correspond to an integrated luminosity of 139 fb −1 . Compared to the previous ATLAS search [35], in addition to the greater integrated luminosity, this search profits from improved had-vis and -jet reconstruction and identification algorithms, more sophisticated multivariate techniques used to target the resonant signal hypotheses, and new background estimation techniques. In particular, the combined reconstruction and identification efficiencies for had-vis and -jets increased by around 25%-38% and 10%, respectively. The rest of this paper is organised as follows. A brief description of the ATLAS detector is given in Section 2, and the data and simulation samples used are outlined in Section 3. Overviews of the reconstruction of physics objects and the event selection are presented in Sections 4 and 5, respectively. The background modelling strategy is described in Section 6. The systematic uncertainties relevant to this search and the statistical interpretation are described in Sections 7 and 8, respectively. The results of the search are given in Section 9. Finally, Section 10 presents the conclusions.

ATLAS detector
ATLAS [3] is a general-purpose particle detector covering nearly the entire solid angle 1 around the collision point. It is composed of an inner tracking detector system surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroidal magnets.
The inner detector, located within a 2 T axial magnetic field generated by the superconducting solenoid, is used to measure the trajectories and momenta of charged particles. The inner layers, consisting of high-granularity silicon pixel detectors, instrument a pseudorapidity range | | < 2.5. An innermost silicon pixel layer, the insertable B-layer [73,74], was added to the detector between Run 1 and Run 2. The insertable B-layer improves the ability to identify displaced vertices, which significantly improves the -jet tagging performance [75]. Silicon strip detectors, which cover the range | | < 2.5, surround the pixel detectors. Outside the strip detectors and covering | | < 2.0, there are straw-tube tracking detectors, which also provide measurements of transition radiation that are used in electron identification.
The outermost part of the detector is the muon spectrometer, which measures the curved trajectories of muons in the field of three large air-core toroidal magnets. High-precision tracking is performed within the range | | < 2.7, and there are chambers for fast triggering within the range | | < 2.4.
The ATLAS detector has a two-level trigger system [76] to select events of interest. The first-level (L1) trigger is implemented in custom electronics and, using a subset of the information from the detector, it accepts events from the 40 MHz LHC proton bunch crossings at a rate of about 100 kHz. This is followed by a software-based high-level trigger (HLT) that reduces the accepted event rate to approximately 1 kHz.
An extensive software suite [77] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). The distance in ( , ) coordinates, Δ = √︁ (Δ ) 2 + (Δ ) 2 , is also used to define cone sizes. Transverse momentum and energy are defined as T = sin and T = sin , respectively.

Data samples
The data used in this search were collected at a centre-of-mass energy of 13 TeV between 2015 and 2018, using triggers to select events with at least one lepton (where a lepton is defined as an electron or a muon), at least one had-vis , at least one lepton and one had-vis , or at least two had-vis . Details about these triggers are discussed in Section 5.1. Events are selected for analysis only if they are of good quality and if all the relevant detector components are known to be in good operating condition [78]. The total integrated luminosity of the data, after meeting the good-quality criteria, is 139.0 ± 2.4 fb −1 [79,80]. The recorded events contain an average of 34 simultaneous inelastic collisions per bunch-crossing (pile-up).

Simulated event samples
Monte Carlo (MC) simulated events are used to model SM background production, SM non-resonant signal production, and BSM resonant signal production. The events were passed through the full ATLAS detector simulation [81] based on G 4 [82], with the exception of the BSM resonant signal events, which were passed through a fast simulation in which the response of the calorimeters is parameterised [83, 84] rather than fully simulated. The effects of pile-up in the same and neighbouring bunch crossings were modelled by overlaying each hard-scatter event with minimum-bias events, simulated using the soft quantum chromodynamics (QCD) processes of P 8.186 [85] with a set of tuned parameters called the A3 tune [86] and the NNPDF2.3 [87] parton distribution functions (PDFs). The E G program [88] was used to model the decays of bottom and charm hadrons in all samples of simulated events, except those generated using S [89]. The samples generated with S used the bottom-and charm-hadron decay model implemented within the generator. The simulated events were processed through the same reconstruction algorithms as the data. For all samples containing a SM Higgs boson, its mass was fixed to 125 GeV. The same mass value is used in the calculation of the Higgs boson decay branching fractions and in the calculation of the single-Higgs-boson and SM non-resonant production cross-sections. Unless otherwise specified, the order of the cross-section calculation refers to the expansion in the strong coupling constant ( s ). A summary of the event samples used for the simulation of the signal and background processes is shown in Table 1.
Simulated SM non-resonant signal production includes the contributions from the ggF and VBF processes. The simulated ggF events were generated with the P B v2 generator [90] at next-toleading order (NLO) with finite top-quark mass, and using the PDF4LHC15_ _30_ (code 90400 in the LHAPDF database [91]) PDF set [92]. Parton showers and hadronisation were simulated using P 8.244 [85] with the A14 tune [93,94] and the NNPDF2.3 PDF set. The cross-section for ggF nonresonant production is calculated at next-to-next-to-leading order (NNLO) using FTApprox [20]. The VBF non-resonant signal events were generated at LO using the M G 5_ MC@NLO 2.7.3 [95] generator with the NNPDF3.0 [96] PDF set. Parton showering and hadronisation were performed using P 8.244 with the A14 tune and the NNPDF2.3 PDF set. The cross-section for VBF non-resonant production is calculated at next-to-next-to-next-to-leading order (N 3 LO) in QCD in the limit of no partonic exchange between the two protons [26]. The calculated cross-section values for ggF and VBF non-resonant production at √ = 13 TeV and = 125 GeV are given in Section 1. Other non-resonant production modes are not considered because their contributions to the analysis sensitivity are expected to be negligible. The BSM resonant signal from the ggF production of a heavy spin-0 resonance and its decay into a pair of SM Higgs bosons, → , was simulated with the M G 5_ MC@NLO 2.6.1 generator using the NNPDF2.3 PDF set at LO accuracy in QCD. The simulated events were interfaced to H 7.1.3 [97,98] to model the parton shower, hadronisation and underlying event, using the H7.1-Default tune [99] and the NNPDF2.3 PDF set. The resonant signal was simulated for 19 values of the resonance mass, , between 251 GeV and 1.6 TeV. The width of the heavy scalar was fixed to 10 MeV.
The production of¯events, and of single-top-quark events in the -, -and -channels, was simulated using the P B v2 generator together with the NNPDF3.0 PDF set. The simulated events were interfaced to P 8.230 for parton showering and hadronisation using the A14 tune together with the NNPDF2.3 PDF set. The top-quark spin correlations were preserved for all these simulated top-quark processes. The top-quark mass was set to 172.5 GeV. The¯production cross-section is calculated at next-to-next-to-leading-order and next-to-next-to-leading-logarithm (NNLO+NNLL) accuracy [100]. The cross-sections for the three single-top-quark production channels are calculated at NLO [101][102][103]. Thē -interference was handled using the diagram removal scheme.
Events containing a or boson produced in association with jets, diboson ( , and ) production processes, and the¯production process were simulated with the S 2.2.1 generator [89], whereas the¯production process was simulated with the S 2.2.8 generator. These samples used the NNPDF3.0 [96] PDF set with dedicated parton shower tuning developed by the S authors. For the simulation of / +jets events, the matrix elements were calculated for up to two partons at NLO and up to four partons at LO using the O L [104] and C [105] matrix-element generators. The expected number of / +jets events is normalised to the NNLO cross-sections [106]. Diboson production was simulated for up to one additional parton at NLO and up to three additional partons at LO using the O L and C programs. The NLO cross-sections from S are used to normalise the diboson and the¯/ events. SM single Higgs boson production is considered as part of the background in this search, and its production modes were simulated using the P B v2 generator and the NNPDF3.0 PDF set. Single Higgs boson production via ggF was simulated at NNLO accuracy in QCD using the P NNLOPS program [107,108], whereas VBF single Higgs boson production was simulated at NLO accuracy in QCD [109]. Events from both of these production modes were interfaced to P 8.212 for parton showering and hadronisation using the AZNLO tune [110] together with the CTEQ6L1 PDF set [111]. The cross-section for ggF production of single Higgs bosons is based on a computation with N 3 LO accuracy in QCD, and NLO accuracy in the electroweak (EW) expansion [71,[112][113][114][115], whereas the cross-section for VBF production of single Higgs bosons is taken from the NNLO(QCD)+NLO(EW) calculation [71,[116][117][118]. The → , → and → simulated events were interfaced to P 8.212 for parton showering and hadronisation using the AZNLO tune together with the CTEQ6L1 PDF set. The cross-sections are taken from the NNLO(QCD)+NLO(EW) calculations for → and → [119][120][121][122][123][124][125], and from calculations at next-to-leading-order and next-to-leading-logarithm (NLO+NLL) accuracy in QCD for → [126][127][128][129][130]. For Higgs boson production in association with a pair of top quarks (¯), the simulated events were interfaced to P 8.230 for parton showering and hadronisation using the A14 tune and the NNPDF2.3 PDF set. The cross-section for¯production is taken from NLO calculations [71]. The contributions from other single Higgs production modes are expected to be negligible and thus not considered. SM single Higgs boson production plays a more important role as a background in the non-resonant search than in the resonant search, due to more similar production kinematics.

Object reconstruction
Electrons, muons, had-vis , jets from the hadronisation of quarks and gluons, including -tagged jets, and p miss T are used in this search. An antihad-vis object, defined as a had-vis with modified identification requirements, is also used to estimate the backgrounds from hadronic jets misidentified as had-vis . Tracks are used in the reconstruction, identification, isolation and vertex compatibility requirements and calibration of many of the physics objects described below, and in vertex reconstruction; they are reconstructed from hits in the inner tracking detectors, and are required to have T > 500 MeV [131,132]. Events are required to have at least one collision vertex reconstructed from two or more associated tracks. If multiple vertices are found, the one with the largest 2 T of the associated tracks is selected as the primary vertex. Finally, an overlap removal procedure is applied to ensure that no detector signature is identified as multiple reconstructed objects. Electron candidates are reconstructed by matching tracks reconstructed in the inner detector to topological energy clusters in the electromagnetic calorimeter, with a reconstruction efficiency of around 98% [133]. Electron candidates are required to have T > 7 GeV and | | < 2.47, and to be outside the transition region between the calorimeter's barrel and endcaps, 1.37 < | | < 1.52. They must pass track-quality requirements, followed by a loose likelihood-based selection that requires the shower profile to be compatible with that of an electromagnetic shower. These requirements have an efficiency of around 93%. Isolation requirements are applied; these are based on the presence of tracks in a cone of T -dependent size Δ around the electron and of calorimetric energy deposits in a fixed-size cone. Lastly, the electron energy scale is calibrated in data, and the energy resolution is calibrated in simulation, using → events [133].
Muon candidates are reconstructed from tracks in the muon spectrometer, matched to tracks in the inner detector where available [134]. In the absence of full tracks in the muon spectrometer, muons with | | < 0.1 can be reconstructed from track segments in the muon spectrometer, or energy deposits compatible with that of a minimum-ionising particle in the calorimeters. If an inner-detector track is present, it must match the direction and momentum of the muon spectrometer track for it to be included. The muon momentum is defined by using information from both the muon spectrometer and the inner detector where available.
Only muon candidates with T > 7 GeV and | | < 2.7, and passing loose quality requirements based on the number of hits used to reconstruct the tracks, with an efficiency of around 99%, are considered for further analysis. Lastly, isolation requirements with an efficiency of around 95% that are based on the presence of particle-flow objects [135] in a cone of T -dependent size Δ around the muon are applied, except for muons used in the -tagged-jet energy correction described below.
Jets are reconstructed using the anti-jet clustering algorithm [136, 137] with a radius parameter of 0.4, applied to noise-suppressed positive-energy topological energy clusters [138, 139] and charged-particle tracks, processed using a particle-flow algorithm [135]. Jet energies are corrected for contributions from pile-up, calibrated using energy-and -dependent correction factors determined from comparisons between particle-level objects and reconstructed physics objects in simulated events, and then corrections are applied to account for effects due to the initiating-parton type and hadron composition [140]. In data, a residual in situ correction is applied in order to correct for differences relative to simulation. Jets are required to have T > 20 GeV and | | < 2.5, or T > 30 GeV and 2.5 < | | < 4.5. To reject jets from pile-up, jets with T < 60 GeV and | | < 2.5 (20 < T < 120 GeV and 2.5 < | | < 4.5) are required to pass a 'jet vertex tagger' [141] ('forward jet vertex tagger' [142]) requirement to determine if they originate from the primary vertex [143]. Lastly, quality criteria [144] are applied to jets reconstructed from topological clusters [138,139] in the calorimeter in order to identify and remove events containing jets from non-collision backgrounds and calorimeter noise.
A multivariate classification algorithm based on a deep neural network, the DL1r tagger [145,146], is used to distinguish -jets from the background of light-flavour-and charm-quark-initiated jets using information about the jet kinematics, the impact parameters of tracks associated with the jet, and the presence of displaced vertices. The inputs to the DL1r network include variables based on a recurrent neural network (RNNIP) [147], which can exploit the spatial and kinematic correlations between tracks that are initiated from the same -hadrons. This analysis uses a -jet working point that targets an efficiency of 77%. A simulation-based T -dependent correction is applied to the -tagged jet momentum [148]. Finally, for -tagged jets containing a muon, the difference between the reconstructed muon momentum and the momentum corresponding to the energy of its associated cluster in the calorimeter is added to the -tagged jet momentum [148]. This corrects for the fact that the muon typically deposits only a small fraction of its energy in the calorimeters. These two corrections improve the resolution of the invariant mass of the pair of -tagged jets by approximately 16% for the non-resonant signal.
The reconstructed had-vis candidates [149] are seeded by jets. The had-vis energy is calibrated using multivariate methods with information from tracks and calorimeter energy clusters [150]. The had-vis are required to have T > 20 GeV and | | < 2.5, excluding 1.37 < | | < 1.52. Boosted decision trees are used to determine if tracks near a cluster originate from a had , and one or three tracks with a total charge of ±1 are required. The truehad-vis are discriminated from quark-and gluon-initiated-jet backgrounds by using recurrent neural networks trained to target signatures with either one or three associated tracks [151], and a loose requirement with an efficiency of around 85% (75%) for one-track (three-track) had-vis candidates is applied. A separate boosted decision tree is then used to reject had-vis candidates with one associated track that originate from electrons, with an efficiency of about 95% for had [149].
Antihad-vis objects are defined in order to estimate the background from jets misidentified as had-vis , as described in Section 6. They are only used for the estimation of these backgrounds, and only one antihad-vis per event is considered. Antihad-vis objects are reconstructed, and their energy is calibrated, in the same way as for had-vis candidates, and they must satisfy the nominal had-vis kinematic and track selection criteria. They are required to pass a looser recurrent neural-network requirement, corresponding to an efficiency of approximately 99% for truehad-vis , but to fail the nominal recurrent neural-network requirement applied to the had-vis candidates.
Neutrinos are produced in the decay of -leptons and in the semileptonic decay of -hadrons, and contribute to the total p miss T in the event. The p miss T is defined as the negative vector sum of the transverse momenta of all reconstructed and calibrated leptons, jets, and had-vis , and of all tracks matched to the primary vertex but not to other reconstructed objects in the event [152]. The magnitude of p miss T is denoted by miss T . A sequential overlap removal procedure is applied to resolve ambiguities in which multiple electron, muon or jet candidates would otherwise be reconstructed from the same detector signature. This procedure uses a definition of angular distance, Δ = √︁ (Δ ) 2 + (Δ ) 2 , that is based on the rapidity of the objects. The following steps are applied in order: 1. if any electrons share a track, all but the highest-T electron are removed; 2. any had-vis within Δ = 0.2 of an electron or a muon is removed, except if the had-vis has T > 50 GeV, in which case the muon must also be associated with a track in the muon spectrometer for the had-vis to be removed; 3. if an electron and a muon share a track, the electron is removed if the muon is associated with a signature in the muon spectrometer, otherwise the muon is removed; 4. any jet within Δ = 0.2 of an electron is removed; 5. any electron within Δ = 0.4 of a jet is removed; 6. any jet within Δ = 0.2 of a muon, or which would have the inner-detector track of a muon matched to it using ghost association [153], is removed if it has fewer than three associated tracks; 7. any muon within Δ = 0.4 of a jet is removed; 8. any jet within Δ = 0.2 of a had-vis is removed; 9. if an antihad-vis is within Δ = 0.2 of a jet, it is removed if the jet is -tagged, otherwise the jet is removed.

Event selections
Events are selected in three separate signal categories. The had had category targets events with two oppositely charged had-vis and two -jets, whereas two lep had categories target events with an electron or muon, an oppositely charged had-vis and two -jets. A control region ( + HF CR) is also defined in order to determine the normalisation of the background in which a boson is produced in association with one or more jets initiated by heavy-flavour quarks ( + HF background). Events in the signal regions (SRs) are analysed for the presence of signal using multivariate discriminants, and discrimination between the + HF and¯backgrounds is achieved using the dilepton invariant mass ( ℓℓ ) in the + HF CR, as described in Section 5.3.

Signal regions
Events in the had had category were recorded using a combination of singlehad-vis triggers (STTs) and dihad-vis triggers (DTTs). The STTs accept events with at least a single had-vis at the HLT with a minimum T between 80 GeV and 160 GeV, depending on the data-taking period. The DTTs select events with at least a pair of had-vis reconstructed at the HLT, with minimum T of 35 GeV (25 GeV) for the (sub-)leading had-vis , where the (sub-)leading had-vis is defined as the had-vis with the (second-)highest T in the event. From the 2016 data-taking period onward, additional requirements were applied in the L1 trigger to reduce the DTT trigger rates. During 2016 data-taking, an additional jet with T > 25 GeV was required. For 2017 (2018) data-taking, if two offline 2 jets with T > 45 GeV are found, then a trigger is used that requires two additional jets with T > 12 GeV (and | | < 2.3) at L1, otherwise another trigger is used that requires one additional jet with T > 25 GeV and the had-vis to be reconstructed within Δ = 2.8 of each other 3 . In order to select events near the trigger efficiency plateaus where the efficiencies are well modelled, the offline had-vis are required to be within Δ = 0.2 of the corresponding HLT had-vis objects and a minimum offline T requirement is applied to had-vis and to the jets. The offline T thresholds for the had-vis range between 100 GeV and 180 GeV for the STTs, and are 40 GeV (30 GeV) for the (sub-)leading had-vis for the DTTs. Additional offline requirements for the DTTs are either that two additional jets with T > 45 GeV are present in the event, or that a jet with T > 80 GeV is present in the event and the had-vis are reconstructed within Δ = 2.5 of each other. For events that pass both the STTs and DTTs, the offline requirements used for the STTs are applied. Events passing the had had event selection are analysed together.
Events in the lep had categories were recorded using a combination of single-lepton triggers (SLTs) and lepton-plushad-vis triggers (LTTs). The SLTs require an electron or muon to be reconstructed at the HLT with an T threshold that ranges from 24 GeV to 26 GeV for electrons and a T threshold that ranges from 20 GeV to 26 GeV for muons, depending on the data-taking period. The LTTs require that an electron with T > 17 GeV or a muon with T > 14 GeV, and a had-vis with T > 25 GeV, are reconstructed at the HLT. The LTTs used to collect lepton-plushad-vis events with had-vis T < 35 GeV had additional requirements at the L1 trigger, requiring the presence of either an additional jet with T > 25 GeV or two additional jets with T > 12 GeV. The electron-plushad-vis triggers that require an additional jet with T > 25 GeV was only used to select an event if the electron-plushad-vis triggers that require two additional jets with T > 12 GeV did not select the event. For any trigger to select an event, based on the presence of a muon, the muon must have | | < 2.5. In order to select events near the trigger efficiency plateaus where the trigger efficiencies are well modelled, the offline electrons, muons and had-vis objects are required to be within Δ = 0.07, Δ = 0.1 and Δ = 0.2 of the corresponding objects at the HLT, respectively. Minimum T requirements are applied to the offline objects, and these are 1 GeV above the thresholds for electrons and muons at the HLT, 5 GeV above the thresholds for had-vis at the HLT, and 80 GeV (45 GeV) for jets with an L1-trigger T threshold of 25 GeV (12 GeV). Events which pass the offline SLT lepton T requirements are not considered for the LTT. This ensures that there is no overlap between the SLT and LTT categories. These two categories are analysed separately.
In the had had category, in addition to the trigger selection described above, events are required to contain exactly two had-vis with opposite charges and exactly two -tagged jets. Events with any additional leptons are vetoed, and the (sub-)leading -tagged jet is required to have T > 45 (20) GeV. The invariant mass of the -lepton pair ( MMC ) is estimated from the four-momenta of the had-vis and the p miss T using the Missing Mass Calculator (MMC) [154], which assumes that the p miss T is exclusively from the neutrinos produced in the -lepton decays. To reject background from low-mass Drell-Yan events, MMC is required to be above 60 GeV.
In the lep had categories, exactly one 'loose' electron or 'loose' muon, an oppositely charged had-vis , and exactly two -tagged jets are required. The selected electron (muon) must also pass a tight (medium) identification requirement with an efficiency of around 80% (97%). Events are required to have MMC > 60 GeV, where MMC is calculated using the four-momenta of the electron or muon, the had-vis and the p miss T . The -tagged jet pair invariant mass ( ) is required to be less than 150 GeV to rejectb ackground events, and to allow for the definition of a¯-enriched region which is used in the estimation of backgrounds, as described in Section 6. A had-vis with T > 20 GeV and | | < 2.3 is required in the SLT category, and a had-vis with T > 30 GeV, or higher if required by the trigger, and | | < 2.3 is required in the LTT category. In both categories, the (sub-)leading -tagged jet must have T > 45 (20) GeV, in addition to any trigger-dependent requirements.
The full event selections are summarised in Table 2. The acceptance times efficiency for the non-resonant ggF+VBF →¯+ − signal, evaluated with respect to the targeted -lepton pair decay mode, is 4.0%, 4.0% and 1.0%, in the had had category, and the lep had SLT and lep had LTT categories, respectively. For the resonant signal, the acceptance times efficiency is shown in Figure 2 as a function of the resonance mass. The decrease in acceptance times efficiency for greater than about 1000 GeV is due to the Lorentz boost of the Higgs bosons causing their decay products to become highly collimated more often.  Figure 2: Acceptance times efficiency for the full analysis selections as a function of the resonance mass in the had had , lep had SLT and lep had LTT trigger categories, shown in solid square markers, empty circle markers and empty triangle markers, respectively. The solid circle markers are the acceptance times efficiency values for the combined lep had category. The acceptance times efficiency for → →¯+ − decays is evaluated with respect to the targeted -lepton pair decay mode ( lep had or had had ). Table 2: Summary of the event selections, shown separately for events that are selected by different triggers. In cases where pairs of reconstructed objects of the same type are required, thresholds for the (sub-)leading T object are given outside (within) parentheses where different event selection thresholds are applied. When the selection depends on the year of data-taking, the possible values of the requirements are separated by commas, except for the jet selection in the LTT and DTT triggers, which use multiple selection criteria as described in Section 5.

Multivariate signal extraction
Multivariate discriminants (MVAs) evaluated for events passing the above selections are used to extract possible signals. Parameterised neural networks (PNNs) [155] are used in the search for resonant production, and a boosted decision tree (BDT) and neural networks (NNs) are used in the had had category and lep had categories of the non-resonant search, respectively. The (P)NNs are trained using Keras [156] with the Tensorflow [157] backend, and the BDT is trained using TMVA [158]. During training, the sum of all backgrounds normalised to their respective cross-sections is used. The backgrounds containing one or more had-vis from a misidentified quark-or gluon-initiated jet are modelled using simulation, except for the multi-jet background in the had had category, where the data-driven estimate is used. The PNNs are parameterised in the mass of the heavy resonance, providing near-optimal sensitivity and continuity over the range of signal masses considered.
The same choice of MVA input variables is used for the resonant and non-resonant production modes, although different input variables are used in the different analysis categories. These variables are listed in Table 3, and are defined as follows: • is the invariant mass of the system as reconstructed from the -lepton pair (calculated using the MMC) and the -tagged jet pair. In the lep had categories the four-momenta of the¯and + − systems are multiplied by correction factors (125 GeV/ and 125 GeV/ MMC , respectively) in order to improve the mass resolution; • Δ ( , ) is evaluated between the two had-vis (the electron or muon and the had-vis ) in the had had category ( lep had categories); • Δ ( , ) is evaluated between the -tagged jets; • Δ T (ℓ, ) is the difference between the transverse momenta of the lepton and the had-vis ; is the transverse mass of the lepton and the p miss T ; • the p miss T centrality specifies the angular position of the p miss T relative to the had-vis in the transverse plane [159] and is defined as ( + )/ , and 1 and 2 represent the two had-vis (electron or muon and had-vis ) in the case of the had had category ( lep had categories); • Δ (ℓ , ) is the azimuthal angle between the ℓ + had-vis system and the -tagged jet pair; • Δ (ℓ, p miss T ) is the azimuthal angle between the lepton and the p miss T ; • Δ ( , p miss T ) is the azimuthal angle between the -lepton pair system (estimated using the MMC) and the p miss T ; • T is the total transverse energy in the event, summed over all jets, had-vis and leptons in the event          (bottom) for events in the had had (left), lep had SLT (middle column) and lep had LTT (right) categories. The normalisation and shape of the backgrounds and the uncertainty in the total background shown are determined from the likelihood fit (described in Section 8) to data in the non-resonant search. The expected non-resonant signal is overlaid with its normalisation scaled by a factor of 100, and the = 500 GeV and = 1000 GeV resonant signals are overlaid in the distributions with their cross-section set to 1 pb. The dashed histogram shows the total pre-fit background. The size of the combined statistical and systematic uncertainty of the background is indicated by the hatched band. The ratio of the data to the sum of the backgrounds is shown in the lower panels.

+ HF control region
The normalisation of the + HF background is determined from data by fitting the ℓℓ distribution in the + HF CR in the likelihood fit (described in Section 8). This is to account for a known discrepancy between the + HF production cross-section provided at NLO by S and the cross-section observed in data. The + HF CR targets events containing boson decays into electron or muon pairs by using triggers that require either a lepton or a pair of same-flavour leptons. Exactly two oppositely charged same-flavour leptons and exactly two -tagged jets must be reconstructed offline. The leptons are required to have T > 9 GeV, pass offline T thresholds based on the trigger thresholds, be compatible with originating from the primary vertex, and pass medium identification and loose isolation requirements. Lastly, the invariant mass of the lepton pair is required to be between 75 GeV and 110 GeV to select events which include a boson decay, and is required to be less than 40 GeV or greater than 210 GeV to ensure orthogonality to the event selection of another ATLAS search. This region also provides constraints on the normalisation of the¯background. Figure 4 shows the post-fit background and data ℓℓ distributions in the + HF control region.

Background modelling
Backgrounds in this search are estimated using a combination of simulation-based and data-driven techniques. The main sources of background are top-quark, +jets, +jets, diboson, single Higgs boson and multi-jet production. A reconstructed had-vis , in these background events, can originate either from a had decay (truehad-vis ), or from a misidentified quark-or gluon-initiated jet (fakehad-vis ). Events in which an electron or a muon is misidentified as a had-vis represent a small additional background.
Most of the background events with fakehad-vis are from¯or multi-jet production. In¯events, fakehad-vis typically originate from quark-initiated jets from the top-quark decay. In multi-jet events, both quark-and  gluon-initiated jets may be misidentified as had-vis .
The simulated event samples, summarised in Section 3.2, are used to model background events containing truehad-vis and events with an electron or a muon misidentified as a had-vis . Events with fakehad-vis in¯or multi-jet production are estimated using techniques relying on both simulated events and data, as detailed in the following subsections. Smaller backgrounds with fakehad-vis from other production processes are estimated from simulation.
The normalisations of simulated¯, for both the true-and fakehad-vis components, and + HF backgrounds are determined from data in the likelihood fits of signal and control regions, as outlined in Section 8.

Fakehad-vis background in the lep had channel
In the lep had channel, a combined fake-factor method similar to that described in Ref. [35] is used to estimate multi-jet and¯backgrounds with fakehad-vis . A schematic depiction of this method is shown in Figure 5. This method employs events in two groups of regions. The events in the identification (ID) regions require one identified had-vis , whereas events in the anti-identification (anti-ID) regions contain one antihad-vis candidate.
Prior to the last step of the overlap removal procedure outlined in Section 4, if an event does not contain an identified had-vis , it is checked for reconstructed had-vis candidates satisfying the antihad-vis requirements, as defined in Section 4. If an event contains multiple antihad-vis candidates, one is chosen randomly. In the LTT category, however, only the antihad-vis candidate that is within Δ = 0.2 of the HLT had-vis object is considered. In order to define a fakehad-vis background template, an anti-ID region is defined using event selection criteria equivalent to the SR selection, with one antihad-vis candidate instead of one identified had-vis . This region, which is enriched with fakehad-vis , is defined as the SR Template region. The template for estimating fakehad-vis in the SR is obtained by subtracting from the data distribution in the SR Template region the distribution of simulated background events in which the had-vis candidate is not a fakehad-vis originating from jets. This subtraction is referred to as the truehad-vis subtraction given that the number of events in which an electron or muon is misidentified as a had-vis , which are also subtracted, is very small. The data and simulated events that are used to build the template are scaled with event weights, referred to as fake-factors (FFs), to estimate the fakehad-vis background in the SR.
The FFs are derived separately for multi-jet (FF MJ ) and¯(FF¯) events in dedicated control regions. The multi-jet control regions (MJ CRs) and¯control regions (¯CRs) are defined separately for the ID and the anti-ID regions, depending on whether they contain one identified had-vis or one antihad-vis candidate, respectively. Besides the had-vis selection, the MJ CRs are defined using the SR selection with an inverted electron or muon isolation requirement (anti-Iso) and without the < 150 GeV requirement. The MJ CR's purity in multi-jet production events varies between 65% and 90% depending on the trigger category type (SLT or LTT) and whether the MJ CR is in the ID or anti-ID region. Similarly, the¯CRs are defined using the SR selection with an inverted requirement ( > 150 GeV). The¯CR is about 95% pure in the events from¯production. Contamination from either non-resonant or resonant signal in these CRs is estimated to be negligible. Backgrounds which are not from events with fakehad-vis originating from jets, are estimated from simulation and are subtracted from the distribution of the data in all the control regions used for the FF measurement. After the subtraction the FFs are derived as the ratio of the number of events in the ID region to the number of events in the anti-ID region. They are parameterised in terms of the had-vis T , independently for 1-and 3-prong had-vis ('1-and 3-prong' refers to the number of tracks associated with a reconstructed had-vis ), and separately for the SLT and LTT categories. The FFs corresponding to the individual background processes are combined as where MJ is the expected fraction of multi-jet events in the SR Template. The number of multi-jet events in the SR Template is estimated by taking the number of data events in the SR Template and subtracting the expected number of non-multi-jet background events with both true-and fakehad-vis as estimated from simulation. The non-multi-jet background events are dominated by¯production. The MJ is parameterised as a function of the had-vis T , and it is measured separately for the had and had events, for 1-and 3-prong had-vis categories, and for the SLT and LTT categories. The FF comb is used to scale the events used for the SR Template in order to obtain the fakehad-vis background prediction in the SR.
The determination of the fakehad-vis background using the combined FF method is sensitive to the modelling of simulated¯events with truehad-vis given that this is the dominant background that is subtracted from data in the derivation of the FF and MJ , and when obtaining the SR Template. Additionally, the derivation of MJ is sensitive to the modelling of simulated¯events with fakehad-vis . To improve predictions, the simulated events from¯production are differentially reweighted to data distributions depending on the jet multiplicity and the scalar sum of the transverse momentum of all visible final-state objects in the event. These reweighting factors are determined from another¯control region (¯CR2), which is about 93% pure in events from¯production, and is defined using a selection identical to the SR selection, but with an inverted requirement ( > 150 GeV) and with an additional T > 40 GeV requirement. Furthermore, events in this control region are required to have a reconstructed had-vis candidate, but this candidate is not required to meet the recurrent neural-network identification criteria defined in Section 4. The T requirement is introduced to reduce any potential contamination from multi-jet events.
Statistical uncertainties in FF¯, FF MJ and MJ are evaluated and propagated to the final result. The difference between the fakehad-vis background estimates obtained with and without the aforementionedm odelling correction is taken as an uncertainty in the background estimate. A conservative 30% modelling uncertainty is assigned to simulated non-¯backgrounds which are subtracted from data. Due to its large dependence on the modelling of simulated¯events with fakehad-vis , the obtained values of MJ are varied by ±0.5, with the constraint 0 ≤ MJ ≤ 1. The impact of such a conservative uncertainty is small since the FFs in multi-jet and¯events are found to be similar. The total uncertainty in the FF comb value for the SLT category is at most 10%, and at most 25% for the LTT category. The combined FF method is checked for closure in the¯CR and it is validated in the 0--tagged and 1--tagged regions, which are the same as the lep had SR except for the requirement on the number of -tagged jets. The signal contamination in the 0--tagged and 1--tagged regions is negligible. The estimated background distributions agree well with the observed distributions in all validation regions.

Fakehad-vis background in the had had channel
In the had had channel, two separate methods are used to estimate the backgrounds with fakehad-vis from and multi-jet production. Multi-jet events can only enter the signal selection when both had-vis are fake, whereas for¯production, usually no more than one reconstructed had-vis is fake.

Fakehad-vis background from multi-jet production
In the had had channel, the fakehad-vis background from multi-jet production is estimated using a fakefactor method. A schematic depiction of this method is shown in Figure 6. The ID region selection refers to the selection of events with two identified had-vis . In order to define an anti-ID region selection, prior to the last step of the overlap removal procedure outlined in Section 4, events that have only one identified had-vis are checked for a reconstructed had-vis candidate satisfying the antihad-vis requirements. The selected antihad-vis candidate is required to be within Δ = 0.2 of an HLT had-vis object, except in the STT category for events in which the identified had-vis is already trigger-matched. If multiple antihad-vis candidates fulfil the defined criteria, one is selected randomly.
In order to define a multi-jet background template, an anti-ID region is defined by using a selection equivalent to the SR selection, with one identified had-vis and one antihad-vis candidate, instead of two identified had-vis . The template for estimating the multi-jet background in the SR (SR Template) is obtained by subtracting simulated non-multi-jet events from data in the template region. A large fraction of the subtracted non-multi-jet events are from¯production, and these simulated¯events with fakehad-vis are corrected with scale factors of the fakehad-vis misidentification efficiencies in the anti-ID region, which are described in Section 6.2.2. Similarly to the procedure used in the lep had channel, the events that are used to build the template are further scaled with FFs to estimate the multi-jet background in the SR.
A multi-jet-enriched control region is defined by using the had had SR selection, but requiring the two had-vis to have same-sign (SS) charges, as opposed to the SR selection that requires two had-vis with opposite-sign (OS) charges. Additionally, events in the control region are required to have exactly one -tagged jet per event (SS CR with 1 -tagged jet). This control region, with its corresponding anti-ID counterpart, is used for FF measurements. This control region's purity in multi-jet production events varies from 80% to 90% depending on the trigger type (STT or DTT). The FFs are measured as the ratio of the number of events in the ID region to the number of events in the anti-ID region after subtracting all simulated non-multi-jet backgrounds from data. The FFs are determined separately for the STT and DTT categories, and for the different years of data-taking to account for the changes to the had-vis identification algorithms and event selection topologies used in the trigger. The FFs are derived in the SS CR with 1 -tagged jet, due to the limited number of selected events and large¯background contamination in the SS region with 2 -tagged jets. For that reason, transfer factors (TFs) are defined to account for the extrapolation from 1--tagged-jet events to 2--tagged-jet events. In the DTT category, the FFs are binned in T and of the antihad-vis . In the STT category, due to the smaller number of available events, the FFs are measured inclusively in T and , but separately according to whether the selected antihad-vis is the leading or sub-leading had-vis candidate in T . In both categories, the FFs are measured separately for events with 1-or 3-prong antihad-vis candidates.
The TFs are defined as ratios of the FFs measured in the SS CR with 2 -tagged jets to the FFs measured in the SS CR with 1 -tagged jet, inclusively for the STT and DTT categories. The large contamination from background in the SS CR with 2 -tagged jets is removed in the subtraction of all simulated non-multi-jet backgrounds from data. The TFs are also measured inclusively in T and of the had-vis , but separately for events with 1-and 3-prong antihad-vis candidates, separately according to whether the selected antihad-vis is the leading or sub-leading had-vis candidate in T , and separately for the different years of data-taking. Their values are compatible with unity within the statistical uncertainty.
The sources of uncertainty in the estimate of the fakehad-vis background from multi-jet production include the statistical uncertainties in the measured FFs and TFs, and uncertainties in the normalisation and shape of the non-multi-jet backgrounds that are subtracted from data when deriving the SR Template. An uncertainty is also introduced to account for the extrapolation from the SS events to the OS events. The associated systematic uncertainty is estimated by comparing the FFs derived from SS 1--tagged-jet events with those derived from OS 1--tagged-jet events. To ensure that the sample is dominated by multi-jet events, the OS events are additionally required to satisfy MMC > 110 GeV and miss T / ( miss T ) < 3, where ( miss T ) is the event-based approximation to the resolution of the miss T [161]. The modelling of the multi-jet background is checked for closure in the SS CRs with 1 and 2 -tagged jets and good agreement between data and prediction is observed. The FFs are measured in the SS CR with 1 -tagged OS, 2 b-tagged jets SS, 1 b-tagged jet SS, 2 b-tagged jets SR Template Non-multi-jet subtracted SR ID Anti-ID τ had τ had channel Figure 6: Schematic depiction of the fake-factor method to estimate the multi-jet background with fakehad-vis in the had had channel. Backgrounds that are not from multi-jet events are simulated and subtracted from data in all the control regions. This is indicated by 'Non-multi-jet subtracted' in the legend. jet, and the TFs between the SS CR with 1 -tagged jet and SS CR with 2 -tagged jets only correct for the normalisation of the fakehad-vis prediction. Thus the validation performed on the SS CR with 2 -tagged jets provides a check of the shape extrapolation from 1 -tagged jet to 2 -tagged jets.

Fakehad-vis background from¯production
Background events with fakehad-vis from¯production in the had had channel are estimated using simulation. However, the fakehad-vis misidentification efficiencies are corrected by scale factors (SFs) derived from data. A schematic depiction of this method is shown in Figure 7.
The SFs are determined in the¯CR defined within the lep had SLT category, as described in Section 6.1. However, in order to harmonise it with the had had SR selection, the¯CR is redefined to select events with had-vis | | < 2.5. The SFs are extracted as a function of the fakehad-vis T , separately for 1-and 3-prong fakehad-vis objects, by fitting the T distribution of simulated events to data using a profile-likelihood fit. The fit of the T distribution allows the contributions of the¯events with true-and fakehad-vis to be determined while correcting for the modelling of the¯simulation that is common to both contributions. Separate fits are performed for different trigger categories. For 1-prong fakehad-vis the SFs are close to unity at fakehad-vis T below 40 GeV and decrease to SF ∼ 0.6 for fakehad-vis T above 70 GeV. The SFs for the 3-prong fakehad-vis are generally about ∼ 20% larger than for the 1-prong fakehad-vis objects.
The¯background with fakehad-vis in the had had SR is estimated from the simulated events that pass the SR selection, weighted by the corresponding SFs for each fakehad-vis in the event.
Uncertainties in the detector response and in the modelling of the¯events, as well as of the other minor contributing processes, are taken into account in the likelihood fit when extracting the SFs. The covariance matrix of the measured SFs, which contains all statistical and systematic uncertainties of the measurement, is diagonalised and the resulting eigenvectors are used to define independent nuisance parameters which are propagated to the final signal extraction fit. Theoretical modelling uncertainties in simulated¯events to which the SFs are applied are estimated as described in Section 7 and they are also propagated to the final signal extraction fit.
When estimating the fakehad-vis background from multi-jet production using the fake-factor method (cf. Section 6.2.1), a large fraction of¯events containing at least one fakehad-vis needs to be subtracted from data in the OS 2--tagged-jet anti-ID region (SR Template) to estimate the multi-jet contribution in the had had SR. The modelling of the simulated¯events with fakehad-vis in the anti-ID region is corrected with SFs that are measured using the same method as described above. These SFs are measured in a control region similar to the¯CR except that the had-vis candidate has to satisfy the antihad-vis requirements. The measured SFs for 1-prong fakehad-vis in the anti-ID region are close to unity at fakehad-vis T below 40 GeV, and follow the same trend of decreasing in value with increasing fakehad-vis T , as observed in the ID region. The SFs for the 3-prong fakehad-vis are generally about 10%-20% larger than for the 1-prong fakehad-vis objects.

Systematic uncertainties
The most significant uncertainty in this search is the statistical uncertainty of the data in the signal region. Nonetheless, experimental, theoretical, and modelling uncertainties in the normalisation and shape of the signal and background estimates give a non-negligible contribution to the total uncertainty and are thus evaluated for use in the statistical model (described in Section 8); these are described below. Statistical uncertainties in the predicted background processes are modelled using a simplified version of the Beeston-Barlow technique [162], in which only the uncertainty in the total background content in each bin is considered.
Systematic uncertainties in the detector response are considered in this search. Where relevant, uncertainties in the trigger, reconstruction, identification, and isolation efficiencies, and in the momentum of electrons [133], muons [134] and had-vis [150] are estimated. Additional uncertainties are estimated for the efficiencies of the electron veto for had-vis and the track-to-vertex-matching requirements for muons. Jet energy scale and resolution uncertainties [140] and the uncertainty in the efficiency of matching jets to the primary vertex [141] are estimated. These energy scale and resolution uncertainties, in addition to an uncertainty in the tracks matched to the primary vertex but not associated with other reconstructed objects in the event, are propagated to the miss T calculation [152,163]. Uncertainties in the -jet tagging efficiency are considered, and these are evaluated as a function of T for -tagged jets and -tagged jets by using¯events [145,164], and as a function of T for light-flavour jets by using + jets events [165]. An uncertainty of 1.7% in the total integrated luminosity [79], obtained using the LUCID-2 detector [80], is assigned to physics processes whose normalisations are taken from simulation. An uncertainty arising from the correction of the pile-up distribution in simulation to that in data is also estimated.
The effects of uncertainties in the parton shower on the acceptances of the resonant (non-resonant) signals are assessed by comparing the nominal MC samples with alternative samples which use P 8 (H 7) to model the parton shower. Alternative samples were not generated for some of the resonant signals, and uncertainties for these signals are derived by interpolating or extrapolating the uncertainties from signals nearby in for which alternative samples were produced. Independent parton-shower acceptance uncertainties are accounted for in the normalisation of the non-resonant ggF, non-resonant VBF and resonant signals, and these range between 0.1% and 19% of the nominal acceptance values. The effects of the renormalisation and factorisation scale, PDF and s uncertainties on the ggF and VBF signal acceptances are included when they are found to be non-negligible. Separate normalisation uncertainties are applied in the had had and lep had LTT categories to account for observed differences in the had-vis trigger acceptances between fast-simulation and full-simulation resonant signal MC samples. The effects of renormalisation and factorisation scale, PDF and s uncertainties on the SM ggF and VBF cross-sections are considered, and an uncertainty arising from the scheme and value used for the top-quark mass in calculations of the virtual top-quark loop contributions in the SM ggF cross-section [21] is included. Lastly, uncertainties in the →¯and → + − branching fractions [71] are considered in the signal and single-Higgs-boson background samples.
The¯normalisation is determined in the likelihood fits, so the analysis is not sensitive to uncertainties in its expected cross-section. However, the relative acceptances of the analysis categories and the distribution shapes of the discriminating variables are still sensitive to modelling uncertainties, and these are evaluated using a method that closely follows Ref.
[166]. The¯hard-scatter and partonshower uncertainties are evaluated by comparing the nominal sample with samples generated using M G 5_ MC@NLO + P 8 and P B v2 + H 7, respectively. The hard-scatter uncertainty also accounts for the matrix-element-parton-shower matching and merging uncertainty (henceforth referred to as the matching uncertainty). The impact of the uncertainty in the amount of initial-state QCD radiation is evaluated by co-varying the ℎ damp parameter, the renormalisation and factorisation scales, and the showering tune. The effects of uncertainties in the amount of final-state QCD radiation, the PDF, the s value, and the renormalisation and factorisation scales are all evaluated by varying generator parameter values or the PDF. All of these uncertainties affect the relative acceptances of the analysis categories. Additionally, the effects of the hard-scatter and parton-shower uncertainties on the distribution shapes of the MVA output variables are evaluated directly in all categories, but are found to be negligible in the lep had LTT category. The impact of the uncertainty in the amount of initial-state QCD radiation on the MVA output variables is considered in the had had category.
The + HF normalisation is determined in the likelihood fits. The effects of uncertainties in the hard scatter, matching, parton shower, the renormalisation, factorisation and resummation scales, PDF and s on the relative acceptances of the analysis categories are all considered. For the renormalisation, factorisation and resummation scales, these uncertainties are evaluated by varying the relevant scales upwards and downwards from their nominal values by a factor of two; both simultaneous and independent variations are considered for the renormalisation and factorisation scales, while the resummation scale is varied independently. In the had had category, the impact of the hard-scatter and parton-shower uncertainty is evaluated by comparing the nominal sample with a M G 5_ MC@NLO + P 8 sample, and the impact on distribution shapes is parameterised in and propagated to the MVA output variables. This uncertainty is not applied in the lep had category and the + HF CR. In the had had category, the impact of the renormalisation and factorisation scale uncertainties on the shape of the MVA output variables is evaluated directly, and in the lep had categories, their impact is parameterised in the T of the -tagged jet pair and propagated to the MVA output variables.
The effects of uncertainties in the PDF and the amount of initial-and final-state QCD radiation on the acceptances of the single-top-quark backgrounds are estimated. Cross-section uncertainties are also considered. Despite these backgrounds being small inclusively, their total yields in the most sensitive MVA bins are significant, and the acceptance uncertainties in the different analysis categories range between 14% and 34% of the nominal values. The impact of the uncertainty in the amount of initial-state QCD radiation is evaluated by co-varying the renormalisation and factorisation scales, and the showering tunes. The impact of the uncertainty in the amount of final-state QCD radiation is evaluated by varying the V 2 parameter [93]. The impact of uncertainties in the combined hard scatter and matching (parton shower) of the background is estimated by comparing the nominal sample with a sample generated using M G 5_ MC@NLO + P 8 (P B v2 + H 7). The impact of the uncertainty in the interference between the and¯backgrounds on the acceptance is evaluated by comparing simulated samples produced with the diagram removal and diagram subtraction schemes [167], parameterising this difference in the T of the -tagged jet pair, and propagating it to the normalisation and shape of the MVA output distributions.
The effects of uncertainties in the renormalisation and factorisation scales, PDF and s on the normalisation of the single-Higgs-boson backgrounds are evaluated. Uncertainties of 100% are applied separately to the normalisations of the ggF, VBF and single-Higgs-boson backgrounds where the Higgs boson decays into -leptons, to account for difficulties in the modelling of these processes in association with heavy-flavour jets [168,169]. The impact of uncertainties in the parton shower on the acceptances are evaluated for the and¯backgrounds by comparing the nominal samples with alternative samples generated using H 7 and E G . Uncertainties in the matching and the amount of initial-and final-state QCD radiation are also considered for the¯backgrounds, and the impact of QCD radiation uncertainties is evaluated by varying the V 3 parameter [93] and the renormalisation scale. Uncertainties in the shape of the MVA output distributions of all single-Higgs-boson processes are found to be negligible. Cross-section uncertainties are applied to the normalisation of the simulated single-Higgs-boson [71], single-top-quark, +light-flavour jets, +jets, and diboson backgrounds, and acceptance uncertainties are applied to the normalisation of various small backgrounds.
Systematic uncertainties affecting the data-driven background estimates are described in Section 6. Table 4 shows the breakdown of the relative contributions to the uncertainty in the non-resonant and resonant extracted signal cross-sections, which is obtained from the likelihood fit described in Section 8. Systematic uncertainties in the single-Higgs-boson backgrounds affect the non-resonant search more than these resonant searches, due to the more significant contribution of ggF single-Higgs-boson background in the most signal-like regions of the MVA distributions in the non-resonant search.

Statistical interpretation
The signal yields are each estimated from data using a simultaneous binned maximum-likelihood fit to the MVA output distributions in the had had , lep had SLT, and lep had LTT event categories, and to the ℓℓ distribution in the + HF CR. The¯and + HF background normalisations are free parameters in the fit, and are primarily constrained by the background-like MVA output bins and the + HF CR. Upper limits are set on the signal normalisations at the 95% confidence level (CL) using the profile-likelihood-ratio test statistic [170] and the modified frequentist CL s technique [171] in the asymptotic approximation [170].
The systematic uncertainties described in Sections 6 and 7 are represented in the fit as Gaussian-or Poisson-constrained nuisance parameters, which modify the normalisation, relative normalisation between event categories, and/or distribution shape of the discriminating variable for the signal and background processes. Systematic uncertainties are symmetrised and shape uncertainties are smoothed where physically motivated, and then those with a negligible impact are removed from the likelihood fit. Experimental, Table 4: Breakdown of the relative contributions to the uncertainties in the extracted signal cross-sections, as determined in the likelihood fit (described in Section 8) to data. They are obtained by fixing the relevant nuisance parameters in the likelihood fit, subtracting the square of the obtained uncertainty in the fitted signal cross-section from the square of the total uncertainty, taking the square root, and then dividing by the total uncertainty. The sum in quadrature of the individual components differs from the total uncertainty due to correlations between uncertainties in the different groups. cross-section, and acceptance uncertainties are correlated across the event categories, except for the parton-shower uncertainty of the¯background, since the kinematic and topological properties of this background differ between event categories. Modelling uncertainties in data-driven background estimates are not correlated across different estimation strategies, because different sources of fakehad-vis are estimated using different procedures.
The binning schemes for the MVA output distributions used in the likelihood fit were chosen to minimise the number of bins, while also maximising the retained expected sensitivity, and ensuring the stability of the fit and the validity of the asymptotic approximation. The binning schemes start from finely binned histograms, and bins are iteratively merged beginning from the most signal-like MVA bins until the following channel-dependent criteria are fulfilled. In the had had channel, the bins are required to satisfy is the relative MC statistical uncertainty of the background estimate and s is the fraction of the total signal in the bin. In the lep had channel, the bins are required to satisfy 10 s + 5 b > 1, where s and b are the fractions of the total signal and background in the bin, respectively. Bins in all channels must be expected to contain at least five background events to ensure that the asymptotic approximation is valid.

Results
The post-fit normalisation factors of the unconstrained¯and + HF backgrounds for the combined had had and lep had channels, as obtained in the search for non-resonant production, are 0.97 ± 0.04 and 1.40 ± 0.11, respectively. These post-fit background normalisation factors are compatible within uncertainties with those obtained for the resonant production searches.
The MVA output distributions in the search for non-resonant production and for two resonance mass hypotheses ( = 500 GeV and 1 TeV) are shown in Figure 8, after performing the fits to the data and assuming a background-only hypothesis. The data, expected signal and post-fit background yields for events entering the most signal-like bin in the non-resonant MVA output histograms in the three event categories are shown in Table 5. Table 5: Data, expected signal and background yields for events passing the event selection and entering the most signal-like bin in the non-resonant MVA output histograms in the three event categories. The background yields are calculated following a likelihood fit of the background model to data in the non-resonant search. The non-resonant signal yields are normalised using the SM cross-section. The 'multi-jet fakes' and '¯fakes' estimates are only used in the had had category, and the 'combined fakes' estimate is only used in the lep had categories. All systematic uncertainties are included in the indicated uncertainties. 6.1 ± 0.8 6 ± 1 6 ± 1 Table 6 shows the 95% CL upper limits on the cross-sections for non-resonant production, and on the ratios of these cross-sections to their SM predictions. The combined observed (expected) limit on the SM non-resonant production cross-section via the ggF and VBF processes is 140 fb (110 +40 −30 fb), while the limit on the ratio to the SM prediction is 4.7 (3.9 +1.5 −1.1 ). The expected limit is calculated assuming no production. Figure 9 shows the data, background and non-resonant signal yields, where final-discriminant bins from the had had , lep had SLT and lep had LTT categories are combined into bins of log 10 ( / ). The fitted background yield assumes the background-only hypothesis. The signal that is shown in the overlaid histograms is scaled either to the SM expected cross-section or to the combined expected limit cross-section.         = 500 GeV (middle row) and = 1000 GeV (bottom), in the had had (left), lep had SLT (middle column) and lep had LTT (right) categories. The distributions are shown after performing the fits to data and assuming the background-only hypothesis. The signal is overlaid and scaled to the combined expected limit. The dashed histogram shows the total pre-fit background. The lower panels show the ratio of data to the total post-fit background, where the hatched band shows the statistical and systematic uncertainties of that background. For visualisation purposes, these histograms are displayed using uniform bin widths instead of the bin edges used in the fit, although the bin contents correspond to those used in the fit. Indices are used to label the bins.
Pull (stat.) Figure 9: Event yields as a function of log 10 ( / ) for data, background and non-resonant signal. Finaldiscriminant bins from the had had , lep had SLT and lep had LTT categories are combined into bins of log 10 ( / ). The fitted background yield assumes the background-only hypothesis. In the lower panel, the pull of the data relative to the background (the statistical significance of the difference between data and fitted background) is shown with statistical uncertainties only. The full lines indicate the pull expected from the sum of the fitted background and the signal, which is scaled to either the SM expected cross-section (red) or the combined expected limit cross-section (blue), relative to the fitted background.
For resonant production, the exclusion limits on the cross-section are shown as a function of in Figure 10. The had had channel has better sensitivity than the lep had channel below a resonance mass of 1 TeV, while their sensitivities become comparable at higher masses. A broad excess is observed in both channels in the mass range 700 GeV < < 1.2 TeV. The most significant excess for the had had ( lep had ) channel is found for the signal mass hypothesis of 1 TeV (1.1 TeV) with a local significance of 2.8 (1.6 ). The most significant combined excess is at a signal mass hypothesis of 1 TeV with a local significance of 3.1 and a global significance of 2.0 .
The -value defining the global significance is given by the probability of finding a maximum local excess larger than the observed value under the background-only hypothesis, regardless of the resonance mass value where the largest excess is found. It is estimated using fits of 'toy' experiments drawn from a model providing a joint description of all observables in the statistical interpretation. The set of observables comprises the number of events observed in a bin entering either a signal or control region, separately for all bins, as well as the central values of auxiliary measurements used to propagate systematic uncertainties to the results. This model differs from the probabilistic model used to define the likelihood functions since these describe only one PNN discriminant at a time, targeting a single signal mass hypothesis. However, non-trivial correlations between observables are expected since events are subject to the same selection criteria regardless of the resonance mass value to be probed, but different multivariate discriminants are used for each of these signal hypotheses. Observables describing the number of events in a given bin are generated from a multivariate Poisson distribution with the dependence structure between observables given by the copula [172] of a centred multivariate normal distribution with covariance matrix given by the expected linear correlations estimated from a simulation-and data-driven background model. Observables describing the statistical precision of the background estimate are generated using a resampling approach [173]. Observables associated with all other experimental and theoretical uncertainties are varied coherently for all hypothesis tests.

Conclusion
A search for non-resonant and resonant Higgs boson pair production in¯+ − events is conducted, where the non-resonant signal is assumed to be produced with SM kinematics, and the resonant signal corresponds to a narrow scalar resonance with a mass in the range 251 to 1600 GeV. The 13 TeV collision dataset used was collected at the LHC by the ATLAS experiment between 2015 and 2018, and corresponds to an integrated luminosity of 139 fb −1 . The sensitivity of this search to the non-resonant signal hypothesis improves on the previous ATLAS search in this channel by around a factor of four. Roughly half of this improvement is due to the larger dataset, while most of the remaining sensitivity gain is due to significant improvements in the had-vis and -jet reconstruction and identification. Analysis-level improvements include the use of more sophisticated multivariate techniques to target the resonant signal hypotheses, and new fakehad-vis estimation methods. The data are found to be compatible with the background-only hypothesis, with the largest deviation being found in the search for resonant production at a mass of 1 TeV and corresponding to a local (global) significance of 3.1 (2.0 ). The observed (expected) upper limit on the non-resonant Higgs boson pair-production cross-section, set at the 95% confidence level, is 4.7 (3.9) times the SM expectation. Observed (expected) upper limits are placed at the 95% confidence level on resonant Higgs boson production and exclude cross-sections above 21-900 fb (12-840 fb), depending on the mass of the resonance. This search provides the highest expected sensitivity to non-resonant production of any individual search to date, and provides limits on resonant production that are more