Boosting Higgs Pair Production in the bb̄bb̄ Final State with Multivariate Techniques

The measurement of Higgs pair production will be a cornerstone of the LHC program in the coming years. Double Higgs production provides a crucial window upon the mechanism of electroweak symmetry breaking and has a unique sensitivity to the Higgs trilinear coupling. We study the feasibility of a measurement of Higgs pair production in the bb̄bb̄ final state at the LHC. Our analysis is based on a combination of traditional cut-based methods with state-of-the-art multivariate techniques. We account for all relevant backgrounds, including the contributions from light and charm jet mis-identification, which are ultimately comparable in size to the irreducible 4b QCD background. We demonstrate the robustness of our analysis strategy in a high pileup environment. For an integrated luminosity of L = 3 ab−1, a signal significance of S/ √ B ' 3 is obtained, indicating that the bb̄bb̄ final state alone could allow for the observation of double Higgs production at the High Luminosity LHC. 1 ar X iv :1 51 2. 08 92 8v 3 [ he pph ] 2 0 Ju n 20 16


Introduction
The measurement of double Higgs production will be one of the central physics goals of the LHC program in its recently started high-energy phase, as well as for its future high-luminosity upgrade (HL-LHC) which aims to accumulate a total integrated luminosity of 3 ab −1 [1,2]. Higgs pair production [3] is directly sensitive to the Higgs trilinear coupling λ and provides crucial information on the electroweak symmetry breaking mechanism. It also probes the underlying strength of the Higgs interactions at high energies, and can be used to test the composite nature of the Higgs boson [4,5]. While Standard Model (SM) cross-sections are small, many Beyond the SM (BSM) scenarios predict enhanced rates for double Higgs production, therefore searches have already been performed by ATLAS and CMS with Run I data [6][7][8][9][10] and will continue at Run II. The study of Higgs pair production will also be relevant to any future high-energy collider, either at a 100 TeV circular machine [11][12][13][14] or at a linear or circular electron-positron collider [15].
Analogously to single Higgs production [16], in the SM the dominant mechanism for the production of a pair of Higgs bosons at the LHC is gluon fusion (see [3,17] and references therein). For a center-of-mass energy of √ s = 14 TeV, the next-to-next-to-leading order (NNLO) total cross section is approximately 40 fb [18], which is increased by a further few percent once nextto-next-to-leading logarithmic (NNLL) corrections are accounted for [19]. Feasibility studies in the case of a SM-like Higgs boson in the gluon-fusion channel at the LHC have been performed for different final states, including bbγγ [20][21][22], bbτ + τ − [23][24][25][26], bbW + W − [25,27] and bbbb [21,23,25,28,29]. While these studies differ in their quantitative conclusions, a consistent picture emerges that the ultimate precision in the determination of the Higgs trilinear coupling λ requires the full integrated luminosity of the HL-LHC, L = 3 ab −1 , and should rely on the combination of different final states. The interplay between kinematic distributions for the extraction of λ from the measured cross-sections, and the role of the associated theoretical uncertainties, have been intensely scrutinized recently [17,[36][37][38][39][40][41][42][43].
In addition to the gluon-fusion channel, Higgs pairs can also be produced in the vector-boson fusion channel hhjj [5,26,30,31], the associated production modes hhW and hhZ [3,32,33] (also known as Higgs-Strahlung), and also in association with top quark pairs hhtt [34]. All these channels are challenging due to the small production rates: at 14 TeV, the inclusive total crosssections are 2.0 fb for VBF hhjj [35], 0.5 fb for W (Z)hh [3] and 1.0 for hhtt [34].
While the SM production rates for Higgs pairs are small, they are substantially enhanced in a variety of BSM scenarios. Feasibility studies of Higgs pair production in New Physics models have been performed in a number of different frameworks, including Effective Field Theories (EFTs) with higher-dimensional operators and anomalous Higgs couplings [14,[44][45][46][47][48][49][50], resonant production in models such as extra dimensions [51][52][53][54], and Supersymmetry and Two Higgs Doublet models (2HDMs) [55][56][57][58][59][60][61]. Since BSM dynamics modify the kinematic distributions of the Higgs decay products, for instance boosting the di-Higgs system, different analysis strategies might be required for BSM Higgs pair searches as compared to SM measurements.
Searches for the production of Higgs pairs have already been performed with 8 TeV Run I data by ATLAS in the bbbb [7] and bbγγ [8] final states, and by CMS in the same bbbb [9] and bbγγ [10] final states. In addition, ATLAS has presented [6] a combination of its di-Higgs searches in the bbτ τ, γγW W * , γγbb, and bbbb final states. Many other exotic searches involve Higgs pairs in the final state, such as the recent search for heavy Higgs bosons H [62].
In the context of SM production, the main advantage of the bbbb final state is the enhancement of the signal yield from the large branching fraction of Higgs bosons into bb pairs, BR H → bb 0.57 [16]. However a measurement in this channel needs to deal with an over-whelming QCD multi-jet background. Recent studies of Higgs pair production in this final state [28,29] estimate that, for an integrated luminosity of L = 3 ab −1 , a signal significance of around S/ √ B 2.0 can be obtained. In these analysis, irreducible backgrounds such as 4b and tt are included, however the reducible components, in particular bbjj and jjjj, are neglected. These can contribute to the signal yield when light and charm jets are mis-identified as b-jets. Indeed, due to both selection effects and b-quark radiation in the parton shower, the contribution of the 2b2j process is as significant as the irreducible 4b component.
In this work, we revisit the feasibility of SM Higgs pair production by gluon-fusion in the bbbb final state at the LHC. Our strategy is based upon a combination of traditional cut-based methods and multivariate analysis (MVA). We account for all relevant backgrounds, including the contribution from mis-identified light and charm jets. We also assess the robustness of our analysis strategy in an environment with high pileup (PU). Our results indicate that the bbbb final state alone should allow for the observation of double Higgs production at the HL-LHC.
The structure of this paper proceeds as follows. In Sect. 2 we present the modeling of the signal and background processes with Monte Carlo event generators. In Sect. 3 we introduce our analysis strategy, in particular the classification of individual events into different categories according to their topology. Results of the cut-based analysis are then presented in Sect. 4. In Sect. 5 we illustrate the enhancement of signal significance using multivariate techniques, and assess the robustness of our results against the effects of PU. In Sect. 6 we conclude and outline future studies to estimate the accuracy in the determination of the trilinear coupling λ and to provide constraints in BSM scenarios.

Modeling of signal and background processes
In this section we discuss the Monte Carlo generation of the signal and background process samples used in this analysis. We shall also discuss the modelling of detector resolution effects.

Higgs pair production in gluon-fusion
Higgs pair production is simulated at leading order (LO) using MadGraph5 aMC@NLO [63]. We use a tailored model [40] for gluon-fusion Higgs boson pair production which includes mass effects from the exact form factors for the top-quark triangle and box loops [64]. Equivalent results can be obtained using the recently available functionalities for the calculation of loopinduced processes [65] in MadGraph5 aMC@NLO. The calculation is performed in the n f =4 scheme, accounting for b-quark mass effects. The renormalization and factorization scales are taken to be µ F = µ R = H T /2, with the scalar sum of the transverse masses of all final state particles. For the input parton distribution functions (PDFs) we adopt the NNPDF 3.0 n f = 4 LO set [66] with α s (m 2 Z ) = 0.118, interfaced via LHAPDF6 [67]. The Higgs boson couplings and branching ratios are set to their SM values, and its mass is taken to be m h = 125 GeV [68][69][70]. In the SM, the Higgs trilinear coupling is given by λ = m 2 h /2v 2 , with v 246 GeV the Higgs vacuum expectation value. In Fig. 1 we show representative Feynman diagrams for LO Higgs pair production in gluon fusion. The non-trivial interplay between the heavy quark box and the triangle loop diagrams can lead to either constructive or destructive interference and complicates the extraction of the sults. Concluding remarks are given in Section 4. Higgs boson pair production at the LHC at leading order (LO) is loop-initiated an ominated by gluon fusion initial states. The contributing gluon fusion diagrams are show Fig. 1. We call the diagram on the left the 'box' diagram and the diagram on the righ e 'triangle' diagram. The two diagrams have spin-0 configurations of the initial sta luons that interfere destructively. The box diagram also has a spin-2 configuration -3 - Only the fermion triangle loop diagram (right) is directly sensitive to the Higgs trilinear coupling λ. In the SM, the fermion loops are dominated by the contribution from the top quark.
trilinear coupling λ from the measurement of the Higgs pair production cross-section. Higherorder corrections [17,18] are dominated by gluon radiation from either the initial state gluons or from the heavy quark loops. The total inclusive cross-section for this processes is known up to NNLO [18]. Resummed NNLO+NNLL calculations for Higgs pair production are also available [19], leading to a moderate enhancement of the order of few percent as compared to the fixed-order NNLO calculation. To achieve the correct higher-order value of the integrated cross-section, we rescale our LO signal sample to match the NNLO+NNLL inclusive calculation. This corresponds to a K-factor σ NNLO+NNLL /σ LO = 2.4, as indicated in Table 1.
Parton level signal events are then showered with the Pythia8 Monte Carlo [71,72], version v8.201. We use the default settings for the modeling of the underlying event (UE), multiple parton interactions (MPI), and PU, by means of the Monash 2013 tune [73], based on the NNPDF2.3LO PDF set [74,75].

Backgrounds
Background samples are generated at leading order with SHERPA [76] v2.1.1. As in the case of the signal generation, the NNPDF 3.0 n f = 4 LO set with strong coupling α s (m 2 Z ) = 0.118 is used for all samples, and we use as factorisation and renormalisation scales µ F = µ R = H T /2. We account for all relevant background processes that can mimic the hh → 4b signal process. This includes QCD 4b multi-jet production, as well as QCD 2b2j and 4j production, and top quark pair production. The latter is restricted to the fully hadronic final state, since leptonic decays of top quarks can be removed by requiring a lepton veto. Single Higgs production processes such as Z(→ bb)h(→ bb) and tth(→ bb) (see Appendix A) along with electroweak backgrounds e.g Z(→ bb)bb, are much smaller than the QCD backgrounds [28,29] and are therefore not included in the present analysis.
The LO cross-sections for the background samples have been rescaled so that the integrated distributions reproduce known higher-order QCD results. For the 4j sample, we rescale the LO cross-section using the BLACKHAT [77] calculation, resulting in an NLO/LO K-factor of 0.6. For the 4b and 2b2j samples NLO/LO K-factors of 1.6 and 1.3 respectively have been determined using MadGraph5 aMC@NLO [63]. Finally, the LO cross-section for tt production has been rescaled to match the NNLO+NNLL calculation of Ref. [78], leading to a K-factor of 1.4. The K-factors that we use to rescale the signal and background samples are summarised in Table 1.
At the generation level, the following loose selection cuts are applied to background events.  Each final-state particle in the hard process must have p T ≥ 20 GeV, and be located in the central rapidity region with |η| ≤ 3.0. At the matrix-element level all final-state particles must also be separated by a minimum ∆R min = 0.1. We have checked that these generator-level cuts are loose enough to have no influence over the analysis cuts. From Table 1 we see that the tt and QCD 4b cross-sections are of the same order of magnitude. However the former can be efficiently reduced by using top quark reconstruction criteria. The bbjj cross-section is more than two orders of magnitude larger than the 4b result, but will be suppressed by the light and charm jet mis-identification rates, required to contribute to the 4b final state. As a cross-check of the SHERPA background cross-sections reported in Table 1, we have produced leading-order multi-jet samples using MadGraph5 aMC@NLO, benchmarked with the results for the same processes reported in Ref. [63]. Using common settings, we find agreement, within scale uncertainties, between the MadGraph5 aMC@NLO and SHERPA calculations of the multi-jet backgrounds.

Modelling of detector resolution
While it is beyond the scope of this work to perform a full detector simulation, it is important to include an estimate of detector effects in the analysis, particularly for the finite energy and angular resolutions which directly degrade the reconstruction of important kinematic variables, such as the invariant mass of the Higgs candidates. Here we simulate the finite energy resolution of the ATLAS and CMS hadronic calorimeters by applying a Gaussian smearing of the transverse momentum p T with mean zero and standard deviation σ E for all final-state particles before jet clustering, that is, with r i a univariate Gaussian random number, different for each of the N part particles in the event. We take as a baseline value for the transverse momentum smearing a factor of σ E = 5%.
To account for the finite angular resolution of the calorimeter, the (η, φ) plane is divided into regions of ∆η × ∆φ = 0.1 × 0.1, and each final state particle which falls in each of these cells is set to the same η and φ values of the center of the corresponding cell. Finally, the energy of each final-state particle is recalculated from the smeared p T , η and φ values to ensure that the resulting four-momentum is that of a light-like particle, since we neglect all jet constituent masses in this analysis.
Our modelling of detector simulation has been tuned to lead to a mass resolution of the reconstructed Higgs candidates consistent with the hadronic mass resolutions of the ATLAS and CMS detectors [79][80][81], as discussed in Sect. 3.5.

Analysis strategy
In this section we describe our analysis strategy. First of all we discuss the settings for jet clustering and the strategy for jet b-tagging. Following this we discuss the categorisation of events into different topologies, and how the different topologies may be prioritised. We motivate our choice of analysis cuts by comparing signal and background distributions for representative kinematic variables. Finally, we describe the simulation of PU and validate the PU subtraction strategy.

Jet reconstruction
After the parton shower, final state particles are clustered using the jet reconstruction algorithms of FastJet [82,83], v3.1.0. Here we use the following jet definitions: • Small-R jets.
These are jets reconstructed with the anti-k T clustering algorithm [84] with R = 0.4 radius. These small-R jets are required to have transverse momentum p T ≥ 40 GeV and pseudorapidity |η| < 2.5, within the central acceptance of ATLAS and CMS, and therefore within the region where b-tagging is possible.
These jets are also constructed with the anti-k T clustering algorithm, now using a R = 1.0 radius. Large-R jets are required to have p T ≥ 200 GeV and lie in a pseudo-rapidity region of |η| < 2.0. The more restrictive range in pseudo-rapidity as compared to the small-R jets is motivated by mimicking the experimental requirements in ATLAS and CMS related to the track-jet based calibration [85,86].
In addition to the basic p T and η acceptance requirements, large-R jets should also satisfy the BDRS mass-drop tagger (MDT) [87] conditions, where the FastJet default parameters of µ mdt = 0.67 and y mdt = 0.09 are used. Before applying the BDRS tagger, the large-R jet constituents are reclustered with the Cambridge/Aachen (C/A) algorithm [88,89] with R = 1.0.
In the case of the analysis including PU, a trimming algorithm [106] is applied to all large-R jets to mitigate the effects of PU, especially on the jet mass. For further details see Sect. 3.5.
All final-state particles are clustered using the anti-k T algorithm, but this time with a smaller radius parameter, namely R = 0.3. The resulting anti-k T R = 0.3 (AKT03) jets are then ghost-associated to each large-R jets in order to define its subjets [7].
These AKT03 subjets are required to satisfy p T > 50 GeV and |η| < 2.5, and will be the main input for b-tagging in the boosted category.
For the boosted and intermediate categories, which involve the use of large-R jets, we use jet substructure variables [90,91] to improve the significance of the discrimination between signal and background events in the MVA. In particular we consider the following substructure variables: • The k T -splitting scale [87,92]. This variable is obtained by reclustering the constituents of a jet with the k t algorithm [93], which usually clusters last the harder constituents, and then taking the k t distance measure between the two subjets at the final stage of the recombination procedure, with p T,1 and p T,2 the transverse momenta of the two subjets merged in the final step of the clustering, and ∆R 12 the corresponding angular separation.
The N -subjettiness variables τ N are defined by clustering the constituents of a jet with the exclusive k t algorithm [96] and requiring that N subjets are found, where p T,k is the p T of the constituent particle k and δR ik the distance from subjet i to constituent k. In this work we use as input to the MVA the ratio of 2-subjettiness to 1-subjettiness, namely which provides good discrimination between QCD jets and jets arising from the decay of a heavy resonance.
• The ratios of energy correlation functions (ECFs) C is defined as while D (β) 2 is instead defined as a double ratio of ECFs, that is, The energy correlation functions ECF(N, β) are defined in [97] with the motivation that (N + 1)-point correlators are sensitive to N -prong substructure. The free parameter β is set to a value of β = 2, as recommended by Refs. [97,98].

Tagging of b-jets
In this analysis we adopt a b-tagging strategy along the lines of current ATLAS performance [91,99], though differences with respect to the corresponding CMS settings [100,101] do not modify qualitatively our results. For each jet definition described above, a different b-tagging strategy is adopted: • Small-R jets.
If a small-R jet has at least one b-quark among their constituents, it will be tagged as a b-jet with probability f b . In order to be considered in the b-tagging algorithm, b-quarks inside the small-R jet should satisfy p T ≥ 15 GeV [99]. The probability of tagging a jet is not modified if more than one b-quark is found among the jet constituents.
If no b-quarks are found among the constituents of this jet, it can be still be tagged as a b-jet with a mistag rate of f l , unless a charm quark is present instead, and in this case the mistag rate is f c . Only jets that contain at least one (light or charm) constituent with p T ≥ 15 GeV can induce a fake b-tag.
We attempt to b-tag only the four (two) hardest small-R jets in the resolved (intermediate) category. Attempting to b-tag all of the small-R jets that satisfy the acceptance cuts worsens the overall performance as the probability of fake b-tags increases substantially.
Large-R jets are b-tagged by ghost-associating anti-k T R = 0.3 (AKT03) subjets to the original large-R jets [7,91,102,103]. A large-R jet is considered b-tagged if both the leading and subleading AKT03 subjets, where the ordering is done in the subjet p T , are both individually b-tagged, with the same criteria as the small-R jets. Therefore, a large-R jet where the two leading subjets have at least one b-quark will be tagged with probability As in the case of small-R jets, we only attempt to b-tag the two leading subjets, else one finds a degradation of the signal significance. The treatment of the b-jet mis-identification from light and charm jets is the same as for the small-R jets.
For the b-tagging probability f b , along with the b-mistag probability of light (f l ) and charm (f c ) jets, we use the values f b = 0.8, f l = 0.01 and f c = 0.1.

Event categorisation
The present analysis follows a strategy similar to the scale-invariant resonance tagging of Ref. [51]. Rather than restricting ourselves to a specific event topology, we aim to consistently combine the information from the three possible topologies: boosted, intermediate and resolved, with the optimal cuts for each category being determined separately. This approach is robust under variations of the underlying production model of Higgs pairs, for instance in the case of BSM dynamics, which can substantially increase the degree of boost in the final state.
The three categories are defined as follows: • Boosted category.
An event which contains at least two large-R jets, with the two leading jets being btagged. Each of these two b-tagged, large-R jets are therefore candidates to contain the decay products of a Higgs boson.
An event with exactly one b-tagged, large-R jet, which is assigned to be the leading Higgs candidate. In addition, we require at least two b-tagged, small-R jets, which must be separated with respect to the large-R jet by an angular distance of ∆R ≥ 1.2.
The subleading Higgs boson candidate is reconstructed by selecting the two b-tagged small-R jets that minimize the difference between the invariant mass of the large-R jet with that of the dijet obtained from the sum of the two small-R jets.
An event with at least four b-tagged small-R jets. The two Higgs candidates are reconstructed out of the leading four small-R jets in the event by considering all possible combinations of forming two pairs of jets and then choosing the configuration that minimizes the relative difference of dijet masses.
Once a Higgs boson candidate has been identified, its invariant mass is required to lie within a fixed window of width 80 GeV around the nominal Higgs boson mass of m h = 125 GeV. Specifically we require the condition where m h,j is the invariant mass of each of the two reconstructed Higgs candidates. This cut is substantially looser than the corresponding cut used in the typical ATLAS and CMS h → bb analyses [79,80]. The motivation for such a loose cut is that further improvements of the signal significance will be obtained using an MVA. Only events where the two Higgs candidates satisfy Eq. (8) are classified as signal events. These three categories are not exclusive: a given event can be assigned to more than one category, for example, satisfying the requirements of both the intermediate and resolved categories at the same time. The exception is the boosted and intermediate categories, which have conflicting jet selection requirements. This is achieved as follows. First of all we perform an inclusive analysis, and optimise the signal significance S/ √ B in each of the three categories separately, including the MVA. We find that the category with highest significance is the boosted one, followed by the intermediate and the resolved topologies, the latter two with similar significance. Therefore, when ascertaining in which category an event is to be exclusively placed: if the event satisfies the boosted requirements, it is assigned to this category, else we check if it suits the intermediate requirements.
If the event also fails the intermediate category requirements, we then check if it passes the resolved selection criteria. The resulting exclusive event samples are then separately processed through the MVA, allowing for a consistent combination of the significance of the three event categories.

Motivation for basic kinematic cuts
We now motivate the kinematic cuts applied to the different categories, comparing representative kinematic distributions between signal and background events. Firstly we present results without PU, and then discuss the impact of PU on the description of the kinematic distributions. In the following, all distributions are normalized to their total integral.
In Fig. 2 we show the p T distributions of the leading and subleading large-R jets in the boosted category. We observe that the background distribution falls off more rapidly as a function of p T than the di-Higgs signal. On the other hand, the cut in p T cannot be too strong to avoid a substantial degradation of signal selection efficiency, specially taking into account the subleading large-R jet. This comparison justifies the cut of p T ≥ 200 GeV for the large-R jets that we impose in the boosted category.
Another selection requirement for the boosted category is that the two leading AKT03 subjets of the large-R jet should satisfy p T ≥ 50 GeV. To motivate this cut, in Fig Table 1.  the distribution in p T of the leading and subleading AKT03 subjets in the subleading large-R jet in events corresponding to the boosted category. It is clear from the comparison that the subjet p T spectrum is relatively harder in the signal with respect to the background. On the other hand, considering the subleading AKT03 subjet, this cut in p T cannot be too harsh, to maintain a high signal selection efficiency. Therefore, as for the previous distribution, the chosen cut value is a compromise between suppressing backgrounds but keeping a large fraction of signal events is crucial.
Turning to the resolved category, an important aspect to account for in the selection cuts is the fact that the p T distribution of the four leading small-R jets of the event can be relatively soft, specially for the subleading jets. As noted in [29], this is due to the fact that the boost from the Higgs decay is moderate, therefore the p T selection cuts for the small-R jets cannot be too large. In Fig. 4 we show the distribution in p T of the four leading small-R jets in signal and background events: we observe that both distributions peak at p T ≤ 50 GeV, with the signal distribution falling off less steeply at large p T . The feasibility of triggering on four small-R jets with a relatively soft p T distribution is one of the experimental challenges for exploiting the resolved category in this final state, and hence the requirement that p T ≥ 40 GeV for the small-R jets. In Fig. 4 we also show the rapidity distribution of the the small-R jets in the resolved category. As expected, the production is mostly central, more so in the case of signal events, since backgrounds are dominated by QCD t-channel exchange, therefore the selection criteria on the jet rapidity are very efficient.
One of the most discriminating selection cuts is the requirement that the invariant mass of the Higgs candidate (di)jets must lie within a window around the nominal Higgs value, Eq. (8).
In Fig. 5 we show the invariant mass of the leading reconstructed Higgs candidates, before the Higgs mass window selection is applied, for the resolved and boosted categories. While the signal distribution naturally peaks at the nominal Higgs mass, the background distributions show no particular structure. The width of the Higgs mass peak is driven both from QCD effects, such as initial-state radiation (ISR) and out-of-cone radiation, as well as from the four-momentum smearing applied to final state particles as part of our minimal detector simulation.
The invariant mass of the di-Higgs system is another important kinematic distribution for this process. The di-Higgs invariant mass is a direct measure of the boost of the system, which in BSM scenarios can be substantially enhanced, for instance due to specific d = 6 EFT operators [14]. One important advantage of the bbbb final state for di-Higgs production is that it significantly increases the reach in m hh as compared to other channels with smaller branching ratios, such as 2b2γ or 2b2τ . In Fig. 6 we show the invariant mass distribution of the reconstructed Higgs pairs, comparing the resolved and the boosted categories.
In the resolved case, we see that the distribution in m hh is rather harder for the signal as compared to the background, and therefore one expects that cutting in m hh would help signal discrimination. For the boosted category the overall trend of the m hh distribution is different because of the selection criteria, and the distribution now peaks at higher values of the invariant mass. In this case, signal and background distributions are not significantly differentiated. Note that at parton-level the m hh distribution for signal events has a kinematic cut-off at m min hh = 250 GeV, which is smeared due to parton shower and detector resolution effects.
In Fig. 7 we show the transverse momentum of the di-Higgs system, p hh T , for the resolved and boosted categories. Once more we see that the background has a steeper fall-off in p hh T than  the signal, in both categories, therefore this variable should provide additional discrimination power, motivating its inclusion as one of the inputs for the MVA. In our LO simulation the p hh T distribution is generated by the parton shower, an improved theoretical description would require merging higher-multiplicity matrix elements [41] or matching to the NLO calculation [17], We shall now investigate the discrimination power provided by jet substructure quantities. In Fig. 8 we show the distributions of representative substructure variables for the boosted category: the k t splitting scale √ d 12 , Eq. (3), the ECF ratio C (β) 2 , Eq. (6), and the 2-to-1 subjettiness ratio τ 21 , Eq. (5), all for the leading Higgs candidates, and also τ 21 for the subleading Higgs candidates.
From Fig. 8 we observe how for these substructure variables the shapes of the signal and background distributions reflect the inherent differences in the internal structure of QCD jets and jets originating from Higgs decays. Signal and background distributions peak in rather different regions. For example, the k t splitting scale √ d 12 peaks around 80 GeV (40 GeV) for signal (background) events, while the distribution of the ECF ratio C small values for signal and is much broader for background events. From Fig. 8 we also see the distributions of the subjettiness ratio τ 21 are reasonably similar for both the leading and the subleading jets.

Impact of pileup
Now we turn to discuss how the description of kinematic distributions for signal and background processes are modified in the presence of pileup. To study the impact of PU, Minimum Bias events have been generated with Pythia8, and then superimposed to the signal and background samples described in Sect. 2. We have explored two scenarios, one with a number of PU vertices per bunch crossing of n PU = 80, and another with n PU = 150. In the following we adopt n PU = 80 as our baseline, and denote this scenario by PU80. We have verified that the combined signal significance is similar if n PU = 150 is adopted instead. In order to subtract PU in hadronic collisions, a number of techniques are available [87,102,[104][105][106][107][108][109][110][111][112][113][114]. 1 In this work, PU is subtracted with the SoftKiller (SK) method [111], as implemented in FastJet, whose performance has been shown to improve the commonly used area-based subtraction [104]. The idea underlying SoftKiller consists of eliminating particles below a given cut-off in their transverse momentum, p (cut) T , whose value is dynamically determined so that the event-wide transverse-momentum flow density ρ vanishes, where ρ is defined as and where the median is computed over all the regions i with area A i and transverse momentum p T i in which the (η, φ) plane is partitioned. From its definition in terms of the median, it follows that the value of p (cut) T will be dynamically raised until half of the regions have p T i = 0. The size and number of these regions is a free parameter of the algorithm -here we will use square regions with length a = 0.4. We restrict ourselves to the central rapidity region, |η| ≤ 2.5, for the estimation of the p T flow density ρ.  The SoftKiller subtraction is then applied to particles at the end of the parton shower, before jet clustering.
In addition, jet trimming [106], as implemented in FastJet, is applied to large-R jets. The trimming parameters are chosen such that the constituents of a given jet are reclustered into k T subjets with R sub = 0.2. Subjets with transverse momentum less than 5% of the total transverse momentum of the large-R jet are then removed. The use of trimming in addition to PU removal with SoftKiller is necessary to correct the jet mass in the boosted category, which is particularly susceptible to soft, wide-angle contaminations. No trimming is applied to the small-R jets and to the case without PU.
In Fig. 9 we show the invariant mass distributions of the Higgs candidates for signal events in the resolved and boosted categories. In the resolved category, we compare the results without PU with those with PU80, with and without SK subtraction. If PU is not subtracted, there is a large shift in the Higgs mass peak, by more than 30 GeV. Once SK subtraction is performed, we recover a distribution much closer to the no PU case, with only a small shift of a few GeV and a broadening of the mass distribution. In the boosted case, the comparison is performed between no PU, PU with only SK subtraction, and PU with both SK and trimming. We find that the mass distribution for jets to which no trimming is applied peaks at around 160 GeV, even after 90  Signal events, Boosted category boosted (right) categories. In the resolved category, we compare the results without PU with those with PU80 with and without SK subtraction. In the boosted case, the comparison is performed between no PU, PU with only SK subtraction, and PU with both SK and trimming.
PU subtraction with SoftKiller. When trimming is applied in addition to SoftKiller, the distribution peaks close to the nominal Higgs mass, as in the case of the resolved category.
In Fig. 10 we compare the transverse momentum of the leading Higgs candidate, p h t and the invariant mass of the di-Higgs system m hh , in both the boosted and resolved categories, between the no PU and the PU+SK+Trim cases. In the case of the p h T distribution, the differences between the selection criteria for the resolved and boosted categories is reflected in the rightward shift of the latter. After subtraction, the effects of PU are small in the two categories. A similar behaviour is observed in the di-Higgs invariant mass distribution.
We can also assess the impact of PU on the substructure variables that will be used as input to the MVA in the boosted and intermediate categories. In Fig. 11 we show the 2-to-1 subjettiness ratio τ 21 , Eq. (5), and the ratio of energy correlation functions C (β) 2 , Eq. (6), for the leading Higgs candidate. We observe that the shapes of both substructure variables are reasonably robust in a environment including significant PU. Therefore we can consider the PU subtraction strategy as validated for the purposes of this study, although further optimisation should still be possible, both in terms of the SoftKiller and of the trimming input settings.
It is also interesting to quantify how the relative differences between signal over background distributions are modified by the inclusion of PU. Considering the boosted category initially, in Fig. 12 we compare various kinematic distributions for signal and background events, with and without PU for the leading Higgs candidate: the transverse momentum distribution p T , the p T of the leading AKT03 subjet, the 2-to-1 subjettiness ratio τ 21 , and the k T splitting scale √ d 12 . We verify that the relevant qualitative differences between signal and background distributions are maintained in the presence of PU. This is especially noticeable for the substructure variables, which exhibit a similar discriminatory power both with and without PU.
We can also perform a similar comparison for the resolved category. In Fig. 13 we compare the kinematic distributions for signal and background events, with and without PU, for the invariant mass and the transverse momentum of the leading Higgs candidate. Again, the PU-subtracted background distributions appear reasonably close to their counterparts without PU, and thus the distinctive features between signal and background are maintained after PU subtraction.
It is illustrative to determine the mass resolution obtained for the reconstructed Higgs can-

no PU PU80+SK+Trim
Signal events, Boosted category  Table 2 we indicate the shift of the fitted invariant mass peak as compared to the nominal Higgs mass, m reco h − m h , and the corresponding width of the distribution, σ m h , obtained from fitting a Gaussian to the mass distributions of leading and subleading Higgs candidates in the resolved and boosted categories. We show results for three cases: without PU, with PU80 but without subtraction (only for the resolved category), and the same with SK+Trim subtraction.
In both categories, we find a mass resolution of around 9 GeV in the case without PU. In the case of PU with SK+Trim subtraction, in the resolved category the mass resolution worsens only slightly to around 11 GeV, while in the boosted category we find the same resolution as in the no PU case. We also note that after SK+Trim subtraction, the peak of the invariant mass distributions of Higgs candidates coincides with the nominal values of m h within a few GeV for the two categories.

no PU PU80+SK+Trim
Signal events, Boosted category

Pre-MVA loose cut-based analysis
In this section we present the results of the pre-MVA loose cut-based analysis described in the previous section, and provide cut-flows for the different analysis steps. We study how the signal significance is affected if only the 4b component of the QCD multi-jet background is taken into account. This section presents the results in an environment without pileup; the following one contains those obtained including significant PU.

Cut-flow and signal significance
Here we compare the cross-sections for signal and background events at various stages of the analysis. We consider all relevant backgrounds (see Sect. 2), and discuss how results are modified in the case where only the 4b background is considered. In Table 3 the different steps of the cut-flow in the present analysis are summarised, separated into the boosted, intermediate, and resolved topologies. The different analysis steps proceed as follows: • C1a: check that we have at least two large-R jets (in the boosted case), one large-R jet and at least 2 small-R jets (in the intermediate case) and at least four small-R jets (in the resolved case).
In addition, require that these jets satisfy the corresponding p T thresholds; p T ≥ 200 GeV for large-R jets and p T ≥ 40 GeV for small-R jets, as well as the associated rapidity acceptance constraints.
• C1b: the two leading large-R jets must be mass-drop tagged in the boosted category. In the intermediate category, the large-R jet must also be mass-drop tagged.
• C1c: after the two Higgs candidates have been reconstructed, their invariant masses are required to lie within a window around m H , in particular between 85 and 165 GeV, Eq. (8).
• C2: the b-tagging conditions are imposed (see Sect. 3.2), and the event is categorised exclusively into one of the three topologies, according to the hierarchy determined in Sect. 3.3.  Signal and background events satisfying all the analysis cuts up to the C2 level are then used as input for the MVA training, to be described next in Sect. 5.
In Table 4 we collect the values for the signal and background cross-sections at the different analysis steps. Results are divided into the resolved, intermediate and boosted categories, and are inclusive up to the C2 level, where exclusivity is imposed. In Table 4 we also provide the signal over background ratio, S/B, and the signal significance, S/ √ B, corresponding to an integrated luminosity of L = 3 ab −1 . These are computed either taking into account all the background components or the 4b QCD background only. We find that after b-tagging, the 2b2j component is of the same order of magnitude as the 4b component in all categories. This implies that the signal significance at the end of the cut-based analysis is degraded due to the contribution of light and charm jets being mis-identified as b-jets.
In the boosted category, at the end of the loose cut-based analysis, we find that around 500 events are expected at the HL-LHC, with a large number, 10 6 , of background events. This leads to a pre-MVA signal significance of S/ √ B = 0.5 and a signal over background ratio of S/B = 0.06%. From Table 4 it is also possible to compute the corresponding pre-MVA expectations for the LHC Run II with L = 300 fb −1 : one expects in the boosted category around 50 signal events, with signal significance dropping down to S/ √ B 0.16. Such signal significances could have been enhanced by applying tighter selection requirements, but our analysis cuts have been left deliberately loose so that such optimisation may be performed by the MVA.
The resolved category benefits from higher signal yields, but this enhancement is compensated for by the corresponding increase in the QCD multi-jet background. In both resolved and intermediate categories the signal significance is S/ √ B 0.4, similar to that of the boosted category. A further drawback of the resolved case is that S/B is substantially reduced as compared to the boosted and intermediate cases.
Combining the results from the boosted, intermediate and resolved categories, we obtain an overall pre-MVA significance for the observation of the Higgs pair production in the bbbb final state at the HL-LHC of (S/ √ B) tot 0.8.

The role of light and charm jet mis-identification
One of the main differences in the present study as compared to previous works is the inclusion of both irreducible and reducible background components, which allows us to quantify the impact of light and charm jet mis-identification. Two recent studies that have also studied the feasibility of SM Higgs pair production in the bbbb final state are from the UCL group [28] and from the Durham group [29]. The UCL study is based on requiring at least four b-tagged R = 0.4 anti-k T jets in central acceptance with p T ≥ 40 GeV, which are then used to construct dijets (Higgs candidates) with p T ≥ 150 GeV, 85 ≤ m dijet ≤ 140 GeV and ∆R ≤ 1.5 between the two components of the dijet. In addition to the basic selection cuts, the constraints from additional kinematic variables are included by means of a Boosted Decision Tree (BDT) discriminant. The backgrounds included are the 4b and 2b2c QCD multijets, as well as tt, Zh, tth and hbb. For the HL-LHC, a signal significance of S/ √ B 2.1 is obtained. The Durham group study [29] requires events to have two R = 1.2 C/A jets with p T ≥ 200 GeV, and in addition two b-tagged subjets inside each large-R jet with p T ≥ 40 GeV each. To improve the separation between signal and background, both the BDRS method and the Shower Deconstruction (SD) [116,117] technique are used. The backgrounds considered are QCD 4b as well as Zbb, hZ and hW . At the HL-LHC, their best result is obtained by requiring two SD-tagged large-R jets, which leads to S/ √ B 2.1. Using the BDRS tagger results in slightly poorer performance.   From our results in Table 4, we observe that the signal significance for the boosted, intermediate, and resolved categories is increased to 1.1, 0.6 and 0.6, respectively, when only the QCD 4b background is included. Combining the signal significance in the three categories, we obtain (S/ √ B 4b ) tot 1.4, twice as large as the result found when all background components are included. Note the importance of the combination of the three exclusive event topologies, as opposed the exploitation of a single specific category. Taking into account the loose selection cuts, we see that our pre-MVA results including only the 4b background are consistent with those reported in previous studies.
From Table 4 we can compare the interplay between the reducible and irreducible components of the QCD backgrounds. In all cases, the 4b and 2b2j components have comparable magnitudes within the uncertainties from missing higher-order corrections. On the other hand, the 4j component is always substantially smaller. So while the 4j component can be safely neglected, the inclusion of the 2b2j component is essential to assess the feasibility of measuring Higgs pairs in this final state robustly, especially in the boosted category. This has the important consequence that a promising avenue to improve the prospects of this measurement would be to  reduce, as much as possible, the light and charm jet mis-identification rate.
In Fig. 14 we show a comparison of the shapes of the 4b and 2b2j components of the QCD background for the transverse momentum p h T of the leading Higgs candidate and for invariant mass m hh of the reconstructed di-Higgs system in the resolved and boosted categories. The two components possess a rather similar shape for the two distributions, albeit with some differences. In the boosted category, the 4b component exhibits a less steep fall-off of the p h T distribution at large p T , while in the resolved case the 2b2j component has a slightly harder distribution of the invariant mass m hh . We also observe that the 2b2j distributions are affected by somewhat larger Monte Carlo fluctuations as compared to 4b, despite the large size of the initial sample.
In the resolved category, the cross-section before b-tagging is two orders of magnitude larger in the 2b2j sample as compared to the 4b sample. After b-tagging, a naive assessment would suggest a suppression of the 2b2j cross-section by a factor (f l /f b ) 2 1.5 · 10 −4 , as compared to the 4b component, since a total of four b-tags are required to classify the event as a Higgs candidate. In this case the ratio of 2b2j over 4b would be around 3%, and therefore negligible. While we have checked that this expectation is borne out at the parton level, we find that when parton shower effects are accounted for the situation is different, due both to radiation of bb pairs and from selection effects. Due to these, the number of b quarks in the final state is increased substantially in the 2b2j component as compared to the parton level, while at the same time the number of events in the 4b sample with 4 b-jets passing selection cuts is reduced.
We can make these statements more quantitative in the following way. To first approximation, neglecting the contribution from charm mis-identification, the overall efficiency of the

QCD 4b QCD 2b2j
Boosted category, no PU b-tagging requirements in the resolved category will be given by the following expression: 1 and zero otherwise. This leads to a ratio of overall b-tagging selection efficiencies However, after the parton shower, the above estimate is no longer accurate. First of all, we will have a non-negligible fraction n (b−jet) j with j = 3, 4 also in the 2b2j sample, due to b-quark pair  of events for the resolved selection for which out of the four leading small-R jets of the event, j jets contain at least one b-quark with p b T ≥ 15 GeV. This information is provided for the di-Higgs signal events and for the three QCD background samples. The last column indicates the overall selection efficiency as defined in Eq. (10) radiation during the shower. Secondly, not all events in the 4b sample will lead to four small-R b-jets, due to a combination of selection cuts and parton shower effects.
In Table 5 we collect the values of n (b−jet) j for the signal and the three QCD background samples. We find that rather than the estimate Eq. (11), the correct ratio of b-tagging selection efficiencies is instead This suppression factor is of the same order as the ratio of 4b to 2b2j cross-sections in the resolved category before b-tagging. This explains why the 2b2j contribution cannot be neglected as compared to the irreducible 4b component of the QCD background. A similar calculation from the numbers in Table 5 shows that, on the other hand, the 4j component of the background can be neglected.

Multivariate analysis
At the end of the loose cut-based analysis, by combining the three event topologies, we obtain a signal significance of S/ √ B 0.8 (1.4) with all backgrounds (only QCD 4b) considered. This section describes how this signal significance can be enhanced when the cut-based analysis is complemented by multivariate techniques. These are by now a mature tool in high-energy physics data analysis, opening new avenues to improve the performance of many measurements and searches at high-energy colliders. In particular, the classification of events into signal and background processes by means of MVAs is commonly used in LHC applications [28,46,80,[118][119][120].
In this section, first we present the specific MVA that we use, based on feed-forward multilayer neural networks. Then we introduce the input variables that are used in the MVA, including the jet substructure variables, and then present the signal significance obtained by applying the MVA. Then we assess the robustness of the MVA strategy in the case of significant contamination from pileup.

Deep artificial neural networks
The specific type of MVA that we use to disentangle signal and background events is a multi-layer feed-forward artificial neural network (ANN), known as a perceptron. 2 This family of ANNs are also known as deep neural networks, due to their multi-layered architecture. The MVA inputs are a set of kinematic variables describing the signal and background events which satisfy the requirements of the cut-based analysis. The output of the trained ANNs also allows for the identification, in a fully automated way, of the most relevant variables in the discrimination between signal and background.
In this work, the ANN that we use has the following architecture.
where N var represents the number of input variables for the MVA, which is different in the resolved, intermediate, and boosted categories. All neural-network layers use a sigmoid activation function, allowing for a probabilistic interpretation of the ANN output. In Fig. 15 we show an illustrative example of an ANN used in this work, corresponding to the case of the boosted category (thus N var = 21, as we explain below). The training of the ANN for the signal/background classification task proceeds as follows. Given a set of N var kinematic variables {k} i associated with the event i, and a set of neural network weight parameters {ω}, we interpret the neural network output y i (the activation state of the neuron in the last layer) as the probability that the event i originates from the signal process, where y i represents the true classification of the event i, i.e, y i = 1 for signal and y i = 0 for background events. With this interpretation, our general classification probability including background events is given by consequently we can define an error function E({ω}) to be minimized during the ANN training.
In this case, the error function is the cross-entropy function, defined as where N ev is the number of Monte Carlo events that are used for the ANN training. The ANN is trained both on the signal and background MC events, so it is important to ensure that the input MC sample is large enough to avoid contamination from MC statistical fluctuations. The training of the neural networks therefore consists of the minimization of the crossentropy error, Eq. (16), which in this work is achieved using a Genetic Algorithm (GA). Genetic Algorithms [125][126][127][128] are non-deterministic minimization strategies suitable for the solution of complex optimization problems, for instance when a very large number of quasi-equivalent minima are present. GAs are inspired on natural selection processes that emulate biological evolution. In our case, the GA training is performed for a very large number of generations, N gen = 5 · 10 4 , to avoid the risk of under-training. We have verified that if a much larger number of generations are used, the results are unchanged.
In addition, in order to avoid the possibility of over-fitting, we have used a cross-validation stopping criterion, in particular the same one as that used in the NNPDF3.0 analysis [66]. This cross-validation proceeds by dividing the input MC dataset into two disjoint sets, using one for training the ANN and the other for validation: the optimal stopping point is then given by the minimum of the error function Eq. (16) to the validation sub-sample. This indicates the point where the ANN begins to train upon statistical fluctuations in the input MC samples, rather than learning the underlying (smooth) physical distributions.

Input kinematic variables
In this work we use different sets of input variables for the three categories. In the case of large-R jets, we exploit the available information on jet substructure. For the three categories, boosted, intermediate and resolved, the following common variables are used as input to the MVA: • The transverse momenta of the leading and subleading Higgs, p T,h 1 and p T,h 2 .
• The transverse momentum of the reconstructed Higgs pair, p T,hh .
• The invariant masses of the leading and sub-leading Higgs candidates, m h,1 and m h,2 .
• The invariant mass of the reconstructed Higgs pair, m hh .
• The separation in the φ-η plane between the two Higgs candidates, ∆R hh .
• The separation in η between the two Higgs candidates, ∆η hh .
• The separation in φ between the two Higgs candidates, ∆φ hh .
In addition, in the boosted category we use the transverse momenta of the leading, p T,h 1,1 and p T,h 1,2 and sub-leading, p T,h 2,1 and p T,h 2,2 , Higgs candidate AKT03 subjets. In the resolved category instead, the corresponding variables are the transverse momenta p T,i of the four leading b-tagged small-R jets in the event. In the intermediate category, we use the transverse momenta of the subjets from the large-R jet p T,h 1,1 and p T,h 1,2 and the transverse momenta p T,i of the two leading b-tagged small-R jets. Therefore, we have 13 variables which are common to the three categories.
In the boosted and intermediate categories, we also include the jet substructure variables introduced in Sect. 3 for the large-R jets: the k t splitting scales √ d 12 , the ratio of 2-to-1 subjettiness τ 12 , and the ratios of energy correlation functions C Given that the MVA is able to identify the most discriminatory variables in an automated way, and to suppress those which have little effect, it is advantageous to include a wide array of input variables. This is one of the main advantages of ANNs in this context: their inherent redundancy means that adding additional information, even if carries very little weight, should not degrade the classification power of the MVA.

MVA results
We now present the results of the MVA, first without PU, and then later including the effects of PU. First of all, in Fig. 16 we show the distribution of the ANN output at the end of the GA minimization, separately for the boosted, intermediate and resolved categories. All distributions are normalized so that their integral adds up to one. The separation between signal and background is achieved by introducing a cut, y cut , on the ANN output, so that MC events with y i ≥ y cut are classified as signal events, and those with y i < y cut as background events. Therefore, the more differentiated the distribution of the ANN output is for signal and background events, the more efficient the MVA discrimination will be.
From Fig. 16 we see that in the boosted category the MVA can produce a clear discrimination between signal and background, with the two distributions forming peaks at their respective optimal limits. This indicates that introducing a suitable cut y cut in the ANN output will substantially reduce the background, while keeping a reasonable signal efficiency. The performance of the MVA discrimination is similar, although slightly worse, in the intermediate and resolved categories.
The results for the signal selection efficiency and the background rejection rate as a function of the cut in the ANN output y cut define the so-called Receiver-Operating Characteristic (ROC) curve, shown in Fig. 17. It is clear that we can achieve high signal efficiency by using a small value of y cut , but such a choice would be affected by poor background rejection. Conversely, using a higher value of the cut will increase background rejection at the cost of dropping signal efficiency. As could already be inferred from the distribution of neural networks output in Fig. 16, we find that our MVA is reasonably efficient in discriminating signal over background. The performance is best in the case of the boosted category, and then slightly worse in the resolved and intermediate categories, consistent with the distributions of the ANN outputs in Fig. 16.
It is useful to estimate, for each value of the cut in the ANN output y cut , how many signal and background events are expected at the HL-LHC with L = 3 ab −1 . This comparison is shown in Fig. 17. We observe that in the boosted category, for a value y cut 0.9 we end up with around 300 signal events and 10 4 background events. Similar results are obtained in the intermediate and resolved categories: in the former we find 130 (3 · 10 3 ) signal (background) events for y cut 0.85 (0.60), and in the latter 630 (10 5 ) signal (background) events for y cut 0.6. Therefore, the MVA achieves a substantial background suppression with only a moderate reduction of signal

efficiency.
A useful property of MVAs such as the one used in our analysis is that they can provide direct physical insight about which of the input variables contribute to the separation between signal and background. In the case of ANNs, this can be quantified by computing the sum of the absolute values of all the weights connected to a given input neuron i, that is with ω (2) ki the value of the weight connecting the k-th neutron of the second layer with the i-th neuron of the first (input) layer, and n (2) = 5 the number of neurons in the second layer. Those input variables with a larger value of ω (tot) i will be those that play a more significant role in enhancing the signal discrimination using the MVA. We note however that the estimate provided by Eq. (17) is necessarily qualitative. In Fig. 18 we show the distribution of the total associated weight, Eq. (17) for each of the N var input variables of the three categories, using the notation for the kinematic variables as in Sect. 5.2. In the resolved category, the variables that carry a higher discrimination power are the transverse momentum of the two reconstructed Higgs candidates and their invariant masses. In the case of the boosted category, the invariant mass distribution of the Higgs candidates is also the most discriminatory variable, followed by the subjet p T distributions and substructure variables such as C The results for the signal significance S/ √ B and the signal over background ratio S/B as a function of y cut for the three categories are given in Fig. 19. The values for y cut = 0 correspond to those at the end of the loose cut-based analysis. We observe how in the three categories there is a marked improvement in signal significance as compared to the pre-MVA results. We also observe a substantial enhancement in S/B, arising from the background suppression achieved by the MVA, reaching values of 1%, 6% and 3.5% in the resolved, intermediate and boosted categories. This improvement in S/B is crucial to ensure the feasibility of this measurement, since it allows systematic uncertainties in the background determination to be at most of a similar size.
The optimal value of the cut in the ANN output, y cut , can be determined from the maximisation of S/ √ B, ensuring that the number of signal events N ev expected at the HL-LHC does not become too low. In addition, we require that the number of MC events used to define the signal category (events with y i ≥ y cut ) is sufficiently large in order to avoid the biases and statistical fluctuations associated to a small training sample. In Table 6 we quote, for the optimal value of y cut in each category, the number of signal and background events N ev expected at the HL-LHC, as well as S/ √ B and S/B. For completeness, we also include the corresponding pre-MVA results.
From Table 6 we see that following the application of the MVA, the signal significance in the boosted category increases from 0. 5 The signal significance for L = 3 ab −1 is thus well above the threshold for the observation of Higgs pair production. However, given that the HL-LHC will be a high-PU environment, which will affect the description of the various kinematic distributions used as input to the MVA, it is essential to quantify the robustness of these results in a realistic environment including the effects of significant PU. It should be emphasized that MVAs such as the ANNs used in this work can always be understood as a combined set of correlated cuts. Once the ANNs have been trained, it is possible to compare kinematical distributions after and before the ANN cut to verify its impact. This information would allow in principle to perform a cut-based analysis, without the need of using ANNs, and finding similar results.
To illustrate this point, in Fig. 20 Table 6: Post-MVA results, for the optimal value of the ANN discriminant y cut in the three categories, compared with the corresponding pre-MVA results (y cut = 0). We quote the number of signal and background events expected for L = 3 ab −1 , the signal significance S/ √ B and the signal over background ratio S/B. The pre-MVA results correspond to row C2 in Table 4. background events. The distributions are not normalized, to better visualize the effect of the MVA cut. Unsurprisingly, the ANN cut effectively selects events which lead to similar kinematical distributions between signal and background events. In the case of the small-R jets p T distribution, the ANN cuts favors the high-p T region, while for the invariant mass distribution only the region around the Higgs mass peak is selected for background events.
A particularly challenging aspect of our analysis is the modeling of the 2b2j and 4j background, specially of the latter, which require extremely large MC samples. In the analysis reported here, out of the original 3M 4j generated events, only around 100 survive the analysis cuts, and thus these low statistics have associated a potentially large uncertainty in the calculation of the post-MVA 4j cross-section. On the other hand, since the 4j cross-sections are always quite smaller than the sum of the 4b of the 2b2j components, this low statistics should not modify qualitatively our conclusions above. To verify explicitly this expectation, and obtain a more robust estimate of the background cross-section from mis-identified jets, we have increased by a factor 10 the size of the 2b2j and 4j background samples, up to a total of 30M each. Processing these events though our analysis, including retraining the MVA, we find (S/ √ B) tot = 3.9, consistent with Eq. (18), indicating that the low statistics of the 4j background is not a limiting factor.

Impact of PU in the MVA
In this section we study how the MVA results are modified when the analysis is performed including significant PU. The loose cut-based analysis and the subsequent MVA optimization have been performed using the same settings as in the case without PU. In Table 7 we provide the pre-MVA cut flow in the case of PU80, the corresponding version without PU being Table 4. The interplay between the signal cross-sections and the various background components is qualitatively unchanged as compared to the no PU case.
In Table 8 we compare the results for the PU80+SK+Trim case between the pre-MVA loose cut-based analysis and the post-MVA results for the optimal values of the ANN output cut y cut . As in Table 6, we also quote the number of signal and total background events expected for L = 3 ab −1 and the values of S/ √ B and S/B. We observe that the pre-MVA signal significance is close to the results of the simulations without PU for the three categories.  of selected signal events in each category at the end of the cut-based analysis is only mildly affected by PU. The slight pre-MVA improvement in S/ √ B for the boosted case arises from a reduction in the number of background events that are classified in this category as compared to the case without PU.
Once the MVA is applied, the signal significance in the resolved, intermediate and boosted categories increases to 2.0, 1.9 and 1.5 respectively, to be compared with the corresponding values without PU, namely 1.9, 2.3 and 2.7. Therefore, the post-MVA effect of PU on S/ √ B is a moderate degradation of the boosted and intermediate categories, specially for the former, while the resolved category is largely unchanged. 3 We also observe that, due to the MVA, the signal over background ratio is increased from 0.007%, 0.03% and 0.1% up to 1%, 3% and 1% in the resolved, intermediate and boosted categories respectively. This indicates that while this measurement is still highly challenging, requiring a careful extraction of the QCD background from the data, it should be within reach.
In Fig. 21 we show the number of signal and background events that are expected for L = 3 ab −1 as a function of y cut , together with the corresponding ROC curve. The slight degradation of the boosted category in the case of PU can be seen by comparing with the corresponding results without PU in Fig. 17. In Fig. 22 we show the signal significance, S/ √ B, and the signal over background ratio, S/B, accounting now for the effects of PU. The corresponding results in the case without PU were shown in Fig. 19. As can be seen, the MVA-driven enhancement remains robust in the presence of PU, with S/ √ B only moderately degraded. Therefore, the qualitative conclusions drawn in the case without PU also hold when the analysis is performed in a high-PU environment. Since no specific effort has been performed to optimize PU subtraction, for instance by tuning the values of the patch length a in SoftKiller or the p T threshold during jet trimming, we believe that there should be still room for further improvement.    It is useful to quantify which of the MVA input variables carry the highest discrimination power in the case of PU, by means of Eq. (17), and compare this with the corresponding results without PU shown in Fig. 18. We have verified that the relative weight of the different input variables to the MVA is mostly unchanged in the case of PU. In the resolved category, the highest total associated weight is carried by the Higgs candidates p T and invariant mass, as well as by the p T of the individual small-R jets. For the boosted category, the highest weight is carried by the Higgs invariant mass, followed by the Higgs p T , m hh , the p T of the AKT03 subjets and the substructure variables, with a similar weighting among them.
In Table 9 we provide the post-MVA number of signal and background events expected for L = 3 ab −1 . For the backgrounds, we quote both the total number, N tot ev , and the QCD 4b component only, N 4b ev . We quote results for the no PU and PU80+SK+Trim cases. We also quote in each case the corresponding values for the signal significance and the signal over background ratio. Note that the MVA is always trained to the inclusive background sample, though differences in the kinematic distributions of the 4b and 2b2j processes are moderate, see Fig. 14. From Table 9 one observes that all categories exhibit a marked improvement from eliminating the contamination from light and charm jet mis-identification. For instance, in the intermediate category, S/ √ B increases from 2.3 to 3.3 (1.9 to 2.9) in the no PU (PU80) case, with similar improvements in the resolved and boosted categories.
In Table 9 we also provide the results for S/ √ B obtained by combining the three categories. Taking into account all background components, we obtain for the case of n PU = 80 an overall indicating that a measurement of Higgs pair production in the bbbb final state at the HL-LHC should be above the threshold for observation, even when realistic PU conditions are accounted for. A similar signal significance is obtained in the case of n PU = 150. Under the assumption that the only relevant background would be the irreducible QCD 4b component, one obtains instead S √ B 4b tot 4.7 (1.5) , L = 3000 (300) fb −1 .
Therefore, a measurement of Higgs pair production in the bbbb final state at the HL-LHC might be even above the threshold for discovery, provided the effects due to mis-identification of light and charm jets as b-jets can be reduced.

Conclusions and outlook
In this work we have presented a feasibility study for the measurement of Higgs pair production in the bbbb final state at the LHC. Our strategy is based on the combination of traditional cut-based analysis with state-of-the-art multivariate techniques. We take into account all relevant backgrounds, in particular the irreducible 4b and the reducible 2b2j and 4j QCD multijets. We have illustrated how the 2b2j component leads to a contribution comparable to that of QCD 4b production, due to a combination of parton shower effects, b-quark pair radiation, and selection requirements. We have also demonstrated the robustness of our analysis strategy under the addition of significant PU. In particular, we have explored two scenarios, n PU = 80 and n PU = 150, and found a comparable overall signal significance in the two cases. Combining the contributions from the resolved, intermediate and boosted categories, we find that, for L = 3 ab −1 , the signal significance for the production of Higgs pairs turns out to be S/ √ B 3. This indicates that, already from the bbbb final state alone, it should be possible to claim observation of Higgs pair production at the HL-LHC. Our study also suggests possible avenues that the LHC experiments could explore to further improve this signal significance. One handle would be to reduce the contribution from light and charm jet mis-identification, ensuring that the irreducible 4b background dominates over the 2b2j component. This would allow to enhance S/ √ B almost to the discovery level, see Table 9. It would also be advantageous to improve the b-tagging efficiency, allowing to achieve higher signal yields. Another possibility would be to improve the mass resolution of the Higgs reconstruction in high-PU environments, and, more general, to optimize the PU subtraction strategy in order to reduce the impact of PU in the modelling of kinematic variables and the associated degradation in the MVA discrimination.
Another challenging aspect of the measurement of Higgs pairs in the bbbb final state is achieving an efficient triggering strategy. In order to reduce the rate from background QCD processes sufficiently, while being able to access the relevant p T regimes, (multi-)jet triggers using b-quark tagging information online for one or more jets are likely to be necessary. The additional rejection provided by these triggers could enable events to be selected efficiently, with four jets down to p T = 40 GeV in the resolved category, and boosted Higgs decays in large-R jets down to jet transverse momenta of p T = 200 GeV. In addition, good control of the multijet backgrounds and the experimental systematics of the MVA inputs will be important to achieve these sensitivities.
Our strategy relies on the modeling of the kinematic distributions of signal and background events, since these provide the inputs to the MVA discriminant. In this respect, it would be important, having established the key relevance of the bbbb channel for the study of Higgs pair production, to revisit and improve the theoretical modeling of our signal and background simulation, in particular using NLO calculations matched to parton showers both for signal [17,41] and for backgrounds [63,76].
One important implication of this work is that it should be possible to significantly improve the accuracy on the extraction of the Higgs trilinear coupling λ from a measurement of the σ hh → bbbb cross-section, as compared to existing estimates. A determination of λ in our approach is however rather non-trivial, involving not only regenerating signal samples for a wide range of values of λ, but also repeating the analysis optimisation, including the MVA training, for each of these values. This study is left to a future publication, where we will also compare the precision from the bbbb final state with the corresponding precision that has been reported from other final states such as bbγγ and bbτ τ . It will also be interesting to perform this exercise for a 100 TeV hadron collider [11][12][13][14]. While at 100 TeV the signal yields would be increased, also the (gluon-driven) QCD multijet background would grow strongly. Revisiting the present analysis, including the MVA optimization, at 100 TeV would also allow us to assess the accuracy of an extraction of the trilinear coupling λ from the bbbb final state at 100 TeV.
In this work we have considered only the SM production mechanism, but many BSM scenarios predict deviations in Higgs pair production, both at the level of total rates and of differential distributions. In the absence of new explicit degrees of freedom, deviations from the SM can be parametrized in the EFT framework using higher-order operators [14,48]. Therefore, we plan to study the constraints on the coefficients of these effective operators that can be obtained from measurements of various kinematic distributions in the hh → bbbb process. Note that the higher rates of the bbbb final state as compared to other final states, such as bbγγ, allow for better constraints upon operators that modify the high-energy behavior of the theory, for instance, it would become possible to access the tail of the m hh distribution.
As in the case of the extraction of the Higgs trilinear coupling λ, such a study would be a computationally intensive task, since BSM dynamics will modify the shapes of the kinematic distributions and thus in principle each point in the EFT parameter space would require a reoptimization with a newly trained MVA. In order to explore efficiently the BSM parameters without having to repeat the full analysis for each point, modern statistical techniques such as the Cluster Analysis method proposed in Ref. [46] might be helpful.

LO
NLO K-factor Zh (13 TeV) 6.5 · 10 −1 pb 7.7 · 10 −1 pb 1.19 tth (13 TeV) 3.8 · 10 −1 pb 4.6 · 10 −1 pb 1.29 bbh (13 TeV) 4.9 · 10 −1 pb 6.1 · 10 −1 pb 1.22   In Table 12 we show the signal and background cross-sections at the end of the cut-based analysis, before the MVA is applied, in the case without PU. We separate the results into the three exclusive categories used in our analysis. From this comparison, we see that as expected, at the end of the cut-based analysis, the single-Higgs backgrounds are smaller than the QCD multijet background by several orders of magnitude. In addition, we find that already at the end of the cut-based analysis the di-Higgs signal is also larger than all the single-Higgs backgrounds in all the selection categories. Since this discrimination can only be improved by the MVA, we conclude that neglecting single-Higgs backgrounds is a reasonable approximation. From Table 12 we also observe that in the resolved and intermediate categories Zh → bbbb is the dominant single-Higgs background, while tth(→ bb) is instead the most important one in the boosted category.