Search for pair production of Higgs bosons in the $b\bar{b}b\bar{b}$ final state using proton-proton collisions at $\sqrt{s} = 13$ TeV with the ATLAS detector

A search for Higgs boson pair production in the $b\bar{b}b\bar{b}$ final state is carried out with up to 36.1 $\mathrm{fb}^{-1}$ of LHC proton-proton collision data collected at $\sqrt{s}$ = 13 TeV with the ATLAS detector in 2015 and 2016. Three benchmark signals are studied: a spin-2 graviton decaying into a Higgs boson pair, a scalar resonance decaying into a Higgs boson pair, and Standard Model non-resonant Higgs boson pair production. Two analyses are carried out, each implementing a particular technique for the event reconstruction that targets Higgs bosons reconstructed as pairs of jets or single boosted jets. The resonance mass range covered is $260-3000$ GeV. The analyses are statistically combined and upper limits on the production cross section of Higgs boson pairs times branching ratio to $b\bar{b}b\bar{b}$ are set in each model. No significant excess is observed; the largest deviation of data over prediction is found at a mass of 280 GeV, corresponding to 2.3 standard deviations globally. The observed 95% confidence level upper limit on the non-resonant production is 13 times the Standard Model prediction.


ATLAS detector
The ATLAS experiment [20] at the LHC is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a near 4π coverage in solid angle.1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic (EM) and hadronic calorimeters, and a muon spectrometer (MS). The ID covers the pseudorapidity range |η| < 2.5. It consists of silicon pixel, silicon microstrip, and straw-tube transition-radiation tracking detectors. An additional pixel detector layer [21], inserted at a mean radius of 3.3 cm, is used in the Run-2 data-taking and improves the identification of b-jets [22]. Lead/liquid-argon (LAr) sampling calorimeters provide EM energy measurements. A steel/scintillator-tile hadronic calorimeter covers the central pseudorapidity range (|η| < 1.7). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to |η| = 4.9. The muon spectrometer surrounds the calorimeters and includes three large superconducting air-core toroids. The field integral of the toroids ranges between 2.0 and 6.0 T m for most of the detector. The MS includes a system of precision tracking chambers and triggering chambers. A dedicated trigger system is used to select events [23]. The first-level trigger is implemented in hardware and uses the calorimeter and muon detectors to reduce the accepted event rate to 100 kHz. This is followed by a software-based high-level trigger that reduces the accepted event rate to 1 kHz on average.

Data
This analysis is performed on two LHC pp collision datasets at √ s = 13 TeV. Data were collected during stable beam conditions and when all relevant detector systems were functional. The integrated luminosity of the dataset collected during 2015 was measured to be 3.2 fb −1 . The second dataset was collected during 2016 and corresponds to an integrated luminosity of 24.3 fb −1 for the resolved analysis and 32.9 fb −1 for the boosted analysis.
The difference in integrated luminosity between the two analyses results from the choices of triggers. In the resolved analysis, a combination of b-jet triggers is used. Events were required to feature either one b-tagged jet [24,25] with transverse momentum p T > 225 GeV, or two b-tagged jets, either both satisfying p T > 35 GeV or both satisfying p T > 55 GeV, with different requirements on the b-tagging. Some triggers required additional non-b-tagged jets. Due to a change in the online b-tagging algorithm between 2015 and 2016, the two datasets are treated independently until they are combined in the final statistical analysis. After the selection described in Section 5, this combination of triggers is estimated to be 65% efficient for simulated signals with a Higgs boson pair invariant mass, m H H , of 280 GeV, rising to 100% efficiency for resonance masses greater than 600 GeV. During 2016 data-taking, a fraction of the data was affected by an inefficiency in the online vertex reconstruction, which reduced the efficiency of the algorithms used to identify b-jets; those events were not retained for further analysis. This reduces the 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upwards. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis.
integrated luminosity of the 2016 dataset for the resolved analysis to 24.3 fb −1 . In the boosted analysis, events were selected from the 2015 dataset using a trigger that required a single anti-k t jet [26] with radius parameter R = 1.0 and with p T > 360 GeV. In 2016, a similar trigger was used but with a higher threshold of p T > 420 GeV. The efficiency of these triggers is 100% for simulated signals passing the jet requirements described in Section 6, so the 2015 and 2016 datasets were combined into one dataset.

Signal models and simulation
Simulated Monte Carlo (MC) event samples are used in this analysis to model signal production and the background from tt. The dominant multi-jet background is modelled using data-driven techniques, as described in Sections 5.2 and 6.2.
Signal G KK → HH → bbbb events were generated at leading order (LO) with MG5_aMC@NLO  For the evaluation of theoretical uncertainties in the signal modeling, samples were produced with variations of the factorization and renormalization scales, PDF sets (following the prescription from Ref. [35]) and shower generator. For the latter, scalar (spin-2) samples were produced that are interfaced to P 8.186 rather than H ++ (and vice versa).
The decay widths of these three resonance models differ. The scalar signals were generated with a width of 1 GeV, allowing a study of generic narrow-width scalar signals. The widths of the graviton signals depend on the resonance mass and the value of k/M Pl . Relative to the resonance mass, they range from 3% (6%) at low mass to 13% (25%) at the highest mass for k/M Pl = 1 (k/M Pl = 2). The graviton samples were normalized using cross sections from Ref. [36].
Resonant signal samples for the scalar and k/M Pl = 1 models were produced in 10 GeV steps between 260 and 300 GeV, in 100 GeV steps up to 1600 GeV, in 200 GeV steps up to 2000 GeV, and in 250 GeV steps up to 3000 GeV. Signal samples for the k/M Pl = 2 model were produced with the same spacings but omitting the masses of 270 GeV, 290 GeV and 2750 GeV due to the larger generated width.  41], which calculated the process at NLO in QCD while fully accounting for the top-quark mass. The cross section times branching ratio to the bbbb final state, evaluated at nextto-next-to-leading order (NNLO) with the summation of logarithms at next-to-next-leading-logarithm (NNLL) accuracy and including top-quark mass effects at NLO is 11.3 +0.9 −1.0 fb [40]. The uncertainty includes the effects due to renormalization and factorization scales, PDF set, α S , and the H → bb branching ratio. In all signal samples, the mass of the Higgs boson (m H ) was set to 125 GeV.
Interference effects between HH resonant production and SM non-resonant HH production are not included in the simulated samples.
The generation of tt events was performed with P -v1 [42] using the CT10 PDF set. The parton shower, hadronization, and the underlying event were simulated using P 6.428 [43] with the CTEQ6L1 PDF set and the corresponding Perugia 2012 set of tuned underlying-event parameters [44]. The top-quark mass was set to 172.5 GeV. Higher-order corrections to the tt cross section were computed with Top++ 2.0 [45]. These incorporate NNLO corrections in QCD, including resummation of NNLL soft gluon terms.  [48,49] and the events were processed with the same reconstruction software as that used for the data.

Object reconstruction
Jets are built from topological clusters of energy deposits in calorimeter cells [50], using a four-momentum reconstruction scheme with massless clusters as input. The directions of jets are corrected to point back to the identified hard-scatter, proton-proton collision vertex, which is the vertex with the highest Σp 2 T of constituent tracks.
Jets are reconstructed using the anti-k t algorithm with different values of the radius parameter R. The jets with R = 0.4 ("small-R jets"), used in the resolved analysis, are reconstructed from clusters calibrated at the electromagnetic (EM) scale. The jets are corrected for additional energy deposited from pile-up interactions using an area-based correction [51]. They are then calibrated using p T -and η-dependent calibration factors derived from simulation, before global sequential calibration [52] is applied, which reduces differences in calorimeter responses to gluon-or quark-initiated jets. The final calibration is based on in situ measurements in collision data [53]. Jets with p T < 60 GeV |η| < 2.4, and with a large fraction of their energy arising from pile-up interactions are suppressed using tracking information, which was combined in a multivariate classification algorithm (jet vertex tagger) [54]. Events that pass a "medium" jet vertex tagger working point, corresponding to a 92% efficiency for jets at the EM scale with 20 < p T < 60 GeV, are retained in the analysis. Quality criteria are applied to the jets, and events with jets consistent with noise in the calorimeter or non-collision backgrounds are vetoed [55].
The jets with R = 1.0 ("large-R jets") used in the boosted analysis are built from locally calibrated [52] topological clusters. They are trimmed [56] to minimize the impact of energy deposits from pile-up interactions. Trimming proceeds by reclustering the jet with the k t algorithm [57] into R = 0.2 subjets and then removing those subjets with p subjet T /p jet T < 0.05, where p subjet T is the transverse momentum of the subjet and p jet T that of the original jet. The energy and mass scales of the trimmed jets are then calibrated using p T -and η-dependent calibration factors derived from simulation [58]. The mass of the large-R jets is computed using tracking and calorimeter information, also called the combined mass technique [59], which leads to a smaller mass resolution and better estimate of the median mass value than obtained using only calorimeter energy clusters.
Jets containing b-hadrons are identified using a score value computed from a multivariate b-tagging algorithm MV2c10 [24, 25], which makes use of observables provided by an impact parameter algorithm, an inclusive secondary vertex finding algorithm and a multi-vertex finding algorithm. The MV2c10 algorithm is applied to a set of charged-particle tracks that satisfy quality and impact parameter criteria and are matched to each jet. For large-R jets, b-tagging is performed on anti-k t R = 0.2 track-jets [60] matched to the large-R jets using ghost association [61]. These track-jets are required to be consistent with the primary vertex of the event as well as to satisfy p T > 10 GeV and |η| < 2.5. The small radius parameter of the track-jets enables two nearby b-hadrons to be identified when their ∆R separation is less than 0.4, which is beneficial when reconstructing high-p T Higgs boson candidates. The b-tagging requirements of both the resolved and the boosted analyses use working points that lead to an efficiency of 70% for b-jets with p T > 20 GeV when evaluated in a sample of simulated tt events. This working point corresponds to a rejection rate of jets originating from u-, dor s-quarks or gluons of 380 for the jets with R = 0.4 and 120 for the track-jets. The rejection of jets from c-quarks is 12 (7.1) for the R = 0.4 jets (track-jets).
Muons are reconstructed by combining tracks in the ID with tracks in the MS, and are required to have p T > 4 GeV, |η| < 2.5 and to satisfy "medium" muon identification criteria [62]. If a muon is within ∆R = 0.4 (0.2) of a jet used for b-tagging in the resolved (boosted) analysis, their four-momentum is added to the calorimeter-based jet's four-momentum to partially account for the energy lost in semileptonic b-hadron decays.

Resolved analysis
The resolved analysis is optimized to discover signals that result in either non-resonant or low-mass resonant Higgs boson pair production. The strategy is to select two Higgs boson candidates, each composed of two b-tagged anti-k t small-R jets, with invariant masses near m h .
The invariant mass of the two-Higgs-boson-candidate system (m 4j ) is used as the final discriminant between Higgs boson pair production and the backgrounds (which are principally multijet, with some tt). Resonant signals would lead to a localized excess, while non-resonant production would result in an excess in the tail of the m 4j spectrum.

Selection
Events are required to have a primary vertex with at least two tracks matched to it. The selection proceeds with the requirement that the event contains at least four b-tagged anti-k t small-R jets with p T > 40 GeV and |η| < 2.5 ("four-tag" sample). The four jets with the highest b-tagging score are paired to construct two Higgs boson candidates. Initially, all possible pairings are considered. The angle between the decay products of the Higgs boson in the laboratory frame depends on the value of m 4j and thus on the Lorentz boost of the Higgs boson. Accordingly, pairings of jets into Higgs boson candidates are only accepted if they satisfy the following requirements, where m 4j is expressed in GeV: In these expressions, ∆R j j,lead is the angular distance between the jets in the leading Higgs boson candidate and ∆R j j,subl for the sub-leading candidate. The leading Higgs boson candidate is defined as the candidate with the highest scalar sum of jet p T . These requirements on ∆R j j efficiently reject jet-pairings in which one of the b-tagged jets is not consistent with one originating from a Higgs boson decay. The specific numbers in this and the following selection requirements were chosen to maximize the sensitivity to the signal.
Events that have more than two Higgs boson candidates satisfying these requirements necessitate an algorithm in order to choose the correct pairs. In the absence of energy loss through semileptonic Bdecays, the optimal choice would be the combination most consistent with the decays of two particles of equal mass. In signal simulation the pairing of jets (when four b-jets have been identified) is correct in at least 90% of the events, depending on m 4j .
The resolved analysis searches for resonances with a wide range of masses, 260 GeV< m H H < 1400 GeV, as well as non-resonant signals. Event selection criteria that vary as a function of m 4j are used to reject background and hence enhance the analysis sensitivity across this range. Mass-dependent requirements are imposed on the p T of the leading Higgs boson candidate, p lead T , and the sub-leading Higgs boson candidate, p subl T : where m 4j is again expressed in GeV.
A further (m 4j -independent) requirement is placed on the pseudorapidity difference between the two Higgs boson candidates, |∆η H H | < 1.5, which rejects multijet events.
A requirement on the Higgs boson candidates' masses is used to define the signal region: where the 0.1m 2j terms represent the widths of the leading and sub-leading Higgs boson candidates' mass distributions, derived from simulation. The signal region is shown as the inner region of Figure 1.
To reduce the tt background, hadronically decaying top-quark candidates are built from any three jets in the event, of which one must be a constituent of a Higgs boson candidate. These three jets are ordered by their b-tagging score. The highest one is considered as the b-jet originating from the top-quark candidate decay; the other two jets are considered as forming a hadronically decaying W boson candidate. A measure of the consistency of this combination with the top-quark hypothesis is then evaluated using the X W t variable: where m W is the invariant mass of the two-jet W boson candidate and m t that of the three-jet top candidate.
All possible combinations of three jets are considered and the top-quark candidate with the smallest X W t is chosen for each event. Events with the smallest X W t < 1.5 are vetoed in the final selection. This requirement reduces the tt contamination where both top quarks decay without leptons (hadronic) by 60%, and the tt events that contain leptons (semileptonic) by 45%.
The final analysis discriminant is m 4j . A correction is made using the known Higgs boson mass, where each Higgs boson candidate's four-momentum is multiplied by a correction factor m H /m 2j . This leads to an improvement of approximately 30% in signal m 4j resolution with a significant reduction of low-mass tails caused by energy loss and with little impact on the background m 4j shape.
The fraction of signal events accepted by the detector multiplied with the efficiency of each selection step is shown in Figure 2 for the narrow-width scalar, graviton, and SM non-resonant signal models. The acceptance times efficiency is higher for the graviton samples because spin-2 resonances decay more centrally, resulting in higher-p T jets. The acceptance times efficiency is limited at low mass by the p T requirement on the jets, and at high mass the chance to resolve four distinct jets becomes lower, and the b-tagging efficiency decrases.

Background estimation
After the full event selection is applied, about 95% of the background consists of multijet events, which are modelled using data. The remaining 5% are tt events. The tt background normalization is determined from data, while the m 4j spectrum is taken from simulation. A data-driven estimate of Z + jets events yields a contribution of 0.2% to the total background, which is neglected in the following. Background from other sources, including processes involving single Higgs boson production, is also found to be negligible.

Multijet background
The multijet background is modelled with an independent data sample selected using the same trigger and selection requirements as used in the signal region, except for the b-tagging requirement: at least four jets with p T > 40 GeV are required, and exactly two of them have to be b-tagged. This "two-tag" selection yields a data sample that consists of 98% multijet events and 2% tt events. The predicted signal contamination is negligible.
To model the multijet background in the four-tag sample with events selected in the two-tag sample, a product of two weights is applied to the two-tag events, where each weight corrects for different effects: additional jet activity and kinematic differences from applying b-tagging requirements.
The weights are derived in a signal-depleted sideband region of the m lead 2j -m subl 2j plane and validated using an orthogonal control region. The control region is defined as the region with The event weight that corrects for additional jet activity is obtained as follows. Higgs boson candidates in the two-tag sample are formed from the two b-tagged jets and another two jets that fail to meet the b-tagging criteria but pass the kinematic requirements. A constant, per-jet transfer factor, f , is applied to each of the anti-b-tagged jets to scale the event yield in the two-tag sample to that expected in the four-tag sample. This transfer factor, given in Table 1, is determined by comparing the jet multiplicity distributions of the two-tag and four-tag selections in the sideband region. In events with more than two anti-b-tagged jets, one of the possible combinations of jets is randomly selected. Higgs boson candidates are then formed from the two selected b-tagged jets and the selected anti-b-tagged jets using the same procedure as in the four-tag sample.
The events in the two-tag data sample are then weighted further to correct for the kinematic differences caused by the additional b-tagging requirement of the four-tag sample. These differences can arise for the following reasons: the b-tagging efficiency varies as a function of jet p T and η; the various multijet processes contribute with different fractions in each sample; and the fractions of events accepted by each trigger path changes.
The weights are determined by fitting cubic splines to the ratio of the total background model to data after subtracting the tt contribution in both samples (12% in the two-tag sideband and 7% in the four-tag sideband). This is done for the five distributions that show the largest differences between the four-tag and two-tag samples. These are: the average |η| of the four jets constituting the Higgs boson candidate; the p T of the second and fourth leading (in p T ) constituent jets; the smallest ∆R between any two constituent jets; and the ∆R between the other two constituent jets. The reweighting is performed using one-dimensional distributions, and is iterated until the weights converged to stable values.
Agreement of the background model and data after these reweightings is checked in the control and sideband regions, and is found to be notably improved. This improvement is verified in the variables used to derive the weights and additionally in many other kinematic distributions.

Background normalization and the tt background
The m 4j distribution of the tt background is modelled using simulation. To improve the statistical precision, the two-tag simulated distribution is used for the hadronic tt sample, after correction by the same kinematic weights used for the multijet model. This procedure is validated in an inclusive region that contains events from signal region, sideband and control region; and good agreement is observed between the corrected two-tag sample and the four-tag sample in several distributions. The semileptonic tt background is modelled directly, using the four-tag MC sample of the tt background.
The normalizations of the tt and multijet backgrounds are determined simultaneously by fitting the yields of semileptonic tt, hadronic tt, and multijet events in three background-enriched samples. These background-enriched samples are all defined as having Higgs boson candidate's masses in the sideband region, but consistent results are found using the control region data. Specific tt-enriched samples are defined with requirements additional to the sideband selection. The semileptonic enriched sample must contain an isolated muon [62] with p T > 25 GeV and from Eq. (2), X W t < 1.5. The sample enriched in hadronic tt requires X W t < 0.75, while the sample defined by X W t > 0.75 is enriched in multijet events.
There are three parameters in the normalization fit: µ multijet , which scales the multijet yield from the two-tag to the four-tag sideband region after the per-jet transfer factor f and the kinematic weights have been applied; and two parameters α hadronic tt and α semileptonic tt that correct the normalizations of the yields for the hadronic and semileptonic tt MC samples in the four-tag sideband region.
The result of the normalization fit is presented in Table 1. The uncertainties are those from limited data and MC sample sizes, and they are propagated to the final fit, as described in Section 7. After the reweighting corrections and the application of the normalization, there is good agreement between the background model and the data distributions in the sideband as well as the control regions. The distributions of m 4j in the control region for both datasets are displayed in Figure 3.

Systematic uncertainties
Background uncertainties are propagated from the fit which determines the multijet and tt yields. The statistical uncertainties in the scale factors (Table 1) are propagated including the correlations, by calculating three orthogonal eigenvariations from the covariance matrix of the normalization fit, resulting  in three nuisance parameters, such that each parameter acts on the three background normalizations simultaneously.
Shape uncertainties in the multijet background are assessed by deriving an alternative background model using the same procedure as in the nominal case, but using data from the control region rather than from the sideband. This alternative model and the baseline are consistent with the observed data in their regions and with each other. The differences between the baseline and the alternative are used as a backgroundmodel shape uncertainty, with a two-sided uncertainty defined by symmetrizing the difference about the baseline. The uncertainty is split into two components to allow two independent variations: a low-H T and a high-H T component, where H T is the scalar sum of the p T of the four jets constituting the Higgs boson candidates. The boundary value is 300 GeV. The low-H T shape uncertainty primarily affects the m 4j spectrum below 400 GeV (close to the kinematic threshold) by up to 5%, and the high-H T uncertainty mainly m 4j above this by up to 30% relative to nominal.
Shape uncertainties affecting the tt background component are dominated by those associated with the use of two-tag simulation to model the m 4j distribution of hadronic tt. These uncertainties are assessed using the same procedure as for the multijet background and are again split into low-H T and high-H T components. The impact of detector and theoretical modelling uncertainties on the tt background shape were assessed but are found to be negligible, because the data-driven reweighting procedure used for the multijet modelling compensates for biases in the tt sample by adjusting the multijet model.
Theoretical uncertainties in the signal acceptance result from variations of renormalization and factorization scales, PDF set uncertainties, and uncertainties in modelling of the underlying event and hadronic showers (therefore varying initial-and final-state radiation). The scales are varied by factors of 2 and 0.5. The PDF uncertainties are evaluated using PDF4LHC15 sets [35]. The parton shower and underlying event are varied by replacing H ++ with P for the scalar samples, and vice versa for the spin-2 samples. The total theoretical uncertainty is dominated by the shower variations. The size of the variation of the expected signal yield is typically below 10% but can increase to 23%, depending on the signal hypothesis.
The following detector modelling uncertainties are evaluated: uncertainties in the jet energy scale (JES) and resolution (JER), uncertainties in the b-tagging efficiency, and uncertainties in the trigger efficiency. The jet energy uncertainties are derived using in situ measurement techniques described in Refs. Trigger efficiency uncertainties are evaluated for the signal, based on the systematic uncertainties arising from the per-jet online b-tagging measurements. There is an additional, small, non-closure uncertainty associated with the calculation of per-event trigger efficiencies using the measured per-jet efficiencies. The total trigger efficiency uncertainty is ±2% for the non-resonant signal and for resonant signals with masses below 1100 GeV, growing to ±5% for a resonance of mass 1400 GeV.
Uncertainties in the signal are fully correlated between the 2015 and 2016 datasets, except those for the luminosity and trigger efficiency. Systematic uncertainties in the normalization and shape of the multijet background model are treated as uncorrelated between the two datasets. Table 2 summarizes the relative impact of the uncertainties on the event yields.

Signal region event yields
The number of events observed in the data, the predicted number of background events in the signal region, and the predicted yield for three potential signals are presented in Table 3 for both the 2015 and 2016 datasets. The numbers of observed and predicted events in the control region are also given, and they are in agreement. A discrepancy between data and the total prediction is seen in the 2016 dataset; about half of this excess can be attributed to one bin at m 4j = 280 GeV. Figure 4 shows comparisons of the predicted m 4j background distributions to those observed in the 2015 and 2016 datasets. A few signal models are also displayed. The scalar sample shown is normalized to a cross section times H → bb branching ratio of 2.7 pb, which is the best-fit value (the fit is described in Section 7). The predicted background and observed distributions are mostly in agreement, with an excess above the predicted background in one bin around 280 GeV, which is discussed in Section 7.

Boosted analysis
The boosted analysis is optimized to discover signals arising from production of high-mass resonances decaying into Higgs boson pairs. The strategy is to select two Higgs boson candidates with mass near m H , each composed of a single large-R jet with at least one b-tagged track-jet matched to it. Three samples are defined according to the total number of b-tagged track-jets associated with the Higgs boson candidates. Since the triggers are fully efficient for the signal processes in both the 2015 and the 2016 datasets, the two datasets are combined into one.
The invariant mass of the two-Higgs-boson-candidate system, m 2J , is used as the final discriminant between Higgs boson pair production and the SM backgrounds. Events that pass the resolved signal region selection are vetoed in the boosted analysis, thus priority is given to the resolved analysis if an event passes both selections, which increases the sensitivity.

Selection
Events are required to have a primary vertex with at least two tracks matched to it. Events are selected that have at least two anti-k t large-R jets with p T > 250 GeV, |η| < 2.0, and mass m J > 50 GeV. Only the two jets with highest p T are retained for further selection. The leading jet is required to have p T > 450 GeV, which ensures 100% trigger efficiency. Since high-mass resonances tend to produce jets that are more central than those from multijet background processes, the two large-R jets are required to have a separation |∆η| < 1.7. To be considered as a Higgs boson candidate, each large-R jet must contain at least one b-tagged R = 0.2 track-jet matched to it by ghost association.
Three separate samples of events are selected. The "two-tag" sample requires each Higgs boson candidate to have exactly one associated b-tagged track-jet. The "three-tag" and "four-tag" samples require that there are exactly three or exactly four b-tagged track-jets associated with Higgs boson candidates in the event, with two b-tagged track-jets associated with one candidate and one or two associated with the other candidate. Increasing the required number of associated b-tagged track-jets in the event increases signal purity at the expense of lower signal efficiency. This loss of efficiency is particularly pronounced for the highest resonance masses. It is rare to identify four distinct track-jets containing b-hadrons in these high-mass events due to the inefficiency in b-tagging high-p T track-jets, and also because the extremely high Lorentz boosts make it difficult to resolve each b-quark as a separate track-jet.
Signal event candidates are selected if each of the large-R jets has a mass consistent with that of the Higgs boson. This is defined analogously to the resolved analysis in Eq. The fraction of signal events accepted by the detector multiplied with the efficiency of each selection step for the G KK model with k/M Pl = 1 is shown in Figure 5(a), and the efficiency for signal events to populate each of the three samples is displayed in Figure 5 A correction is made, multiplying the four-momentum of each large-R jet with a factor m H /m J . This slightly improves the resolution of m 2J for signal events by reducing the low-mass tails caused by energy loss. There is little impact on the background distribution.

Background estimation
The main backgrounds after selection are multijet events, which comprise 80-95% of the total background, depending on the number of b-tagged track-jets required, with the remainder being tt events. The contribution of Z + jets events to the signal region in each sample is estimated using simulation to be less than 0.1%, and is therefore neglected. Other sources of background, including processes involving single Higgs boson production, are also negligible. The multijet events are modelled using data. The tt yield is estimated using a data-driven technique, while the tt m 2J shape is taken from simulation.
The shape of the multijet background is modelled using independent data samples selected with the same trigger and selection requirements as described above, but with fewer b-tagged track-jets. To model the two-tag sample, a "1b-1" sample is selected comprising events where one of the large-R jets contains a single b-tagged track-jet, while the other large-R jet contains no b-tagged track-jets, but at least one track-jet which fails b-tagging. Analogous "2b-1" and "2b-2" samples are selected to model the threeand four-tag samples, respectively. These comprise events where one large-R jet contains two b-tagged track-jets and the other large-R jet contains no b-tagged track-jets but exactly one (2b-1) or at least two (2b-2) track-jets that fail b-tagging. These selections are referred to below as lower-tagged samples.
The normalizations of both the multijet and the tt backgrounds are derived using a signal-free sideband region. The definition of the sideband region is optimized such that it contains the bulk of the tt events, yet is close enough to the signal region to accurately model the background kinematics there. A control region between the signal and sideband regions is used to validate the background models and to assign systematic uncertainties. The regions are defined using X H H and two other variables: The signal region contains events with X H H < 1.6, events in the control region fulfil R H H < 33 GeV and X H H > 1.6, and events in the sideband satisfy R H H > 33 GeVand R Similarly to the resolved analysis, corrections are made for differences between the lower-tagged and n-tag samples by reweighting events in the lower-tagged sample. Differences between those samples are expected since requiring b-tags generally affects the jet kinematics. In the 1b-1 sample the anti-b-tagged large-R jet's kinematics are reweighted to mimic the kinematics of a tagged large-R jet (i.e. a Higgs boson candidate). Similarly, in the 2b-1 (2b-2) sample the anti-b-tagged large-R jet's kinematics are reweighted to the kinematics of a Higgs boson candidate with one (two) b-tags. The weights are derived from data from lower-tagged samples which are inclusive in the regions, ie. events from signal, control or sideband region. Each lower-tagged sample is split into two subsamples, depending on whether the leading or sub-leading large-R jet is b-tagged. The weights are obtained from spline interpolations to the ratios of the two subsamples for the three distributions that are most affected by b-tagging: the p T of the leading large-R jet, and the p T of the leading track-jets associated with the leading and sub-leading large-R jets. The reweighting is iterated until the weights converge to stable values.
The background yield in each of the three signal samples, N n-tag background (where n-tag represents two-, threeand four-tag), is evaluated using the following expression:  Table 4. The impact of limited statistical precision for m 2J > 1.2 TeV in the multijet and tt models is ameliorated by fitting the background distributions with the following function: where p i are free parameters and √ s is the centre-of-mass energy. This functional form is chosen after fitting various functions to the lower-tagged data. The m 2J distributions of the multijet and tt backgrounds are each fitted in the range 1.2 TeV < m 2J < 3.0 TeV. From these fits, smooth background histograms are generated and passed to the statistical analysis. Since very few simulated tt events pass the full three-tag or four-tag selections, the shape of the tt distribution in the three-tag or four-tag sample is taken from the two-tag distribution. The shape differences of these templates are negligible compared to the systematic uncertainties considered and the statistical uncertainties of the available samples.
The modelling of the background yields and kinematics is validated in the control regions of the n-tag samples. Good agreement is observed between the yield in data and the yield of predicted backgrounds in the sideband and control regions of each of the samples, as shown in Table 5. Figure 7 compares the predicted background m 2J distributions to data in the control regions of the three samples. The systematic uncertainties derived in Section 6.3 are shown.  Figure 7: The m 2J distributions in the control region of the boosted analysis for the data and the predicted background (top panels) for (a) the two-tag, (b) the three-tag, and (c) the four-tag samples. The data-to-background ratio (bottom panels) shows also the combination of statistical and systematic uncertainties as the grey hatched band. The expected signal for a 2 TeV G KK resonance with k/M Pl = 1 and a scalar with the same mass is also shown. The scalar has an arbitrary cross section times branching ratio of 12 fb. Table 5: The number of events in data and predicted background yields in the sideband and control regions of the two-tag, three-tag and four-tag samples for the boosted analysis. The numbers of multijet and tt background events in the sideband regions are constrained by the number of observed events, as explained in the text. The uncertainties are statistical, with fit uncertainties included for backgrounds. The anti-correlation between the multijet and tt yields is accounted for in the uncertainty in the total background yield.

Systematic uncertainties
Uncertainties that are in common with those of the resolved analysis are the theoretical uncertainties in the signal acceptance and the b-tagging uncertainties. Systematic uncertainties that differ from those of the resolved analysis are described here.
The uncertainty in the integrated luminosity of the combined 2015 and 2016 datatsets is ±2.1% [67].
The large-R jet energy and mass uncertainties (i.e. jet mass scale, JMS, and jet mass resolution, JMR) are derived in situ from 13 TeV pp collisions, using techniques described in Ref.
[69]. The uncertainty in the b-tagging efficiency for track-jets is evaluated with the same method as used for R = 0.4 calorimeter-based jets.
Uncertainties in the signal are treated as fully correlated across the three samples.
Detector modelling uncertainties in the tt sample (b-tagging efficiencies; jet energy, resolution and mass) impact the result of the normalization fit. These variations of µ multijet and α tt are propagated to the predictions of the multijet and tt estimates in the signal regions, and they are treated as fully correlated across the three signal regions and are fully correlated with the same uncertainties in the signal.
An uncertainty in both the shape and normalization of the multijet and tt backgrounds is assigned by considering the statistical uncertainties in the nominal fitted values of α tt and µ multijet , as given in Table 4. Two orthogonal eigenvariations are calculated from the covariance matrix of the normalization fit, which are then applied to the background predictions. Correlations between α tt and µ multijet are fully retained this way.
the leading large-R jet's mass distribution is carried out in the varied sideband region, and the validation is done in the varied control region. For each variation, the estimated total number of background events is compared to the data, and the largest difference seen is taken as the uncertainty. The assigned uncertainties are ±12.2%, ±4.2% and ±2.8% in four-tag, three-tag and two-tag samples, respectively.
Each uncertainty in the data-driven background estimate is evaluated for each sample, and these are treated as uncorrelated across the samples in the statistical analysis.
A summary of the systematic uncertainties and their impacts on the event yields is presented in Table 6. The impact of b-tagging efficiency uncertainties is smaller in the three-tag sample, since the variations applied to b-tagged track-jets and anti-b-tagged track-jets have effects that partially cancel out.

Signal region event yields
The number of events observed in the data, the predicted number of background events in the signal region, and the predicted yield for two potential signals are presented in Table 7 for the two-tag, three-tag, and four-tag samples. The scalar sample shown is normalized to 12 fb. The numbers of predicted background events and observed events are in agreement within the statistical uncertainties. Figure 8 shows comparisons of the predicted m 2J background distributions to those observed in the data. The predicted background and observed distributions are in agreement, with no significant excess. Table 7: The number of predicted background events in the signal region for the boosted analysis compared to the data, for the two-tag, three-tag, and four-tag samples. The yields for a 2 TeV scalar and a 2 TeV G KK with k/M Pl = 1 are also shown. The scalar is normalized to a cross section times branching ratio of 12 fb. The quoted uncertainties include both the statistical and systematic uncertainties. The anti-correlation between the multijet and tt yields is accounted for in the uncertainty in the total background yield.

Statistical analysis
Following the statistical procedures outlined in Ref. [1], a test statistic based on the profile likelihood ratio [70] is used to test hypothesized values of µ = σ/σ model , the global signal strength factor, separately for each signal model. The exclusion limits are computed using asymptotic formulae [70] and are based on the CL s method [71], where a value of µ is regarded as excluded at the 95% confidence level (CL) when CL s is less than 5%.

Resonant H H production
The resolved analysis is performed for resonance masses in the range 260-1400 GeV, the boosted analysis is carried out for signal masses in the range 800-3000 GeV, and the two analyses are combined in the mass range where they overlap. The statistical analysis is performed using the data observed in the signal regions. For the resolved analysis, the m 4j distribution is used as the final discriminant and the 2015 and 2016 datasets are fitted simultaneously. In the boosted analysis the m 2J distribution is used as the final discriminant, and the data from the two-tag, three-tag and four-tag signal regions are fitted simultaneously; however, the two-tag sample is not considered for masses below 1500 GeV since its contribution to the sensitivity is negligible at low mass.
Systematic uncertainties are treated within each signal region using Gaussian or log-normal constraint terms in the definition of the likelihood function. The luminosity uncertainty is treated as uncorrelated for the 2015 and 2016 datasets of the resolved analysis and the combined boosted dataset, since a subset of the 2016 dataset in the resolved analysis could not be used. All other systematic uncertainties affecting the signal are fully correlated between the resolved and boosted samples. Uncertainties in the background models are treated as uncorrelated between both analyses.
Before the fit is performed on the collision data, it is first validated on artificial datasets without statistical fluctuations. These datasets are created from the background-only model in the signal regions. The pulls, constraints, and correlations of the nuisance parameters are then checked at each mass point in the fits to the data. The impact of each uncertainty on µ is computed; the leading uncertainty at 280 GeV is the background modelling uncertainty that arises from comparing the background-model derived from control region data rather than sideband data. At 2000 GeV it is the background modelling uncertainty calculated by comparing smoothed data to the smoothed prediction in the control region.
Asymptotic approximations are used to calculate the local significance of a deviation from the backgroundonly hypothesis. The largest local deviation is found at 280 GeV, where the p 0 value is 1.7 · 10 −4 (3.6σ) for the narrow-width scalar, and 5.8 · 10 −3 (2.5σ) for the k/M Pl = 1 graviton model. The k/M Pl = 2 graviton signal is too wide to explain this deviation. The signal shape of the scalar at 280 GeV has an approximately Gaussian form, resulting from the m 4j resolution of about 9 GeV. The graviton signals have finite widths Γ (Γ = 8 GeV for k/M Pl = 1 and Γ = 33 GeV for k/M Pl = 2) and furthermore, their shapes are distorted due to their close proximity to the kinematic threshold.
The global significance is evaluated using pseudo-experiments, generated from the background-only model that has been fitted to the data. For each pseudo-experiment, the largest local excess is obtained from unconditional fits to all signal mass points and all three signal models. A distribution of those local significances is sampled, and the global p 0 value is computed by counting how often the largest excess in the pseudo-experiments is more significant than that observed in the data. The global significance obtained is 2.3σ.
Upper limits on the cross section are set in each of the benchmark models. Figure 9 shows the combined 95% CL upper limits for a narrow-width scalar and a spin-2 G KK in the bulk Randall-Sundrum (RS) model with k/M Pl = 1 or k/M Pl = 2. The predicted cross sections are shown for the graviton models, taken from Ref. [36]. The bulk RS model with k/M Pl = 1 is excluded for masses between 313 and 1362 GeV, and the bulk RS model with k/M Pl = 2 is excluded for masses below 1744 GeV. The limits at low mass are stronger for the wider Graviton than the narrow Graviton model, because in the former case the signal has a larger acceptance and the distorted shape peaks at higher values of m 4j above a steeply falling background.

SM non-resonant H H production
The non-resonant search is performed using the resolved analysis only, since it has much better sensitivity to non-resonant signals than the boosted analysis. Using the SM HH non-resonant production via gluongluon fusion as the signal model, computed at NLO and fully accounting for the top-quark mass, the observed 95% CL upper limit is σ(pp → HH → bbbb) < 147 fb. This value is to be compared with the SM prediction for gluon-gluon fusion produced HH of σ(pp → HH → bbbb) = 11.3 +0.9 −1.0 fb. The expected and observed limit values and the uncertainties in the expectation are given in Table 8, in units of the SM prediction. For ease of comparison with previous results, the limits are also calculated using the approximate top-quark mass correction: the observed limit is σ(pp → HH → bbbb) < 134 fb, about 9% smaller than that obtained with the exact top-quark mass correction.
Despite the excess at 280 GeV, the non-resonant limits are stronger than expected. The non-resonant signal shape has its maximum around 400 GeV, with a slowly falling spectrum towards high mass (Figure 4), and in several bins the number of observed events is below the prediction.
The result is limited by systematic uncertainties in the background normalization and shape. Since these are data-driven, an increase of the integrated luminosity will improve the sensitivity.  Results are reported for the resolved analysis with each H → bb decay reconstructed as two separate b-tagged small-R jets and for the boosted analysis with each H → bb decay reconstructed as a single large-R jet associated with at least one b-tagged track-jet.
No significant data excess is observed above the estimated background consisting mainly of multijet and tt events. The largest deviation from the background-only hypothesis is observed for narrow signal models at a mass of 280 GeV in the resolved analysis, with a global significance of 2.3σ.
Upper limits on the production cross section times branching ratio to the bbbb final state are set for a narrow-width scalar and for spin-2 resonances with k/M Pl = 1 or k/M Pl = 2. The bulk RS model with k/M Pl = 1 is excluded for masses between 313 and 1362 GeV, and the bulk RS model with k/M Pl = 2 is excluded for masses below 1744 GeV. The 95% CL upper limit on the non-resonant production is 147 fb, which corresponds to 13.0 times the SM expectation.

Acknowledgements
We thank CERN for the very successful operation of the LHC, as well as the support staff from our institutions without whom ATLAS could not be operated efficiently. [