Search for nonresonant Higgs boson pair production in the $\mathrm{b\overline{b}b\overline{b}}$ final state at $\sqrt{s} =$ 13 TeV

Results of a search for nonresonant production of Higgs boson pairs, with each Higgs boson decaying to a $\mathrm{b\overline{b}b\overline{b}}$ pair, are presented. This search uses data from proton-proton collisions at a centre-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 35.9 fb$^{-1}$, collected by the CMS detector at the LHC. No signal is observed, and a 95% confidence level upper limit of 847 fb is set on the cross section for standard model nonresonant Higgs boson pair production times the squared branching fraction of the Higgs boson decay to a $\mathrm{b\overline{b}b\overline{b}}$ pair. The same signature is studied, and upper limits are set, in the context of models of physics beyond the standard model that predict modified couplings of the Higgs boson.


Introduction
The detailed understanding of the properties of the Higgs boson (H) discovered in 2012 by the CERN LHC experiments [1][2][3] remains an important subject in fundamental physics. Current determinations of the properties of the new particle by the ATLAS and CMS Collaborations are found to be in agreement with standard model (SM) predictions [4,5]. However, there are still many measurements that could reveal unexpected deviations from the SM. A number of models of physics beyond the SM (BSM) can be tested using their predictions of the properties of the observed state, including the Higgs boson self-coupling and couplings to bosons and fermions [6][7][8][9][10].
The production of Higgs boson pairs (HH) is the most direct way to access the Higgs boson selfcoupling [11] and to study in detail the SM Higgs potential. The HH production cross section predicted by the SM for 13 TeV proton-proton (pp) collisions and m H = 125.09 GeV [5,12] is 33.49 +4.3% −6.0% (scale) ± 2.3%(α S ) ± 2.1%(PDF) fb [13][14][15][16][17], where the uncertainty is due to the variation of the renormalization (µ R ) and factorization (µ F ) scales (scale), the strong coupling constant (α S ) uncertainties, and the uncertainty in parton distribution functions (PDF). The predicted cross section results in a low expected event rate, and the acceptance for HH events in the detector is small. This means that the SM HH production process cannot be observed with the data collected so far at the LHC: the expectation is that it will only be possible to set an upper limit on the HH production cross section, as discussed, e.g. in Refs. [18,19]. However, the cross section can be enhanced by anomalous couplings in BSM models [20] and in some cases the enhancement is large enough that HH production could be observed with the current data.
The first searches for nonresonant HH production were performed by LHC experiments using pp collisions data at √ s = 8 TeV [21,22]. The data collected in 2015 and 2016 at √ s = 13 TeV were used for improved analyses in the decay channels: bbbb [19], bb ν ν [23], bbττ [18], and bbγγ [24]. An additional search in the bbbb decay channel focused on the region of phase space where one bb pair is highly Lorentz-boosted and is reconstructed as a single large-area jet [25]. In the cases mentioned above, at least one of the two Higgs bosons is required to decay to bb to exploit the large branching fraction of this decay. Results were found to be compatible within uncertainties to the expected SM background contribution. The measurement of nonresonant HH production at the LHC with the tightest expected upper limit (19 times the SM rate) was made in the bbγγ channel [24], yielding an observed upper limit equivalent to 22 times the SM rate.
This article reports the results of a search for HH production with both Higgs bosons decaying into bottom quark pairs, resulting in distinct hadronic jets. The search is performed using 13 TeV pp collisions data corresponding to an integrated luminosity of 35.9 fb −1 , collected by the CMS detector in 2016. The final state containing four b quarks has the highest branching fraction of all possible HH final states, corresponding to ≈0.339 for an SM Higgs boson with a mass of 125 GeV. It is one of the most sensitive signatures for the investigation of HH production, as confirmed by the results of a similar search recently performed by the ATLAS Collaboration [19]. The main challenge for this analysis is the large background from multijet final states produced by quantum chromodynamics (QCD) processes, which collectively yield rates exceeding that of the signal by several orders of magnitude. We address this by fully exploiting the distinctive features of the signal: the presence of four b quarks and the kinematical properties of the decay process. In a sample selected by requiring four b quark jets, a multivariate event classifier is trained to discriminate signal from background. This sample is studied by comparing it to a model of all contributing background processes, which is completely based on data. Because of the use of different data sets, triggers, and offline selection requirements, this analysis is fully independent from the CMS searches mentioned above [18, 23-25].

Beyond-the-standard-model extensions
In the SM, HH production occurs predominantly by gluon-gluon fusion (ggF) via an internal fermion loop, where the top quark (t) contribution is dominant. In the absence of new light states, the ggF HH production at the LHC can be generally described (considering operators up to dimension 6) by five parameters controlling the tree-level interactions of the Higgs boson. Without considering CP violating effects, the relevant part of the Lagrangian then takes the form: This Lagrangian follows from extending the SM with operators of mass dimension between four and six in the framework of an effective field theory [26], encoding the effects of new heavy states currently beyond experimental reach. The five parameters of the Lagrangian, named κ λ , κ t , c g , c 2g , and c 2 , are related to the Higgs boson couplings. In particular, the multiplicative factors κ λ = λ HHH /λ SM and κ t = y t /y SM parametrize deviations from the SM values of, respectively, the Higgs boson trilinear coupling and the top quark Yukawa coupling. The former is given by λ SM = m 2 H /2v 2 , with v being the vacuum expectation value of the Higgs field. The absolute couplings c g , c 2g , and c 2 parametrize contact interactions not predicted by the SM, i.e. the coupling of the Higgs boson to gluons and those of the two Higgs bosons to two gluons or to a top quark-antiquark pair, which could arise through the mediation of very heavy new states. In Eq. (1), m t is the mass of the top quark, and G µν the gluon field. We neglect possible modifications of the bottom quark Yukawa coupling κ b , which is already constrained by LHC data [27]. The Feynman diagrams contributing to HH production in pp collisions at leading order (LO) are shown in Fig. 1. The translation of the above parametrization to the flavour-diagonal Higgs basis (as discussed in Ref. [26]) is trivial; we use the notation of Eq. (1) for simplicity.
The parameter space for the Higgs boson couplings in a BSM scenario has five dimensions. Constraints on the ranges of the five parameters come from measurements of single Higgs boson production already performed at the LHC, as well as other theoretical considerations [28]. However, the allowed phase space is still large and a precise scan is not feasible. The kinematical properties of the pair-produced Higgs bosons depend strongly on the values of those five couplings. A statistical approach has been developed in order to identify regions of the parameter space that present similar final state kinematical properties. In each region, a benchmark model is selected at the most representative point. The procedure, described in Ref. [28], leads to twelve benchmarks that best represent, within a limited uncertainty, the phenomenology of the whole five-dimensional space. An additional benchmark (box), representative of the null Higgs boson self-coupling hypothesis, is also considered. The parameter values of the benchmarks are listed in Table 1. The search for BSM signals presented here is focused on these benchmarks. Table 1: The values of the anomalous coupling parameters for the 13 benchmark models studied [28]. For reference, the values of the parameters in the SM are also included.

The CMS detector
The CMS detector is a multipurpose apparatus designed to reconstruct the high-energy interactions produced by the LHC. Its central feature is a superconducting solenoid with an internal diameter of 6 m. The solenoid generates a magnetic field of 3.8 T inside a volume occupied by four main sub-detectors, each composed of a barrel and two endcap sections: silicon pixel and strip tracker detectors, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL). The pixel tracker provides an impact parameter resolution for charged tracks of about 15 µm, which allows for a precise reconstruction of secondary vertices, crucially used to identify jets originating from the hadronization of b quarks (b jets). Muons are measured in gas-ionization detectors embedded in a steel flux return yoke outside the solenoid. Information from the calorimeters and muon detectors is used by the first level of the CMS trigger [29], a system based on custom hardware processors that provides the first online event selection. The second level of the CMS trigger, also called high-level trigger and consisting of a farm of processors running a version of the full event reconstruction software optimized for fast processing, further selects events using information from the whole detector before sending them downstream for detailed processing and storage. Particles produced in the pp collisions are detected in the pseudorapidity range |η| < 5. A more detailed description of the CMS detector can be found in [30].

Data sets
The online event selection for the data used in this analysis is designed to select a sample of multijet events enriched with b quark decays, reducing the rate from the QCD multijet background with light quarks and gluons. The combined secondary vertex (CSVv2) algorithm [31] is used to identify b jets. This algorithm exploits the relatively long lifetime of hadrons containing b quarks (cτ ∼ 450 µm), which results in a displaced decay point of the produced b hadrons. The reconstructed trajectories of charged decay products from b hadrons thus exhibit significant impact parameters with respect to the b quark production point. The CSVv2 algorithm uses the impact parameter information together with information on other characteristics of the jets to discriminate jets originating from b quarks from those produced by the hadronization of light quarks or gluons. Two trigger paths contribute to the online selection.
In the first trigger path jets are considered if their p T is above 30 GeV and |η| < 2.6. Selected events must contain at least four such jets of which at least three are tagged as b jets by the CSVv2 algorithm and at least two have p T > 90 GeV. The second trigger path requires at least four jets with p T > 45 GeV with at least three tagged as bjets by the CSVv2 algorithm. The logical OR between these two selections provides the data used in this analysis.
The production of nonresonant HH in the SM is simulated following the prescriptions of Ref. [32] at LO with MADGRAPH5 aMC@NLO 2.2.2 [33] used as the generator. Loop factors are calculated on an event-by-event basis and applied to an effective model, from Ref. [32]; the NNPDF30 lo as 0130 nf 4 PDF set [34] is used. In addition, for the study of BSM models involving anomalous Higgs boson couplings, we generate for each of the parameter space points listed in Table 1 a set of 300 000 simulated events.
The 14 simulated signal samples are added together to obtain a larger signal sample. We will refer to this ensemble of events as the Pangea sample. This sample is then reweighted to reproduce the physics of any particular point in the BSM phase space. The weights are obtained by looking at the matrix element information for m gen HH and cos θ * gen from dedicated simulations, as described in Ref. [35]. The numbers of events used to determine the weights at generator level are 3 000 000 for the SM sample and 50 000 for each BSM benchmark. In the following, we always use the Pangea sample instead of the 14 original samples to study signal properties in each model considered.
Although our search employs an approach fully based on data to model backgrounds, we make use of a simulation of QCD processes for several cross-checks. This simulation consists of a collection of seven simulated data sets of contiguous ranges in the H gen T variable, which is defined as the scalar sum of the p T of all partons that originate from the hard-scattering process in a simulated event. The samples are generated by MADGRAPH5 aMC@NLO 2.2.2 at LO, using the NNPDF30 lo as 0130 set, and are then interfaced with PYTHIA 8.212 [36] for fragmentation and parton showering, using the MLM matching [37]; their equivalent integrated luminosity depends on the H  [38][39][40] sample of inclusive tt [41] events is used. The behaviour of minor backgrounds is verified and a study of their contamination of our selected sample is carried out using POWHEG 2.0 NLO samples of single top quark t channel [42], ttH [43], single Higgs boson production [44], and associated ZH production [45]. In addition, we use single top quark s channel, tttt, ttbb, and bbH samples generated with MADGRAPH5 aMC@NLO 2.2.2 at NLO. All of those samples are interfaced with PYTHIA 8.212 for parton showering and fragmentation. The tt sample utilises the generator tune CUETP8M2T4 [46] for the underlying event activity, other samples interfaced with PYTHIA use the tune CUETP8M1 [47]. The tt, ttH, single Higgs boson, and ZH samples are generated using the NNPDF30 nlo as 0118 PDF set. The single top quark, ttbb, and bbH samples are generated with the NNPDF30 nlo nf 4 pdfas set. The NNPDF30 nlo nf 5 pdfas set is used to generate the tttt sample. All of the PDF sets are taken from the LHAPDF6 set [48]. The response of the CMS detector is modelled using GEANT4 [49].
Finally, in order to study possible discrepancies between the efficiency of the triggers used in our data selection and their modelling by the simulation, we compare the effect of bjet selection requirements on data collected by a trigger requiring a single isolated muon of p T > 18 GeV with its simulation, using a mixture of events from tt/single-top described above and a MAD-GRAPH5 aMC@NLO 2.2.2 W+jets LO sample using the MLM matching, weighted appropriately, and a W+jets sample generated using the NNPDF30 lo as 0130 PDF set.

Event reconstruction
Global event reconstruction is performed by the particle-flow (PF) algorithm [50], which aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. In this process, the identification of the particle type (photon, electron, muon, charged hadron, neutral hadron) plays an important role in the determination of the particle direction and energy. Electrons (e.g. coming from photon conversions in the tracker material or from b hadron semileptonic decays) are identified as a primary charged particle track and one or more ECAL energy clusters corresponding to this track extrapolation to the ECAL and to possible bremsstrahlung photons emitted along the way through the tracker material. Muons (e.g. from b hadron semileptonic decays) are identified as a track in the central tracker consistent with either a track or several hits in the muon system, associated with an energy deficit in the calorimeters. The objects primarily considered in this analysis are hadronic jets, composed of particles produced by quark fragmentation and hadronization. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. The energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy. Jets are reconstructed from PF candidates using the anti-k T clustering algorithm [51] with a distance parameter of 0.4, as implemented in the FASTJET package [52]. Jet energy corrections are applied to both data and simulation to scale the energy and correct for differences in the detector response in real and simulated collisions [53]. Jet identification criteria are also applied in order to reject fake jets from detector noise and jets originating from primary vertices not associated with the hard interaction [54]. The combined multivariate algorithm (CMVA) [31] is used in the offline analysis to identify jets that originate from the hadronization of b quarks. The CMVA builds on the CSV algorithm by adding soft-lepton information to the combined discriminant. The value of the multivariate discriminant used depends on the required suppression of jets from light quarks and gluons. The medium working point of the CMVA, defined such that the misidentification rate of light quarks and gluons as b jets is 1%, is used in this analysis. For jets produced by the hadronization of b quarks emitted in HH production events, the medium working point corresponds to a b-tagging efficiency of about 65% for the jets of interest of this analysis.
A weight is applied to each Monte Carlo (MC) event in order to match the distribution of the number of primary interactions per event in data (pileup correction), thus reproducing the effect on the selection efficiency of the varying instantaneous luminosity conditions incurred during data taking. The simulated events are also weighted to account for measured differences in the b tagging efficiency between data and simulation [31]. The trigger efficiency for signal events is evaluated using a full simulation of the trigger [29]. The correction factor for the efficiency is found to be 0.96 ± 0.02 based on measurements performed in b-tag multiplicity categories, using a top-pair enriched sample collected with an inclusive muon trigger.

Analysis strategy
The focus of this search is the study of nonresonant production of HH in the bbbb final state, as predicted by the SM and by several BSM extensions. The analysis is optimized for sensitivity to the SM signal. We use the same selection to extract limits on the HH production cross section for the BSM models.
The offline selection, performed on all data events passing one of the two trigger paths described in Section 4, aims at increasing the fraction of data events containing two Higgs boson candidates decaying into b quark jet pairs. This includes a preliminary selection of events where four or more jets have been b-tagged by the CMVA. Although this selection significantly reduces the QCD multijet background rate, this background still dominates the selected data, with contributions from events where light quark or gluon jets are mistagged by the CMVA and events containing heavy quarks.
After the selection of events with four or more b tags, each reconstructed Higgs boson candidate is composed of a pair of b jets, referred to in the following as "a dijet system" or simply "dijet". A boosted decision tree (BDT) classifier [55] is then trained to exploit the observable differences between the SM signal and the background. Finally, a search for a signal contribution to the selected data and an extraction of an upper limit in the number of selected signal events is performed by means of a binned fit to the distribution of the BDT classifier output. The limit on the number of events is converted to a limit on the HH production cross section times the square of the branching fraction of the Higgs boson into a bb pair, using the corresponding integrated luminosity and the computed signal efficiency.
Both the optimization of the BDT classifier and the extraction of upper limits on HH production require an accurate modelling of the multijet background. Unfortunately, the precise simulation of QCD processes yielding a large number of final-state partons is notoriously hard, as MC simulations are not complete to beyond LO; in addition, the very large production cross section for those processes makes it wholly impractical to produce simulated data sets corresponding to an integrated luminosity comparable to that of collision data. To address these issues, a dedicated method, fully based on data, was developed to produce a precise model of the kinematical behaviour of background events. This is described in detail in Section 8.

Event selection
The events of interest are identified by a jet-based selection applied to data collected by the triggers described in Section 4, as well as to all simulated samples. Jets are required to have p T j > 30 GeV and |η j | < 2.4. We require at least four such jets (N j ≥ 4) and these need to be defined as b-tagged jets by the medium working point of the CMVA (N b ≥ 4). These criteria strongly reduce the QCD multijet background and select HH production events where the final state can be fully reconstructed. The number of selected events in the data set studied is 184 879.
The efficiencies for the SM signal are listed in Table 2. The efficiency for the 13 BSM points varies from -40% to +10% compared to the SM values. The average number of jets per selected event is ≈5. The four jets with the highest CMVA discriminant values are considered as the decay products of two Higgs boson candidates. The pairing of the four jets into Higgs boson candidates is performed by considering the invariant mass of the two dijet candidates calculable for the three possible pairings, and computing the absolute mass differences ∆M klmn = |M kl − M mn |, where the klmn indices run on the three permutations of the four jets. The combination resulting in the smallest mass difference between the two dijet systems is chosen as the one best describing the decay topology. This procedure results in a correct pairing of the b quarks to Higgs bosons in 54% of the cases, as tested on the Pangea MC signal sample. The two selected dijets are then labelled as "leading" and "trailing" according to their invariant mass value. This procedure, which does not explicitly use the known mass of the Higgs boson, allows the dijet masses for the selected combinations to retain the power to discriminate the HH production signal from the background. Table 2: Cut-flow efficiency for the SM signal pp → HH → bbbb; the efficiency and the relative reduction of each successive selection step is shown. The number of expected SM signal events for an integrated luminosity of 1 fb −1 is also reported. A multivariate technique is used in order to improve the sensitivity of the analysis. A BDT discriminator is trained to distinguish the SM signal from backgrounds (as described in Section 8), using the XGBOOST library [55]. All 13 BSM models use the same BDT as the SM to distinguish signal from background. We supply the BDT algorithm with a set of variables describing the kinematical properties of the event. The list of variables is pruned to discard those not contributing to the overall discrimination power of the algorithm. In Table 3 we list the variables chosen to build the classifier. In order to characterise the HH system we use as mass variables the invariant mass of the HH system (M HH ), an estimator of the combined mass of the HH system, M X (defined by Eq. ). Additionally, we use the θ * angles between the HH system and the leading Higgs boson candidate, cos θ * H 1 H 2 −H 1 , and between the leading Higgs boson candidate and the leading jet, cos θ * H 1 -j 1 . We further use the following jet-related variables: the p T i j and η i j (i = 1-4) of the four jets with the highest values of the CMVA discriminant, the scalar p T sum of the jets in the event (H T ), and of the jets that are not part of the reconstructed HH system (i.e. the rest of the jets, H rest T ). The CMVA values of the third and fourth jets sorted by CMVA value (CMVA 3 and CMVA 4 ) are also used. The estimator, M X , of the mass of the system of two Higgs bosons is constructed as: where m H = 125 GeV. Even though M X is strongly correlated with other variables used in the BDT, its use improves the discrimination power.
The invariant masses of the reconstructed Higgs boson candidates are the variables with the largest discrimination power, but all the variables used make significant contributions to the classifier.
We use 60% of the Pangea sample for training the classifier. The remaining 40% of the Pangea sample is employed for the validation (20%) and application (20%) steps; this splitting has been found to produce maximal sensitivity to a possible HH signal. As a background sample, an artificial data set constructed with a custom mixing procedure, as described in Section 8, is employed.

The background model
A method exploiting collision data only, based on hemisphere mixing, has been developed [56] to perform two separate tasks: first, to provide input to the training of the BDT classifier; and second, to reproduce the expected shape of the BDT output in background-only events. The method does not require the presence of signal-depleted sidebands in order to extract a background estimation; in fact, it aims at creating an artificial background data set using the whole original data set as the input. Thus, rather than a model of a single distribution, a full model of the original data is produced.

The hemisphere mixing technique
The basic concept at the heart of the method is to divide each data event in two hemispheres. The collection of hemispheres can then be used to create new events by recombining them in pairs. To create a good background model, the kinematical properties of the new events must be as similar as possible to the ones of the original data but also insensitive to the possible presence of signal. In order to define the hemispheres, we use the transverse thrust axis. This is defined as the axis on which the sum of the absolute values of the projections of the p T of the jets is maximal, and correspondingly, transverse thrust (T) is the value of this sum. Once the transverse thrust axis is identified, the event is divided into two halves by cutting perpendicular to the transverse thrust axis. One such half is called a hemisphere (h). In a preliminary step, each event in the original N-event data set is split into two hemispheres that are collected in a library of 2N elements. Once the library is created, each event is used as a basis for creating artificial events. These are constructed by picking two hemispheres from the library that are similar, according to a measure defined below, to the two hemispheres that make up the original event. An illustration of the procedure can be found in Fig. 2. Once the thrust axis is identified, the event is divided into two halves by cutting along the axis perpendicular to the transverse thrust axis. One such half is called a hemisphere (h). In a preliminary step, each event in the original N-event data set is split into two hemispheres that are collected in a library of 2N hemispheres. Once the library is created, each event is used as a basis for creating artificial events. These are constructed by picking two hemispheres from the library that are similar to the two hemispheres that make up the original event.
the one in the library that is compared to o, the number of jets in o and q is required to be equal, N o j = N q j , and also the number of b-tagged jets are required to be equal, N o b = N q b . These two requirements are used to maintain the topology of the original events and to avoid introducing events that would not pass the selection described in Section 7 (e.g. by combining a hemisphere with 2 jets with a hemisphere with 1 jet, resulting in an event with 3 jets). The requirement for equal numbers of jets is waived for the infrequently occurring pairs of hemispheres that both have at least four jets and at least four b-tagged jets. For each hemisphere q in the library fulfilling the above criteria, a multidimensional distance from hemisphere o is computed using the four jet-related variables, as follows: To keep track of the distance criterion used to choose each hemisphere, the artificial event may be labelled as (k 1 , k 2 ) using the neighbour indices, indicating that one hemisphere of the original event was replaced by its k 1 th neighbour and the other hemisphere of the original event by its k 2 th neighbour and these were used to form the artificial event. By applying this procedure to the whole set of events of the data to be modelled, and by choosing a limiting value K for k, we obtain a total of K 2 data sets, each equal in size to the original one, and each featuring very similar characteristics to the original one, despite being made up entirely of artificial events.
The procedure described above is successful at modelling multijet events because it exploits the fact that their production can be idealized at LO as a 2 → 2 process, which is made complex by a number of sub-leading effects (QCD radiation, pileup, multiple interactions). The reconstruction of the transverse thrust axis, and the decomposition of events into hemispheres using that axis as a seed, uses the independent fragmentation of the two final state partons as a working hypothesis to create artificial replicas of the original events. The method destroys any correlation in the jet distribution between the two hemispheres, so that any physical effect, such as the decay of a heavy object into jet pairs, is washed out in the artificial samples. Because of this, the resulting artificial data sets are unaffected by the presence of a small signal contamination in the original data. This has been verified by signal injection tests. We started with an original data set composed of simulated QCD multijet events to which is added an additional component of signal corresponding to a cross section 100 times larger than the one expected by the SM. After hemisphere mixing, the kinematical properties of the resulting artificial samples are found to resemble closely those from the QCD multijet part of the original data set, which is its dominant component, and unaffected by the minority component (the signal contamination). Naively this can be understood if we note that, if the signal fraction in the original sample amounts to e.g. 0.1%, the probability that a signal event is modelled using two different hemispheres both originally belonging to signal events is of the order of 0.0001%. Event-based variables such as the two Higgs boson candidate masses, which are obtained by the minimum ∆M criterion described in Section 7 and are thus sensitive to the characteristics of both hemispheres together, do not retain their distribution in events where only one hemisphere is taken from an HH decay event.
We apply the hemisphere mixing technique to data events selected with the N j ≥ 4, N b ≥ 4 criteria, using K = 10 neighbour hemispheres to each hemisphere of the original event, which were found to still provide good modelling. The resulting artificial samples are used to provide a background model in the training of the BDT classifier (training sample), as well as an independent set for the BDT validation and optimization (validation sample), and a third data set used to extract the predicted shape of the optimized BDT (application sample). Not all of the data sets are fully independent so only a subset can be safely employed for further studies. We use the following collections of artificial events in the measurement: for the training sample, we use all artificial events of types (1, 1), (1, 2), (2, 1), and (2, 2); for the validation sample, all artificial events of types (3, 4), (5, 6), (7,8), and (9, 10); and for the application sample, all artificial events of types (4, 3), (6, 5), (8,7), and (10,9). This split guarantees that the three samples have equal number of events, and that the validation and application samples are independent of each other, being constituted of artificial events made up of different hemispheres. For the training sample the partial use of the same hemispheres in modelling different artificial events might at most slightly degrade the discrimination power but does not have a detrimental effect on the subsequent steps of the analysis. A study is performed by switching the validation and application samples and we find that this does not change the results. The fraction of data events that are totally replicated in the background template is completely negligible. A comparison between the distributions obtained through the procedure described above and by using MC simulation for QCD multijet processes can be seen in Fig. 3 for a number of vari-ables. The compatibility is good, although the statistical uncertainties in the model from MC simulation are large.

The background template validation
We perform a number of stringent checks to verify that the background is well modelled by the hemisphere mixing procedure. For this purpose, we define two control regions (CRs): the first one, called the m H CR, is obtained by removing from the data events where the leading and trailing dijet masses are in the region 90 < M H 1 < 150 GeV, 80 < M H 2 < 140 GeV. This avoids using events belonging to the signal-enriched region. In the second region, the b tag CR, fully orthogonal to the default selection, we select events with at least four b-tagged jets as defined by the loose working point of the CMVA, while vetoing events with any jets that are defined as b-tagged jets according to the medium working point of the CMVA. The loose working point of the CMVA has a misidentification rate of ≈10% and a b-tagging efficiency of ≈85% for jets produced by the hadronization of b quarks emitted in HH production events. The distributions of all individual event variables for the artificial data sets are compared to those from the original data set in these two CRs and are found to be in agreement. This is illustrated for a number of variables in Figs. 4 and 5. However, the power of the technique rests in its ability to provide fully multidimensional modelling. To verify this, a first cross-check consists of comparing the full BDT shape for data and the artificial model in the m H CR. We observe an agreement in the shape of the BDT discriminator with a slight excess of background events in the lower range of the BDT output (as can be seen in Fig. 6, left). A similar trend is seen in the b tag CR.
A high-precision study is required to investigate the need for a correction to the background shape of the BDT discriminator and a corresponding systematic uncertainty. For this purpose, all the possible combinations of neighbouring hemispheres in the range 1 to 10, except the ones used for training ((1, 1), (1, 2), (2, 1), (2, 2)), are merged into a unique sample M. We re-sample 200 new replicas with the same number of events as the original data set without replacement from M, each time starting from the full sample M. Each of the replicas is then used as a new original data set, and artificial samples are created from it using the hemisphere mixing procedure. The output distribution of a previously trained BDT for the large sample M is then compared to that for its artificial counterpart, obtaining a distribution of differences between actual and predicted data in each of the 80 BDT bins. A schematic of the procedure and the results are available in Appendix A. A systematic bias is detected and the background template is corrected for the value obtained from this comparison. The variance related to the background bias extraction, together with expected statistical uncertainty, are estimated and accounted for as a systematic uncertainty in the final fit described in Section 10. The validity of this background bias extraction procedure has been checked by applying it to the data in the two CRs previously mentioned. The means of the per bin expectation values minus the observed values are compatible with zero after the bias correction in both control regions, the root-mean-square of the pulls is compatible with one after the bias correction in the b tag CR, but not in the m H CR, as shown on Fig. 6 (upper right). To account for this, we increase the uncertainty in the background such that the value of standard deviation (s.d.) becomes 1.0 in the m H CR after the bias correction is applied (Fig. 6, lower right).

Systematic uncertainties
The sources of systematic uncertainties found to be relevant to this analysis are listed in Table 4. The systematic uncertainty in the shape of the background model is accounted for by  Bias correction for the background model, described in Section 8.2, is applied by rescaling the weight of each event using the event yield ratio between corrected and uncorrected BDT distributions. Only statistical uncertainties are shown as the uncertainties related to the bias correction can not be propagated from the BDT classifier to a different variable. (lower left), and CMVA 4 (lower right). Bias correction for the background model, described in Section 8.2, is applied by rescaling the weight of each event using the event yield ratio between corrected and uncorrected BDT distributions in this CR. Only statistical uncertainties are shown as the uncertainties related to the bias correction can not be propagated from the BDT classifier to a different variable.
[GeV]  Figure 6: Left: comparison of the distribution of BDT output for data (left) selected in a region of the leading versus trailing Higgs boson candidate mass plane that excludes a 60-GeV-wide box around the most probable values of the dijet masses of signal events, with the corresponding output on an artificial sample obtained from the same data set by hemisphere mixing. Right: bin-by-bin differences between data and model, in s.d. units before (upper right) and after (lower right) bias correction; pull distribution for the differences, fit to a Gaussian distribution. The bias correction uncertainty is increased to take the s.d. of the residuals to 1.0.
assigning an uncertainty to each BDT output bin that includes the statistical uncertainty and the systematic uncertainty related to the bias extraction discussed in the previous section. The background normalization is left freely floating in the BDT distribution fit. The uncertainty due to the b tagging efficiencies is estimated by varying them within their uncertainties. The uncertainty due to the pileup modelling is computed by considering a ±4.6% variation in the total inelastic cross section value at 13 TeV [57]. The effect of jet energy resolution is evaluated by smearing jet energies according to their estimated uncertainty. The jet energy scale is varied within one s.d. as a function of jet p T and |η|, and the efficiency of the selection criteria is recomputed. The trigger efficiency correction factor discussed in Section 5 is affected by a 2% uncertainty that is taken as a systematic uncertainty in the related source. In the mentioned sources of systematic uncertainty, both shape and normalization shifts are considered in the model. The signal yield for a given production cross section is affected by a systematic uncertainty in the measured integrated luminosity of 2.5% [58]. The effect of variation of the µ R and µ F scales on the signal acceptance is estimated by taking the maximum and the minimum difference with respect to the nominal acceptance when varying µ F and µ R each individually as well as both together up and down by a factor of two. Lastly, to estimate the signal acceptance uncertainty due to PDF uncertainties, the PDF4LHC [59] recommendation is followed, using as the uncertainty the s.d. in the acceptance for a set of 100 MC replicas of the NNPDF 3.0 set [34].

Results
We search for the presence of HH events in CMS data collected in the 2016 run of the LHC using the BDT discriminant trained on the SM signal simulation and artificial background data. Two-component likelihood fits to the binned BDT output distributions are performed, using Table 4: Systematic uncertainties considered in the analysis and relative impact on the expected limit for the SM HH production. The relative impact is obtained by fixing the nuisance parameters corresponding to each source and recalculating the expected limit.

Source
Affects the BDT distribution for the background resulting from the artificial data set described in Section 8 and the signal simulations corresponding to the SM and each of the BSM benchmark points. The validation samples were used to study the dependence of both the expected limit and the compatibility of the data and background distributions on the value of the BDT discriminator used for the selection. Selecting BDT discriminator values >0.2 results in a small loss of sensitivity (≈1.5%) with improved data-background compatibility. As a result, the 64 bins with BDT >0.2 are used to extract the limits. The fit to the SM signal is shown in Fig. 7 and the postfit distributions of reconstructed Higgs boson masses are shown in Fig. 8. Minor background contamination arising from ttH, ZH, bbH, and single Higgs boson production processes do not show a signal-like BDT distribution and their effect is found to be negligible in the selected data at our level of sensitivity.
The observed and expected 95% confidence level (CL) upper limits on the cross section for pp → HH → bbbb nonresonant production, are computed using the asymptotic approximation [60] of the CL s criterion [61][62][63], using a test statistic based on the profile likelihood ratio (the LHC test statistic) [60]. The systematic uncertainties are treated as nuisance parameters and are profiled in the minimization. The limits are shown in Table 5 together with the 1 s.d. and 2 s.d. CL intervals around the expected limits. For the SM process, the expected limit is 419 fb, which corresponds to ≈37 times the SM HH production cross section times the square of the branching fraction for the H → bb decay. The observed upper limit obtained is 847 fb, which is ≈2 s.d. above the expected upper limit. This corresponds to an observed limit of 2496 fb for σ(pp → HH) SM . We perform the procedure described above in turn on the 13 BSM benchmark models considered. The results are shown in Fig. 9 and reported in Table 6. The difference between observed and expected limits is similar for SM and all the benchmark models. This is explained by the fact that the benchmark points use the same BDT as SM, resulting in the same background shape as an input to the fit. The background shape has a deficit of events compared to data in the last bins of the BDT distribution, as seen in Fig. 7. We also search for HH production with values of κ λ in the range [-20, 20], assuming κ t = 1, and the results are shown in Fig. 10. The  Figure 7: Results of the fit to the BDT distribution for the SM HH production signal. In the bottom panel a comparison is shown between the best fit signal and best fit background subtracted from measured data. The band, centred at zero, shows the total uncertainty. kinematic properties vary significantly across the points in this range. We do not exclude any values of κ λ , assuming κ t = 1.

Summary
This paper presents a search for nonresonant Higgs boson pair (HH) production with both Higgs bosons decaying into bb pairs. The standard model (SM) production has been studied along with 13 beyond the SM (BSM) benchmark models, using a data set of √ s = 13 TeV proton-proton collision events, corresponding to an integrated luminosity of 35.9 fb −1 collected by the CMS detector during the 2016 LHC run. The analysis of events acquired by a hadronic multijet trigger includes the selection of events with 4 b-tagged jets and a classification using boosted decision trees, optimized for discovery of the SM HH signal. Limits at 95% confidence level on the HH production cross section times the square of the branching fraction for the Higgs boson decay to b quark pairs are extracted for the SM and each BSM model considered, using binned likelihood fits of the shape of the boosted decision tree classifier output. The background model is derived from a novel technique based on data that provides a multidimensional representation of the dominant quantum chromodynamics multijet background and also models well the overall background distribution. The expected upper limit on σ(pp → HH → bbbb) is 419 fb, corresponding to 37 times the expected value for the SM process. The observed upper limit is 847 fb. Anomalous couplings of the Higgs boson are also investigated. The upper limits extracted for the HH production cross section in the 13 BSM benchmark models range from 508 to 3513 fb.  [4] ATLAS and CMS Collaborations, "Measurements of the Higgs boson production and decay rates and constraints on its couplings from a combined ATLAS and CMS analysis of the LHC pp collision data at √ s = 7 and 8 TeV", JHEP 08 (2016) 045, doi:10.1007/JHEP08(2016)045, arXiv:1606.02266.  [18] CMS Collaboration, "Search for Higgs boson pair production in events with two bottom quarks and two tau leptons in proton-proton collisions at √ s = 13 TeV", Phys. Lett. B 778 (2018) 101, doi:10.1016/j.physletb.2018.01.001, arXiv:1707.02909.
[19] ATLAS Collaboration, "Search for pair production of Higgs bosons in the bbbb final state using proton-proton collisions at √ s = 13 TeV with the ATLAS detector", (2018). arXiv:1804.06174. Submitted to JHEP.
[25] CMS Collaboration, "Search for production of Higgs boson pairs in the four b quark final state using large-area jets in proton-proton collisions at √ s = 13 TeV", (2018). arXiv:1808.01473. Submitted to JHEP.
[26] A. Falkowski, "Effective field theory approach to LHC Higgs data", Pramana 87 (2016) 39 Figure 11: Diagram describing the procedure used to estimate the background bias correction. All possible combinations of mixed hemispheres except those used for training are added together to create a large sample M of 96N events from which we repeatedly subsample without replacement 200 replicas M i of N events. The hemisphere mixing procedure is then carried out again for each of this replicas to produce a set of re-mixed data replicas R i . The trained multivariate classifier trained is then evaluated over all the events of M and each R i . and the histograms of the classifier output are compared to obtain a the differences for each of the replicas. The median difference is taken as bias correction. The variability due to the limited number of subsamples is estimated by bootstrap and it is shown for each estimation using a coloured shadow around the quantile estimation. The light yellow shadow represents the uncertainty due to the limited statistics of the reference observed sample. The separation between the one s.d. quantiles is compatible with the expected variance if the estimation was Poisson or Gaussian distributed.