Search for resonances decaying to a pair of Higgs bosons in the $\mathrm{b\overline{b}q\overline{q}'}\ell\nu$ final state in proton-proton collisions at $\sqrt{s}=$ 13 TeV

A search for new massive particles decaying into a pair of Higgs bosons in proton-proton collisions at a center-of-mass energy of 13 TeV is presented. Data were collected with the CMS detector at the LHC, corresponding to an integrated luminosity of 35.9 fb$^{-1}$. The search is performed for resonances with a mass between 0.8 and 3.5 TeV using events in which one Higgs boson decays into a bottom quark pair and the other decays into two W bosons that subsequently decay into a lepton, a neutrino, and a quark pair. The Higgs boson decays are reconstructed with techniques that identify final state quarks as substructure within boosted jets. The data are consistent with standard model expectations. Exclusion limits are placed on the product of the cross section and branching fraction for generic spin-0 and spin-2 massive resonances. The results are interpreted in the context of radion and bulk graviton production in models with a warped extra spatial dimension. These are the best results to date from searches for an HH resonance decaying to this final state, and they are comparable to the results from searches in other channels for resonances with masses below 1.5 TeV.

In this paper, we describe a search for narrow resonances (X) decaying to HH, where one H decays to a bottom quark pair (bb) and the other decays to a W boson pair, with at least one boson off-shell (WW * ). These are the most likely and second-most likely Higgs boson decay channels, respectively. The otherwise large SM background of jets produced via quantum chromodynamics processes, referred to as "multijet" background, is greatly reduced by considering the WW * final state in which one W boson decays to quarks (qq ) and the other to either an electron-neutrino pair (eν) or a muon-neutrino pair (µν). This search is optimized for particle mass m X > 0.8 TeV and employs new techniques for this channel. The search is performed on a data set collected in 2016 at the CERN LHC, corresponding to an integrated luminosity of 35.9 fb −1 of proton-proton (pp) collisions at √ s = 13 TeV.
The Higgs bosons have a high Lorentz boost because of the large values of m X considered, and the decay products of each one are produced in a collimated cone. The H → bb decay is reconstructed as a single jet, referred to as the bb jet, with high transverse momentum p T . The H → WW * decay is also reconstructed as a single jet, referred to as the qq jet, but with a nearby lepton (e or µ). In both cases, the jets are required to have a reconstructed topology consistent with a substructure arising from a boson decaying to two quarks. The semileptonic Higgs boson decay chain is reconstructed from both the visible decay products and the missing transverse momentum. A distinguishing characteristic of the signal is a peak in the two-dimensional plane of the bb jet mass m bb and the reconstructed HH invariant mass m HH .
The main SM background to this search arises from top quark pair tt production in which one top quark decays via a charged lepton (t → Wb → νb) and the other decays exclusively to quarks (t → Wb → qq b). The top quarks affecting this analysis have decay products that are collimated because of large boosts. In particular, the all-hadronic top quark decays can be misreconstructed as single bb jets. Peaks in the m bb distribution from this background correspond to fully contained top quark and W boson decays. The second-largest background is primarily composed of production of W bosons in association with jets (W+jets) and multijet events. Both W+jets and multijet background events are experimentally distinct from tt production, in part because their m bb distributions are smoothly falling.
In this analysis, the events are divided into 12 exclusive categories by lepton flavor, qq jet substructure, and bb jet flavor identification. The SM background and signal yields are then simultaneously estimated using a maximum likelihood fit to the two-dimensional distribution in the m bb and m HH mass plane.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the coverage in pseudorapidity η provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. Events of interest are selected using a two-tiered trigger system [56]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [57].

Simulated samples
Signal and background yields are extracted from data with a fit using templates of the twodimensional m bb and m HH mass distribution. The signal and background templates are obtained from samples generated using Monte Carlo simulation.
The signal process pp → X → HH → bbWW * is simulated for both the spin-0 and spin-2 resonance scenarios. The X bosons are produced via gluon fusion and have a 1 MeV resonance width, which is small compared to the experimental resolution. The samples are generated at leading order using the MADGRAPH5 aMC@NLO 2.2.2 generator [58] with MLM merging [59] for m X between 0.8 and 3.5 TeV.
The background processes are produced with a variety of generators. The same generator as for signal is used to produce tt, W+jets, multijet, Higgs boson production in association with a t quark, and Drell-Yan samples. Samples of WZ diboson production and the associated production of a top quark pair with either a W or Z boson are also generated with MAD-GRAPH5 aMC@NLO, but at next-to-leading-order with the FxFx jet merging scheme [60]. The WW diboson process, single top production, and ttH are generated with POWHEG v2 at next-toleading order [61][62][63][64][65][66][67][68]. Single top in the associated production (tW) and t-channel (tq) processes are included, but not s-channel (tb), which is negligible. All processes are initially normalized to the highest order total cross section calculation available.
For all samples, the parton showering and hadronization are simulated with PYTHIA 8.205 [69] using the CUETP8M1 [70] tune, with NNPDF 3.0 [71] parton distribution functions (PDFs). The simulation of the CMS detector is performed within the GEANT4 [72] framework. Additional pp collisions in the same or nearby bunch crossings (pileup) are simulated and the samples are weighted to have the same pileup multiplicity as measured in data.

Event reconstruction
Signal events and those from the primary SM background source, tt production with a singlelepton final state, have similar signatures. Both processes feature high-p T jets with substructure consistent with two or more quarks, jets containing b hadron decays, and leptons that originate from a W boson decay. Additional discrimination of signal events from background events is achieved by associating the lepton and each jet with a particle in the HH → bbWW * → bb νqq decay chain and applying mass constraints.
The particle-flow (PF) algorithm [73] aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. The reconstructed vertex with the largest value of summed tracking-object p 2 T is taken to be the primary pp interaction vertex. These tracking objects are track jets and the negative vector sum of the track jet p T . Track jets are clustered using the anti-k T jet finding algorithm [74,75] with the tracks assigned to the vertex as inputs.

Electron and muon identification
Events are required to have exactly one isolated lepton. This lepton is associated with the leptonic W boson decay. Reconstructed electrons are required to have p T > 20 GeV and |η| < 2.5, and are identified with a high-purity selection to suppress the potentially large multijet background [76]. Muons are required to have p T > 20 GeV and |η| < 2.4, and to pass identification criteria optimized to select muons with >95% efficiency [77]. The impact parameter of lepton tracks with respect to the primary vertex is required to be consistent with originating from that vertex: longitudinal distance <0.1 cm, transverse distance <0.05 cm, and significance <4 standard deviations of the three-dimensional displacement. These criteria remove background events where the lepton is produced by a semileptonic heavy-flavor decay rather than a W boson decay. In addition, these criteria prevent incorrectly selecting a lepton from a heavy-flavor decay in signal events. Requiring leptons to be isolated from nearby hadronic activity is important to suppress background, but can also cause significant signal inefficiency because of the collinear decay of the Lorentz-boosted Higgs boson. This inefficiency is mitigated by using an isolation definition specifically designed for leptons from boosted decays [78]. The isolation metric I rel is the p T sum of the PF particles with ∆R < ∆R iso with respect to the lepton, divided by the lepton p T . The angular distance is ∆R = √ (∆η) 2 + (∆φ) 2 . The value ∆R iso is defined to be which preserves signal efficiency even in the case of high m X . The neutral particle contribution to I rel from pileup interactions is estimated and removed using the method described in Ref. [76]. Electrons are selected with I rel < 0.1, whereas muons, because of lower background rates, are selected with I rel < 0.2.
Muons in signal events have an approximate efficiency of 85% for m X = 0.8 TeV, decreasing to 70% for m X = 3.5 TeV, with isolation being the leading source of inefficiency compared to all other requirements. The efficiency to select electrons is lower, approximately 40% for m X = 0.8 TeV, decreasing to 6% for m X = 3.5 TeV. The leading source of electron inefficiency is a selection imposed at the reconstruction level on the ratio of the energy deposited in the HCAL to that deposited in the ECAL. Signal electrons typically fail this selection because of the nearby energy deposits from the hadronic W boson decay. Lepton reconstruction, identification, and isolation efficiencies are measured in a Z → data sample with a "tag-and-probe" method [79] and the simulation is corrected for any discrepancies with the data. There is generally much less hadronic activity in Z → events than in signal events, so these corrections are parameterized by nearby hadronic activity to ensure their applicability. The lepton selection efficiencies in simulation are found to be within 10% of those in data. The uncertainty in the correction is at its largest for high hadronic activity, with a maximum value of 10% for electrons and 5% for muons.

Jet clustering and momentum corrections
Two types of jets are used. Because the X bosons being considered here are much more massive compared to the mass of the Higgs bosons they decay into, the subsequent H → bb and W → qq decays are each reconstructed as single, merged jets. These jets are formed by clustering PF particles according to the anti-k T algorithm [74,75] with a distance parameter of 0.8, and are referred to as AK8 jets. The PF particle or particles associated with the lepton are not included in the clustering of this jet type in order to prevent the qq jet from containing the lepton's momentum. Jets of the second type, referred to as AK4 jets, are used to suppress background events from tt production by identifying additional jets originating from b quarks. These jets are also clustered according to the anti-k T algorithm, but with a distance parameter of 0.4. Jets of both types are required to have |η| < 2.4 to be within acceptance of the tracker. The AK8 jets are required to have p T > 50 GeV, whereas the threshold is 20 GeV for AK4 jets.
Jet momentum for both jet types is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance. Additional pp interactions within the same or nearby bunch crossings can contribute additional tracks and calorimetric energy depositions, increasing the apparent jet momentum. The pileup per particle identification (PUPPI) algorithm [80] is used to mitigate the effect of pileup at the reconstructed particle-level, making use of local shape information, event pileup properties, and tracking information. Charged particles identified to be originating from pileup vertices are discarded. For each neutral particle, a local shape variable is computed using the surrounding charged particles compatible with the primary vertex within the tracker acceptance (|η| < 2.5), and using both charged and neutral particles in the region outside of the tracker coverage. The momenta of the neutral particles are then rescaled according to their probability to originate from the primary interaction vertex deduced from the local shape variable [81]. Jet energy corrections are derived from simulation studies so that the average measured response of jets becomes identical to that of particle level jets. In situ measurements of the momentum balance in dijet, photon+jet, Z+jet, and multijet events are used to determine any residual differences between the jet energy scale in data and in simulation, and appropriate corrections are made [82]. Additional selection criteria are applied to each jet to remove jets potentially dominated by instrumental effects or reconstruction failures [83].

Hadronic boson decay reconstruction
In high-m X signal events, the H → WW * decay is reconstructed as an AK8 jet and a nearby lepton, with the jet itself containing two localized energy deposits, "subjets," one from each quark. Only the AK8 jet closest in ∆R to the lepton is considered for qq jet reconstruction. This jet satisfies qq jet reconstruction criteria if it is close to the lepton (∆R < 1.2) and if two subjets with p T > 20 GeV and |η| < 2.4 can be identified. The constituents of the jet are first reclustered using the Cambridge-Aachen algorithm [84,85]. The "modified mass drop tagger" algorithm [86,87], also known as the "soft drop" (SD) algorithm, with angular exponent β = 0, soft cutoff threshold z cut < 0.1, and characteristic radius R 0 = 0.8 [88], is applied to remove soft, wide-angle radiation from the jet. The subjets used in the analysis are those remaining after the algorithm has removed all recognized soft radiation. The purity of the qq jet reconstruction is quantified using the "N-subjettiness" variables τ N , which measure compatibility with the hypothesis that a jet originates from N subjets [89]. The τ N are obtained by first reclustering the jet into N subjets using the k T algorithm [90]. The variables are then calculated with these subjets as described in Ref. [89] with a characteristic radius R 0 = 0.8. The ratio of N-subjettiness variables, τ 2 /τ 1 , is used to discriminate qq jets originating from two-pronged W boson decays against those from single quarks or gluons.
Generally, the Higgs bosons in signal events have large Lorentz boosts and are produced with ∆φ ≈ π between them. Therefore, bb jet candidates are required to be AK8 jets with ∆φ > 2 from the lepton and ∆R > 1.6 from the qq jet. If there are two or more bb jet candidates, the one leading in p T is used. This jet is reconstructed as a bb jet if it is the leading or secondleading AK8 jet in p T , has p T > 200 GeV, and if two constituent subjets with p T > 20 GeV and |η| < 2.4 can be identified. The invariant mass of the two subjets is used to obtain m bb . The performance of the SD algorithm varies with bb jet p T , so simulation-derived m bb correction factors are applied as a function of p T to make the average m bb value be the Higgs boson mass m H = 125 GeV [91].

Jet flavor identification
Jets and subjets are identified as likely to have originated from b hadron decays using the combined secondary vertex b tagging algorithm [92]. Two operating points of the algorithm are used, which have similar performance on subjets and AK4 jets. A high-efficiency working point, referred to as "loose," has an efficiency of ≈80% and a light-quark or gluon misidentification rate of ≈10%. The "medium" operating point has an efficiency and misidentification rate of ≈60% and ≈1%, respectively. A "tight" operating point is not used. Jets or subjets with p T > 30 GeV and |η| < 2.4 are considered for b tagging. The b tagging efficiency and misidentification rate are measured in data, and the simulation is corrected for any discrepancy [92].

Semileptonic Higgs and X boson reconstruction
The missing transverse momentum vector p miss T is computed as the negative vector p T sum of all the PF candidates in an event [93]. The p miss T is modified to account for corrections to the energy scale of the reconstructed jets in the event. The p miss T is an estimate of the transverse momentum of the neutrino in the semileptonic Higgs boson decay chain. The longitudinal momentum p z of this neutrino is estimated by setting the invariant mass of the neutrino, the lepton, and the qq jet to m H and solving the corresponding second-order equation. If two real solutions exist, the one with the smaller magnitude is chosen. If the p z solution is complex, the real component of the solution is used. Choosing the other p z solution or incorporating the imaginary components does not improve the m HH resolution. The reconstructed momentum of the W boson that decays to leptons, referred to as the ν candidate, is obtained from the lepton and the estimated neutrino momenta. The WW * candidate momentum is then obtained from the combined ν candidate and the qq jet momenta. The invariant mass of this object and the bb jet is m HH .

Event selection and categorization
Events are included in this search if they pass the following criteria that indicate they originate from a X boson decay and are then divided into 12 independent categories. A separate set of criteria is used to define control regions, which are used to validate the modeling of background processes.

Event selection
Events are selected by the trigger system if they contain one of the following: an isolated electron with p T > 27 GeV, an isolated muon with p T > 24 GeV, or H T > 800 GeV (900 GeV for the last quarter of data taking), where H T is the scalar sum of jet p T for all AK4 online jets with p T > 30 GeV. A combination (inclusive OR) of lepton and H T triggers is used because the online lepton isolation selection is inefficient for high-m X signal, which provides two highp T , collimated Higgs boson decays. These events have large H T and are instead selected with higher efficiency by the H T trigger. Additional multi-object triggers supplement these two single-object triggers with looser isolation and H T requirements, thereby maintaining high signal trigger efficiency for the entire m X analysis range. The pileup correction for H T is the same offline as in the trigger. The trigger efficiency is measured for tt events in data and is >94% for events passing H T and lepton p T offline selection criteria. The simulation is corrected so that its trigger efficiency matches the efficiency measured with data. The trigger efficiency for signal events is 98% for m X = 0.8 TeV and >99% for m X > 1 TeV.
Offline, events are required to have H T > 400 GeV and a lepton with p T > 30 GeV for electrons and p T > 26 GeV for muons. Background events from Drell-Yan production are suppressed by rejecting events that contain additional leptons with p T > 20 GeV. Events are further required to have a qq jet and a bb jet. Background from tt production is reduced by vetoing events with AK4 jets that are ∆R > 1.2 from the bb jet and pass the medium b tagging operating point.
Jets in multijet and W+jets events tend to be produced at higher |η| than those produced in signal events, which contain jets from the decay of a heavy resonance. The ratio p T /m, which is the WW * candidate p T divided by m HH , exploits this property and is especially effective at high m HH . Events are required to have p T /m > 0.3. A m H constraint on the WW * candidate is not useful because it is already imposed in the neutrino momentum calculation. However, there is discrimination because the decay chain involves a two-body decay as an intermediate step. We define a variable m D ≡ p T ∆R/2, where ∆R is the separation of the two reconstructed W bosons and the p T is that of the WW * candidate. This variable is based on an approximate expression for the opening angle of a highly boosted, massive particle decay. The selection m D < 125 GeV is applied and has a high efficiency for signal events. The m D and p T /m distributions are shown in Fig. 1. This figure is shown only to illustrate how these variables are used to discriminate signal events from background events; the simulated distributions are pre-modeling and pre-fit. The initial discrepancy in m bb near 50 GeV is later accommodated by the maximum likelihood fit.

Event categorization
Events are categorized by event properties that reflect the signal purity. The categorization allows for a single set of selections that targets the full m X range, which is preferable to search categories that are optimized for different mass ranges. Electron and muon events are separated because their efficiencies for background and signal are different, resulting in different signal purities. The electron and muon categories are labeled "e" and "µ," respectively, in the figures. There are three categories of b tagging, evaluated by counting the number of subjets in the bb jet that pass b tagging operating points. The first, labeled "bL," is composed of events in which one subjet passes the medium operating point and the other does not pass the loose operating point. Events with one subjet passing the medium operating point and one passing the loose but not the medium operating point are denoted "bM," and those with two subjets passing the medium operating point are labeled "bT." The final categorization is based on the τ 2 /τ 1 N-subjettiness ratio of the qq jet, referred to as qq τ 2 /τ 1 . Events with 0.55 < qq τ 2 /τ 1 < 0.75  Pre-modeling and pre-fit distributions of the discriminating variables, which are described in the text, are shown for data (points) and SM processes (filled histograms) as predicted directly from simulation. The statistical uncertainty of the simulated sample is shown as the hatched band. The solid lines correspond to spin-0 signals for m X of 1 and 2.5 TeV. The product of the cross section and branching fraction to two Higgs bosons is set to 2 pb for both signal models. The lower panels show the ratio of the data to the sum of all background processes. fall into the low-purity category, "LP," while those with qq τ 2 /τ 1 < 0.55 are included in a high-purity category, "HP." The qq τ 2 /τ 1 distribution is shown in Fig. 1. Events are divided into all combinations of categories for a total of 12 exclusive selections. When describing a single selection, the category label is a combination of those listed above. For example, the tightest b tagging category with a low-purity qq τ 2 /τ 1 selection in the electron channel is: "e, bT, LP." The categories and their corresponding labels are summarized in Table 1.
The search is performed in these categories for 30 < m bb < 210 GeV. Events below 30 GeV would provide little sensitivity and would be relatively difficult to model since these are events for which the SD algorithm results in nearly all of the jet energy being removed. The m bb distribution is displayed in Fig. 1. Events with 700 < m HH < 4000 GeV are analyzed. The lower bound is chosen such that the m HH distribution is monotonically decreasing for background events. The upper bound is far above the highest mass event observed in data. For spin-0 sce- Table 1: Event categorization and corresponding category labels. The 12 independent event categories are formed by applying a single selection from each of the three categorization types. For the bb jet subjet b tagging type, "medium" refers to the subjets that pass the medium b tagging operating point and "loose" refers to those that pass the loose, but not the medium, operating point.

Categorization type
Selection Category label Lepton flavor Electron e Muon µ bb jet subjet b tagging One medium bL One medium and one loose bM Two medium bT qq jet substructure 0.55 < qq τ 2 /τ 1 < 0.75 LP qq τ 2 /τ 1 < 0.55 HP narios, the selection efficiency for X → bbqq ν events to pass the criteria of any event category is 9% at m X = 0.8 TeV. The efficiency increases with m X to 18% at m X = 1.2 TeV because the Higgs boson decays become more collimated. Above 1.2 TeV the selection efficiency decreases to a minimum of 9% at m X = 3.5 TeV because of the combination of lower b tagging efficiency for high-p T jets and the worsening of the lepton isolation for extremely collimated Higgs boson decays. The Higgs bosons in spin-2 signal events are more central in polar angle than those from spin-0 signal, resulting in a larger selection efficiency, ≈15%, relative.

Control region event selection and categorization
Two control regions are used to validate the SM background estimation and to obtain systematic uncertainties. The first, labeled "tt CR," targets backgrounds with top quarks, specifically tt production. Such events are selected by inverting the AK4 jet b-tagging veto. The m D and p T /m selections are removed to increase the statistical power of the sample. This control region is then divided into the 12 categories previously described. Overall, the m bb and m HH shapes in this control region are very similar to the shapes in the signal region for the backgrounds that contain top quarks. A correction to the simulated top quark p T spectrum in tt events is measured in this region and applied to simulation as a normalization correction; the final value of the normalization and its uncertainty ultimately come from the two-dimensional fit to signal and background. While the tt CR is an adequate probe of processes that involve top quarks, it is not sensitive to the multijet or W+jets backgrounds. Instead, a second control region, labeled "q/g CR," is used to study the modeling of these processes. The selection of events in this control region is the same as for the signal region, except that the bb jet is required to have no subjets passing the loose b tagging operating point. As a result, the events in this control region are not categorized by bb jet b tagging, but is still categorized by lepton flavor and qq τ 2 /τ 1 .

Background and signal modeling
The search is performed by simultaneously estimating the signal and background yields using a maximum likelihood fit to the data in the 12 event categories. The data are binned in two dimensions, m HH and m bb , with the ranges specified in Section 5 and with bin widths of 25 and 2 GeV, respectively. The bin widths are smaller than the mass resolutions, but large enough to keep the number of bins computationally tractable. Each processes is modeled with two-dimensional templates, one for each event category. The templates are created using simulation. Because of the limited size of the simulated samples, we employ methods to smooth the background distributions. Shape uncertainties that account for possible differences between data and simulation are included while executing the fit. This fitting method was previously presented in Ref. [49].

Background categorization
Background events are separated into four generator-level categories, each with distinct m bb shapes. The categories are defined by counting the number of generator-level quarks from the immediate decay of a top quark, W boson, or (rarely) Z boson within ∆R < 0.8 of the bb jet axis. The first, labeled "m t background," is the component in which all three quarks from a single top quark decay fulfill this criterion. The second is labeled "m W background" and consists of those events that are not labeled m t background but in which both quarks from a W or Z boson fall within the jet cone. Both of these backgrounds contain resonant peaks in the m bb shape corresponding to either the top quark or W boson mass. The "lost t/W background" contains events with partial decays within the bb jet, identified as events in which at least one quark is contained within the jet cone, but does not satisfy one of the previous two requirements. The last category, "q/g background," designates all other events. The first three categories are primarily composed of tt events, while the last is a composite of W+jets, multijet, and tt events. The background categorization is summarized in Table 2.

Template creation strategy
A template is produced for each of the 12 event categories, for each of the four backgrounds.
To reduce statistical fluctuations in the templates, each is generated from an initial smooth template created by relaxing requirements or by combining categories. In all cases, the regions with relaxed criteria are chosen such that the shapes for these regions are similar to those for the full event selection. The final template for each event category and background is produced by fitting the high-statistics template to the simulated samples for that category's event selection. The fit is performed in a similar manner to the fit to data and with a similar parameterization of the template shape. The templates are compared to simulation after applying the full event selection and any deviations in shape are found to be much smaller than the statistical uncertainty of the data sample.
While this procedure increases the statistical power of the simulation samples, the multijet background simulation sample cannot be produced with a large enough effective integrated luminosity to be directly used in the template creation. Instead, the similarity of m bb reconstruction for W+jets and multijet events is exploited. Both these processes have bb jets that are composed of at least one quark or gluon that is misidentified as a bb jet, resulting in nearly identical monotonically falling m bb shapes. Both processes also have similar relative fractions in the bL, bM, and bT categories. Because the W+jets and multijet backgrounds are so similar, the multijet component of the background is included in the template creation by rescaling the W+jets background to account for the number of multijet events predicted by the simulation. Although the multijet background simulation sample cannot be used in the two-dimensional template creation, it is used to calculate this one-dimensional, m HH -parameterized correction that depends on both qq τ 2 /τ 1 and lepton flavor.

Background process modeling
The background templates are modeled as conditional probabilities of m bb as a function of m HH so that the templates include the correlation of these two variables. The two-dimensional probability distribution is P bkg (m bb , m HH ) = P bb (m bb |m HH , θ 1 )P HH (m HH |θ 2 ), where P HH and P bb are one-dimensional probability distributions and the θ 1 and θ 2 are nuisance parameters used to account for shape uncertainties. A parametric function that models the full m HH range for background events is difficult to obtain from first principles. Instead, a non-parametric approach is taken. The P HH are produced from the one-dimensional m HH histograms with kernel density estimation (KDE) [94][95][96]. The smoothing of the P HH distributions is controlled by parameters within the KDE framework called bandwidths. Gaussian kernels with adaptive bandwidths are used because the event density for this distribution varies strongly with m HH and a single, global bandwidth is not suitable for the full distribution. These adaptive bandwidths depend on a first iteration estimate of P HH , which itself is produced with KDE. However, for this first iteration a global bandwidth h is used that scales as The sums are over all events in the simulation sample and the w i are the individual event weights. This formulation is chosen to minimize the mean integrated squared error of the estimate. For the adaptive estimates, the bandwidths h i associated with each event are where the f (x i ) are the estimated event densities at the location x i of the event and g is a normalization factor such that the global bandwidth scale is controlled by h. As discussed in Ref. [97], adaptive KDE can result in overestimation of the distribution tails in the case of large bandwidths being applied. This is ameliorated by imposing a maximum bandwidth value, which is usually chosen to be 1-5 times larger than the median bandwidth. The m HH tail is further smoothed by fitting with an exponential function for m HH 2 TeV.
The P bb distributions are obtained for the m t and m W backgrounds by fitting m bb histograms with a double Crystal Ball function [98,99]. This function has a Gaussian core, which is used to model the bulk of the m bb distribution, and power-law tails, which describe the effects of more severe jet misreconstruction. The fits are performed for events binned in m HH to capture the evolution of the m bb shape with m HH . The double Crystal Ball function parameters are then interpolated between m HH bins. The P bb distributions for the lost t/W and q/g backgrounds are estimated from the two-dimensional histograms with two-dimensional KDE. Independent adaptive bandwidths and bandwidth upper limits are used for each dimension when forming the P bb . Similar to the derivation of the P HH , the m HH tails are smoothed with exponential function fits. Simulation yields are used as the initial values of the background yields in the fit to data.

Signal process modeling
The signal templates are also modeled as conditional probabilities P signal (m bb , m HH |m X ) = P HH (m HH |m bb , m X , θ 1 )P bb (m bb |m X , θ 2 ).
The P signal distributions are first obtained for discrete m X values by fitting histograms of the signal mass distributions. Models continuous in m X are then produced by interpolating the fit parameters. The P bb distributions are created by fitting m bb histograms with a double Crystal Ball function, and the resonance resolution is ≈10%. The shape for the bL categories also includes an exponential function to model the small fraction of signal events with no resonant peak in the distribution.
The P HH distributions are also modeled with a double Crystal Ball function, but with a linear dependence on m bb , parameterized by ∆ bb = (m bb − µ bb )/σ bb . The µ bb and σ bb are the mean and standard deviation parameters from the fit to m bb , respectively. The mean of the Crystal Ball function µ HH is then where µ 0 and µ 1 are fit parameters. This parameterization models the characteristic that a mismeasurement of the bb jet results in a mismeasurement of m HH . The standard deviation of m HH , denoted as σ HH , also depends on m bb , where σ 0 and σ 1 are fit parameters. An undermeasurement of m bb can be caused by the SD algorithm removing energy from the Higgs boson decay. In such a scenario, the correlation between the two variables worsens and the m HH resolution becomes wider. For |∆ bb | > 2.5, only the values at the boundary are used since the correlation does not hold for severe mismeasurements. The m HH resolution is ≈6% for m X = 1 TeV, decreasing to 4% for m X = 3 TeV.
The product of the acceptance and efficiency for X → HH events to be included in the individual event categories is taken from simulation. As for the shape parameters, the efficiency is interpolated in m X . Uncertainties in the relative acceptances and in the integrated luminosity of the sample are included in the maximum likelihood fit that is used to obtain confidence intervals on the X → HH process. The modeling is tested by fitting the templates to pseudoexperiments with injected signal and no significant bias in the fitted signal yield is found.

Validation of background models with control region data
The background models are validated by analyzing the tt CR and q/g CR data samples. For both control regions, background templates are constructed in the same way as for the standard event selection, except that they are made to model the control region selection. The background templates are then fit to the control region data with the same systematic uncertainties that are used in the standard maximum likelihood fit. The result of the simultaneous fit is shown in Fig. 2 for both control regions. To improve visualization, the displayed binning in Figs. 2-6 is coarser than that used in the maximum likelihood fit. The projections in both mass dimensions are shown for the combination of all event categories. The fit result models the data well, indicating that the shape uncertainties can account sufficiently for potential differences between data and simulation.

Systematic uncertainties
Systematic uncertainties are included in the maximum likelihood fit as nuisance parameters. The initial background uncertainties are large, but are constrained by the fit to the data. They are chosen to cover differences between simulation and data in the tt CR and the q/g CR. The fit result does not depend strongly on the initial uncertainty values, as they function only as loose constraints for the fit. Nuisance parameters for shape uncertainties are modeled as Gaussian functions, whereas log-normal functions are used for normalization uncertainties. The m bb scale and resolution uncertainties for the signal, the m t background, and the m W background are evaluated as uncertainties in the mean and standard deviation of the double Crystal Ball function parameters, respectively. The signal m HH scale and resolution uncertainties are handled in the same manner. The other background shape uncertainties are implemented as alternative background templates. Each alternative template is produced by shifting the nominal background template, bin-by-bin, by a factor that depends on either m HH or m bb . The magni-tude of these factors are subsequently constrained as nuisance parameters. Each uncertainty is listed in Table 3 with its initial size and the number of independent nuisance parameters that are included in the fit to model it. Table 3: The systematic uncertainties included in the maximum likelihood fit. The "type" indicates if the uncertainty affects process yield Y or the shape of the m bb or m HH distributions. The processes that they are assigned to, their initial sizes, and the number of independent nuisance parameters, N p , for each uncertainty are listed. Uncertainty sizes that vary by event category are listed with category labels. The labels Y, S, and R denote how a single uncertainty affects yield, scale, and resolution, respectively.

Background normalization uncertainties
Since the main source of the m t , m W , and lost t/W backgrounds is tt production, some uncertainties are applied by treating the three categories as a single component, referred to collectively as the "non-q/g background." The fraction of each of the three categories within the combination is determined from the overall b tagging efficiency and the bb jet p T distributions. Additional uncertainties are then assigned to the modeling of their relative composition.
For each event category, the q/g background and the non-q/g background each have a large initial normalization uncertainty that is uncorrelated among categories. The relative composition of the three tt backgrounds is controlled in two ways. First, the m W and lost t/W back-grounds have independent normalization uncertainties per b tagging category. In both cases, the m t background normalization is varied in an anticorrelated manner such that the non-q/g background normalization does not change. Second, the composition is allowed to vary linearly with m HH to account for bb jet reconstruction effects that depend on bb jet p T . This is implemented with a m HH shape uncertainty that only shifts the m t background spectrum. There is one such independent nuisance parameter per b tagging category. Three other nuisance parameters shift the m W and lost t/W backgrounds together.

Background shape uncertainties
The jet mass scale and resolution after applying the SD algorithm are measured for W boson decays merged into single jets in data with tt events, using the known W boson mass. The mass scale and resolution in the simulation are found to agree with the data within uncertainties. These measurements determine the uncertainties in the m bb scale and resolution of the m t and m W backgrounds. For the lost t/W and q/g backgrounds, nuisance parameters are used to account for mismodeling of the simulated energy scale or the low-mass region by morphing the template shapes using a factor that is either proportional to, or inversely proportional to m bb , respectively. Other, and more complex, parameterizations were studied but were found unnecessary. The m bb shapes do not vary strongly with lepton flavor or qq τ 2 /τ 1 , so a single pair of uncorrelated nuisance parameters is applied per background and b tagging category.
Mismodeling of the background p T spectrum could manifest as an incorrect m HH scale. This is accounted for by morphing the background templates by multiplicative factors proportional to m HH . Possible mismodeling of the m HH resolution is considered in a similar manner, but with multiplicative factors proportional to m −1 HH . A pair of scale and resolution uncertainties is assigned to the non-q/g background spectrum for each event category. An independent set of m HH uncertainties for the q/g background is also included.

Signal uncertainties
A 2.5% uncertainty in the integrated luminosity [100] is included as a signal normalization uncertainty. Signal acceptance uncertainties from the choices of PDF, factorization scale, and renormalization scale are also applied. The scale uncertainties are obtained following the prescription found in Refs. [101,102], and the PDF uncertainty is evaluated using the NNPDF 3.0 PDF set [71]. Both the simulated trigger selection efficiency and the lepton selection efficiencies are corrected to match the data efficiencies. The uncertainties in these measurements are included as independent uncertainties in the electron and muon channel signal yields. Uncertainties in the jet energy scale, resolution, and unclustered energy resolution affect signal acceptance, m HH scale, and m HH resolution. The same m bb scale and resolution uncertainties that are applied to the m t and m W backgrounds are applied to the signal. In this case, the background and signal uncertainties are 100% correlated.
The bb jet b tagging efficiency uncertainty is included as a single nuisance parameter that varies the signal normalization in each b tagging category. The uncertainty depends on m X , with a maximum size of 10, 4, and 4% for the bT, bM, and bL categories, respectively. The bL category normalization uncertainty is anticorrelated with the other two uncertainties. A normalization uncertainty is assigned to the efficiency for passing the AK4 jet b tagging veto. The qq τ 2 /τ 1 selection efficiency is measured in a tt data sample for W bosons decaying to quarks. The uncertainty in this measurement is included as an uncertainty in the HP and LP category relative yields. An additional extrapolation uncertainty is applied because the jets in this sample have lower p T than those in signal events. The uncertainty depends on m X , with a maximum value of 7% for m X = 3.5 TeV. The LP and HP selection efficiency uncertainties are anticorrelated.

Results
The data are interpreted by performing a maximum likelihood fit for a model containing only background processes and one containing both background and signal processes. The background-only fit is found to model the data. We interpret the results as upper limits on the product of the X production cross section and the X → HH branching fraction (σB).
The quality of the fit is quantified with the generalized χ 2 goodness-of-fit test using saturated models [103]. The probability distribution function of the test statistic is obtained with pseudoexperiments and the observed value is within the central 68% quantile of expected results. The best fit values of the nuisance parameters are consistent with the initial uncertainty ranges.
The fit result and the data are projected in m bb for each event category in Figs. 3 and 4. The shape is modeled well, with each background category contributing to a specific subset of the mass range. In particular, the resonant peaks associated with W boson and top quark decays are correctly modeled by the fit. Similarly, the projection in m HH for each event category is shown in Figs. 5 and 6. Good agreement is found for the entire m HH mass range.
Model-independent upper limits are shown in Fig. 7. The 95% confidence level (CL) limits are shown for varying m X and both the spin-0 and spin-2 boson scenarios. The limits are evaluated using the asymptotic approximation [104] of the CL s method [105,106]. The observed exclusion limit is consistent with the expected limit; the most significant deviation between the two is about 1.5 standard deviations at m X ≈ 2.3 TeV. The sources of the discrepancy are small excesses in data at high m HH for the µ, bM, LP and µ, bL, HP event categories. The m X = 0.8 TeV spin-0 signal is excluded for σB > 123 fb, with the exclusion limit strengthening to σB > 8.3 fb for m X = 3.5 TeV signal. The higher signal acceptance for spin-2 signal results in stronger constraints on σB: >103 fb for m X = 0.8 TeV signal and >7.8 fb for m X = 3.5 TeV signal. This search yields the best limits in this decay channel for X → HH production. It has similar sensitivity to resonances with m X ≈ 1 TeV to searches performed in other channels [39,48]. This search is less sensitive to m X 1.5 TeV resonances because of the degradation of the lepton selection efficiency for events with very large boost.   Figure 7: Observed and expected 95% CL upper limits on the product of the cross section and branching fraction to HH for a generic spin-0 (left) and spin-2 (right) boson X, as a function of mass. Example radion and bulk graviton predictions are also shown. The HH branching fraction is assumed to be 25 and 10%, respectively.
Predicted radion and bulk graviton cross sections [107] are also shown in Fig. 7 in the context of Randall-Sundrum models that allow the SM fields to propagate though the extra dimension. Typical model parameters are chosen as proposed in Ref. [108]. For radions, a branching fraction of 25% to HH and an ultraviolet cutoff Λ R = 3 TeV are assumed. A 10% branching fraction is assumed for bulk gravitons, which occurs in scenarios that include significant coupling between the bulk graviton and top quarks. Bulk graviton production cross sections depend on the dimensionless quantity k = √ 8πk/M Pl , where k is the curvature of the extra dimension and M Pl is the Planck mass. For this interpretation, we choose k = 0.1 and 0.3. For these particular signal parameters the radion and bulk graviton decay widths are larger than the 1 MeV width chosen for signal sample generation, but smaller than the detector resolution.

Summary
A search has been presented for new particles decaying to a pair of Higgs bosons (H) where one decays into a bottom quark pair (bb) and the other into two W bosons that subsequently decay into a lepton, a neutrino, and a quark pair. The large Lorentz boost of the Higgs bosons leads to the distinct experimental signature of one large-radius jet with substructure consistent with the decay H → bb and a second large-radius jet with a nearby isolated lepton consistent with the decay H → WW * . This search uses a sample of proton-proton collisions collected at √ s = 13 TeV by the CMS detector, corresponding to an integrated luminosity of 35.9 fb −1 . The primary standard model background, top quark pair production, is suppressed by reconstructing the HH decay chain and applying mass constraints. The signal and background yields are estimated by a two-dimensional template fit in the plane of the bb jet mass and the HH resonance mass. The templates are validated in a variety of data control regions and are shown to model the data well. The data are consistent with the expected standard model background.
The results represent upper limits on the product of cross section and branching fraction for new bosons decaying to HH. The observed limit at 95% confidence level for a spin-0 resonance ranges from 123 fb at 0.8 TeV to 8.3 fb at 3.5 TeV, while the limit for a spin-2 resonance is 103 fb at 0.8 TeV and 7.8 fb at 3.5 TeV. These are the best results to date from searches for an HH resonance decaying to this final state. The results are comparable to the results from searches in other channels for resonances with masses below 1.5 TeV.   [36] ATLAS Collaboration, "Search for Higgs boson pair production in the bbWW * decay mode at √ s = 13 TeV with the ATLAS detector", (2018). arXiv:1811.04671. Submitted to JHEP.  [41] CMS Collaboration, "Search for production of Higgs boson pairs in the four b quark final state using large-area jets in proton-proton collisions at √ s = 13 TeV", JHEP 01 (2019) 040, doi:10.1007/JHEP01(2019)040, arXiv:1808.01473.
[42] CMS Collaboration, "Searches for a heavy scalar boson H decaying to a pair of 125 GeV Higgs bosons hh or for a heavy pseudoscalar boson A decaying to Zh, in the final states with h → ττ", Phys. Lett