Boosted Higgs $\rightarrow b\bar{b}$ in vector-boson associated production at 14 TeV

The production of the Standard Model Higgs boson in association with a vector boson, followed by the dominant decay to $H \rightarrow b\bar{b}$, is a strong prospect for confirming and measuring the coupling to $b$-quarks in $pp$ collisions at $\sqrt{s}=14$ TeV. We present an updated study of the prospects for this analysis, focussing on the most sensitive highly Lorentz-boosted region. The evolution of the efficiency and composition of the signal and main background processes as a function of the transverse momentum of the vector boson are studied covering the region $200-1000$ GeV, comparing both a conventional dijet and jet substructure selection. The lower transverse momentum region ($200-400$ GeV) is identified as the most sensitive region for the Standard Model search, with higher transverse momentum regions not improving the statistical sensitivity. For much of the studied region ($200-600$ GeV), a conventional dijet selection performs as well as the substructure approach, while for the highest transverse momentum regions ($>600$ GeV), which are particularly interesting for Beyond the Standard Model and high luminosity measurements, the jet substructure techniques are essential.


Introduction
Following the discovery of a Higgs boson [1,2] with a mass of around 125 GeV principally via its decay to gauge bosons (γ, Z, W ), the task of confirming and then measuring the presumeddominant decay to bb remains a priority and a challenge. The most sensitive searches for this decay mode to date are in the "boosted" region of the V H production channel -that is, when the Higgs (H) and the vector boson (V ) both have transverse momentum p T > 200 GeV or so. Two approaches can be used to reconstruct the Higgs boson in this region: two nearby, separate "resolved" b-jets can be identified, or a single "fat" jet can be found and decomposed using jet substructure techniques.
The use of jet substructure techniques to identify hadronically-decaying boosted, massive particles was suggested some time before the start-up of the Large Hadron Collider [3,4], and has seen much phenomenological and experimental activity and progress over recent years (see [5] for a recent overview). Jet substructure and/or "grooming" techniques have claimed many successes in recent measurements and searches, and in particular have been shown to not only be robust against soft QCD effects such as underlying event and multiple proton-proton interactions (pile-up), but in some cases an essential tool for reducing their impact [6,7].
An early expectation was that boost, and hence jet substructure, would be important for identifying the bb decay mode of a low-mass Higgs boson [8]. The searches to date for this decay mode using LHC data [9,10] indeed gain most of their sensitivity from the boosted region -in which the Higgs and the vector boson both have transverse momentum p T > 200 GeVbut do not exploit jet substructure. One reason for this is the excellent performance of the anti-k T jet algorithm [11] used by both ATLAS and CMS. When run with a radius parameter of R = 0.4 (ATLAS) or 0.5 (CMS), a good mass resolution is obtained along with well-defined jet separation, even for jet pairs which are quite boosted. Another is the fact the mass of the Higgs boson, at 125 GeV, turned out to be towards the high end of the applicability of the jet substructure methods, which would have been most effective for a 115 GeV Higgs boson. Finally, a major reason is assumed to be the fact that the LHC has not yet reached its design energy of 14 TeV, but ran in 2010 and 2011 at centre-of-mass energies of 7 TeV, and in 2012 at 8 TeV. The lower centre-of-mass energy shifts the balance in favour of the un-boosted region of phase-space with respect to the expectations at 14 TeV, reducing the high-p T fraction of the cross section substantially.
We examine these assumptions, and re-evaluate the potential impact of using jet substructure techniques to decompose a large-radius "fat" jet on the search for the H → bb decay in the V H channel in the 14 TeV era, by conducting a particle-level study of boosted W H, H → bb production. Although we only consider the W H, H → bb channel, we expect the conclusions on the resolved and jet substructure approaches, to be largely applicable to the ZH, H → bb channels.

Event Generation and Selection
Candidate W bosons are identified by requiring a muon with p T > 20 GeV and absolute pseudorapidity |η| < 3.0, as well as a neutrino with p T > 20 GeV. Only events in which the p T of the W is greater than 200 GeV are considered. It is assumed, based on previous measurements, that the presence of a high p T lepton, as well as two highly boosted b-jets, allows for very efficient triggering, and that there is negligible efficiency loss due to the trigger within the acceptance.
Two jet algorithms are used in this study: anti-k T R = 0.4 and Cambridge/Aachen [12] R = 1.2 split and filtered [8] jets. The analysis was performed using a Rivet [13] routine, making extensive use of fastjet [14] 1 .
The geometrical matching of jets or subjets to B-hadrons is performed by requiring a ∆R condition 2 on their overlap, chosen to be less than 0.4 or 0.3, respectively. A variable-R matching was also tried for the subjets, where R was defined as the subjet radius, but was found to bring no significant improvement to the analysis sensitivity. This statement is in part dependent on the background composition, and in particular if charm rejection were to be significantly improved, variable-R matching could bring benefits since it rejects more genuine bb events. If more than one B-hadron overlaps, the closest is chosen, and the matching continues with the remaining hadrons. Only B-hadrons with p T > 5 GeV are considered. If a jet or subjet is not matched to a B-hadron, an additional check is performed with charm hadrons, to allow the experimental charm-quark mis-tag rate to be estimated. If both matching conditions fail, the jet is labelled as 'light'.
Higgs boson candidates are selected in two different ways. In the resolved approach, the following requirements are applied: 1 The Rivet analysis code is available from the authors on request. 2 Defined as ∆R = (∆φ) 2 + (∆η) 2 , where φ is the azimuthal angle.
• At least two anti-k T R = 0.4 jets with p T > 20 GeV, |η| < 3; • ∆R < 1.4 between the two leading anti-k T jets; • Each of the two leading jets is matched to a B-hadron.
In the substructure approach the following requirements are applied: • At least one Cambridge/Aachen split and filtered jet with p T > 180 GeV, |η| < 3; • The two subjets with highest p T in the leading Cambridge/Aachen split and filtered jet are each matched to a B-hadron.
After this event selection the dominant backgrounds are top-pair production (tt) and W + bb, with additional contributions from W t and W Z processes. In addition to the vector boson candidate selection, a veto on the number of jets in the event is applied to suppress these backgrounds, such that events with more than three anti-k T jets with p T > 20 GeV and |η| < 5 are rejected, and the sub-subleading anti-k T jet, if present, is required to be in the forward region (|η| > 3.0) or to have low transverse momentum (less than 10% of p T (W )). These cuts are used to make a more realistic estimate of the signal-to-background ratio and significance. They carry significant theoretical uncertainties and experimental challenges, but do not strongly affect the comparison between the resolved and substructure approaches since they are the same for both 3 .
In the simulation of signal and backgrounds, the calculation of the matrix elements is performed with amc@nlo [15], including NLO corrections in QCD. The description of the processes is improved by matching the NLO calculation with a parton-shower program, in this case herwig++ [16][17][18], which also includes models of the underlying event and hadronisation. The renormalisation and factorisation scales are dynamically defined as the sum of the transverse masses of all final state particles and partons 4 . For all processes except W t, the decays of the t, W and H are simulated using MadSpin [19], considering the t → W b, W → µν, W → qq ′ and H → bb decay modes, with the branching ratios set to 1.0, 0.11, 0.68 and 0.58, respectively. For the W t-channel event generation, the interference with tt is dealt with by the Diagram Removal scheme [20], and the W and t decays are performed by herwig++. Multi-jet processes can also be a background to W H, H → bb searches. However, their contribution is negligible in the boosted region and is disregarded here.
Pile-up is not simulated. Studies using full detector simulation indicate that jet grooming techniques can remove effects of pile-up to a large extent [5], even under extreme conditions [21], as can pile-up subtraction techniques in the case of anti-k T jets [22]. The presence of pile-up jets could also lead to a degradation in the efficiency of the jet veto cut. In this study we assume that sufficiently robust and efficient algorithms are available to reduce any efficiency loss due to pile-up jets to a negligible level. However, any efficiency loss due to pile-up jets would impact both signal and background equally, leading to a lower sensitivity overall and this would not alter the main conclusions of the study on the relative performance of the resolved and substructure approaches.
The total rate of tt events is scaled by a factor of 1.25 based on an estimate of the impact of NNLO QCD contributions [23]. This assumes a uniform enhancement of the cross-section as a function of the top-quark p T , and is therefore a conservative estimate of the expected behaviour 5 . We note that the transverse momentum spectrum of V H production is known to be subject to significant higher order corrections [26,27].
Events are weighted to take into account a b-tagging efficiency assumed to be 75%, and mis-tag rates of 15% for charm (c) and 1% for all other quarks and gluons (l). Although the requirement of two b-tagged jets reduces most of the W +jets background to W + bb events, the contribution from W + cc events is not negligible. Based on the yields obtained in the ATLAS result of [28], the W + bb process is scaled by a factor of 1.2 to account, approximately, for the W + cc contamination. Given that in the boosted region W + ll was found to only make up ∼ 1% of the total background, it was deemed negligible and not included in this study.

Signal Acceptance
The evolution of the signal efficiency for the resolved and substructure methods as a function of p T (W ) is shown in Fig. 1a. These efficiencies are evaluated after applying the vector boson selection cuts described above, and requiring a Higgs boson candidate in the invariant-mass window 110 < m H < 130 GeV, but before applying the jet veto. Efficiencies for events which are uniquely reconstructed by each approach are shown as dashed lines.
The resolved method identifies significantly more events than does the substructure approach at lower p T (W ) (200 − 300 GeV) and approximately 20% of the events reconstructed in the resolved case are missed by the substructure approach over the full p T (W ) range, mostly due to a combination of the momentum balance condition of the splitting algorithm, the B-hadronsubjet matching requirements, and the mass window condition. The two algorithms have very similar performance in the ∼ 300 − 550 GeV region. A marked drop in the efficiency of the resolved method is observed when p T (W ) exceeds 600 GeV, reflecting the increasing probability that the bb pair be emitted with an angular separation of less than 0.4, and thus failing to be reconstructed as two anti-k T R = 0.4 jets.
In the p T (W ) > 200 GeV region, events uniquely reconstructed by the substructure approach contribute ∼20% of the total acceptance, a contribution that increases to ∼70% when considering only the p T (W ) > 600 GeV region. Considering a luminosity of 150 fb −1 , this implies an additional ∼ 30 and ∼3 events (in the muon channel alone), respectively. This can be compared to the ∼120 and ∼1 signal events expected in the resolved case.
The impact of these efficiencies on the accessibility of the signal is demonstrated in Fig. 1b, which shows the W H differential cross-section with respect to the W transverse momentum, p T (W ), multiplied by branching ratio and selection efficiency.

Background Estimation
In addition to the signal efficiency, the evolution of signal-to-background ratios and significance with p T (W ) are important figures-of-merit to conclude on the feasibility of the V H, H → bb channel and the usefulness of substructure techniques. Bearing in mind the limitations of a particle-level study, estimates of identification and reconstruction efficiencies for the main background processes have been made.
The background efficiencies for the resolved and substructure methods are shown in Fig. 2, as a function of p T (W ). As with the signal efficiencies in Fig. 1a, they are evaluated after applying the boson selection cuts and mass window but before the jet veto. In general the background efficiencies show similar features to the signal, with a drop in the resolved efficiency (i.e. increased rejection) around 500 − 600 GeV for the resolved method, which is not seen in the substructure method. The exception to this is the W + bb background, where the resolved efficiency does not drop as rapidly. This seems to be due to the fact that wide-angle bb pairs produced in the hard matrix element continue to feed into the boosted kinematic region as p T increases. We also note that below 400 GeV, the W Z background is significantly higher in the substructure case, due to Z → bb decays reconstructed with a mass above 110 GeV.
After the initial event selection, the jet veto rejects roughly 30% and 40% of signal events in the Higgs boson mass window with the resolved and substructure selection, respectively. It is however extremely effective in reducing the tt contamination in the mass window rejecting over 90% of the events in both cases. The efficiency for W + bb events is more discrepant between the methods, ranging from approximately 30% to 50%, with the best rejection achieved by the substructure approach.

Mass Distributions and Sensitivity
The invariant mass distributions are shown in Fig. 3 for both the resolved and substructure approaches for an integrated luminosity of 3000 fb −1 , with Table 1 showing the expected number of events in the m H window for each process. The top background has a peak in the same region as the signal, especially in the resolved case. The region of low invariant masses obtained with the substructure reconstruction has a very high purity of W + bb events and could in principle be useful as a control region for this background. Table 2 displays the categorisation of events in terms of the flavour composition of the leading and subleading jets: bb, bc and bl. As expected, the signal is dominated by genuine bb events. The W Z and W + bb backgrounds are also dominated by bb, with a few percent contribution from mis-tags. However, most of the tt contamination comes from mis-tagged bc events, a component which is even more significant in W t events 6 .
The contribution of bc to the tt background also increases as a function of p T (W ), making up ∼ 85% of the tt background in both the resolved and substructure cases for p T (W ) > 400 GeV.
In the resolved case, the bb component becomes negligible in this region, whilst it continues to contribute ∼ 5% in the substructure case, with the remaining component due to bl. The bb contribution in the substructure case is composed of a significant fraction of tt-pairs produced in association with additional heavy flavour jets. This becomes the dominant contribution for p T (W ) > 400 GeV, where it forms ∼ 70% of this background component. Given the large theoretical uncertainties on such production, this could add an additional level of difficulty in probing this region of phase space. In the resolved case there is a negligible fraction of tt-pairs produced in association with additional heavy flavour jets in all p T (W ) regions.
Improvements in b-tagging techniques, in both improving their level of charm-quark rejection and increasing the acceptance to identify additional b-jets in the events, are vital to reduce the tt contribution in the mass window of the Higgs boson.   All cross-sections fall rapidly with increasing p T (W ), and the evolution of rates and shapes can be seen in Fig. 4 for the resolved and substructure cases. Despite the limited statistics, it is observed that in the resolved analysis, the shapes of the W +bb and tt(bb) background processes are kinematic in origin, and heavily dependent on the boost of the system.
An estimation of the signal sensitivity for both the resolved and substructure approaches is made, assuming integrated luminosities of 150 and 3000 fb −1 , corresponding to expectations for Run 2 of the LHC and for the eventual goal of a high luminosity upgrade. As well as the muon channel studied above, signal and background events originating from the electron decay channel are also taken into account, assuming the same acceptance.
The signal-to-background ratios are shown in bins of p T (W ) in Table 3, calculated in the Higgs boson mass window. The substructure method achieves a higher S/B in the 200 < p T (W ) < 400 GeV range, and the values for higher boosts are compatible between the two methods, within the statistical uncertainties. Given the significant drop in signal efficiency obtained with the resolved approach for values of p T (W ) greater than 600 GeV, a decrease in S/B might have been expected. However, this drop is accompanied by a similar decrease in the background efficiency.
The S/ √ B is calculated in bins of p T (W ), as shown in Table 3. It is observed that the most significant event region corresponds to the range 200 < p T (W ) < 400 GeV, where the resolved approach continues to perform well, and that higher boosts do not help in achieving a higher signal significance. This observation suggests that the great advantage in boosting the Table 3: Signal-to-background ratio and signal significances in the full boosted range and in each p T (W ) bin. The figures of merit are calculated considering all events selected by the resolved and substructure selections, and also events that were uniquely selected by the latter, after the jet veto is applied. The acceptance from the electron channel is taken into account. V H system consists in reducing the combinatorial background and the large tt contribution, achieved with transverse momenta on the order of the Higgs boson mass. Higher p T values are not beneficial to the signal significance due to the extremely small signal cross-section.
The two analyses achieve similar significances in the range p T (W ) < 600 GeV, while the substructure approach outperforms in the highest bin, increasing the significance by approximately 50%. A combination of the events reconstructed by the resolved approach with those uniquely reconstructed by the substructure approach has the potential to increase the significance of the highest p T (W ) region by approximately ∼60%. A Run 2 measurement targeting the full boosted regime can already achieve a statistical significance of 5σ, a result that could be improved by a few percent by combining both the resolved and substructure methods. Figure 5 shows the expected background-subtracted signal mass-peak for a luminosity on 3000 fb −1 , with error bars illustrating the anticipated statistical uncertainty. The information from both approaches could also be combined in more sophisticated ways, such as a multivariate technique, to take advantage of the complementary information such techniques can provide to better reject and control the main background processes.
This study considers only the W H channel, without systematic uncertainties. The addition of the ZH,H → bb channels, for the cases of Z decaying to either leptons or neutrinos, will significantly increase the statistical sensitivity. Additionally, further optimisations of the event selection can also be expected to further improve the sensitivity. The inclusion of systematic uncertainties will degrade the sensitivity, although given the large datasets available, it should be possible to control such uncertainties to a higher degree than was the case in Run 1 of the LHC. The conclusions reached on the relative applicability of the resolved and jet substructure approaches should not be strongly dependent on either of these consideration though. There is however an indication from these studies, that as the substructure approach gives a higher S/B in the most sensitive region, as well as a rather pure W +bb control region which could be used to constrain that background, it could have improved sensitivity relative to the resolved case once systematic uncertainties are included (assuming the two approaches have similar sensitivity to the main nuisance parameters in a profile likelihood fit and the experimental uncertainties related to the jets are comparable).
A comparison between this study and previous work [8] indicates that the substructure results for the W H here are consistent, apart from the fact that the W t background and bc contamination are better estimated here (as was also done by ATLAS using a full detector simulation [29]). The principle new factors which make the benefits of using jet substructure less dramatic are the 125 GeV mass of the Higgs boson and the excellent performance of the anti-k T algorithm over the 200 − 400 GeV range.

Conclusions
An updated feasibility study of a W H, H → bb search at a pp collider has been performed exploring the centre-of-mass energy of √ s = 14 TeV in the boosted regime, using both a resolved dijet and jet substructure selection to reconstruct the Higgs boson candidate. The most sensitive region is found to be p T (W ) = 200 − 400 GeV, with higher p T (W ) regions not improving the statistical sensitivity. In this region, both jet selections perform well. However, for p T (W ) > 600 GeV, the substructure analysis is essential to retain signal efficiency and sensitivity; this region is of interest for Standard Model measurements at high luminosities and for searches Beyond the Standard Model. Combining both approaches over the full range could also be expected to bring additional benefits. As expected, b-tagging is a central issue, especially given that the tt contamination comes mainly from mis-tagged charm jets.
In summary, the measurement of H → bb decays in the V H production channel remains challenging, but possible, in 14 TeV running of the LHC. Either a resolved dijet or jet substructure selection work equally well for the most sensitive regions, but to obtain maximum sensitivity and to probe the p T dependence, both approaches are important. and the Research Executive Agency (REA) of the European Union under the Grant Agreement PITN-GA2012-316704 ("HiggsTools").