Search for a standard model Higgs boson produced in association with a top-quark pair and decaying to bottom quarks using a matrix element method

A search for a standard model Higgs boson produced in association with a top-quark pair and decaying to bottom quarks is presented. Events with hadronic jets and one or two oppositely charged leptons are selected from a data sample corresponding to an integrated luminosity of 19.5fb-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {fb}^\text {-1}$$\end{document} collected by the CMS experiment at the LHC in pp\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {p}\mathrm {p}$$\end{document} collisions at a centre-of-mass energy of 8TeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\hbox {TeV}$$\end{document}. In order to separate the signal from the larger tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {t}\overline{\hbox {t}}$$\end{document} + jets background, this analysis uses a matrix element method that assigns a probability density value to each reconstructed event under signal or background hypotheses. The ratio between the two values is used in a maximum likelihood fit to extract the signal yield. The results are presented in terms of the measured signal strength modifier, μ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document}, relative to the standard model prediction for a Higgs boson mass of 125GeV\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\hbox {GeV}$$\end{document}. The observed (expected) exclusion limit at a 95 % confidence level is μ<4.2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu <4.2$$\end{document} (3.3), corresponding to a best fit value μ^=1.2-1.5+1.6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\mu }=1.2^{+1.6}_{-1.5}$$\end{document}.


Introduction
Following the discovery of a new boson with mass around 125 GeV by the ATLAS and CMS Collaborations [1][2][3] at the CERN LHC, the measurement of its properties has become an important task in particle physics. The precise determination of its quantum numbers and couplings to gauge bosons and fermions will answer the question whether the newly discovered particle is the Higgs boson (H) predicted by the standard model (SM) of particle physics, i.e. the quantum of the field responsible for the spontaneous breaking of the electroweak symmetry [4][5][6][7][8][9]. Conversely, any deviation from SM predictions will represent evidence of physics beyond our present knowledge, thus opening new horizons in highenergy physics. While the measurements performed with the data collected so far indicate overall consistency with the SM e-mail: cms-publication-committee-chair@cern.ch expectations [3,[10][11][12][13], it is necessary to continue improving on the measurement of all possible observables.
In the SM, the Higgs boson couples to fermions via Yukawa interactions with strength proportional to the fermion mass. Direct measurements of decays into bottom quarks and τ leptons have provided the first evidence that the 125 GeV Higgs boson couples to down-type fermions with SM-like strength [14]. Evidence of a direct coupling to uptype fermions, in particular to top quarks, is still lacking. Indirect constraints on the top-quark Yukawa coupling can be inferred from measuring either the production or the decay of Higgs bosons through effective couplings generated by top-quark loops. Current measurements of the Higgs boson cross section via gluon fusion and of its branching fraction to photons are consistent with the SM expectation for the top-quark Yukawa coupling [3,[10][11][12]. Since these effective couplings occur at the loop level, they can be affected by beyond-standard model (BSM) particles. In order to disentangle the top-quark Yukawa coupling from a possible BSM contribution, a direct measurement of the former is required. This can be achieved by measuring observables that probe the top-quark Yukawa interaction with the Higgs boson already at the tree-level. The production cross section of the Higgs boson in association with a top-quark pair (ttH) provides an example of such an observable. A sample of tree-level Feynman diagrams contributing to the partonic processes qq, gg → ttH is shown in Fig. 1 (left and centre). The inclusive next-to-leading-order (NLO) ttH cross section is about 130 fb in pp collisions at a centre-of-mass energy √ s = 8 TeV for a Higgs boson mass (m H ) of 125 GeV [15][16][17][18][19][20][21][22][23][24], which is approximately two orders of magnitude smaller than the cross section for Higgs boson production via gluon fusion [23,24].
The first search for ttH events used pp collision data at √ s = 1.96 TeV collected by the CDF experiment at the Tevatron collider [25]. Searches for ttH production at the LHC have previously been published for individual decay modes of the Higgs boson [26,27]. The first combination of ttH searches in different final states has been published by the CMS Collaboration based on the full data set collected at √ s = 7 and 8 TeV [28]. Assuming SM branching fractions, the results of that analysis set a 95 % confidence level (CL) upper limit on the ttH signal strength at 4.5 times the SM value, while an upper limit of 1.7 times the SM is expected from the background-only hypothesis. The median expected exclusion limit for ttH production in the H → bb channel alone is 3.5 in the absence of a signal.
The results of a search for ttH production in the decay channel H → bb are presented in this paper based on pp collision data at √ s = 8 TeV collected with the CMS detector [29] and corresponding to an integrated luminosity of 19.5 fb −1 . The analysis described here differs from that of Ref. [28] in the way events are categorized and in its use of an analytical matrix element method (MEM) [30,31] for improving the separation of signal from background. Within the MEM technique, each reconstructed event is assigned a probability density value based on the theoretical differential cross section σ −1 dσ/dy, where y denotes the fourmomenta of the reconstructed particles. Particle-level quantities that are either unknown (e.g. neutrino momenta, jetparton associations) or poorly measured (e.g. quark energies) are marginalised by integration. The ratio between the probability density values for signal and background provides a discriminating variable suitable for testing the compatibility of an event with either of the two hypotheses [32].
The MEM has already been successfully used at the Tevatron collider in the context of Higgs boson searches [33,34], although for simpler final states. A phenomenological feasibility study for a ttH measurement in the H → bb decay channel at the LHC using the MEM has been pioneered in Ref. [35] based on the MadWeight package [36] for automatised matrix-element calculations. The present paper makes use of an independent implementation of the MEM, specifically optimized for the final state of interest. This is the first time that the MEM is applied to a search for ttH events. The final states typical of ttH events with H → bb, that are characterised by huge combinatorial background, the presence of nonreconstructed particles, and small signal-tobackground ratios, provide an ideal case for the deployment of the MEM. The analysis strategy is designed to maximise the separation between ttH and tt + bb background events, in order to reduce the systematic uncertainty on the signal extraction related to the modelling of this challenging background.
This paper is organised as follows. Section 2 describes the main features of the CMS detector. Section 3 presents the data and simulation samples, while Sects. 4 and 5 discuss the reconstruction of physics objects and the event selection, respectively. Section 6 describes the signal extraction. The treatment of systematic uncertainties and the statistical interpretation of the results are discussed in Sects. 7 and 8, respectively. Section 9 summarises the results.

CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the field volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. Extensive forward calorimetry complements the coverage provided by the barrel and endcap detectors. The first level of the CMS trigger system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a time interval of less than 4 µs. The high-level trigger processor farm further decreases the event rate from around 100 kHz to around 1 kHz, before data storage. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables can be found in Ref. [29].

Data and simulated samples
The data sample used in this search was collected with the CMS detector in 2012 from pp collisions at a centre-ofmass energy of 8 TeV, using single-electron, single-muon, or dielectron triggers. The single-electron trigger requires the presence of an isolated electron with transverse momentum ( p T ) in excess of 27 GeV. The single-muon trigger requires an isolated muon candidate with p T above 24 GeV. The dielectron trigger requires two isolated electrons with p T thresholds of 17 and 8 GeV.
Signal and background processes are modelled with Monte Carlo (MC) simulation programs. The CMS detector response is simulated by using the Geant4 software package [37]. Simulated events are required to pass the same trigger selection and offline reconstruction algorithms used on collision data. Correction factors are applied to the simulated samples to account for residual differences in the selection and reconstruction efficiencies with respect to those measured.
The ttH, H → bb signal is modelled by using the pythia 6.426 [38] leading order (LO) event generator normalised to the NLO theoretical cross section [15][16][17][18][19][20][21][22][23][24], and assuming the SM Higgs boson with a mass of 125 GeV. The main background in the analysis stems from tt + jet production. This process has been simulated with the MadGraph 5.1.3 [39] tree-level matrix element generator matched to pythia for the parton shower description, and normalised to the inclusive next-to-next-to-leading-order (NNLO) cross section with soft-gluon resummation at nextto-next-to-leading logarithmic accuracy [40]. The tt + jets sample has been generated in a five-flavour scheme with treelevel diagrams for two top quarks plus up to three extra partons, including both charm and bottom quarks. An additional correction factor to the tt + jets samples is applied to account for the differences observed in the top-quark p T spectrum when comparing the MadGraph simulation with data [41]. The interference between the ttH, H → bb diagrams and the tt + bb background diagrams is negligible and is not considered in the MC simulation. Minor backgrounds come from the Drell-Yan production of an electroweak boson with additional jets (W + jets, Z + jets), and from the production of a top-quark pair in association with a W ± , Z boson (ttW, ttZ). These processes have been generated by MadGraph matched to the pythia parton shower description. The Drell-Yan processes have been normalised to the NNLO inclusive cross section from fewz 3.1 [42], while the NLO calculations from Refs. [43,44] are used to normalise the ttW and ttZ samples, respectively. Single top quark production is modelled with the NLO generator powheg 1.0 [45][46][47][48][49][50] combined with pythia. Electroweak diboson processes (WW, WZ, and ZZ) are simulated by using the pythia generator normalised to the NLO cross section calculated with mcfm 6.6 [51]. Processes that involve top quarks have been generated with a topquark mass of 172.5 GeV. Samples generated at LO use the CTEQ6L1 parton distribution function (PDF) set [52], while samples generated with NLO programs use the CTEQ6.6M PDF set [53].
Effects from additional pp interactions in the same bunch crossing (pileup) are modelled by adding simulated mini-mum bias events (generated with pythia) to the generated hard interactions. The pileup multiplicity in the MC simulation is reweighted to reflect the luminosity profile observed in pp collision data.

Event reconstruction
The global event reconstruction provided by the particleflow (PF) algorithm [54,55] seeds the reconstruction of the physics objects deployed in the analysis. To minimise the impact of pileup, charged particles are required to originate from the primary vertex, which is identified as the reconstructed vertex with the largest value of p 2 T,i , where p T,i is the transverse momentum of the ith charged particle associated with the vertex. The missing transverse momentum vector p miss T is defined as the negative vector sum of the transverse momenta of all neutral particles and of the charged particles coming from the primary vertex. Its magnitude is referred to as E miss T . Muons are reconstructed from a combination of measurements in the silicon tracker and in the muon system [56]. Electron reconstruction requires the matching of an energy cluster in the ECAL with a track in the silicon tracker [57]. Additional identification criteria are applied to muon and electron candidates to reduce instrumental backgrounds. An isolation variable is defined starting from the scalar p T sum of all particles contained inside a cone around the track direction, excluding the contribution from the lepton itself. The amount of neutral pileup energy is estimated as the average p T density calculated from all neutral particles in the event multiplied by an effective area of the isolation cone, and is subtracted from the total sum. Jets are reconstructed by using the anti-k T clustering algorithm [58], as implemented in the FastJet package [59,60], with a distance parameter of 0.5. Each jet is required to have pseudorapidity (η) in the range [−2.5, 2.5], to have at least two tracks associated with it, and to have electromagnetic and hadronic energy fractions of at least 1 % of the total jet energy. Jet momentum is determined as the vector sum of the momenta of all particles in the jet. An offset correction is applied to take into account the extra energy clustered in jets because of pileup. Jet energy corrections are derived from the simulation, and are confirmed with in situ measurements of the energy balance of dijet and Z/γ + jet events [61]. Additional selection criteria are applied to each event to remove spurious jet-like features originating from isolated noise patterns in few HCAL regions.
The combined secondary vertex (CSV) b-tagging algorithm is used to identify jets originating from the hadronisation of bottom quarks [62]. This algorithm combines the information about track impact parameters and secondary vertices within jets into a likelihood discriminant to provide separation of b-quark jets from jets that originate from lighter quarks or gluons. The CSV algorithm assigns to each jet a continuous value that can be used as a jet flavour discriminator. Large values of the discriminator correspond preferentially to b-quark jets, so that working points of increasing purity can be defined by requiring higher values of the CSV discriminator. For example, the CSV medium working point (CSVM) is defined in such a way as to provide an efficiency of about 70 % (20 %) to tag jets originating from a bottom (charm) quark, and of approximately 2 % for jets originating from light quarks or gluons. Scale factors are applied to the simulation to match the distribution of the CSV discriminator measured with a tag-and-probe technique [63] in data control regions. The scale factors have been derived as a function of the jet flavour, p T , and |η|, as described in Ref. [28].

Event selection
The experimental signature of ttH events with H → bb is affected by a large multijet background which can be reduced to a negligible level by only considering the semileptonic decays of the top quark. The selection criteria are therefore optimised to accept events compatible with a ttH signal where H → bb and at least one of the top quarks decays to a bottom quark, a charged lepton, and a neutrino. Events are divided into two exclusive channels depending on the number of charged leptons (electrons or muons), which can be either one or two. Top quark decays in final states with tau leptons are not directly searched for, although they can still satisfy the event selection criteria when the tau lepton decays to an electron or muon, plus neutrinos. Channels of different lepton multiplicities are analysed separately. The single-lepton (SL) channel requires one isolated muon with p T > 30 GeV and |η| < 2.1, or one isolated electron with p T > 30 GeV and |η| < 2.5, excluding the 1.44 < |η| < 1.57 transition region between the ECAL barrel and endcap. Events are vetoed if additional electrons or muons with p T in excess of 20 GeV, the same |η| requirement, and passing some looser identification and isolation criteria are found. The dilepton (DL) channel collects events with a pair of oppositely charged leptons satisfying the selection criteria used to veto additional leptons in the SL channel. To reduce the contribution from Drell-Yan events in the same-flavour DL channel, the invariant mass of the lepton pair is required to be larger than 15 GeV and at least 8 GeV away from the Z boson mass. The optimisation of the selection criteria in terms of signal-to-background ratio requires a stringent demand on the number of jets. At least five (four) jets with p T > 30 GeV and |η| < 2.5 are requested in the SL (DL) channel. A further event selection is required to reduce the tt + jets background, which at this stage exceeds the signal rate by more than three orders of magnitude. For this purpose, the CSV discriminator values are calculated for all jets in the event and collectively denoted by ξ . For SL (DL) events with seven or more (five or more) jets, only the six (four) jets with the largest CSV discriminator value are considered. The likelihood to observe ξ is then evaluated under the alternative hypotheses of tt plus two heavy-flavour jets (tt+hf) or tt plus two light-flavour jets (tt+lf). For example, for SL events with six jets, and neglecting correlations among different jets in the same event, the likelihood under the tt + hf hypothesis is estimated as: where ξ i is the CSV discriminator for the ith jet, and f hf(lf) is the probability density function (pdf) of ξ i when the ith jet originates from heavy-(light-)flavour partons. The latter include u, d, s quarks and gluons, but not c quarks. For the sake of simplicity, the likelihood in Eq. (1) is rigorous for W → ud(s) decays, whereas it is only approximate for W → cs(d) decays, since the CSV discriminator pdf for charm quarks differs with respect to f lf [62]. Equation (1) can be extended to the case of SL events with five jets, or DL events with at least four jets, by considering that in both cases four of the jets are associated with heavy-flavour partons, and the remaining jets with light-flavour partons. The likelihood under the alternative hypothesis, f (ξ |tt + lf), is given by Eq. (1) after swapping f hf for f lf . The variable used to select events is then defined as the likelihood ratio The distribution of F for SL events with six jets is shown in Fig. 2 (bottom right). In the following, events are retained if F is larger than a threshold value F L ranging between 0.85 and 0.97, depending on the channel and jet multiplicity. The selected events are further classified as high-purity (low-purity) if F is larger (smaller) than a value F H , with F L < F H < 1.0. The lowpurity categories serve as control regions for tt + lf jets, providing constraints on several sources of systematic uncertainty. The high-purity categories are enriched in tt + hf events, and drive the sensitivity of the analysis. The thresholds F L and F H are optimised separately for each of the analysis categories defined in Sect. 6. The exact values are reported in Table 1.   After requiring a lower threshold on the selection variable F, the background is dominated by tt + jets, with minor contributions from the production of a single top quark plus jets, tt plus vector bosons, and W/Z + jets; the expected purity for a SM Higgs boson signal is only at the percent level. By construction, the selection criteria based on Eq. (2) enhance the tt+bb subprocess compared to the otherwise dominant tt+lf production. The tt + bb background has the same final state as the signal whenever the two b quarks are resolved as individual jets. Therefore, this background cannot be effectively reduced by means of the F discriminant. The cross section for tt + bb production with two resolved b-quark jets is larger than that of the signal by about one order of magnitude and is affected by sizable theoretical uncertainties [64], which Total background 311 ± 22 598 ± 38 1291 ± 60 142 ± 10 The expected event yields with their uncertainties are obtained from a signal-plus-background fit as described in Sect. 8. In the last row of each table, the symbol S (B) denotes the signal (total background) yield hampers the possibility of extracting the signal via a counting experiment. A more refined approach, which thoroughly uses the kinematic properties of the reconstructed event, is therefore required to improve the separation between the signal and the background.

Signal extraction
As in other resonance searches, the invariant mass reconstructed from the H → bb decay provides a natural discriminating variable to separate the narrow Higgs boson dijet resonance from the continuum mass spectrum expected from the tt + jets background. However, in the presence of additional b quarks from the decay of the top quarks, an ambiguity in the Higgs boson reconstruction is introduced, leading to a combinatorial background. The distribution of the experimental mass estimator built from a randomly selected jet pair is much broader compared to the detector resolution, since wrongly chosen jet pairs are only mildly or not at all correlated with m H . Unless a selection rule is introduced to filter out the wrong combinations, the existence of such a combinatorial background results in a suppression of the statistical power of the mass estimator, which grows as the factorial of the jet multiplicity. Multivariate techniques that exploit the correlation between several observables in the same event are naturally suited to deal with signal extraction in such complex final states. In this paper, a likelihood technique based on the theoretical matrix elements for the ttH process and the tt + bb background is applied for signal extraction. This method utilises the kinematics and dynamics of the event, providing a powerful discriminant between the signal and background. The tt + bb matrix elements are considered as the prototype to model all background processes. This choice guarantees optimal separation between the signal and the tt + bb background, which is a desirable property given the large rate and theoretical uncertainty in the latter. The performance on the other tt + jets subprocesses might not be necessarily optimal, even though some separation power is still preserved; indeed, the tt + bb matrix elements describe these processes better than the signal matrix elements do, as it has been verified a posteriori with the simulation. More specifically, the shapes of the matrix element discriminant predicted by the simulation for the different tt + jets subprocesses are found to be similar to each other, with a slightly better separation power for the tt + bb background. The approximate degeneracy in shape between several processes can be ascribed to a smearing effect of the combinatorial background, as well as to the impact of the Higgs boson mass constraint on the calculation of the event likelihood under the signal hypothesis. The latter provides a similar discrimination against all tt + jets subprocesses. A slightly worse separation power is instead observed for minor backgrounds, such as single top quark or ttZ events, for which neither of the two matrix elements tested really applies. However, all of the background processes analysed are found to yield discriminant shapes that can be well distinguished from that for the signal. Also, it is found that most of the statistical power attained by this method in separating ttH, H → bb from tt + bb events relies on the different correlation and kinematic distributions of the two b-quark jets not associated with the top quark decays.

Construction of the MEM probability density functions
The MEM probability density functions under the signal and background hypothesis are constructed at LO assuming for simplicity that in both cases the reactions proceed via gluon fusion. At √ s = 8 TeV, the fraction of the gluon-gluon initiated subprocesses is about 55 % (65 %) of the inclusive LO (NLO) cross section, and it grows with the centre-of-mass energy [21]. Examples of diagrams entering the calculation are shown in the middle and right panels of Fig. 1. All possible jet-quark associations in the reconstruction of the final state are considered. For each event, the MEM probability density function w(y|H) under the hypothesis H = ttH or tt + bb is calculated as: ( p a , p b , p 1 , . . . , p 8 where y denotes the set of observables for which the matrix element pdf is constructed, i.e. the momenta of jets and leptons. The sum extends over the N a possibilities of associating the jets with the final-state quarks. The integration on the right-hand side of Eq. (3) (W ). For H = ttH, the factorisation scale μ F entering the PDF is taken as half of the sum of twice the top-quark mass and the Higgs boson mass [20], while for H = tt + bb a dynamic scale is used equal to the quadratic sum of the transverse masses for all coloured partons [66]. The scattering amplitude for the hard process is evaluated numerically at LO accuracy by the program OpenLoops [67]; all resonances are treated in the narrow-width approximation [68], and spin correlations are neglected. The transfer function W (y, p) provides a mapping between the measured set of observables y and the final-state particles momenta p = (p 1 , . . . , p 8 ). Given the good angular resolution of jets, the direction of quarks is assumed to be perfectly measured by the direction of the associated jets. Also, since energies of leptons are measured more precisely than for jets, their momenta are considered perfectly measured. Under these assumptions, the total transfer function reduces to the product of the quark energy transfer function times the probability for the quarks that are not reconstructed as jets to fail the acceptance criteria. The quark energy transfer function is modelled by a single Gaussian function for jets associated with light-flavour partons, and by a double Gaussian function for jets associated with bottom quarks; the latter are constructed by superimposing two Gaussian functions with different mean and standard deviation. Such an asymmetric parametrisation provides a good description of both the core of the detector energy response and the low-energy tail arising from semileptonic B hadron decays. The parametrisation of the transfer functions has been derived from MC simulated samples.

Event categorisation
To aid the evaluation of the MEM probability density functions at LO, events are classified into mutually exclusive categories based on different parton-level interpretations. Firstly, the set of jets yielding the largest contribution to the sum defined by Eq. (1), determines the four (tagged) jets associated with bottom quarks; the remaining N untag (untagged) jets are assumed to originate either from W → qq decays (SL channel) or from initial-or final-state gluon radiation (SL and DL channels). There still remains a twelve-fold ambiguity in the determination of the parton matched to each jet, which is reflected by the sum in Eq. (3). Indeed, without distinguishing between b and b quarks, there exist 4!/(2!2!) = 6 combinations for assigning two jets out of four with the Higgs boson decay (H = ttH), or with the bottom quark-pair radiation (H = tt + bb); for each of these possibilities, there are two more ways of assigning the remaining tagged jets to either the t or t quark, thus giving a total of twelve associations. In the SL channel, an event can be classified in one of three possible categories. The first category (Cat-1) is defined by requiring at least six jets; if there are exactly six jets, the mass of the two untagged jets is required to be in the range [60, 100] GeV, i.e. compatible with the mass of the W boson. If the number of jets is larger than six, the mass range is tightened to compensate for the increased ambiguity in selecting the correct W boson decay products. In the event interpretation, the W → qq decay is assumed to be fully reconstructed, with the two quarks identified with the jet pair satisfying the mass constraint. The definition of the second category (Cat-2) differs from that of Cat-1 by the inversion of the dijet mass constraint. This time, the event interpretation assumes that one of the quarks from the W boson decay has failed the reconstruction. The integration on the right-hand side of Eq. (3) is extended to include the phase space of the nonreconstructed quark. The other untagged jet(s) is (are) interpreted as gluon radiation, and do not enter the calculation of w(y|H). The total number of associations considered is twelve times the multiplicity of untagged jets eligible to originate from the W boson decay: N a = 12N untag . In the third category (Cat-3), exactly five jets are required, and an incomplete W boson reconstruction is again assumed. In the DL channel, only one event interpretation is considered, namely that each of the four bottom quarks in the decay is associated with one of the four tagged jets.
Finally, two event discriminants, denoted by P s/b and P h/l , are defined. The former encodes only information from the event kinematics and dynamics via Eq. (3), and is therefore suited to separate the signal from the background; the latter contains only information related to b tagging, thus providing a handle to distinguish between the heavy-and the lightflavour components of the tt + jets background. They are defined as follows: and where the functions f (ξ |tt+hf) and f (ξ |tt+lf) are defined as in Eq. (1) but restricting the sum only to the jet-quark associations considered in the calculation of w(y); the coefficients k s/b and k h/l in the denominators are positive constants that can differ among the categories and will be treated as optimisation parameters, as described below. The joint distribution of the (P s/b , P h/l ) discriminants is used in a two-dimensional maximum likelihood fit to search for events resulting from Higgs boson production. By construction, the two discriminants satisfy the constraint 0 ≤ P s/b , P h/l ≤ 1. Because of the limited size of the simulated samples, the distributions of P s/b and P h/l are binned. A finer binning is used for the former, which carries the largest sensitivity to the signal, while the latter is divided into two equal-sized bins. The coefficient k s/b appearing in the definition of P s/b is introduced to adjust the relative normalisation between w(y|ttH) and w(y|tt + bb); likewise for k h/l . A redefinition of any of the two coefficients would change the corresponding discriminant monotonically, thus with no impact on its separation power. However, since both variables are analysed in bins with fixed size, an optimisation procedure, based on minimising the expected exclusion limit on the signal strength as described in Sect. 8, is carried out to choose the values that maximise the sensitivity of the analysis. More specifically, the coefficients k s/b are first set to the values that remove any local maximum for the tt + bb distribution around P s/b ∼ 1, a condition that is found to provide already close to optimal coefficients. Then, starting from this initial point, several values of k s/b are scanned and the P s/b distributions are recomputed accordingly. An expected upper limit on the signal strength is then evaluated for each choice of k s/b using the simulated samples. This procedure is repeated until a minimum in the expected limit is obtained. A similar procedure is applied for choosing the optimal k h/l coefficients.

Background modelling
The background normalisation and the distributions of the event discriminants are derived by using the MC simulated samples described in Sect. 3. In light of the large theoretical uncertainty that affects the prediction of tt plus heavyflavour [64,69], the MadGraph sample is further divided into subsamples based on the quark flavour associated with the jets generated in the acceptance region p T > 20 GeV, The second column reports the range of rate variation for the processes affected by a given source of systematic uncertainty (as specified in the last three columns) when the nuisance parameter associated with it is varied up or down by its uncertainty. The third column indicates whether a source of systematic uncertainty is assumed to affect the process normalisation only, or both the normalisation and the shape of the event discriminants These cases typically arise when the second extra b quark in the event is either too far forward or too soft to be reconstructed as a jet, or because the two extra b quarks are emitted almost collinearly and end up in a single jet. Similarly, if at least one reconstructed jet is matched to a c quark, the event is labelled as tt + cc. In the latter case, single-and doublematched events are treated as one background. If none of the above conditions is satisfied, the event is classified as tt plus light-flavour. Table 1 reports the number of events observed in the various categories, together with the expected signal and background yields. The latter are obtained from the signalplus-background fit described in Sect. 8.

Systematic uncertainties
There are a number of systematic uncertainties of experimental and theoretical origin that affect the signal and the background expectations. Each source of systematic uncertainty is associated with a nuisance parameter that modifies the likelihood function used to extract the signal yield, as described in Sect. 8. The prior knowledge on the nuisance parameter is incorporated into the likelihood in a frequentist manner by interpreting it as a posterior arising from a pseudomeasurement [70]. Nuisance parameters can affect either the yield of a process (normalisation uncertainty), or the shape of the P s/b and P h/l discriminants (shape uncertainty), or both. Multiple processes across several categories can be affected by the same source of uncertainty. In that case the related nuisance parameters are treated as fully correlated. The uncertainty in the integrated luminosity is estimated to be 2.6 % [71]. The lepton trigger, reconstruction, and identification efficiencies are determined from control regions by using a tag-and-probe procedure. The total uncertainty is evaluated from the statistical uncertainty of the tag-and-probe measurement, plus a systematic uncertainty in the method, and is estimated to be 1.6 % per muon and 1.5 % per electron. It is conservatively approximated to a constant 2 % per charged lepton. The uncertainty on the jet energy scale (JES) ranges from 1 % up to about 8 % of the expected energy scale depending on the jet p T and |η| [61]. For each simulated sample, two alternative distributions of the P s/b and P h/l discriminants are obtained by varying the energy scale of all simulated jets up or down by their uncertainty, and the fit is allowed to interpolate between the nominal and the alternative distributions with a Gaussian prior [70]. A similar procedure is applied to account for the uncertainty related to the jet energy resolution (JER), which ranges between about 5 and 10 % of the expected energy resolution depending on the jet direction. Since the analysis categories are defined in terms of the multiplicity and kinematic properties of the jets, a variation of either the scale or the resolution of the simulated jets can induce a migration of events in or out of the analysis categories, as well as migrations among different categories. The fractional change in the event yield induced by a shift of the JES (JER) ranges between 4-13 % (0.5-2 %), p t T is the transverse momentum of the generated top quark, between one (no correction at all) and 2r t − 1 (the relative correction is doubled). This results in both a shape and a normalisation uncertainty. The latter can be as large as 20 % for a top quark p T around 300 GeV, and corresponds to an overall normalisation uncertainty of about 3-8 % depending on the category. To account for uncertainties in the tt + jets acceptance, the factorisation and renormalisation scales used in the simulation are varied in a correlated way by factors of 1/2 and 2 around their central value. The scale variation is assumed uncorrelated among tt + bb, tt + b, and tt + cc. In a similar way, independent scale variations are introduced for events with exactly one, two, or three extra partons in the matrix element. To account for possibly large K-factors due to the usage of a LO MC generator, the tt+bb, tt+b, and tt+cc normalisations predicted by the MadGraph simulation are assigned a 50 % uncertainty each. This value can be seen as a conservative upper limit to the theoretical uncertainty in the tt + hf cross section achieved to date [64]. Essentially, the approach followed here is to assign large a priori normalisation uncertainties to the different tt + jets subprocesses, thus allowing the fit to simultaneously adjust their rates. Scale uncertainties in the inclusive theoretical cross sections used to normalise the simulated samples range from a few percent up to 20 %, depending on the process. The PDF uncertainty is treated as fully correlated for all processes that share the same dominant initial state (i.e. gg, gq, or qq); it ranges between 3 and 9 %, depending on the process. Finally, the effect of the limited size of the simulated samples is accounted for by introducing one nuisance parameter for each bin of the discriminant histograms and for each sample, as described in Ref. [72]. Table 2 summarises the various sources of systematic uncertainty with their impact on the analysis.

Results
The statistical interpretation of the results is performed by using the same methodology employed for other CMS Higgs boson analyses and extensively documented in Ref. [2]. The measured signal rate is characterised by a strength modifier μ = σ/σ SM that scales the Higgs boson production cross section times branching fraction with respect to its SM expectation for m H = 125 GeV. The nuisance parameters, θ , are incorporated into the likelihood as described in Sect. 7. The total likelihood function L (μ, θ) is the product of a Poissonian likelihood spanning all bins of the (P s/b , P h/l ) distributions for all the eight categories, times a likelihood function for the nuisance parameters. Based on the asymptotic properties of the profile likelihood ratio test statistic q(μ) = −2 ln[L(μ,θ μ )/L(μ,θ)], confidence intervals on μ are set, whereθ andθ μ indicate the best-fit value for θ obtained when μ is floating in the fit or fixed at a hypothesised value, respectively. Figures 3 and 4 show the binned distributions of (P s/b , P h/l ) in the various categories and for the two channels. For visualisation purposes, the two-dimensional histograms are projected onto one dimension by showing first the distribution of P s/b for events with P h/l < 0.5 and then for P h/l ≥ 0.5. The observed distributions are compared to the signal-plusbackground expectation obtained from a combined fit to all categories with the constraint μ = 1. No evidence of a ttH signal over the background is observed. The statistical interpretation is performed both in terms of exclusion upper limits (UL) at a 95 % CL, where the modified CL s prescription [73,74] is adopted to quote confidence intervals, and in terms of the maximum likelihood estimator of the strength modifier (μ). Figure 5 (top) shows the observed 95 % CL UL on μ, compared to the signal-plus-background and to the background- The observed 95 % CL UL on μ are given in the third column, and are compared to the median expected limits for both the signal-plus-background and for the background-only hypotheses. For the latter, the ±1σ and ±2σ CL intervals are also given only expectation. Results are shown for the SL and DL channels alone, and for their combination. The observed (background-only expected) exclusion limit is μ < 4.2 (3.3).
The best-fit value of μ obtained from the individual channels and from their combination is shown in Fig. 5 (middle). A best-fit valueμ = 1.2 +1.6 −1.5 is measured from the combined fit. Table 3 summarises the results.
Overall, a consistent distribution of the nuisance parameters pulls is obtained from the combined fit. In the signal-plusbackground (background-only) fit, the nuisance parameters that account for the 50 % normalisation uncertainty in the tt + bb, tt + b, and tt + cc backgrounds are pulled by +0.2 (+0.5), −0.4 (−0.3), and +0.8 (+0.8), respectively, where the pull is defined as the shift of the best-fit estimator from its nominal value in units of its a priori uncertainty. The correlation between the tt + bb normalisation nuisance and thê μ estimator is found to be ρ ≈ −0.4, and is the largest entry in the correlation matrix. From an a priori study (i.e. before fitting the nuisance parameters with the likelihood function of the data), the nuisance parameter corresponding to the 50 % normalisation uncertainty in the tt+bb background features the largest impact on the median expected limit, which would be around 4 % smaller if that uncertainty were not taken into account. Such a reduced impact on the expected limit implies that the sensitivity of the analysis is only mildly affected by the lack of a stringent a priori constraint on the tt + bb background normalisation; this is also consistent with the observation that the fit effectively constrains the tt + bb rate, narrowing its normalisation uncertainty down to about 25 %.
For illustration, Fig. 5 (bottom) shows the distribution of the decimal logarithm log(S/B), where S/B is the ratio between the signal and background yields in each bin of the two-dimensional histograms, as obtained from a combined fit with the constraint μ = 1. Agreement between the data and the SM expectation is observed over the whole range of this variable.

Summary
A search for Higgs boson production in association with a top-quark pair with H → bb has been presented. A total of 19.5 fb −1 of pp collision data collected by the CMS experiment at √ s = 8 TeV has been analysed. Events with one lepton and at least five jets or two opposite-sign leptons and at least four jets have been considered. Jet b-tagging information is exploited to suppress the tt plus light-flavour background. A probability density value under either the ttH or the tt + bb background hypothesis is calculated for each event using an analytical matrix element method. The ratio of probability densities under these two competing hypotheses allows a one-dimensional discriminant to be defined, which is then used together with b-tagging information in a likelihood analysis to set constraints on the signal strength modifier μ = σ/σ SM .
No evidence of a signal is found. The expected upper limit at a 95 % CL is μ < 3.3 under the background-only hypothesis. The observed limit is μ < 4.2, corresponding to a best-fit valueμ = 1.2 +1.6 −1.5 . Within the present statistics, the analysis documented in this paper yields competitive results compared to those obtained on the same data set and for the same final state by using non-analytical multivariate techniques [28]. However, the matrix element method applied for a maximal separation between the signal and the dominant tt + bb background allows for a better control of the systematic uncertainty due to this challenging background. This method represents a promising strategy towards a precise determination of the top quark Yukawa coupling. Once the statistical uncertainty will be reduced by the inclusion of the upcoming 13 TeV collision data, systematic uncertainties will start to play a more important role. By incorporating experimental and theoretical model parameters into an event likelihood, the matrix element method offers a natural handle to minimise the impact of systematic uncertainties on the extraction of the signal.
Acknowledgments We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: the Austrian Federal Ministry of Science, Research and Economy and the Austrian Science Fund; the Belgian