Search for the standard model Higgs boson produced in association with a top-quark pair in pp collisions at the LHC

A search for the standard model Higgs boson produced in association with a top-quark pair is presented using data samples corresponding to an integrated luminosity of 5.0 inverse femtobarns (5.1 inverse femtobarns) collected in pp collisions at the center-of-mass energy of 7 TeV (8 TeV). Events are considered where the top-quark pair decays to either one lepton+jets (t tbar to ell nu q q' b bbar) or dileptons (t tbar to ell(+) nu ell(-) nu b bbar), ell being an electron or a muon. The search is optimized for the decay mode H to b bbar. The largest background to the t tbar H signal is top-quark pair production with additional jets. Artificial neural networks are used to discriminate between signal and background events. Combining the results from the 7 TeV and 8 TeV samples, the observed (expected) limit on the cross section for Higgs boson production in association with top-quark pairs for a Higgs boson mass of 125 GeV is 5.8 (5.2) times the standard model expectation.


Introduction
With the recent observation [1,2] at the Large Hadron Collider (LHC) of a new, Higgs-like particle with a mass of approximately 125 GeV, the focus of searches for the standard model (SM) Higgs boson has shifted to evaluating the consistency of this new particle with SM expectations. A key component in this effort will be to determine whether the new particle's observed couplings to other fundamental particles match the predictions for a SM Higgs boson. A deviation from expectations could provide hints of physics beyond the standard model.
In the SM, the dominant production mechanism for the Higgs boson at the LHC arises from gluon fusion, via the Higgs boson coupling to gluons through a heavy quark loop. However, with sufficient data, other production mechanisms, such as Higgs boson production via vector boson fusion or in association with a W boson, Z boson, or tt pair, should also be observable. Furthermore, there are a number of decay channels available to a SM Higgs boson with a mass of approximately 125 GeV. Although the dominant decay mode at this mass is to a pair of bottom quarks, decays to WW, ZZ, ττ, and γγ are also experimentally accessible. The SM provides precise predictions for these production and decay rates that depend on the coupling strength of the Higgs boson to the other fundamental particles of the SM.
To date, the only combinations of production mechanism and decay mode that have been established at greater than three standard deviation (σ) significance for this newly observed particle are direct production, with the new particle decaying either to a pair of photons or a pair of W or Z bosons. In all three of these cases, the observed rates are in agreement with SM expectations for Higgs boson production within the experimental uncertainties. However, establishing the complete consistency of the couplings of this newly observed particle with SM expectations for the Higgs boson involves measuring the rate of production across all the various possible production and decay channels discussed above.
The analysis described herein focuses on the search for a Higgs boson produced in association with a pair of top quarks (ttH production) conducted at the Compact Muon Solenoid (CMS) experiment. The analysis considers Higgs boson masses between 110 and 140 GeV. The search is optimized for Higgs boson decays to a bottom-quark pair, but we do not exclude events from other Higgs boson decay modes. The rate at which this process occurs depends on the largest of the fermionic couplings to the Higgs boson, namely the couplings to the top and bottom quarks. These two key couplings will be particularly important in probing the new particle's consistency with SM expectations.
The ttH vertex is the most challenging one to probe directly. Measuring the rate of Higgs boson production through the gluon fusion process provides an indirect measurement of the coupling between the top quark and the Higgs boson because this production mechanism is dominated by a top-quark loop that couples the gluons to the Higgs boson [3]. Likewise, the decay of the Higgs boson to two photons receives a significant contribution from a top-quark loop, although the loop involving W bosons dominates in this process [4]. However, extraction of the coupling between the top quark and the Higgs boson in this way relies on the assumption that there are no new massive fundamental particles beyond those of the SM that contribute in the loop. Unless the Higgs boson is very heavy, it will not decay to top quarks. Therefore, for the mass range most favored for the SM Higgs [5], and for 125 GeV in particular, ttH production is the only way to probe the ttH vertex in a model-independent manner [6,7].
In contrast, there are several processes that can be used to probe the coupling of this new particle to bottom quarks. Because of the large bb background from multijet production, it is not experimentally feasible to probe H → bb in Higgs boson production via gluon fusion. Instead, the search is typically made using associated production involving either a W or a Z boson (VH production). Although ttH production has a smaller expected cross section, this signature provides a probe that is complementary to the VH channel: they both provide information about the coupling between the bottom quark and the Higgs boson, but the dominant backgrounds are very different, tt + jets production instead of W + jets production.
An observation of ttH production, depending on the measured properties, might be consistent with the SM Higgs boson or could indicate something more exotic [8,9]. Since the expected SM rates in this channel are very small, a sizeable excess would be clear evidence for new physics. A previous search at the Tevatron [10], the first such search conducted at a hadron collider, showed no significant excess over SM expectation. This paper is organized as follows. Section 2 describes the CMS apparatus. Section 3 describes the data and simulation samples utilized in the analysis, while Section 4 discusses the object identification, event reconstruction and selection. The extraction of the ttH signal is discussed in Section 5, followed by a description of the impact of systematic uncertainties encountered in the analysis in Section 6. The results of this search are reported in Section 7 and followed by a summary in Section 8.

The CMS detector
The CMS detector consists of the following main components. A superconducting solenoid occupies the central region of the CMS detector, providing an axial magnetic field of 3.8 T parallel to the beam direction. The silicon pixel and strip tracker, the crystal electromagnetic calorimeter and the brass/scintillator hadron calorimeter are located in concentric layers within the solenoid. These layers provide coverage out to |η| = 2.5, where pseudorapidity is defined as η = − ln [tan (θ/2)]. A quartz-fiber Cherenkov hadron forward calorimeter extends further to |η| < 5.2. The CMS experiment uses a right-handed coordinate system, with the origin at the nominal interaction point, the x axis pointing to the center of the LHC ring, the y axis pointing up (perpendicular to the LHC plane), and the z axis along the counterclockwise beam direction. The polar angle θ is measured from the positive z axis and the azimuthal angle φ is measured in the x-y plane in radians. Muons are detected by gas-ionization detectors embedded in the steel flux return yoke outside the solenoid. The first level of the CMS trigger system, composed of custom hardware processors, is designed to select the most interesting events in less than 3 µs using information from the calorimeters and muon detectors. The high-level trigger processor farm further decreases the event rate to a few hundred Hz for data storage. More details about the CMS detector can be found in Ref. [11].

Data and simulation samples
This search is performed with samples of proton-proton collisions at √ s = 7 TeV and 8 TeV, collected with the CMS detector in 2011 and 2012, respectively. These data correspond to a total integrated luminosity of 5.0 fb −1 at 7 TeV and 5.1 fb −1 at 8 TeV.
All background and signal processes are modeled using Monte Carlo (MC) simulations from MADGRAPH 5.1.1 [12], PYTHIA 6.4.24 [13], and POWHEG 1.0 [14] event generators, depending on the physics process. The MC samples use CTEQ6L1 [15] parton distribution functions (PDFs) of the proton, except for the POWHEG samples, which use CTEQ6M. The ttH signal events are generated using PYTHIA. The main background tt sample is generated with MAD-GRAPH, with matrix elements corresponding to up to three additional partons which are then matched to parton showers produced by PYTHIA. The additional partons generated with the tt sample include b and c quarks in addition to light flavored quarks and gluons. Decays of τ leptons are handled with TAUOLA 2.75 [16]. MADGRAPH is also used to simulate ttW, ttZ, W + jets, and Drell-Yan (DY) processes, with up to 4 partons in the final state. The DY contribution includes all Z/γ * → processes with the dilepton invariant mass m > 10 GeV. Singletop production is modeled with the next-to-leading order (NLO) generator POWHEG combined with PYTHIA. Electroweak diboson processes (WW, WZ, and ZZ) are simulated using PYTHIA.
Effects from additional pp interactions in the same bunch crossing (pileup) are modeled by adding simulated minimum-bias events (generated with PYTHIA) to the simulated processes. The CMS detector response is simulated using the GEANT4 software package [40]. The pileup multiplicity distribution in MC is reweighted to reflect the luminosity profile of the observed pp collisions. We apply an additional correction factor to account for residual differences in the jet transverse momentum (p T ) spectrum due to pileup; the event-by-event correction factor is based on the difference between simulation and data in the distribution of the scalar sum of the transverse momenta of the jets in the event. We include a systematic shape uncertainty in association with this correction factor. In addition to correcting the MC due to pileup, we also apply jet energy resolution corrections [41] and lepton and trigger efficiency scale factors to the MC events.

Event reconstruction and selection
This analysis selects events consistent with the production of a Higgs boson in association with a top-quark pair (see Fig. 1). In the SM, the top quark is expected to decay to a W boson and a bottom quark nearly 100% of the time. Hence different tt decay modes can be identified according to the subsequent decays of the W bosons. Here we consider two tt decay modes: the lepton+jets mode (tt → νqq bb), where one W boson decays leptonically, and the dilepton mode (tt → + ν − νbb), where both W bosons do so. For the lepton+jets case, we select events containing an energetic, isolated, electron or muon, and at least four energetic jets, two or more of which should be identified as originating from a b quark (b-tagged) [42]. For the dilepton case, we require a pair of oppositely charged energetic leptons (two electrons, two muons, or one electron and one muon) and two or more jets, with at least two of the jets being b-tagged.
Object reconstruction is based on the particle flow (PF) algorithm [43], which combines the information from all CMS subdetectors to identify and reconstruct individual objects including muons, electrons, photons, and charged and neutral hadrons produced in an event. To minimize the impact of pileup, charged particles are required to originate from the primary vertex, which is identified as the reconstructed vertex with the largest value of Σp 2 T , where the summation includes all tracks associated with that vertex. In both channels, a significant amount of missing transverse energy (E miss T ) should be present due to the presence of neutrinos, however no explicit requirement on the E miss T is used in the event selection. The E miss T vector is calculated as the negative of the vectorial sum of the transverse momenta of all particles. For both chan- Figure 1: A leading-order Feynman diagram for ttH production, illustrating the two top-quark pair system decay channels considered here, and the H → bb decay mode for which the analysis is optimized. nels, we use a common set of criteria for selecting individual objects (electrons, muons, and jets) which is described below.
In the lepton+jets channel, the data were recorded with triggers requiring the presence of either a single muon or electron. The trigger muon candidate was required to be isolated from other activity in the event and to have p T > 24 GeV for both the 2011 and 2012 data-taking periods. In 2011, the trigger electron candidate was required to have transverse energy E T > 25 GeV and to be produced in association with at least three jets with p T > 30 GeV, whereas in 2012, a singleelectron trigger with minimum E T threshold of 27 GeV was used. In the dilepton channel, the data were recorded with triggers requiring any combination of electrons and muons, one lepton with p T > 17 GeV and another with p T > 8 GeV. The offline object selection detailed below is designed to select events in the plateau of the trigger efficiency turn-on curve.
Muons are reconstructed using information from the tracking detectors and the muon chambers [44]. Tight muons must satisfy additional quality criteria based on the number of hits associated with the muon candidate in the pixel, strip, and muon detectors. For lepton+jets events, tight muons are required to have p T > 30 GeV and |η| < 2.1 to ensure the full trigger efficiency. For dilepton events, tight muons are required to have p T > 20 GeV and |η| < 2.1. Loose muons in both channels are required to have p T > 10 GeV and |η| < 2.4. The muon isolation is assessed by calculating the scalar sum of the p T of charged particles from the same primary vertex and neutral particles in a cone of ∆R = (∆η) 2 + (∆φ) 2 = 0.4 around the muon direction, excluding the muon itself; the resulting sum is corrected for the effects of neutral hadrons from pileup interactions. The ratio of this corrected isolation sum to the muon p T is the relative isolation of the muon. For tight muons, the relative isolation is required to be less than 0.12. For loose muons, this ratio must be less than 0.2.
Electrons are reconstructed using both calorimeter and tracking information [45]. Any electron that can be paired with an oppositely charged particle consistent with the conversion of an energetic photon is rejected. Tight electrons in lepton+jets events are required to have E T > 30 GeV, while in dilepton events they must have E T > 20 GeV. Loose electrons must have E T > 10 GeV. All electrons are required to have |η| < 2.5. Electrons that fall into the transition region between the barrel and endcap of the electromagnetic calorimeter (1.442 < |η| < 1.566) are rejected because the reconstruction of an electron object in this region is not optimal. The isolation for electrons is calculated in a similar manner to muon isolation; however, for electrons the isolation sum is calculated in a cone of ∆R = 0.3. In the same way as for muons, the relative isolation is the ratio of this corrected isolation sum to the electron E T . Tight electrons must have a relative isolation less than 0.1, while loose electrons must have a relative isolation less than 0.2.
In both channels of this search, all events are required to contain at least one tight lepton, either a muon or an electron. The second lepton in the dilepton channel may be loose or tight, while in the lepton+jets channel events with a second loose lepton are rejected to ensure the same events do not enter both channels.
Jets are reconstructed by clustering the charged and neutral PF particles using the anti-k T algorithm with a distance parameter of 0.5 [46,47]. Particles identified as isolated muons and electrons are expected to come from W decays and are excluded from the clustering. Non-isolated muons and electrons are expected to come from b-decays and are included in the clustering. The momentum of a jet is determined from the vector sum of all particle momenta in the jet candidate and is scaled according to jet energy corrections, based on simulation, jet plus photon data events and dijet data events [41]. Charged PF particles not associated with the primary event vertex are ignored when reconstructing jets. The neutral component coming from pileup events is removed by applying a residual energy correction following the area-based procedure described in Refs. [48,49]. In the lepton+jets channel, we require at least three jets with p T > 40 GeV and a fourth jet with p T > 30 GeV. In the dilepton analysis, we require at least two jets with p T > 30 GeV. All jets must have a pseudorapidity in the range |η| < 2.4.
Jets are identified as originating from a b quark using the combined secondary vertex (CSV) algorithm [42]. This algorithm combines information about the impact parameter of tracks and reconstructed secondary vertices within the jets in a multivariate algorithm designed to separate jets containing the decay products of bottom-flavored hadrons from jets originating from charm quarks, light quarks, or gluons. The CSV algorithm provides a continuous output discriminant; high values of the CSV discriminant indicate that the jet is more consistent with being a b jet, while low values indicate the jet is more likely a light-quark jet. To select b-tagged jets, a selection is placed on the CSV discriminant distribution such that the efficiency is 70% (20%) for jets originating from a b (c) quark and the probability of tagging jets originating from light quarks or gluons is 2%. In addition, the CSV discriminant values for the selected jets are used in the signal extraction as described in Section 5. For MC events, the CSV discriminant values of each jet are adjusted so that the proportion of b jets, c jets, and light-quark jets of different η and p T values passing each of three CSV working points (tight,medium, and loose) is the same in data and MC. The adjustment factor is computed using a linear interpolation between CSV working points. Figure 2 shows the jet and b-tagged jet multiplicities for events selected in the lepton+jets channel. For both lepton+jets and dilepton channels, signal ttH events are generally characterized by having more jets and more tags than the background processes. To increase the sensitivity of this analysis, we separate the selected events into different categories based on the number of jets and tags. For lepton+jets events, we use the following seven categories: ≥6 jets + 2 b-tags, 4 jets + 3 b-tags, 5 jets + 3 b-tags, ≥6 jets + 3 b-tags, 4 jets + 4 b-tags, 5 jets + ≥4 b-tags, and ≥6 jets + ≥4 b-tags. For dilepton events, only two categories are used: 2 jets + 2 b-tags and ≥3 jets + ≥3 b-tags. Tables 1-3 show the predicted signal, background, and observed yields in each category for the lepton+jets and dilepton channels. Background estimates are obtained from MC after the appropriate corrections and scale factors have been applied, as described above. Given the event selection criteria and the large jet and b-tag multiplicity requirements in the lepton+jets channel, the background from QCD multijet production is negligible. Uncertainties in signal and background yields include both statistical and systematic sources. Sources of systematic uncertainty are described in Section 6. In Tables 1-3, the tt + jets background is separated into the tt + bb, tt + cc, and tt+light flavor (l f ) components. The categories with higher jet and tag multiplicities are the most sensitive to signal. We include less sensitive categories in order to better constrain the background.
The choice of event selection categories outlined above is optimized for the H → bb decay mode. However, in the higher end of our search range-including m H = 125 GeV-other decay modes, especially WW and ττ, can have significant standard model branching fractions. For the purposes of this search, we define any ttH event as signal, regardless of the Higgs boson decay. For most of the event selection categories defined above, the contribution from the decay modes other than H → bb is less than 10%. The largest contribution from the nonbb decay modes arises in the ≥6 jets + 2 b-tags lepton+jets category where almost 50% of the events come from decay modes other than H → bb. In that category H → WW dominates the non-bb contribution. With the current optimization, the impact of the non-bb decay modes to the analysis sensitivity is negligible as the contribution from H → bb in the most sensitive categories is > 95%.

Signal extraction
Artificial neural networks (ANNs) [50] are used in all categories of the analysis to further discriminate signal from background and improve signal sensitivity. Separate ANNs are trained for each jet-tag category, and the choice of input variables is optimized for each as well. The ANN input variables considered are related to object kinematics, event shape, and the discriminant output from the b-tagging algorithm. A total of 24 input variables has been considered and are listed in column 1 of Table 4. The inputs are selected from a ranked list based on initial separation between signal and background. The separation of the individual variables is evaluated using a separation benchmark S 2 [51] defined as follows: where y is the input variable, andŷ S andŷ B are the signal and background probability density functions for that input variable in the signal and background samples, respectively. The maximum number of input variables considered is determined by the statistics in the simulated samples used for ANN training. The number of variables per category is determined by reducing the number of variables until the minimum number of variables needed to maintain roughly the same ANN performance is reached. In the lepton+jets categories, the use of approximately 10 input variables yields stable performance; using fewer inputs exhibits degraded discrimination power, and using more inputs exhibits little improvement in performance in most categories. A similar exercise was done for the dilepton categories. The choice of input variables for each jet-tag category used in the 8 TeV analysis is summarized in Table 4; the input variables for each category in the 7 TeV analysis are very similar. The input variables used in the ANN can be broken down into several classes, as detailed below. Entries The first class of variables are those that are basic kinematic properties of single objects in the event or combinations of objects. These variables include the p T of the leading four jets, and the p T and mass of the system defined by the vector sum of the lepton(s) momenta, the E miss T vector, and the momenta of the jets in the event (p T ( , E miss T , jets) and M( , E miss T , jets), respectively), all of which favor larger values for ttH signal than for the backgrounds. The number of jets is used in the ≥3 jets + ≥3 b-tags category in the dilepton analysis since ttH signal favors larger jet multiplicity than background.
A related class of variables involves looking at the kinematic properties of pairs of jets. The H → bb decay produces jets that have a large invariant mass even if the jets fail the b-tag selection. Other untagged jets in the event tend to come from hadronic W decay and initialor final-state radiation, and tend to have a small invariant mass compared to the jets from the Higgs boson decay. For this reason, some signal discrimination is provided by examining the invariant mass of pairs of untagged jets in lepton+jets categories with six or more jets but fewer than four b-tagged jets.
Likewise, the 6-jet category with four or more tags uses two variables that rely specifically on the H → bb hypothesis: the invariant mass of the tagged-jet pair with the smallest opening angle (M((j tag m , j tag n ) closest )), and the "best Higgs mass" (M((j tag m , j tag n ) best )), the invariant mass constructed from the two tagged jets least likely to be a part of the tt system as determined by a minimum χ 2 search among all the jet, lepton, and E miss T combinations in the event, using the W and top masses as kinematic constraints. The M((j tag m , j tag n ) closest ) distribution for both signal and background has a peak near the same value; however, the distribution is wider in the case of signal, offering some discriminating power. In signal events, the "best Higgs mass" is highly correlated with the Higgs boson mass. Although the peak is broadened by events where the wrong jets are associated with the Higgs boson decay, this variable still provides some power in discriminating signal from background. The ≥6 jets + ≥4 b-tags uses 11 variables instead of the typical 10 because it was shown that the addition of the "best Higgs mass" variable, uniquely designed for this jet-tag category, offers a non-negligible increase in expected ANN performance.
Another class of variables exploits differences in the "shape" of events between signal and background. In general, production of an extra massive object, in addition to top quarks tends to make ttH events more spherical in shape, while the background events are more collimated or have more jet activity. Variables in this class include angular correlations, like the opening angle between the tagged jets (∆R(j tag m , j tag n )) or between the lepton and closest jet (∆R( , j closest )), where in the dilepton analysis the angle is calculated with respect to the lepton leading in p T . More complex event shape variables like sphericity and aplanarity [52], as well as the Fox-Wolfram moments H 0 , H 1 , H 2 , H 3 [53], also exhibit differences between signal and background.
The last class of variables used in the ANN involves the CSV discriminant values of the tagged jets. The signal events tend to have more b jets than the dominant tt + jets background. Beyond the simple multiplicity of tagged jets we can, however, exploit the overall b-jet content of the signal in several ways. For instance, the average and squared-deviation from this average of the CSV discriminant values for the tagged jets (µ CSV , (σ CSV n ) 2 for the n-th tagged jet) are powerful variables. Events with genuine b jets will have higher average CSV discriminant values and the b jets themselves will have CSV values more tightly clustered around high values than those from light-flavour or charm jets which are tagged.
Using the procedure discussed above, different variables are chosen for use in each of the different event selection categories. This is motivated by the fact that although the tt+jets background is dominant throughout, the kinematics of the events can be very distinct in different jet multiplicity bins. Similarly, the tagging discriminant of the b jets clearly is different in events with 2, 3 or ≥4 b-tags. Finally, the overall breakdown of the tt+jets background into tt + bb, tt + cc and tt+light-flavor is different across the jet-tag categories, implying different variables will be more effective in some categories than others.
In nearly all event selection categories, the variables that discriminate best between signal and background directly involve b-tagging information, such as the average CSV output value for b-tagged jets. This is natural, since the largest fraction of the backgrounds in all categories involve events with fewer b jets than the ttH generally has. However, when considering specifically the tt + bb, a background very similar to the signal, the b-tagging information alone is not as powerful, and additional information from kinematic variables and angular correlations, such as the minimum ∆R between all pairs of b-tagged jets, become important. Even so, the tt + bb background remains difficult to separate from the ttH signal. Figures 3 through 5 show the variables used in the ANN for the 5 jets + 3 b-tags category (lep-ton+jets channel) and the 2 jets + 2 b-tags (dilepton channel). The 5 jets + 3 b-tags category is chosen for lepton+jets as a compromise between signal sensitivity and adequate statistics for display purposes. Also shown, in Figure 6, are data-to-simulation comparisons of the best input variables for each jet-tag category considered in the 8 TeV analysis. The data-to-simulation ratio plots in Figures 3 through 6 show that, within uncertainties, the simulation reproduces well the shape and normalization of the distributions of the variables used in the ANN before the final maximum likelihood fit is performed (as discussed in Section 7). Correlations between input variables are also well reproduced by simulation.
For ANN training, we use ttH (m H = 120 GeV) as the signal and tt+jets as the background, such that there is an equal amount of both for each category. The mass m H = 120 GeV sample was chosen in the analysis of the 7 TeV data before the observation of a Higgs-like particle at m H = 125 GeV was announced. This mass point was preserved in the 8 TeV ANN training for consistency. The signal and background events used to train an ANN are split in half: one half is used to do the training itself, while the other is used as an independent test sample to monitor performance during training. The ANN method used is the "multilayer perceptron", available as part of the TMVA [51] package in ROOT [54]. A multilayer perceptron is a specific kind of neural network in which the neurons in each layer only have connections to neurons in the following layer. The network architecture used here consists of two hidden layers, with N neurons in the first layer and N − 1 neurons in the second layer, where N is the number of input variables. Standard tests were completed during ANN training to look for evidence of overtraining; no such evidence was found in any jet-tag category, providing confidence that our training statistics were satisfactory given the number of input variables used in each.
The ANN output provides better discrimination between signal and background than any one of the input variables individually. Figures 7 and 8 show the ANN output for all the categories of the lepton+jets channel in 7 TeV and 8 TeV data, respectively, and Figs. 9 and 10 show output distributions for dilepton events. We use these ANN output distributions for the signal extraction as described in Section 7. Table 5 lists the systematic uncertainties that affect signal and background yields, the shape of the ANN output, or both. The effects of these uncertainties are evaluated specifically for each event selection category, and the effects from the same source are treated as completely correlated across the categories. The impact on the rate is the relative change in expected yield      Figure 9: The distributions of the ANN output for dilepton events at 7 TeV in the various analysis categories. The left plot shows events with 2 jets + 2 b-tags and right plot shows events with ≥3 jets + ≥3 b-tags. The background is normalized to the SM expectation; the uncertainty (shown as a hatched band in the stack plot and a green band in the ratio plot) band includes statistical and systematic uncertainties that affect both the rate and shape of the background distributions. The ttH signal (m H = 125 GeV) is normalized to 300 or 30 × SM expectation for the 2 jets + 2 b-tags and the ≥3 jets + ≥3 b-tags categories, respectively.  Figure 10: The distributions of the ANN output for dilepton events at 8 TeV in the various analysis categories. The left plot shows events with 2 jets + 2 b-tags and right plot shows events with ≥3 jets + ≥3 b-tags. The background is normalized to the SM expectation; the uncertainty (shown as a hatched band in the stack plot and a green band in the ratio plot) includes statistical and systematic uncertainties that affect both the rate and shape of the background distributions. The ttH signal (m H = 125 GeV) is normalized to 300 or 30 × SM expectation for the 2 jets + 2 btags and the ≥3 jets + ≥3 b-tags categories, respectively. due to each uncertainty. Some sources of uncertainty affect predicted yields for all processes in each category uniformly, while in some cases the uncertainty affects the predicted yield of some processes in certain categories more than others; in the latter cases the range of the effect on the predicted yield is given across all processes in all categories. Hence large relative rate changes listed in Table 5 can typically be attributed to processes with small expected yields in a single category that change significantly when considering a source of uncertainty. Lepton identification and trigger efficiency uncertainties were found to have a small impact on the analysis. The uncertainties were estimated by comparing variations in the difference in performance between data and MC simulation using a high-purity sample of Z-boson decays. The largest variations were at most 4% for a small fraction of events, such as electrons at low p T . The analysis conservatively uses 4% uncertainty on the lepton scale overall. To ascertain the effects of the uncertainty on the pileup distribution, the cross section used to predict the distribution of pileup interactions in MC is varied by 8% from its nominal value, and the resulting change in the number of pileup interactions is propagated through the analysis. The systematic uncertainty due to the additional pileup correction, based on the scalar sum of the p T of the jets, is evaluated by doubling or removing the correction applied. The uncertainty on the luminosity estimate corresponding to the 7 TeV dataset is 2.2% [55] and, for the 8 TeV dataset, 4.4% [56].

Systematic uncertainties
The uncertainty from the jet energy scale [41] is evaluated by varying the energy scale for all jets in the signal and background predictions up and down by one standard deviation as a function of jet p T and η and re-evaluating the yields and ANN shapes of all processes. Similarly, the uncertainty on the jet energy resolution is obtained by varying the jet energy resolution correction up and down by one standard deviation, although in this case the effect on shape is negligible and therefore not included.
The b-tagging scale factor corrects the b-tagging efficiency in simulation to match that mea-sured in data [42]. The uncertainty on this scale factor is evaluated by varying it up and down by one standard deviation and the new CSV output value corresponding to that uncertainty is recalculated. This new CSV value is used to determine both the number of tags associated with that systematic and the new shape of variables that use the CSV output, such as the average CSV value for b-tagged jets. This uncertainty affects both rate and shape estimates. Since the b-tagging scale factor uncertainty affects the ANN shape differently for events with different number of jets or number of b-tagged jets, we conservatively assume no correlations among all the categories.
We account for the effect of background MC statistics in our analysis using the approach described in [57,58]. To make the limit computation more efficient and stable, we do not evaluate this uncertainty for any bin in the ANN shapes for which the MC statistical uncertainty is negligible compared to the data statistics or where there is no appreciable contribution from signal. In total, there are 64 nuisance parameters used to describe the MC statistics for the 8 TeV results, but only five are needed for 7 TeV, due to the larger MC statistics available for those samples.
Tests show that the effect of neglecting bins as described above is smaller than 5%.
Theoretical uncertainties on the cross sections used to predict the rates of various processes are propagated to the yield estimates. All rates are estimated using cross sections of at least NLO accuracy, which have uncertainties arising primarily from PDFs and the choice of factorization and renormalization scales. The cross section uncertainties are each separated into their PDF and scale components and correlated where appropriate between processes. For example, the PDF uncertainty for processes originating primarily from gluon-gluon initial states, e.g., tt and ttH production, are treated as 100% correlated.
In addition, for the tt + jets (including tt + bb and tt + cc) and the V+jets processes, the inclusive NLO or better cross section prediction are extrapolated to exclusive rates for particular jet or tag categories using the MADGRAPH tree-level matrix element generator matched to the PYTHIA parton shower MC program. Although MADGRAPH incorporates contributions from higherorder diagrams, because it does so only at tree-level, it is subject to fairly large uncertainties arising from the choice of scale. These uncertainties are evaluated using samples for which the factorization and renormalization scales have been varied up and down by a factor of two. The rate uncertainty arising from this source varies with the number of additional jets in the production diagram, and is larger for events with more jets. The effect of scale variations on the ANN output shape is also included for the tt + jets sample. Scale variations are treated as uncorrelated for the tt+light flavour, tt + bb, and tt + cc components to cover the uncertainty in the relative yields of those processes; the impact on the ANN output shape from scale variation in the V+jets processes is neglected, since this contribution is small in most categories. The scale variations for W + jets and Z + jets are treated as correlated with each other, but uncorrelated with tt + jets.
As the background due to the tt + bb contribution is very similar to the signal, the uncertainty on its rate and shape will have a substantial impact on our search. Due to the lack of more accurate higher order theoretical predictions for this process, we estimated this background and assessed its uncertainty based on the inclusive tt sample and the most important contribution to the uncertainty comes from the factorization and renormalization scale systematics.
Neither control region studies nor higher-order theoretical calculations [59] can currently constrain the normalization of the tt + bb contribution to better than 50% accuracy. Therefore, to be conservative, an extra 50% rate uncertainty is assigned to tt + bb for both 7 TeV and 8 TeV.

Results
A maximum likelihood fit is performed on the ANN output distributions from the nine jet-tag categories considered in the analysis. We consider the model including the SM backgrounds and a Higgs boson signal, as well as a model with only SM backgrounds but no Higgs boson signal. As we currently lack sensitivity to detect a SM Higgs boson signal, and observe no significant excess in the data, we focus here on setting 95% confidence level (CL) upper limits on the possible presence of a SM-like signal.
The statistical methodology employed by this analysis is identical to that used for other CMS searches [2,60,61]. In brief, we use a modified frequentist CL s [62,63] approach in which the test statistic involves the ratio of the likelihood functions constructed from the background expectations plus the SM Higgs boson signal scaled by an arbitrary parameter µ, where µ ≥ 0. The parameter µ = σ/σ SM is the ratio of the cross section of our signal process (σ) to the expected SM Higgs boson cross section (σ SM ). The likelihood function describes the expected yield of signal and background in bins of the ANN output for each event selection category. The systematic uncertainties described in Section 6 are incorporated into the likelihood by means of nuisance parameters that affect each background's rate, shape or both. Shape variations are handled by means of template morphing. A vertical template morphing approach is used where the shapes are smoothly interpolated between the ±1σ varied shapes and linearly extrapolated outside that region. This is the standard template morphing approach used by all CMS Higgs analyses. As appropriate for the frequentist approach taken here, the nuisance parameters are profiled during the limit extraction. The nuisance parameter correlations are implemented in a way that accounts for event migrations between the selection categories. Furthermore, in cases involving shape systematics, where high-statistics, background-rich categories might overconstrain certain systematic effects in the lower-statistics, higher-sensitivity categories, we take the approach of decorrelating the nuisance parameters to avoid overly aggressive constraints.
When combining the results from the 7 TeV and 8 TeV datasets, the proper correlation in systematic effects must be represented in the nuisance parameter choices. Given that for all theoretical predictions and many experimental uncertainties, exactly the same calculation or calibration is applied to the 7 TeV and 8 TeV datasets, the associated systematic uncertainties are treated as completely correlated and a single nuisance parameter is used to implement the effect. There are two exceptions to this approach. The luminosity is evaluated separately for the two analyses and the dominant uncertainties are largely independent, so the luminosity uncertainty is treated as uncorrelated between 7 TeV and 8 TeV. Furthermore, as separate MC samples are used for the two datasets, the MC statistical uncertainties are treated as uncorrelated between the two datasets.
Background-dominated categories are used to constrain the fitted background contributions in the signal-enhanced categories. The prediction from the fit for the composition of the selected sample in each category more accurately describes the data than the prediction directly from simulation, and the uncertainties on the final composition are reduced. The resulting distributions are driven by the shape from tt+light flavor, the dominant background in each category. No significant excesses of data above the background-only predictions are observed, and we use our statistical treatment to extract upper limits on the amount of ttH production consistent with our data. Figure 11 shows the 95% CL upper limit on the ratio µ of the ttH cross section with respect to that predicted by the SM as a function of m H for the 7 TeV and 8 TeV samples, separately, combining both lepton+jets and dilepton channels in each dataset. Figure 12 shows the upper limit obtained by combining both data samples. Table 6 shows the expected and observed limits for 7 TeV, 8 TeV, and combined analysis, using both the lepton+jets and the dilepton channels. The expected limit is extracted from the background-only hypothesis with no Higgs signal present. In addition to the median expected limit, the bands that contain 68% (1 standard deviation) and 95% (2 standard deviations) around the median are also quoted. The median expected limit for a Higgs boson mass of 125 GeV is 5.2 × σ SM while the observed limit is 5.8 × σ SM .
As a cross check, we extracted the limit using the best single variable according to Table 4 and plotted in Fig. 6 instead of the ANN output. Otherwise, the analysis was performed in exactly the same way as the version based on the ANN, including the event selection categories, systematic uncertainties, and treatment of the nuisance parameters. The resulting median expected limit, for a Higgs boson mass of 125 GeV is 6.6 × σ SM , approximately 27% higher than the limit obtained with the ANN. The primary reason for this decrease in sensitivity is the loss of separating power and the increased susceptibility to individual systematic effects coming from using fewer variables. The observed limit obtained using the best single variable analysis is 10.4 × σ SM , which is beyond the 68% CL range (on µ) of the expected ([5.0, 9.2]) but within the 95% CL range ([4.0, 12.7]).   Figure 11: The observed and expected 95% CL upper limits on the signal strength parameter µ = σ/σ SM for lepton+jets and dilepton channels combined using the 2011 dataset at 7 TeV (above) and the 2012 dataset at 8 TeV (below).

Summary
A search for the standard model Higgs boson produced in association with a top-quark pair has been performed at the CMS experiment using data samples corresponding to an integrated luminosity of 5.0 fb −1 (5.1 fb −1 ) collected in pp collisions at the center-of-mass energy of 7 TeV (8 TeV). Events are considered where the top-quark pair decays to either one lepton+jets (tt → νqq bb) or dileptons (tt → + ν − νbb), being an electron or a muon. The search has been optimized for the decay mode H → bb, however sensitivity to other decay modes has been preserved. Artificial neural networks are used to discriminate between signal and background events. Combining the results from the 7 TeV and 8 TeV samples, the observed (expected) limit on the cross section for Higgs boson production in association with top-quark pairs for a Higgs boson mass of 125 GeV is 5.8 (5.2) times the standard model expectation. This is the first such search at the LHC.