Search for the associated production of the Higgs boson with a top-quark pair

A search for the standard model Higgs boson produced in association with a top-quark pair tt¯H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \left(\mathrm{t}\overline{\mathrm{t}}\mathrm{H}\right) $$\end{document} is presented, using data samples corresponding to integrated luminosities of up to 5.1 fb−1 and 19.7 fb−1 collected in pp collisions at center-of-mass energies of 7 TeV and 8 TeV respectively. The search is based on the following signatures of the Higgs boson decay: H → hadrons, H → photons, and H → leptons. The results are characterized by an observed tt¯H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathrm{t}\overline{\mathrm{t}}\mathrm{H} $$\end{document} signal strength relative to the standard model cross section, μ=σ/σSM,under the assumption that the Higgs boson decays as expected in the standard model. The best fit value is μ = 2.8 ± 1.0 for a Higgs boson mass of 125.6 GeV.

The results of a search for ttH production using the CMS detector [18] at the LHC are described in this paper. The small ttH production cross section-roughly 130 fb at √ s = 8 TeV [19][20][21][22][23][24][25][26][27][28]makes measuring its rate experimentally challenging. Therefore, it is essential to exploit every accessible experimental signature. As the top quark decays with nearly 100% probability to a W boson and a b quark, the experimental signatures for top-quark pair production are determined by the decay of the W boson. When both W bosons decay hadronically, the resulting final state with six jets (two of which are b-quark jets) is referred to as the all-hadronic final state. If one of the W bosons decays leptonically, the final state with a charged lepton, a neutrino, and four jets (two of which are b-quark jets) is called lepton + jets. Finally, when both W bosons decay leptonically, the resulting dilepton final state has two charged leptons, two neutrinos, and two b-quark jets. All three of these top-quark pair signatures are used in the search for ttH production in this paper. Although in principle, electrons, muons, and taus should be included as "charged leptons," experimentally, the signatures of a tau lepton are less distinctive than those of the electron or muon. For the rest of this paper, the term "charged lepton" will refer only to electrons or muons, including those coming from tau lepton decays.
Within the SM, the observed mass of the Higgs boson near 125 GeV [9,29,30] implies that a variety of Higgs boson decay modes are experimentally accessible. At this mass, the dominant decay mode, H → bb, contributes almost 60% of the total Higgs boson decay width. The next largest contribution comes from H → WW with a branching fraction around 20%. Several Higgs boson decay channels with significantly smaller branching fractions still produce experimentally accessible signatures, especially H → γγ, H → ττ, and H → ZZ.
The experimental searches for ttH production presented here can be divided into three broad categories based on the Higgs boson signatures: H → hadrons, H → photons, and H → leptons. There are two main Higgs boson decay modes that contribute to the H → hadrons searches: H → bb and H → ττ, where both τ leptons decay hadronically. Note that events with τ pairs include both direct H → ττ decays and those where the τ leptons are produced by the decays of W or Z bosons from H → WW and H → ZZ decays. Events used in the H → hadrons searches have one or more isolated charged lepton from the W boson decays from the top quarks, which means these searches focus on the lepton + jets and dilepton tt final states, using single-lepton or dilepton triggers, respectively. Multivariate analysis (MVA) techniques are employed to tag the jets coming from b-quark or τ-lepton decays and to separate ttH events from the large tt+jets backgrounds.
In contrast, the H → photons search focuses exclusively on the H → γγ decay mode. In this case, the photons provide the trigger, and all three tt decay topologies are included in the analysis. The CMS detector's excellent γγ invariant mass resolution [31] is used to separate the ttH signal from the background, and the background model is entirely based on data.
Finally, in the H → leptons search, the leptons arise as secondary decay products from H → WW, H → ZZ, and H → ττ decays, as well as from the W bosons produced in the top quark decays. To optimize the signal-to-background ratio, events are required to have either a pair of same-sign charged leptons, or three or more charged leptons. The events are required to pass the dilepton or trilepton triggers. Multivariate analysis techniques are used to separate leptons arising from W-boson, Z-boson and τ-lepton decays, referred to as signal leptons, from background leptons, which come from b-quark or c-quark decays, or misidentified jets. MVA techniques are also used to distinguish ttH signal events from background events that are modeled using a mixture of control samples in data and Monte Carlo (MC) simulation. Table 1 summarizes the main features of each search channel described above.
To characterize the strength of the ttH signal relative to the SM cross section (µ = σ/σ SM ) a fit is performed simultaneously in all channels. The fit uses specific discriminating distributions in each channel, either a kinematic variable like the diphoton invariant mass in the H → photons channel or an MVA discriminant as in the H → hadrons and H → leptons cases. The uncertainties involved in the background modeling are introduced in the fit as nuisance parameters, so that the best-fit parameters provide an improved description of the background. Table 1: Summary of the search channels used in the ttH analysis. In the description of the signatures, an refers to any electron or muon in the final state (including those coming from leptonic τ decays). A hadronic τ decay is indicated by τ h . Finally, j represents a jet coming from any quark or gluon, or an unidentified hadronic τ decay, while b represents a b-quark jet. Any element in the signature enclosed in square brackets indicates that the element may not be present, depending on the specific decay mode of the top quark or Higgs boson. The minimum transverse momentum p T of various objects is given to convey some sense of the acceptance of each search channel; however, additional requirements are also applied. Jets labeled as btagged jets have been selected using the algorithm described in section 4. More details on the triggers used to collect data for each search channel are given in section 3. Selection of final-state objects (leptons, photons, jets, etc.) is described in general in section 4, with further channel-specific details included in sections 5-7. In this table and the rest of the paper, the number of b-tagged jets is always included in the jet count. For example, the notation 4 jets + 2 b-tags means four jets of which two jets are b-tagged. Trielectron 1 e/µ, p T > 10 GeV 2 e(µ), p T > 7(5) GeV ≥2 jets + ≥1 b-tags, p T > 25 GeV

Data and simulation samples
This search is performed with samples of proton-proton collisions at √ s = 7 TeV, collected with the CMS detector in 2011 (referred to as the 7 TeV dataset), and at √ s = 8 TeV, collected in 2012 (referred to as the 8 TeV dataset). All of the search channels make use of the full CMS 8 TeV dataset, corresponding to an integrated luminosity that ranges from 19.3 fb −1 to 19.7 fb −1 , with a 2.6% uncertainty [32]. The luminosity used varies slightly because the different search channels have slightly different data quality requirements, depending on the reconstructed objects and triggers used. In addition, the H → photons analysis makes use of data collected at √ s = 7 TeV, corresponding to an integrated luminosity of 5.1 fb −1 . Finally, the ttH search in the H → bb final state based on the 7 TeV dataset with an integrated luminosity of 5.0 fb −1 , described in Ref. [33], is combined with the 8 TeV analysis to obtain the final ttH result. The uncertainty on the 7 TeV luminosity is 2.2% [34]. In the H → hadrons and H → leptons analyses, events are selected by triggering on the presence of one or more leptons. For the H → photons analysis, diphoton triggers are used.
Single-lepton triggers are used for channels with one lepton in the final state. The singleelectron trigger requires the presence of an isolated, good-quality electron with transverse momentum p T > 27 GeV. The single-muon trigger requires a muon candidate isolated from other activity in the event with p T > 24 GeV. Dilepton triggers are used for channels with two or more leptons in the final state. The dilepton triggers require any combination of electrons and muons, one lepton with p T > 17 GeV and another with p T > 8 GeV. In the H → leptons analysis, a trielectron trigger is used, with minimum p T thresholds of 15 GeV, 8 GeV, and 5 GeV.
The H → photons analysis uses diphoton triggers with two different photon identification schemes. One requires calorimetric identification based on the electromagnetic shower shape and isolation of the photon candidate. The other requires only that the photon has a high value of the R 9 shower shape variable, where R 9 is calculated as the ratio of the energy contained in a 3×3 array of ECAL crystals centered on the most energetic deposit in the supercluster to the energy of the whole supercluster. The superclustering algorithm for photon reconstruction is explained in more detail in section 4. The E T thresholds at trigger level are 26 (18) GeV and 36 (22) GeV on the leading (trailing) photon depending on the running period. To maintain high trigger efficiency, all four combinations of thresholds and selection criteria are used.
Expected signal events and, depending on the analysis channel, some background processes are modeled with MC simulation. The ttH signal is modeled using the PYTHIA generator [ [36] tree-level matrix element generator, combined with PYTHIA for the parton shower and hadronization. For the H → leptons analysis, the rare WWZ, WWW, tt + γ+jets, and ttWW processes are generated similarly. Single top quark production (t+q, t+b, and t+W) is modeled with the next-to-leading-order (NLO) generator POWHEG 1.0 [37][38][39][40][41][42] combined with PYTHIA. Samples that include top quarks in the final state are generated with a top quark mass of 172.5 GeV. For the H → photons analysis, the gluon fusion (gg → H) and vector boson fusion (qq → qqH) production modes are generated with POWHEG at NLO, and combined with PYTHIA for the parton shower and hadronization. Higgs boson production in association with weak bosons (qq → WH/ZH) is simulated with PYTHIA. Samples generated with a leading order generator use the CTEQ6L1 parton distribution function (PDF) [43] set, while samples generated with NLO generators use the CTEQ6.6M PDF set [44].
The CMS detector response is simulated using the GEANT4 software package [45]. All events from data and simulated samples are required to pass the same trigger conditions and are reconstructed with identical algorithms to those used for collision data. Effects from additional pp interactions in the same bunch crossing (pileup) are modeled by adding simulated minimum bias events (generated with PYTHIA) to the generated hard interactions. The pileup interaction multiplicity distribution in simulation reflects the luminosity profile observed in pp collision data. Additional correction factors are applied to individual object efficiencies and energy scales to bring the MC simulation into better agreement with data, as described in section 4.

Object reconstruction and identification
A global event description is obtained with the CMS particle-flow (PF) algorithm [46,47], which optimally combines the information from all CMS sub-detectors to reconstruct and identify each individual particle in the pp collision event. The particles are classified into mutually exclusive categories: charged hadrons, neutral hadrons, photons, muons, and electrons. The primary collision vertex is identified as the reconstructed vertex with the highest value of ∑ p 2 T , where the summation includes all particles used to reconstruct the vertex. Although the separate ttH search channels share the same overall object reconstruction and identification approach, there are differences in some of the selection requirements. Generally speaking, the requirements in the H → hadrons channel are more stringent than in the H → photons or leptons because of the larger backgrounds in the first channel and the smaller amount of signal in the other ones.
Photon candidates are reconstructed from the energy deposits in the ECAL, grouping the individual clusters into a supercluster. The superclustering algorithms achieve an almost complete reconstruction of the energy of photons (and electrons) that convert into electron-positron pairs (emit bremsstrahlung) in the material in front of the ECAL. In the barrel region, superclusters are formed from five-crystal-wide strips in η, centered on the locally most energetic crystal (seed), and have a variable extension in φ. In the endcaps, where the crystals are arranged according to an x-y rather than an η-φ geometry, matrices of 5×5 crystals (which may partially overlap) around the most energetic crystals are merged if they lie within a narrow φ road. The photon candidates are collected within the ECAL fiducial region |η| < 2.5, excluding the barrel-endcap transition region 1.44 < |η| < 1.57 where photon reconstruction is sub-optimal. Isolation requirements are applied to photon candidates by looking at neighboring particle candidates reconstructed with the PF event reconstruction technique [46]. Additional details on photon reconstruction and identification can be found in Ref. [30].
Electrons with p T > 7 GeV are reconstructed within the geometrical acceptance of the tracker, |η| < 2.5. The reconstruction combines information from clusters of energy deposits in the ECAL and the electron trajectory reconstructed in the inner tracker [48-51]. The track-cluster matching is initiated either "outside-in" from ECAL clusters, or "inside-out" from track candidates. Trajectories in the tracker volume are reconstructed using a dedicated modeling of the electron energy loss and fitted with a Gaussian sum filter [48]. The electron momentum is determined from the combination of ECAL and tracker measurements. Electron identification relies on a multivariate technique that combines observables sensitive to the amount of bremsstrahlung along the electron trajectory, the spatial and momentum matching between the electron trajectory and associated clusters, and shower shape observables. In order to increase the lepton efficiency, the H → leptons analysis uses a looser cut on the multivariate discriminant than do the other analysis channels. Although the minimum p T requirement on electrons is p T > 7 GeV, the different ttH search channels, particularly the H → hadrons channel, use a higher threshold on some of the selected electrons depending on the trigger requirements and to help control backgrounds (see sections 5-7 for more details).
Muons are reconstructed within |η| < 2.4 and for p T > 5 GeV [52]. The reconstruction combines information from both the silicon tracker and the muon spectrometer. The matching between the inner and outer tracks is initiated either "outside-in", starting from a track in the muon system, or "inside-out", starting from a track in the silicon tracker. The PF muons are selected among the reconstructed muon track candidates by applying minimal requirements on the track components in the muon and tracker systems and taking into account matching with energy deposits in the calorimeters [53]. Depending on the level of backgrounds in a given analysis channel, different requirements can be placed on the distance of closest approach for the muon to the collision vertex-referred to as the impact parameter (IP)-in both the z−direction (d z ) and the x − y plane (d xy ) to reject background muons. As in the electron case, the p T threshold for some or all of the muons is set higher than the 5 GeV default, depending on the trigger requirements used by a particular search channel and to control backgrounds.
An important quantity for distinguishing signal and background leptons is isolation. Although conceptually similar, isolation is defined slightly differently for muons and electrons depending on the analysis channel. Muon isolation is assessed by calculating the sum of the transverse energy of the other particles in a cone of ∆R = (∆η) 2 + (∆φ) 2 = 0.4 around the muon direction, excluding the muon itself, where ∆η and ∆φ are the angular differences between the muon and the other particles in the η and φ directions. To correct for the effects of pileup, charged contributions not originating from the primary collision vertex are explicitly removed from the isolation sum, and the neutral contribution is corrected assuming a ratio of 0.5 for the contribution of neutral to charged objects to the pileup activity. The ratio of the corrected isolation sum to the muon p T is the relative isolation of the muon. For the H → leptons search, electron isolation is calculated identically to muon isolation. For the H → hadrons and H → photons searches, there are two differences. The first is that the electron isolation sum only takes into account charged and neutral particles in a cone of ∆R = 0.3. Second, the correction for pileup effects to the neutral contribution in the isolation sum is made using the average p T density calculated from neutral particles multiplied by the effective area of the isolation cone. The relative isolation is the ratio of this corrected isolation sum to the electron p T .
Jets are reconstructed by clustering the charged and neutral PF particles using the anti-k T algorithm with a distance parameter of 0.5 [54,55]. For the H → hadrons search, particles identified as isolated muons and electrons are expected to come from W-boson decays and are excluded from the clustering. Non-isolated muons and electrons are expected to come from b-quark decays and are included in the clustering. The H → leptons and H → photons searches do not exclude the isolated leptons from the jet clustering, but require selected jets to be separated by ∆R > 0.5 from the selected leptons. The choice not to exclude leptons from the clustered jets in the H → leptons search is an integral part of the non-prompt lepton rejection strategy. When a lepton is clustered into a jet, that information is used to help determine whether the lepton originated from a semileptonic decay of a heavy (bottom or charm) quark (see section 7 for more details).
Jets are required to have at least two PF constituents and more than 1% of their energy in both the electromagnetic and hadronic components to reject jets arising from instrumental effects. For the H → leptons and H → photons searches, additional requirements are applied to remove jets coming from pileup vertices [56]. For the H → hadrons and H → leptons analyses, charged PF particles not associated with the primary event vertex are ignored when clustering the jets to reduce the contribution from pileup. The momentum of the clustered jet is corrected for a variety of effects [57]. The component coming from pileup activity-in the case of H → hadrons or leptons, just the neutral part-is removed by applying a residual energy correction following the area-based procedure described in Refs. [58,59]. Further corrections based on simulation, γ/Z+jets data, and dijet data are then applied, as well as a correction to account for residual differences between data and simulation [57]. Selected jets are required to have |η| < 2.4, and p T > 25 GeV (H → leptons and H → photons) or p T > 30 GeV (H → hadrons). The higher p T requirement in the latter case arises from the larger amount of background in that sample.
Jets are identified as originating from a b-quark using the combined secondary vertex (CSV) algorithm [60,61] that utilizes information about the impact parameter of tracks and reconstructed secondary vertices within the jets in a multivariate algorithm. The CSV algorithm provides a continuous output ranging from 0 to 1; high values of the CSV discriminant indicate that the jet likely originates from a b quark, while low values indicate the jet is more consistent with light-flavor quarks or gluons. The efficiency to tag b-quark jets and the rate of misidentification of non-b-quark jets depend on the working point chosen. For the medium working point of the CSV algorithm, the b-tagging efficiency is around 70% (20%) for jets originating from a b (c) quark and the probability of mistagging for jets originating from light quarks or gluons is approximately 2%. For the loose working point, the efficiency to tag jets from b (c) quarks is approximately 85% (40%) and the probability to tag jets from light quarks or gluons is about 10%. These efficiencies and mistag probabilities vary with the p T and η of the jets, and the values quoted are indicative of the predominant jets in this analysis.
The hadronic decay of a τ lepton (τ h ) produces a narrow jet of charged and neutral hadronsalmost all pions. Each neutral pion subsequently decays into a pair of photons. The identification of τ h jets begins with the formation of PF jets by clustering charged hadron and photon objects via the anti-k T algorithm. Then, the hadron-plus-strips (HPS) [62,63] algorithm tests each of the most common τ h decay mode hypotheses using the electromagnetic objects found within rectangular bands along the azimuthal direction. In the general algorithm, combinations of charged hadrons and photons (one charged hadron, one charged hadron + photons, and three charged hadrons) must lead to invariant masses consistent with the appropriate intermediate resonances [63]. For this analysis, only the decays involving exactly one charged hadron are used.
The missing transverse energy vector is calculated as the negative vector p T sum of all PF candidates identified in the event. The magnitude of this vector is denoted as E miss T . Since pileup interactions degrade the performance of the E miss T variable, the H → leptons search also uses the H miss T variable. This variable is computed in the same way as the E miss T , but uses only the selected jets and leptons. The H miss T variable has worse resolution than E miss T but it is more robust as it does not rely on soft objects in the event. A linear discriminator is computed based on the two variables, exploiting the fact that E miss T and H miss T are less correlated in events with missing transverse energy from instrumental mismeasurement than in events with genuine missing transverse energy. The linear discriminant is constructed to optimize separation between ttH and Z+jets in simulation.
To match the performance of reconstructed objects between data and simulation, the latter is corrected with the following data-MC scale factors: Leptons are corrected for the difference in trigger efficiency, as well as in lepton identification and isolation efficiency. For the H → leptons channel, corrections accounting for residual differences between data and simulation are applied to the muon momentum, as well as to the ECAL energy before combining with the tracking momentum for electrons. All lepton corrections are derived using tag-and-probe techniques [64] based on samples with Z boson and J/ψ decays into two leptons. Jet energy corrections as described above are applied as a function of the jet p T and η [57]. Standard efficiency scale factors for the medium and loose b-tagging working points [60,61] are applied for light-and heavy-flavor jets in the H → leptons and H → photons searches, while the H → hadrons search uses a more sophisticated correction to the CSV shape (see section 5 for more details).

Event selection
Events in the H → hadrons analysis are split into three different channels based on the decay modes of the top-quark pair and the Higgs boson: the lepton+jets channel (tt → νqq bb, H → bb), the dilepton channel (tt → + ν − νbb, H → bb), and the τ h channel (tt → νqq bb, H → τ h τ h ), where a lepton is an electron or a muon. For the lepton+jets channel, events containing an energetic, isolated lepton, and at least four energetic jets, two or more of these jets must be btagged, are selected. For the dilepton channel, a pair of oppositely charged leptons and three or more jets, with at least two of the jets being b-tagged, are required. For the τ h channel, beyond the two identified hadronically decaying τ leptons, at least two jets, one or two of which must be b-tagged, are required. The event selections are designed to be mutually exclusive. For all figures (figures 2-7) and tables (tables 2-4) of the H → hadrons analysis, the b-tagged jets are included in the jet count.
In addition to the baseline selection detailed in section 4, two additional sets of selection criteria are applied to leptons in the H → hadrons analysis: tight and loose, described below. All events are required to contain at least one tight electron or muon. Loose requirements are only applied to the second lepton in the dilepton channel.
Tight and loose muons differ both in the identification and kinematic requirements. For events in the lepton+jets and τ h channels, tight muons are required to have p T > 30 GeV and |η| < 2.1 to ensure that the trigger is fully efficient with respect to the offline selection. Tight muons in the dilepton channel have a lower p T threshold at 20 GeV. Loose muons must have p T > 10 GeV and |η| < 2.4. For tight (loose) muons, the relative isolation is required to be less than 0.12 (0.2). Tight muons must also satisfy additional quality criteria based on the number of hits associated with the muon candidate in the pixel, strip, and muon detectors. To ensure the muon is from a W decay, it is required to be consistent with originating from the primary vertex with an impact parameter in the x − y plane d xy < 0.2 cm and distance from the primary vertex in the z-direction d z < 0.5 cm. For loose muons, no additional requirements beyond the baseline selection are applied.
Tight electrons in the lepton+jets and τ h channels are required to have p T > 30 GeV, while the dilepton channel requires p T > 20 GeV. Loose electrons are required to have p T > 10 GeV. All electrons must have |η| < 2.5, and those that fall into the transition region between the barrel and endcap of the ECAL (1.44 < |η| < 1.57) are rejected. Tight electrons must have a relative isolation less than 0.1, while loose electrons must have a relative isolation less than 0.2. In a manner similar to tight muons, tight electrons are required to have d xy < 0.02 cm and d z < 1 cm, while loose electrons must have d xy < 0.04 cm.
For τ leptons decaying hadronically, only candidates with well-reconstructed decay modes [63] that contain exactly one charged pion are accepted. Candidates must have p T > 20 GeV and |η| < 2.1, and the p T of the charged pion must be greater than 5 GeV. Candidates are additionally required to fulfill criteria that reject electrons and muons mimicking hadronic τ-lepton decays. These include requirements on the consistency of information from the tracker, calorimeters, and muon detectors, including the absence of large energy deposits in the calorimeters for muons and bremsstrahlung pattern recognition for electrons. A multivariate discriminant, which takes into account the effects of pileup, is used to select loosely isolated τ h candidates [65]. Finally, the τ h candidates must be separated from the single tight muon or electron in the event by a distance ∆R > 0.25. Events are required to contain at least one pair of oppositely charged τ h candidates. In the case that multiple valid pairs exist, the pair with the most isolated τ h signatures, based on the aforementioned MVA discriminant, is chosen.
While the basic jet p T threshold is 30 GeV, in the lepton+jets channel, the leading three jets must have p T > 40 GeV. Jets originating from b quarks are identified using the CSV medium working point.

Background modeling
All the backgrounds in the H → hadrons analysis are normalized using NLO or better inclusive cross section calculations [66][67][68][69][70][71]. To determine the contribution of individual physics processes to exclusive final states as well as to model the kinematics, the MC simulations described in section 3 are used. The main background, tt+ jets, is generated using MADGRAPH inclusively, with tree-level diagrams for up to tt + 3 extra partons. These extra partons include both b and c quarks. However, as there are significantly different uncertainties in the production of additional light-flavor (lf) jets compared to heavy-flavor (hf), the tt+jets sample is separated into subsamples based on the quark flavor associated with the reconstructed jets in the event. Events where at least two reconstructed jets are matched at the generator level to extra b quarks (that is b quarks not originating from a top-quark decay) are labeled as tt + bb events. If only a single jet is matched to a b quark, the event is classed as tt+b. These cases typically arise because the second extra b quark in the event is either too far forward or too soft to be reconstructed as a jet, or the two extra b quarks have merged into a single jet. Finally, if at least one reconstructed jet is matched to a c quark at the generator level, the event is labeled as tt + cc. Different systematic uncertainties affecting both rates and shapes are applied to each of the separate subsets of the tt+jets sample, as described in section 8.
Besides the common corrections to MC samples described in section 4, additional correction factors are applied for samples modeling the backgrounds for this analysis channel. A correction factor to tt+jets MC samples is applied so that the top-quark p T spectrum from MAD-GRAPH agrees with the distribution observed in data and predicted by higher-order calculations. These scale factors, which range from roughly 0.75 to 1.2, were derived from a fully corrected measurement of the tt differential cross section as function of the top-quark p T using the √ s = 8 TeV dataset obtained using the same techniques as described in Ref. [72].
Furthermore, a dedicated correction to the CSV b-tagging rates is applied to all the MC samples. The CSV discriminant is used to identify b-quark jets, and the CSV discriminant shape is used in the signal extraction technique to distinguish between events with additional genuine b-quark jets and those with mistags. Therefore, a correction for the efficiency difference between data and simulation over the whole range of discriminator values is applied. The scale factors-which are between 0.7 and 1.3 for the bulk of the jets-are derived separately for light-flavor (including gluons) and b-quark jets using two independent samples of 8 TeV data in the dilepton channel. Both control samples are also orthogonal to the events used in the signal extraction. The light-flavor scale factor derivation uses a control sample enriched in events with a Z boson, selected by requiring a pair of opposite-charge, same-flavor leptons and exactly two jets. The b-quark scale factor is derived in a sample dominated by dileptonic tt, a signature that includes exactly two b-quark jets, by selecting events with two leptons that are not consistent with a Z boson decay and exactly two jets. Using these control samples, a tag-and-probe approach is employed where one jet ("tag") passes the appropriate b-tagging requirement for a light-flavor or b-quark jet. The CSV discriminant of the other jet ("probe") is compared between the data and simulation, and the ratio gives a scale factor for each jet as a function of CSV discriminant value, p T and η. Each light-flavor or b-quark jet is then assigned an appropriate individual scale factor. The CSV output shape for c-quark jets is dissimilar to that of both light-flavor and b-quark jets; hence, in the absence of a control sample of c-quark jets in data, a scale factor of 1 is applied, with twice the relative uncertainty ascertained from b-quark jets (see section 8). These CSV scale factors are applied to simulation on an event-byevent basis where the overall scale factor is the product of the individual scale factors for each jet in the event. This procedure was checked using control samples. Tables 2, 3, and 4 show the predicted event yields compared to data after the selection in the lepton+jets, dilepton, and τ h channels, respectively. The tables are sub-divided into the different jet and b-tag categories used in each channel. The signal yield is the SM prediction (µ fixed to 1). In these tables, background yields and uncertainties use the best-fit value of all nuisance parameters, with µ fixed at 1. For more details about the statistical treatment and the definition of µ, see section 9. The expected and observed yields agree well in all final states across the different jet and b-tag categories.

Figures 2, 3, and 4
show the data-to-simulation comparisons of variables that give the best signal-background separation in each of the lepton+jets, dilepton, and τ h channels, respectively. In these plots, the background is normalized to the SM expectation; the uncertainty band (shown as a hatched band in the stack plot and a green band in the ratio plot) includes statistical and systematic uncertainties that affect both the rate and shape of the background distributions. For the ratio plots shown below each distribution, only the background expectation (and not the signal) is included in the denominator of the ratio. The contribution labeled "EWK" is the sum of the diboson and W/Z+jets backgrounds. The ttH signal (m H = 125.6 GeV) is not included in the stacked histogram, but is shown as a separate open histogram normalized to 30 times the SM expectation (µ = 30). To calculate the variable second m(jj,H), the invariant masses of all jet pairs with at least one b-tagged jet are calculated and the jet pair whose mass is the second closest to the Higgs boson mass is chosen. Within the uncertainties, the simulation reproduces well the shape and the normalization of the distributions.

Signal extraction
Boosted decision trees (BDTs) [73] are used to further improve signal sensitivity. In the lep-ton+jets and dilepton channels, BDTs are trained separately for each category, using the ttH  Figure 2: Input variables that give the best signal-background separation for each of the lep-ton+jets categories used in the analysis at √ s = 8 TeV. The top, middle, and bottom rows show the events with 4, 5, and ≥6 jets, respectively, while the left, middle, and right columns are events with 2, 3, and ≥4 b-tags, respectively. More details regarding these plots are found in the text.   sample with m H = 125 GeV. The three dilepton categories use a single BDT. Of the seven lep-ton+jets categories, four categories use a single BDT, while three categories each use two BDTs in a tiered configuration. The tiered configuration includes one BDT that is trained specifically to discriminate between ttH and ttbb events, the output of which is then used as an input variable in the second, more general, ttH versus tt+jets BDT. This tiered approach allows better discrimination between the ttH process and the difficult ttbb component of tt+jets production, resulting in better control of tt+hf systematics and a lower expected limit on µ. In the τ h channel, due to the low event counts, a single BDT is used for all categories, using an event selection equivalent to the union of all categories with more than one untagged jet.
All BDTs utilize variables involving the kinematics of the reconstructed objects, the event shape, and the CSV b-tag discriminant. Ten variables are used as inputs to the final BDTs in all lepton+jets categories, while 10 or 15 variables are used in the first BDT in categories employing the tiered-BDT system (the ≥6 jets + ≥4 b-tags and ≥6 jets + 3 b-tags categories use 15 variables, and the 5 jets + ≥4 b-tags category uses ten variables due to lower available training statistics in that category). The dilepton channel uses four variables for the 3 jets + 2 b-tags category and six in each of the other categories. In the τ h channel, almost all variables used to train the BDT are related to the τ h system, such as the mass of the visible τ decay products, the p T , the isolation, and the decay mode of both τ h , and the |η| and distance to the lepton of the more energetic τ h . In addition, the p T of the most energetic jet, regardless of the b-tagging status, is used in the BDT.
To train the BDTs, the τ h channel uses simulated ttH, H → ττ (m H = 125 GeV) events with generator-level matched τ h pairs as the signal, whereas both the lepton+jets and dilepton channels uses ttH (m H = 125 GeV) events, with inclusive Higgs boson decays. All three channels use tt+jets events as background when training. An equal number of signal and background events are used for a given category and channel. The signal and background events are evenly divided into two subsamples: one set of events is used to do the actual training, and the other is used as a test sample to monitor against overtraining. The specific BDT method used is a "gradient boost", available as part of the TMVA package [74] in ROOT [75]. The tree architecture consists of five nodes, a few hundred trees form a forest, and the learning rate is set to 0.1. Figures 5, 6, and 7 show the final BDT output distributions for the lepton+jets, dilepton, and τ h channels, respectively. Background-like events have a low BDT output value, while signal-like events have a high BDT output value. The background distributions use the best-fit values of all nuisance parameters, with µ fixed at 1, and the uncertainty bands are constructed using the post-fit nuisance parameter uncertainties. The fit is described in section 9. The ttH signal (m H = 125.6 GeV) is not included in the stacked histogram, but is shown as a separate open histogram normalized to 30 times the SM expectation (µ = 30). For the ratio plots shown below each BDT distribution, only the background expectation (and not the signal) is included in the denominator of the ratio. The final BDT outputs provide better discrimination between signal and background than any of the input variables individually. The BDT output distributions are used to set limits on the Higgs boson production cross section, as described in section 9.

H → photons
The diphoton analysis selects events using the diphoton system to identify the presence of a Higgs boson, and a loose selection on the remaining objects to accept all possible tt decays, while rejecting other Higgs boson production modes that are not directly sensitive to the topquark Yukawa coupling. The background is extracted directly from the diphoton invariant mass distribution m γγ , exploiting the fact that a signal around 125 GeV will be characterized by a narrow peak.
The event selection starts from the requirement of two photons, where the leading photon is required to have a p T > m γγ /2 and the second photon to have a p T > 25 GeV. The variable threshold on the leading photon p T increases the efficiency while minimizing trigger turn-on effects. The photon identification and energy measurement is the same as that used in Ref. [30] with the only exception being that the primary vertex selection is done as described in section 4 of this paper. The presence of at least one b-tagged jet according to the medium working point of the CSV algorithm is required, consistent with the presence of b jets from top quark decays in the final state. Muons must lie in the pseudorapidity range |η| < 2.4, and electrons within |η| < 2.5. Both muons and electrons are required to have p T greater than 20 GeV.
Events are categorized in two subsamples: the leptonic and hadronic channels. The hadronic channel requires, in addition to the two photons in the event, at least four jets of which at least one is b-tagged and no identified high-p T charged leptons, whereas the leptonic channel requires at least two jets of which at least one is b-tagged and at least one charged lepton, where = e, µ, with p T > 20 GeV. The 7 TeV dataset is too small to perform an optimization on each signal decay mode; thus events passing the hadronic and leptonic selections are combined in a single category.
Unlike the H → hadrons and H → leptons channels, the contribution from Higgs boson production modes other than ttH must be treated with care for this channel. This is because this analysis is designed to have very loose requirements on the jet and lepton activity, and the other Higgs boson production modes will peak at the same location in the diphoton invariant mass distribution as the ttH signal. This is in contrast with the situation for the H → hadrons and H → leptons analyses, where the non-ttH production modes tend to populate the most background-rich region of the phase space investigated, thus a very small contamination of non-ttH Higgs boson production has almost no impact on those analyses. The event selection for the ttH, H → photons channel is thus designed to minimize the contribution from other Higgs boson production modes. The expected signal yields for the various production processes for the SM Higgs boson of mass 125.6 GeV in this channel are shown in table 5, after selection in the 100 ≤ m γγ ≤ 180 GeV range. As can be seen the contribution of production      modes other than ttH is minor. The contribution of single-top-quark-plus-Higgs-boson production has not been explicitly estimated but its cross section is expected to be only about 1/10 of the ttH cross section and the events have different kinematics [76], so its contribution to the sample is expected to be small.
The main backgrounds are the production of top quarks and either genuine or misidentified photons in the final state, and the production of high-p T photons in association with many jets, including heavy-flavor jets. Because the background will be estimated by fitting the data which is a mixture of these processes, it is useful to test the background modeling in an independent control sample defined using collision data. The control sample is constructed using events that have been recorded with the single-photon trigger paths, and inverting the photon identification requirements on one of the two photons used to reconstruct the Higgs boson signal. To take into account the fact that the efficiency of the photon isolation requirement is not constant as a function of the photon p T and η, a two-dimensional reweighting procedure is applied to the leading and subleading photon candidates in such events. The reweighting is performed so as to match the photon p T and η spectra to the ones of photons populating the signal region. A control sample with similar kinematic properties as the data, yet statistically independent, is thus obtained.
The extent to which the control sample is well-modeled is tested using events passing the photon selections, and the requirement of at least two high-p T jets. The sample is further split into events with and without charged leptons, to test the kinematic properties of the model against data. A few key kinematic distributions are shown in figure 8, where the black markers show the signal sample, the green histogram is the control sample data, and the red line displays the signal kinematics. All distributions are normalized to the number of events observed in data.
Even after the dedicated event selection, the dataset is still largely dominated by backgrounds. The strategy adopted in this analysis is to fit for the amount of signal in the diphoton mass spectrum, as this provides a powerful discriminating variable due to the excellent photon energy resolution, in the region surrounding the Higgs boson mass. The background is obtained by fitting this distribution in each channel (hadronic or leptonic) over the range 100 GeV < m γγ < 180 GeV. The actual functional form used to fit the background, in any particular channel, is included as a discrete nuisance parameter in the likelihood functions used to extract the results; exponentials, power-law functions, polynomials (in the Bernstein basis), and Laurent series are considered for this analysis. When fitting the background by minimizing the value of twice the negative logarithm of the likelihood (2NLL), all functions in these families are tried, with a penalty term added to 2NLL to account for the number of free parameters in the fitted function. Pseudoexperiments have shown that this "envelope" method provides good coverage of the uncertainty associated with the choice of the function, for all the functions considered for the background, and provides an estimate of the signal strength with negligible bias [30].
The diphoton invariant mass spectra for data, the expected signal contribution, and the background estimate from data are shown in figure 9 for the combination of hadronic and leptonic selections on the √ s = 7 TeV data (left), the hadronic (middle) and leptonic (right) channels separately using √ s = 8 TeV data. The expected signal contribution of the dominant SM Higgs boson production modes is shown as a blue histogram. The result of the fit is shown in the plots as a red line, together with the uncertainty bands corresponding to 1σ (green) and 2σ (yellow) coverage. The observed diphoton mass spectra agree well with the background estimates.  Figure 8: Distributions of the b-tagged jet multiplicity (top row) and jet multiplicity (bottom row) for events passing a relaxed selection in the hadronic (left) and leptonic (right) channels, but removing events where the diphoton invariant mass is consistent with the Higgs boson mass within a 10 GeV window. The relaxed selection applies the standard photon and lepton requirements but allows events with any number of jets. The plots compare the data events with two photons and at least two jets (black markers) and the data from the control sample (green filled histogram) to simulated ttH events (red open histogram). Both signal and background histograms are normalized to the total number of data events observed in this region to allow for a shape comparison.

H → leptons 7.1 Object identification
In this channel the signal has multiple prompt leptons from W, Z, or τ decays. The largest backgrounds have at least one non-prompt lepton, usually from the decay of a b hadron (in tt+jets, Z+jets, and W+jets events). The analysis begins with a preselection of electron and muon objects using loose criteria with very high efficiency for prompt leptons and moderate non-prompt lepton rejection. In addition to the basic cuts from section 4, the lepton is required to be associated with the event vertex. The distance between the lepton track and the event vertex along the z-axis and perpendicular to it (d z and d xy ) must be less than 1 cm and 0.5 cm, respectively. The S IP (defined as the ratio of the IP to its uncertainty) is required to be less than 10, a fairly loose cut intended to retain efficiency for leptons coming from τ decays. Next, a multivariate discriminator based on BDT techniques is used to distinguish prompt from nonprompt leptons. This discriminator, referred to as the lepton MVA, is trained with simulated prompt leptons from the ttH MC sample and non-prompt leptons from the tt+jets MC sample, separately for electrons and muons and for several bins in p T and η.
The lepton MVA input variables relate to the lepton IP, isolation, and the properties of the nearest jet, within ∆R < 0.5. A tight working point on the lepton MVA output is used for the search in the dilepton and trilepton final states, and a loose working point is used for the four-lepton final state. For the tight working point, the efficiency to select prompt electrons is of order 35% for p e T ∼ 10 GeV and reaches a plateau of 85% at p e T ∼ 45 GeV; for prompt muons it is of order 55% for p µ T ∼ 10 GeV, and reaches a plateau of about 97% at p µ T ∼ 45 GeV. The efficiency to select electrons (muons) from the decay of b hadrons is between 5-10% (around 5%).
To suppress electrons from photon conversions, tight electrons with missing tracker hits before the first reconstructed hit, or associated with a successfully reconstructed conversion vertex, are rejected [77].
Additional cuts are used to suppress incorrect charge reconstruction in the dilepton final states. For electrons, the tracker and ECAL charge measurements must agree, where the ECAL charge is measured by comparing the position of the energy deposits in the ECAL to a straight-line trajectory formed from the electron hits in the pixel detector [50,78]. For muons, the relative uncertainty in the track p T must be less than 20%.
The agreement between data and simulation for the input variables and the final lepton MVA is validated in dedicated control regions. For prompt leptons, high-purity control samples are selected with same-flavor, opposite-sign pairs of leptons with an invariant mass close to that of the Z boson and little E miss T . In these events, tight isolation and p T selection are applied to the leading lepton, and the trailing lepton is used to check the agreement between simulation and data. High-purity τ leptons are selected by requiring opposite-flavor, opposite-sign pairs of electrons and muons with an invariant mass between 20 GeV and 80 GeV. In these events, tight isolation, p T , and S IP cuts are applied to one of the two leptons, and the other lepton is used to compare simulation and data. For non-prompt leptons, samples enriched in leptons from the decay of b hadrons are selected with three-lepton Z → + and tt → + control regions. The agreement is good; small corrections to better match the data distributions of the input variables are applied to the simulation before training the MVA discriminant. Efficiency scale factors for the tight and loose lepton MVA working points are computed for prompt leptons with a tag-and-probe technique in the Z → control region. Backgrounds with non-prompt leptons are estimated directly from data, as described in section 7.3.

Event selection
The multilepton selection is optimized to accept ttH events where the Higgs boson decays into WW, ZZ, or ττ, and at least one W boson, Z boson, or τ decays leptonically. With at least one additional lepton from the top decays, the events have one of the following three signatures: • two same-sign leptons (electrons or muons) plus two b-quark jets; • three leptons plus two b-quark jets; • four leptons plus two b-quark jets.
The first three rows in table 6 show the expected distribution of the ttH signal among these different signatures. The other rows in the table will be discussed below. Table 6: Expected and observed yields after the selection in all five final states. For the expected yields, the total systematic uncertainty is also indicated. The rare SM backgrounds include triboson production, tbZ, W ± W ± qq, and WW produced in double parton interactions. A '-' indicates a negligible yield. Non-prompt and charge-misidentification backgrounds are described in section 7.3. Candidate events that match one of these signal signatures are selected by requiring combinations of reconstructed objects. Three features are common to all three decay signatures: • Each event is required to have one lepton with p T > 20 GeV and another with p T > 10 GeV to satisfy the dilepton trigger requirements.
• If an event has any pair of leptons, regardless of charge or flavor, that form an invariant mass less than 12 GeV, that event is rejected. This requirement reduces contamination from Υ and J/ψ, as well as very low-mass Drell-Yan events that are not included in the simulation.
• Since signal events have two top quarks, each event is required to have at least two jets, where at least two jets satisfy the loose CSV working point or one jet satisfies the medium CSV working point.
In addition, pairs of leptons with the same flavor whose invariant mass is within 10 GeV of the Z boson mass are rejected to suppress background events with a Z boson decay. Same-sign dielectron events are rejected if they contain any such pair. Events in the 3 and 4 categories are rejected only if the two leptons in the pair have opposite charges.
Same-sign dilepton events are required to have exactly two leptons with identical charges and at least four hadronic jets. Each lepton must pass the lepton preselection, the tight working point of the lepton MVA discriminant, and the charge quality requirements. To reject events from backgrounds with a Z boson, L D > 30 GeV is required for dielectron events, where L D is defined in section 4, equation 1. To further suppress reducible backgrounds, especially non-tt backgrounds, the threshold on the p T of the second lepton is raised to 20 GeV, and the scalar sum of the p T of the two leptons and of the E miss T is required to be above 100 GeV.
The three-lepton candidate selection requires exactly three leptons that pass the lepton preselection and the tight working point for the lepton MVA discriminant. To further reject events from backgrounds with a Z boson, an L D requirement is applied, with a tighter threshold if the event has a pair of leptons with the same flavor and opposite charge. For events with large jet multiplicity (≥ 4 jets), where contamination from the Z-boson background is smaller, the L D requirement is not applied.
The four-lepton candidate selection requires exactly four leptons that each pass the lepton preselection and the loose working point of the lepton MVA discriminant.
The observed event yields in data for each final state and the expectations from the different physical processes after event selection are summarized in table 6. The details of the calculations of the signal and background yields are discussed in the next section.

Signal and background modeling
Three categories of backgrounds are identified in this search: ttV backgrounds from the associated production of a tt pair and one or more W or Z bosons; diboson or multiboson production associated with multiple hadronic jets; and reducible backgrounds from events with non-prompt leptons, or opposite-sign dilepton events in which the charge of one of the leptons is misidentified. These three background classes are estimated separately with different methods, described below. The systematic uncertainties associated with each background estimate are discussed in section 8.
The ttH signal and backgrounds from ttW and ttZ, as well as minor backgrounds like ttWW and triboson processes, are estimated from simulation, normalized to the NLO inclusive cross sections for each process [15, 19-28, 67, 68, 79, 80]. The combined cross section of ttW and ttZ has been measured by the CMS Collaboration in 7 TeV data [81]. The results are consistent with theory but have larger uncertainties. The prediction for the ttZ process is also tested directly in a trilepton control region requiring two of the leptons to have the same flavor, opposite charge, and invariant mass within 10 GeV of the nominal Z boson mass [82]. Agreement is observed in this control region, though the precision of the test is dominated by the statistical uncertainty of about 35%. Agreement was also observed in a tt → e ± µ ∓ bb νν sample, indicating good simulation of prompt leptons and real b-quark jets.
The WZ and ZZ production processes with the gauge bosons decaying to electrons, muons, or taus can yield the same leptonic final states as the signal. These processes are predicted theoretically at NLO accuracy, but the uncertainty in the production cross section of diboson with additional partons can be large. To reduce this uncertainty, a low-signal control sample of WZ or ZZ plus at least two jets is selected by vetoing any event with a loose b tag, as well as inverting the Z → veto. The diboson background in the signal region is normalized according to the event yield observed in this control region times an extrapolation factor, taken from MC simulation, associated with going from the control region to the signal region.
The expected flavor composition in simulation for WZ events after the full selection in the trilepton final state is approximately 50% from WZ production in association with mistagged jets from light quarks or gluons, 35% from events with one jet originating from a c quark, and 15% from events with b quarks. For ZZ in the four-lepton final state, the expectation is about 40% events with jets from gluons or light quarks, 35% from events with b quarks and 25% from events with c quarks.
The reducible backgrounds with at least one non-prompt lepton are estimated from data. A control region dominated by reducible backgrounds is defined by selecting events with the same kinematics as the signal region, but for which at least one of the leptons fails the requirement on the lepton MVA. The kinematic distributions for data in this region are consistent with MC, mostly tt+jets with one non-prompt lepton, as shown in figure 10. Extrapolation to the signal region is then performed by weighting events in the control region by the probability for non-prompt leptons to pass the lepton MVA selection, measured from same-sign dilepton and lepton+b-tagged jet data in control regions with fewer jets than the signal region, as a function of the lepton p T and η, separately for muons and electrons.
Events in which a single lepton fails the lepton MVA requirement enter the signal region prediction with weight /(1 − ), where denotes the aforementioned probability computed for the p T , η, and flavor of the lepton failing the selection. Events with two leptons failing the requirement are also used, but with a negative weight − 1 2 /[(1 − 1 )(1 − 2 )]; this small correction is necessary to account for events with two background-like leptons contaminating the sample of events with a single lepton failing the requirement.
The measurement of the probability for non-prompt leptons to pass the lepton MVA cuts, and the weighting of events in the control region, are performed separately for events with at most one jet satisfying the medium CSV requirement and for events with at least two, to account for the different flavor composition and kinematics of the two samples.
Charge misidentification probabilities are determined as function of the lepton p T and η from the observed yields of same-sign and opposite-sign dilepton pairs with mass within 10 GeV of the Z-boson mass. For electrons, this probability varies from 0.03% in the barrel to 0.3% in the endcaps, while for muons the probability is found to be negligible.
The prediction for background dilepton events with misidentified electron charge in the signal region is computed from opposite-sign dilepton events passing the full selection, except for the charge requirement: events with a single electron enter the prediction with a weight equal to the charge misidentification probability for that electron, while dielectron events enter the prediction with a weight equal to the sum of the charge misidentification probabilities for the two electrons.

Signal extraction
After the event selection, overall yields are still dominated by background. The strategy adopted in this search is to fit for the amount of signal in the distribution of a suitable discriminating variable.
In the dilepton analysis, a BDT output is used as discriminating variable. The BDT is trained with simulated ttH signal and tt+jets background events, with six input variables: the p T and |η| of the trailing lepton, the minimal angular separation between the trailing lepton and the closest jet, the transverse mass of the leading lepton and E miss T , H T , and H miss T . The same training is used for the ee, eµ, and µµ final states, as the gain in performance from dedicated trainings in each final state is found to be negligible.  Figure 10: These plots show the distribution of key discriminating variables for events where one lepton fails the lepton MVA requirement. The expected distribution for the non-prompt background is taken from simulation (mostly tt+jets), and the yield is fitted from the data. The bottom panel of each plot shows the ratio between data and predictions as well as the overall uncertainties after the fit (blue). In the first row the distributions of the trailing lepton p T for the e ± e ± (left), e ± µ ± (center), and µ ± µ ± (right) final states are shown. In the second row the distributions of the H T (left), the p T of the jet with highest b-tagging discriminator (center), and the lepton maximum |η| (right) are shown for the trilepton channel.
In the trilepton analysis, a BDT output is also used as the final discriminant. The BDT is trained with simulated ttH signal and a mix of tt+jets, ttW, and ttZ background events, with seven discriminating variables: the number of hadronic jets, the p T of the jet with the highest btagging discriminant value, the scalar sum of lepton and jet p T (H T ), the fraction of H T from jets and leptons with |η| < 1.2, the maximum of the |η| values of the three leptons, the minimum ∆R separation between any pair of opposite-sign leptons, and the mass of three jets, two close to the W-boson mass and a b-tagged jet, closest to the nominal top quark mass [82].
As a cross-check in both the dilepton and the trilepton final states, the number of hadronic jets was used instead of the BDT as the discriminating variable. The gain in signal strength precision from the multivariate analysis compared to this simpler cross-check is about 10%.
In the four-lepton analysis, only the number of hadronic jets is used: the sensitivity of this channel is limited by the very small branching fraction, and the estimation of the kinematic distributions of the reducible backgrounds from data is also challenging due to the low event yields.
In the dilepton and trilepton final states, events are divided into categories by the sum of the electrical charges of the leptons, to exploit the charge asymmetry present in several SM background cross sections in pp collisions (ttW, WZ, single top quark t-channel, W+jets). The gain in signal strength precision from this categorization is approximately 5%.
The expected and observed distributions of the number of selected jets and the BDT output, for the different final states of the dilepton analysis, are shown in figure 11. The same distributions are shown for the trilepton analysis in figure 12. The distribution of the number of selected jets is also shown for the four-lepton channel in figure 12. The ttH signal yield in the stack is the SM prediction (µ = 1); additionally, the signal yield for µ = 5 is shown as a dotted line. The background distributions use the best-fit values of all nuisance parameters, with µ fixed at 1, and the uncertainty bands are constructed using the nuisance parameter uncertainties.
The dilepton data are in good agreement with the predictions in the ee and eµ channels, while an excess of signal-like events is visible in the µµ final state. The details of this excess are discussed below. In the trilepton channel the overall data yield matches expectations. The jet multiplicity in data is a bit higher, but the distribution of the BDT discriminator matches the prediction. In the four-lepton channel only one event is observed with respect to an overall SM prediction (including expected ttH contribution) of about three events.
Because the excess of signal-like events is most pronounced in the dimuon channel, additional cross-checks were performed. The agreement between expected and observed yields in the ee and eµ channels suggests that the background estimates are reasonable. Detailed studies of various single-muon and dimuon distributions did not reveal any potential additional source of background. Moreover, the analysis of the dimuon final state has been repeated with different lepton selections, using looser working points for the lepton MVA and also with traditional selections on individual variables. These approaches have sensitivities 10-50% worse than the nominal analysis and give compatible results. The consistency of these checks suggests this excess does not arise from a deficiency in the estimation of the backgrounds.

Systematic uncertainties
There are a number of systematic uncertainties that impact the estimated signal or background rates, the shape of the final discriminant, or both. This section describes the various sources of systematic uncertainty. Section 9 will explain how the effects of these uncertainties are ac-  counted for in the likelihood function used to set limits and extract the best-fit Higgs boson signal.
Different systematic uncertainties are relevant for different parts of the overall ttH analysis. Uncertainties related to MC modeling affect all analysis channels, whereas systematic uncertainties related to the background estimation or object identification can be specific to particular channels. Table 7 summarizes the impact of systematic uncertainties on this analysis. For each broad category, table 7 shows the range of effects the systematic uncertainties have on the signal and background rates, and notes whether the uncertainty also has an effect on the shape of the final discriminant. Cases for which a systematic category only applies to one analysis channel are noted in parentheses. Further details are given below. Table 7: Summary of systematic uncertainties. Each row in the table summarizes a category of systematic uncertainties from a common source or set of related sources. In the statistical implementation, most of these uncertainties are treated via multiple nuisance parameters. The table summarizes the impact of these uncertainties both in terms of the overall effect on signal and background rates, as well as on the shapes of the signal and background distributions. The rate columns show a range of uncertainties, since the size of the rate effect varies both with the analysis channel as well as the specific event selection category within a channel. The uncertainties quoted here are a priori uncertainties; that is they are calculated prior to fitting the data, which leads to a reduction in the impact of the uncertainties as the data helps to constrain them. Global event uncertainties affect all the analysis channels. The integrated luminosity is varied by ±2.2% for the 7 TeV dataset [34] and by ±2.6% for the 8 TeV dataset [32] from its nominal value. The effect of finite background MC statistics in the analysis is accounted for using the approach described in Refs. [83,84]. To avoid including thousands of nuisance parameters that have no effect on the result, this uncertainty is not evaluated for any bin in the BDT shapes for which the MC statistical uncertainty is negligible compared to the data statistics or where there is no appreciable contribution from signal. Tests show that the effect on the final result of neglecting the MC statistical uncertainty for these bins is smaller than 2%. In total, there are 190 nuisance parameters used to describe the fluctuations in the bins of the BDT outputs.

Rate uncertainty
The reconstructed objects in each event come with their own uncertainties. The uncertainty from the jet energy scale [57] is evaluated by varying the energy scale for all jets in the signal and background simulation simultaneously either up or down by one standard deviation as a function of jet p T and η, and reevaluating the yields and discriminant shapes of all processes. These variations have a negligible effect on the m γγ distribution and shape effects for the H → photons channel are ignored. The jet energy resolution uncertainty is found to have a negligible impact for all channels. The corrections for the b-tagging efficiencies for light-flavored, c-, and b-quark jets have associated uncertainties [60]. These uncertainties are parameterized as a function of the p T , η, and flavor of the jets. Their effect on the analysis is evaluated by shifting the correction factor of each jet up and down by one standard deviation of the appropriate uncertainty. Because the CSV distribution for jets in the H → hadrons channel receives shape corrections, it requires a different set of shape uncertainties. In deriving the CSV shape corrections, there are uncertainties from background contamination, jet energy scales, and limited size of the data samples. The statistical uncertainty in the CSV shape corrections has the potential to modify the shape of the CSV distribution in complicated ways. To parameterize this, the shape uncertainties are broken down into two orthogonal components: one component can vary the overall slope of the CSV distribution, while the other component changes the center of the distribution relative to the ends. These uncertainties are evaluated separately for light-flavor and b-quark jets. Twice the b-quark jet uncertainties are also applied to c-quark jets, whose nominal scale factor is 1.
Electron and muon identification and trigger efficiency uncertainties are estimated by comparing variations in the difference in performance between data and MC simulation using a high-purity sample of Z-boson decays. These uncertainties vary between 1% and 6%. The systematic uncertainty associated with the MVA selection of prompt leptons in the H → leptons channel uses tag-and-probe measurements comparing data and simulation in dilepton Z-boson events in the dilepton channel. The overall uncertainty amounts to about 5% per lepton. The uncertainty in the misidentification probabilities for non-prompt leptons is estimated from simulation. The misidentification rate is estimated following the same approach and parameterization used in the QCD dominated control region, but instead using only MC samples with a similar composition. This simulation-based misidentification rate is then applied to MC samples with the expected background composition in the signal region, and the amount of disagreement between the number of non-prompt leptons predicted by the parameterized misidentification rate and those actually observed in this collection of MC samples is used to estimate the systematic uncertainty. The uncertainty is assessed separately for different p T , η and b-tagged jet multiplicity bins for each flavor. The overall uncertainty amounts to about 40%, which is applied using linear and quadratic deformations of the p T -and η-dependent misidentification rate.
The uncertainties in the τ h identification consist of electron and jet misidentification rates, as well as the uncertainty in the τ h identification itself. The last is applied to the generatorlevel matched τ h , and estimated to be 6% per object, using a tag-and-probe technique with a Z → ττ → µτ h process. The jet misidentification rate uncertainty is determined to be 20% comparing τ h misidentification rates in data and simulated W+jets events, where the W boson decays to µν. Likewise, the electron misidentification rate uncertainty is found to be 5% from Z → ee events using a tag-and-probe technique. The τ h energy scale systematics are obtained from studies involving Z → ττ [65].
For photon identification, the uncertainty in the data-MC efficiency scale factor from the fidu-cial region determines the overall uncertainty, as measured using a tag-and-probe technique applied to Z → ee events (3.0% in the ECAL barrel, 4.0% in ECAL endcap). For the uncertainties related to the photon scale and resolution, the photon energy is shifted and smeared respectively within the known uncertainty for both photons.
Theoretical uncertainties may affect the yield of signal and background contributions as well as the shape of distributions. Signal and background rates are estimated using cross sections of at least NLO accuracy, which have uncertainties arising primarily from the PDFs and the choice of the factorization and renormalization scales. The cross section uncertainties are each separated into their PDF and scale components and correlated, where appropriate, between processes. For example, the PDF uncertainty for processes originating primarily from gluon-gluon initial states, e.g., tt and ttH production, are treated as completely correlated.
In addition to the rate uncertainties coming from the NLO or better cross section calculations, the modeling of the tt+jets (including tt + bb and tt + cc), ttV, diboson+jets and the W/Z+jets processes are subject to MC modeling uncertainties arising from the extrapolation from the inclusive rates to exclusive rates for particular jet or tag categories using the MADGRAPH treelevel matrix element generator matched to the PYTHIA parton shower MC program. Although MADGRAPH incorporates contributions from higher-order diagrams, it does so only at treelevel, and is subject to fairly large uncertainties arising from the choice of scales. These uncertainties are evaluated using samples for which the factorization and renormalization scales have been varied up and down by a factor of two. Scale variations are propagated to both the rate and (where significant) the final discriminant shape. Scale variations are treated as uncorrelated for the tt+light flavor, tt + bb, and tt + cc components. The scale variations for W+jets and Z+jets are treated as correlated; all other scale variations are treated as uncorrelated.
A systematic uncertainty on the top quark p T reweighting for the tt+jets simulation is assessed using the uncorrected MC shapes as a −1 standard deviation systematic uncertainty, and overcorrected MC shapes as a +1 standard deviation uncertainty. The overcorrected shapes are calculated by doubling the deviation of the top-quark p T scale factors from 1. The tt + bb and tt + cc processes represent an important source of irreducible background for the H → hadrons analysis. Neither control region studies nor higher-order theoretical calculations [85] can currently constrain the normalization of these contributions to better than 50% accuracy. Therefore, an extra 50% uncorrelated rate uncertainty is conservatively assigned to the tt + bb, tt + b and tt + cc processes.
In the H → photons analysis, to assess the contamination from Higgs boson production from mechanisms other than ttH, it is necessary to extrapolate MC predictions to final states with several jets beyond those included in the matrix elements used for the calculation. As these jets are modeled primarily with parton shower techniques, the uncertainty in these predictions should be carefully assessed. As POWHEG is used to model gg → H production, the uncertainty on the rate of additional jets is estimated by taking the observed difference between the POWHEG predictions and data in tt events which are dominated by gluon fusion production, gg → tt [86]. This uncertainty amounts to at most 30%, which includes the uncertainty in the fraction of gg → H plus heavy-flavor jets. Furthermore, the fraction of gg → H plus heavyflavor jets is scaled by the difference observed between data and the POWHEG predictions [87] in ttbb and ttqq/gg. These large uncertainties apply to a very small subset of the events falling into the signal region, thus resulting in a very small uncertainty on the final sensitivity to the signal itself.
In the H → leptons analysis, the normalization uncertainty in the WZ (ZZ) process comes from a variety of sources. Several uncertainties are related to the control region used to estimate the normalization, as described in section 7.3. The statistical uncertainty in the control region estimate results in 10% (12%) uncertainty in the normalization, while residual backgrounds in the control region account for another 10% (4%). Uncertainties in the b-tagging efficiencies result in a 15% (7.5%) normalization uncertainty. While uncertainties in the PDFs [88,89] and on the extrapolation from the control region to the signal region cause normalization uncertainties of 4% (3%) and 5% (12%) respectively. Taken together, the uncertainties described above result in an overall WZ (ZZ) normalization uncertainty of 22% (19%).

Results
The statistical methodology employed for these results is identical to that used for other CMS Higgs boson analyses. More details can be found in Ref. [9]. In brief, a binned likelihood spanning all analysis channels included in a given result is constructed. The amount of signal is characterized by the signal strength parameter µ, which is the ratio of the observed cross section for ttH production to the SM expectation. In extracting µ some assumption must be made about the branching fractions of the Higgs boson. Unless stated otherwise, µ is extracted assuming SM branching fractions. Under some circumstances the branching fractions are parameterized in a more sophisticated fashion, for example allowing separate scaling for the Higgs boson's couplings to different particles in the SM. Uncertainties in the signal and background predictions are incorporated by means of nuisance parameters. Each distinct source of uncertainty is accounted with its own nuisance parameter, and in the case where a given source of uncertainty impacts more than one analysis channel, a single nuisance parameter is used to capture the correlation in this uncertainty between channels. Nuisance parameters are profiled, allowing high-statistics but signal-poor regions in the data to constrain certain key nuisance parameters.
To assess the consistency of the data with different hypotheses, a profile likelihood ratio test statistic is used: q(µ) = −2 ln L(µ,θ µ )/L(μ,θ) , where θ represents the full suit of nuisance parameters. The parametersμ andθ represent the values that maximize the likelihood function globally, while the parametersθ µ are the nuisance parameter values that maximize the likelihood function for a given µ. Results are reported both in terms of the best-fit value for µ and its associated uncertainty and in terms of upper limits on µ at 95% confidence level (CL). Limits are computed using the modified frequentist CL S method [90,91]. Results are obtained both independently for each of the distinct ttH signatures (bb, τ h τ h , γγ, same-sign 2l, 3l, and 4l) as well as combined over all channels.
The best-fit signal strengths from the individual channels and from the combined fit are given in table 8 and figure 13. The internal consistency of the six results with a common signal strength has been evaluated to be 29%, estimated from the asymptotic behavior of the profile likelihood function [9]. Combining all channels, the best fit value of the common signal strength is µ = 2.8 +1.0 −0.9 (68% CL). For this fit, the rates of Higgs boson production from mechanisms other than ttH production are fixed to their SM expectations; however, allowing all Higgs boson contributions to float with a common signal strength produces a negligible change in the fit result. Although the fit result shows an excess, within uncertainties, the result is consistent with SM expectations. The p-value under the SM hypothesis (µ = 1) is 2.0%. The p-value for the background-only hypothesis (µ = 0) is 0.04%, corresponding to a combined local significance of 3.4 standard deviations. Assuming SM Higgs boson production with m H = 125.6 GeV [29], the expected local significance is 1.2 standard deviations.
Throughout this paper, whenever a specific choice for Higgs boson mass has been required, a mass of 125.6 GeV has been used, corresponding to the most precise Higgs boson mass measurement by CMS at the time these results were obtained [29]. However, the recent CMS measurement of inclusive Higgs boson production with the Higgs boson decaying to a pair of photons [30], obtains a lower Higgs boson mass value. The combination of CMS Higgs boson mass measurements is expected to be very close to 125 GeV. The combined ttH measurement is not very sensitive to the Higgs boson mass value. The combined best-fit signal strength obtained assuming a Higgs boson mass of 125 GeV is µ = 2.9 +1.1 −0.9 . This result corresponds to a 3.5 standard deviation excess over the background-only (µ = 0) hypothesis, and represents a 2.1 standard deviation upward fluctuation on the SM ttH (µ = 1) expectation. These values are very close to the values quoted above for m H = 125.6 GeV.
Although the observed signal strength is consistent with SM expectations, it does represent a roughly 2 standard deviation upward fluctuation. Therefore, it is interesting to look more closely at how the different channels contribute to the observed excess. From figure 13, it can be seen that the same-sign dilepton channel yields the largest signal strength. Within that channel, the same-sign dimuon subsample has the largest signal strength, with µ = 8.5 +3.3 −2.7 compared with µ = 2.7 +4.6 −4.1 for the same-sign dielectron channel and µ = 1.8 +2.5 −2.3 for the samesign electron-muon channel. The internal consistency of these three channels, along with the three and four lepton channels, is 16%. To characterize the impact of the same-sign dimuon channel on the combined fit, the fit was repeated with that channel omitted, resulting in a signal strength of µ = 1.9 +1.0 −0.9 . This fit result corresponds to a p-value under the SM hypothesis (µ = 1) of 17%. The p-value under the background-only hypothesis for this fit is 1.6% corresponding to a local significance of 2.2 standard deviations. Although removing the same-sign dimuon channel does result in a lower fitted signal strength, the overall conclusion is unchanged.
In the above, consistency with SM expectations is assessed by varying the ttH signal strength. An alternative approach would be to vary individual couplings between the Higgs boson and other particles. The collected statistics are currently insufficient to allow individual couplings to each SM particle to be probed. However, it is feasible to scale the couplings to vector bosons and fermions separately. This is a useful approach for testing whether the excess observed is consistent with expectations from SM ttH production. Following the methodology used to study the properties of the new boson in the global CMS Higgs boson analysis [9], the scale factors κ V and κ f are introduced to modify the coupling of the Higgs boson to vector bosons and fermions, respectively. Figure 14 shows the 2D likelihood scan over the (κ V ,κ f ) phase space using only the ttH analysis channels. The best-fit values of the coupling modifiers are at (κ V ,κ f ) = (2.2,1.5), which is compatible at the 95% CL with the expectation from the SM Higgs boson (1,1).
As BSM physics can enhance the production rate for the ttH and ttH + X final states, it is also useful to characterize the upper limit on ttH production. Furthermore, the expected limit serves as a convenient gauge of the sensitivity of the analysis. The 95% CL expected and observed upper limits on µ are shown in table 8 for m H = 125.6 GeV and as a function of m H in figure 15, when combining all channels. Both the expected limit in the background-only hypothesis and the hypothesis including the SM Higgs boson signal, assuming the SM cross section, are quoted. In addition to the median expected limit under the background-only hypothesis, the bands that contain the one and two standard deviation ranges around the median are also quoted. In the absence of a ttH signal, the median expected upper limit on µ from the combination of all channels is 1.7; the corresponding median expectation under the hypothesis of SM ttH production with m H = 125.6 GeV is 2.7. The observed upper limit on µ is 4.5, larger than both expectations, compatible with the observation that the best fit value of the signal strength modifier µ is greater than one. The limits for the individual channels at m H = 125.6 GeV are given in the right panel of figure 15. Table 8: The best-fit values of the signal strength parameter µ = σ/σ SM for each ttH channel at m H = 125.6 GeV. The signal strength in the four-lepton final state is not allowed to be below approximately −6 by the requirement that the expected signal-plus-background event yield must not be negative in either of the two jet multiplicity bins. The observed and expected 95% CL upper limits on the signal strength parameter µ = σ/σ SM for each ttH channel at m H = 125.6 GeV are also shown.   Figure 13: Left: The best-fit values of the signal strength parameter µ = σ/σ SM for each ttH channel at m H = 125.6 GeV. The signal strength in the four-lepton final state is not allowed to be below approximately −6 by the requirement that the expected signal-plus-background event yield must not be negative in either of the two jet multiplicity bins. Right: The 1D test statistic q(µ ttH ) scan vs. the signal strength parameter for ttH processes µ ttH , profiling all other nuisance parameters. The lower and upper horizontal lines correspond to the 68% and 95% CL, respectively. The µ ttH values where these lines intersect with the q(µ ttH ) curve are shown by the vertical lines.

Summary
The production of the standard model Higgs boson in association with a top-quark pair has been investigated using data recorded by the CMS experiment in 2011 and 2012, corresponding to integrated luminosities of up to 5.1 fb −1 and 19.7 fb −1 at √ s = 7 TeV and 8 TeV respectively. Signatures resulting from different combinations of decay modes for the top-quark pair and the Higgs boson have been analyzed. In particular, the searches have been optimized for the H → bb, τ h τ h , γγ, WW, and ZZ decay modes. The best-fit value for the signal strength µ is 2.8 ± 1.0 at 68% confidence level. This result represents an excess above the backgroundonly expectation of 3.4 standard deviations. Compared to the SM expectation including the contribution from ttH, the observed excess is equivalent to a 2-standard-deviation upward fluctuation. These results are obtained assuming a Higgs boson mass of 125.6 GeV but they do not vary significantly for other choices of the mass in the vicinity of 125 GeV. These results are more consistent with the SM ttH expectation than with the background-only hypothesis.  [49] CMS Collaboration, "Electron reconstruction and identification at √ s = 7 TeV", CMS Physics Analysis Summary CMS-PAS-EGM-10-004, 2010.