Search for $\mathrm{t\overline{t}}$H production in the H $\rightarrow$ $\mathrm{b\overline{b}}$ decay channel with leptonic $\mathrm{t\overline{t}}$ decays in proton-proton collisions at $\sqrt{s} =$ 13 TeV

A search is presented for the associated production of a standard model Higgs boson with a top quark-antiquark pair ($\mathrm{t\overline{t}}$H), in which the Higgs boson decays into a b quark-antiquark pair, in proton-proton collisions at a centre-of-mass energy $\sqrt{s} =$ 13 TeV. The data correspond to an integrated luminosity of 35.9 fb$^{-1}$ recorded with the CMS detector at the CERN LHC. Candidate $\mathrm{t\overline{t}}$H events are selected that contain either one or two electrons or muons from the $\mathrm{t\overline{t}}$ decays and are categorised according to the number of jets. Multivariate techniques are employed to further classify the events and eventually discriminate between signal and background. The results are characterised by an observed $\mathrm{t\overline{t}}$H signal strength relative to the standard model cross section, $\mu = \sigma$ / $\sigma_{\mathrm{SM}}$, under the assumption of a Higgs boson mass of 125 GeV. A combined fit of multivariate discriminant distributions in all categories results in an observed (expected) upper limit on $\mu$ of 1.5 (0.9) at 95% confidence level, and a best fit value of 0.72 $\pm$ 0.24 (stat) $\pm$ 0.38 (syst), corresponding to an observed (expected) signal significance of 1.6 (2.2) standard deviations above the background-only hypothesis.


Introduction
The observation [1][2][3] of a Higgs boson with a mass of approximately 125 GeV [4,5] at the CERN LHC marked the starting point of a broad experimental program to determine the properties of the newly discovered particle. Decays into γγ, ZZ, WW, and ττ final states have been observed, and there is evidence for the direct decay of the particle to the bottom quarkantiquark (bb) final state [6][7][8][9][10]. The measured rates for various production and decay channels are consistent with the standard model (SM) expectations [11,12], and the hypothesis of a spin-0 particle is favoured over other hypotheses [13,14].
In the SM, the Higgs boson couples to fermions with a Yukawa-type interaction, with a coupling strength proportional to the fermion mass. Probing the coupling of the Higgs boson to the heaviest known fermion, the top quark, is therefore very important for testing the SM and for constraining various models of physics beyond the SM (BSM), some of which predict a different coupling strength than the SM. Indirect constraints on the coupling between the top quark and the Higgs boson are available from processes including virtual top quark loops, for example Higgs boson production through gluon-gluon fusion [11,12], as well as from production of four top quarks [15]. On the other hand, the associated production of a Higgs boson and a top quark-antiquark pair (ttH production) as illustrated by the Feynman diagrams in Fig. 1 is a direct probe of the Higgs boson coupling to fermions with weak isospin +1/2 ("up-type") in addition to τ leptons, which carry a weak isospin of −1/2 ("down-type"). The Higgs boson decay into bb, also shown in Fig. 1, is experimentally attractive as a final state because it features the largest branching fraction of 0.58 ± 0.02 for a 125 GeV Higgs boson [16]. Several BSM physics scenarios predict a significantly enhanced production rate of events with ttH final states, while not modifying the branching fractions of Higgs boson decays by a measurable amount [17][18][19][20][21][22][23][24][25][26]. In this context, a measurement of the ttH production cross section has the potential to distinguish the SM Higgs mechanism of generating fermion masses from alternative ones.
Various dedicated searches for ttH production have been conducted during Run 1 of the LHC. The CMS Collaboration searches employed proton-proton (pp) collision data corresponding to an integrated luminosity of 5 fb −1 at a centre-of-mass energy of √ s = 7 TeV and 19.5 fb −1 at √ s = 8 TeV. These searches have been performed by studying Higgs boson decays to b quarks, photons, and leptons using multivariate analysis (MVA) techniques, showing a mild excess of the observed ttH cross section relative to the SM expectation of µ = σ/σ SM = 2.8 ± 1.0 [27]. A similar excess of µ = 2.1 +1.4 −1.2 was observed in a search for ttH production in multilepton final states by the ATLAS Collaboration using data at √ s = 8 TeV, corresponding to an integrated luminosity of 20.3 fb −1 [28]. The searches in the H → bb decay channel were performed with several analysis techniques [27,29,30], yielding a most stringent observed (expected) upper limit on µ of 3.4 (2.2) at the 95% confidence level (CL).
The increased centre-of-mass energy of √ s = 13 TeV results in a ttH production cross section 3.9 times larger than at √ s = 8 TeV based on next-to-leading-order (NLO) calculations; while the cross section for the most important background, tt production, is increased by a factor of 3.3 [31], resulting in a more favourable signal-to-background ratio. The CMS Collaboration has performed searches in the all-jets [32] and multilepton [33] final states with 35.9 fb −1 of data, achieving evidence for ttH production with an observed (expected) significance of 3.2 (2.8) standard deviations in the latter case. Recently, the ATLAS Collaboration reported observed (expected) evidence for ttH production with a significance of 4.2 (3.8) standard deviations, based on an integrated luminosity of 36.1 fb −1 and combining several Higgs boson decay channels [34]; in the H → bb channel alone, an observed (expected) upper limit on µ of 2.0 (1.2) at 95% CL and a best fit value of µ = 0.84 +0. 64 −0.61 were obtained [35].
In this paper, a search for ttH production in the H → bb final state is presented that has been performed using 35.9 fb −1 of data recorded with the CMS detector at √ s = 13 TeV in 2016. In the SM, the top quark is expected to decay into a W boson and a b quark almost exclusively. Hence different tt decay modes can be identified according to the subsequent decays of the W bosons. The event selection is based on the decay topology of ttH events in which the Higgs boson decays into bb and the tt decay involves at least one lepton, resulting in either ν qq bb (single-lepton) or + ν − ν bb (dilepton) tt final states, where = e, µ. Analysis methods established in Run 1 [27,29] have been significantly improved, and novel methods have been added. In particular, two multivariate techniques-namely boosted decision trees (BDTs) and the matrix element method (MEM) [36-40]-that utilise event information differently in order to discriminate signal from background events have been employed in combination. Since the two methods aim at separating signal from different background processes, their combined usage helps to obtain a better sensitivity. In addition, a new multivariate technique based on deep neural networks (DNNs) has been employed to separate signal from background events. The best fit value of the signal strength modifier µ is obtained from a combined profile likelihood fit of the classifier output distributions to the data, correlating processes and their uncertainties where appropriate.
This document is structured as follows. The CMS detector is described in Section 2. In Section 3, the simulated signal and background samples are described. The basic selection of analysis objects and events is discussed in Section 4. The general analysis strategy and background estimation methods are introduced in Section 5. The effect of systematic uncertainties is studied in Section 6. Results of the analysis are presented in Section 7, followed by a summary in Section 8.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionisation chambers embedded in the steel magnetic flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [41]. Events of interest are selected using a two-tiered trigger system [42]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events, while the second level selects events by running a version of the full event reconstruction software optimised for fast processing on a farm of computer processors.

Simulation of signal and background
Several Monte Carlo event generators, interfaced with a detailed detector simulation, are used to model experimental effects, such as reconstruction and selection efficiencies, as well as detector resolutions. The CMS detector response is simulated using GEANT4 (v.9.4) [43].
The main background contribution originates from tt production, the production of W and Z/γ * bosons with additional jets (referred to as W+jets and Z+jets, or commonly as V+jets), single top quark production (tW and t-channel production), diboson (WW, WZ, and ZZ) processes, and tt production in association with a W or Z boson (referred to as tt+W and tt+Z, or commonly as tt+V). Both the tt and the single top quark processes in the tand tW-channels are simulated with POWHEG [51,52]. The s-channel single top quark processes, as well as V+jets and tt+V processes are simulated at NLO with MADGRAPH5 aMC@NLO, where for the V+jets processes the matching of matrix-element (ME) jets to parton showers (PS) is performed using the FXFX [53] prescription. The PYTHIA event generator is used to simulate diboson events.
Parton showering and hadronisation are simulated with PYTHIA (v. 8.200) for all signal and background processes. The PYTHIA CUETP8M2T4 [54,55] tune is used to characterise the underlying event in the ttH signal and tt and single top quark background processes, while the CUETP8M1 [55] tune is used for all other background processes.
For comparison with the observed distributions, the events in the simulated samples are normalised to the same integrated luminosity of the data sample, according to their predicted cross sections. These are taken from theoretical calculations at next-to-next-to-leading order (NNLO, for V+jets production), approximate NNLO (single top quark tW channel [56]), and NLO (single top quark tand s-channels [57,58], tt+V production [59], and diboson production [60]). The ttH cross section of 507 +35 −50 fb and Higgs boson branching fractions used in the analysis also correspond to NLO accuracy [16]. The tt simulated sample is normalised to the full NNLO calculation with resummation to next-to-next-to-leading-logarithmic accuracy [61][62][63][64][65][66][67], assuming a top quark mass value of 172.5 GeV and using the NNPDF3.0 PDF set. This sample is further separated into the following processes based on the flavour of additional jets that do not originate from the top quark decays in the event: tt+bb, defined at generator level as the events in which two additional b jets are generated within the acceptance requirements (see Section 4), each of which originates from one or more B hadrons; tt+b, for which only one additional b jet within the acceptance originates from a single B hadron; tt+2b, which corresponds to events with two additional B hadrons that are close enough in direction to produce a single b jet; tt+cc, for which events have at least one c jet within the acceptance and no additional b jets; tt + light flavour jets (tt+lf), which corresponds to events that do not belong to any of the above processes. The tt+bb, tt+b, tt+2b, and tt+cc processes are collectively referred to as tt+hf in the following. This categorisation is important because the subsamples originate from different physics processes and have different systematic uncertainties.
Effects from additional pp interactions in the same bunch crossings (pileup) are modelled by adding simulated minimum-bias events (generated with PYTHIA) to all simulated processes. The pileup multiplicity distribution in simulation is reweighted to reflect the luminosity profile of the observed pp collisions. Correction factors described in Section 4 are applied to the simulation where necessary to improve the description of the data.

Object and event reconstruction
The event selection is optimised to identify events from the production of a Higgs boson in association with tt events, where the Higgs boson decays into bb. Two tt decay modes are considered: the single-lepton mode (tt → ν qq bb), where one W boson decays into a charged lepton and a neutrino, and the dilepton mode (tt → + ν − ν bb), where both W bosons decay into a charged lepton and a neutrino. These signatures imply the presence of isolated leptons ( = e, µ), missing transverse momentum due to the neutrinos from W boson decays, and highly energetic jets originating from the final-state quarks. Jets originating from the hadronisation of b quarks are identified through b tagging techniques [68].
Online, events in the single-lepton channel were selected by single-lepton triggers which require the presence of one electron (muon) with a transverse momentum (p T ) threshold of p T > 27(24) GeV. Events in the dilepton channel were selected either by the single-lepton trigger (retaining events with an additional lepton) or by dilepton triggers that require the presence of two electrons or muons. The same-flavour dilepton triggers required two electrons with p T > 23 and 12 GeV, or two muons with p T > 17 and 8 GeV, respectively. The different-flavour dilepton triggers required either a muon with p T > 23 GeV and an electron with p T > 12 GeV, or an electron with p T > 23 GeV and a muon with p T > 8 GeV.
Events are reconstructed using a particle-flow (PF) technique [69], which combines information from all subdetectors to enhance the reconstruction performance by identifying individual particle candidates in pp collisions. Charged tracks identified as hadrons from pileup vertices are omitted in the subsequent event reconstruction.
The electron and muon candidates are required to be sufficiently isolated from nearby jet activity as follows. For each electron (muon) candidate, a cone of ∆R = 0.3 (0.4) is constructed around the direction of the track at the event vertex, where ∆R is defined as √ (∆η) 2 + (∆φ) 2 , and ∆η and ∆φ are the distances in the pseudorapidity and azimuthal angle. Excluding the contribution from the lepton candidate, the scalar p T sum of all particle candidates inside the cone consistent with arising from the chosen primary event vertex is calculated. The neutral component from pileup interactions is subtracted event-by-event, based on the average transverse energy deposited by neutral particles in the event in the case of electrons, and half the transverse momentum carried by charged-particles identified to come from pileup vertices in the case of muons. A relative isolation discriminant I rel is defined as the ratio of this sum to the p T of the lepton candidate. Electron candidates are selected if they have values of I rel < 0.06, while muons are selected if they fulfil the requirement I rel < 0.15 in the single-lepton channel and I rel < 0.25 in the dilepton channel. In addition, electrons from identified photon conversions are rejected [70]. To further increase the purity of muons originating from the primary interaction and to suppress misidentified muons or muons from decay-in-flight processes, additional quality criteria, such as a minimal number of hits associated with the muon track, are required in both the silicon tracker and the muon system [71].
For the single-lepton channel, events are selected containing exactly one energetic, isolated lepton (e or µ), which is required to have p T > 30(26) GeV in the case of the electron (muon), and |η| < 2.1. Electron candidates in the transition region between the barrel and endcap calorimeters, 1.4442 < |η| < 1.5560, are excluded. The flavour of the lepton must match the flavour of the trigger that accepted the event (e.g. if an electron is identified, the single-electron trigger must have accepted the event). For the dilepton channel, events are required to have a pair of oppositely charged energetic leptons (e + e − , µ ± e ∓ , µ + µ − ). The lepton with the highest p T out of the pair is required to have p T > 25 GeV, and the other lepton p T > 15 GeV; both leptons are required to fulfil the requirement |η| < 2.4, excluding electrons in the transition region. The flavours of the lepton pair must match the flavour of the trigger that accepted the event. The events are unambiguously classified as e + e − , µ ± e ∓ , or µ + µ − , depending on the type of the selected lepton pair, and there is no overlap with the other channels under study. The invariant mass of the selected lepton pair, m , is required to be larger than 20 GeV to suppress events from heavy-flavour resonance decays and low-mass Drell-Yan processes. In the same-flavour channels, events are also rejected if 76 < m < 106 GeV, thereby suppressing further contribution from Z+jets events. In both the single-and dilepton channel, events with additional isolated leptons with p T > 15 GeV and |η| < 2.4 are excluded from further analysis.
The missing transverse momentum vector p miss T is defined as the projection of the negative vector sum of the momenta of all reconstructed PF objects in an event on the plane perpendicular to the beams. Its magnitude is referred to as p miss T . Events are required to fulfil p miss T > 20 GeV in the single-lepton and p miss T > 40 GeV in the dilepton same-flavour channels to further suppress background contribution.
Jets are reconstructed from the PF particle candidates using the anti-k T clustering algorithm [72] with a distance parameter of 0.4, as implemented in FASTJET [73]. Charged hadrons that are associated to pileup vertices are discarded from the clustering. The jet energy is corrected for the remaining neutral-hadron pileup component in a manner similar to that used to find the energy within the lepton isolation cone [74]. Jet energy corrections are also applied as a function of jet p T and η [75] to data and simulation. Events in the single-lepton channel are required to have at least four reconstructed jets with p T > 30 GeV and |η| < 2.4. In the dilepton channels, jets with p T > 20 GeV and |η| < 2.4 are required, from which two must satisfy p T > 30 GeV.
Jets originating from the hadronisation of b quarks are identified using a combined secondary vertex algorithm (CSVv2) [68], which provides a b tagging discriminant by combining identified secondary vertices and track-based lifetime information. A discriminant value is chosen such that the probability of tagging jets originating from light-flavour quarks (u, d, or s) or gluons is about 1%, and the corresponding efficiency for tagging jets from b (c) quarks is ≈65% (10%). The shape of the CSVv2 discriminant distribution in simulation is corrected by scale factors to better describe the data. This correction is derived separately for light-flavour and b jets from a tag-and-probe approach using control samples enriched in events with a Z boson and exactly two jets, and tt events with no additional jets. Events are required to have at least two (one) b-tagged jets in the single-lepton (dilepton) channels.
Event yields observed in data and predicted by the simulation after this selection are listed in Table 1 for the single-lepton and dilepton channels. The corresponding jet and b-tagged jet multiplicity distributions are shown in Figs. 2 and 3, respectively. The ttH signal includes H → bb and all other Higgs boson decay modes. Table 1: Event yields observed in data and predicted by the simulation after the selection requirements described in the text: at least four jets, at least two of which are b tagged in the single-lepton (SL) channel, and at least two jets, at least one of which is b tagged in the dilepton (DL) channel.

Analysis strategy and event classification
In both the single-lepton and dilepton channels, events with at least four jets of which at least three are b-tagged are selected among those passing the selection described in Section 4. These events are then further divided into categories with varying signal purity and different background composition. In each category, combinations of several multivariate discriminants are optimised to separate signal from background. The signal is extracted in a simultaneous template fit of the discriminant output obtained from the simulation to the data across all the categories, correlating processes and their uncertainties where appropriate. In this way, the different background composition in the different categories helps to constrain the uncertainties of the different processes and increases the overall sensitivity of the search.
Several methods that classify events as signal-or background-like were explored to achieve optimal sensitivity: BDTs and DNNs, combined with a MEM. In the BDT approach, events are divided into categories based on their jet and b-tagged jet multiplicity ("jet-tag categories"). In the DNN approach, the jet multiplicity and the DNN classification output, described below, are used for the event categorisation ("jet-process categories"). The approach that provided the best expected sensitivity in each channel, evaluated on fits to simulated data, was chosen for obtaining the final result from data. Therefore, in the dilepton channel a BDT+MEM classification is chosen, while in the single-lepton channel the DNN approach is used. The methods and the corresponding categorisation are illustrated in Fig. 4 and described in the following.  In the dilepton channel, events are separated into two jet-tag categories with (≥ 4 jets, 3 b tags) and (≥ 4 jets, ≥ 4 b tags). In each jet-tag category, a dedicated BDT is trained to separate signal from background processes. The BDTs utilise input variables related to kinematic properties of individual objects, event shape, and the jet CSVv2 b tagging discriminant. The training is performed using simulated ttH and tt+jets events as signal and background, respectively, which are weighted to achieve equal yields of signal and background events. In order to avoid a biased performance estimate, the events are separated in half for training and validation. The specific BDT boosting method used is the stochastic gradient boost [36,76], available as part of the TMVA package [38]. The choice of the BDT architecture and the input variables was optimised with a procedure based on the particle swarm algorithm [77,78], selecting the configuration and set of variables that yields the highest discrimination power. They are detailed in Appendix A.
In the (≥ 4 jets, 3 b tags) category the BDT output distribution is used as the final discriminant. The (≥ 4 jets, ≥ 4 b tags) category is further divided into two subcategories, one with small values of the BDT output (background-like) and one with large output value (signallike). The division is taken at the median of the BDT output distribution for simulated signal events. In each subcategory, the MEM discriminant output is used as the final discriminant.
The MEM discriminant is constructed as the ratio of the probability density values for the signal (ttH) and background (tt+bb) hypotheses, following the algorithm described in Ref.
[29]. Each event is assigned a probability density value computed from the four-momenta of the reconstructed particles, which is based on the leading order scattering amplitudes for the ttH and tt+bb processes and integrated over the particle-level quantities that are either unknown or poorly measured. The probability density functions are constructed at leading order, assuming gluon-gluon fusion production both for signal and background processes as it represents the majority of the event rate. In each event, the four jets that are most likely to originate from b quarks are considered explicitly as candidates for the b quarks from the decay of the Higgs boson and the top quarks. All permutations are considered when associating the b-quark-like jets to the top quark or Higgs boson decays in the matrix element. The four b-like jets are selected using a likelihood ratio criterion as follows. The likelihoods are computed under either the hypothesis that four jets or that two jets in the event originate from b quarks, based on the expected b tagging discriminant probability densities from simulation. The used ratio is computed as the four-b-jets likelihood, normalised to the sum of the four-and the two-b-jets likelihoods.
The high BDT output subcategory is expected to be enhanced with signal events and residual tt+bb background events, and the MEM discriminant achieves by construction particularly powerful additional separation against the tt+bb background contributions. The choice of the median contributes to a robust result by ensuring a sufficient number of events in each subcategory. Including the low b tag multiplicity and the low BDT output subcategories into the fit constrains the background contributions and systematic uncertainties for each of the different event topologies. Thus, there are in total three categories in the dilepton channel.
In the single-lepton channel, events are separated depending on the jet multiplicity into three categories with (4 jets, ≥ 3 b tags), (5 jets, ≥ 3 b tags), and (≥ 6 jets, ≥ 3 b tags). Dedicated multi-classification DNNs [79] are trained in each jet multiplicity category to separate signal and each of the five tt+jets background processes tt+bb, tt+2b, tt+b, tt+cc, or tt+lf. The training is conducted in two stages. In the first stage, a DNN is trained to predict which of the expected physics objects originate from the underlying hard process, such as for example the b quark jet from the decay of a top quark, are reconstructed in the final state. In the second stage, the initial network is extended by adding hidden layers, which take as input the variables and the output values of the first stage, and the resulting network is trained to predict the physics process of an event. The values obtained in the output nodes of the second stage are normalised to unity using a "softmax" function [79], and, as a result, can be interpreted as probabilities describing the likelihood of the event being a ttH signal or one of the five tt+jets background processes. Events are divided into subcategories of the most probable process according to this DNN classification. Thus, there are in total 18 jet-process categories in the single-lepton channel.
In each of the jet-process categories, the DNN classifier output distribution of the node that matches the process category is used as the final discriminant.
The DNNs utilise the same input variables as the BDTs and additionally the MEM discriminant output. When computing the MEM in the single-lepton channel, up to four additional light jets, ordered in p T , are permuted over as candidates for the light quarks from the hadronic decay of the W boson. The DNN training is performed using simulated ttH and tt+jets events as signal and background, respectively. The overall set of events is split into a training set (30%), an independent set (20%) for validation and optimisation of the DNN configuration (hyper parameters), such as the number of nodes per layer, and a set that is reserved for the fit to the data (50%). The hyper parameters and input variables are detailed in Appendix A.
In summary, in the dilepton channel events are subdivided into three jet-tag categories and either the BDT or MEM output distribution is used as the final discriminant. In the single-lepton channel events are subdivided into 18 jet-process categories and the DNN output distribution of the most probable process is used as the final discriminant.

Systematic uncertainties
In Table 2, all sources of systematic uncertainties considered in the analysis are listed. They affect either the rate of the signal or background processes, or the discriminant shape, or both.
In the last case, the rate and shape effects are treated as entirely correlated and are varied simultaneously. The uncertainties are taken into account via nuisance parameters in the final fit procedure described in Section 7. The effect of the uncertainties is evaluated individually in each category of each analysis channel, where the effects from the same source are treated as fully correlated. The impact of the uncertainties on the final result is discussed in Section 7. The uncertainty in the integrated luminosity estimate is 2.5% [81]. The trigger efficiency in the single-lepton channel and the electron and muon identification efficiency uncertainties are estimated by comparing variations in measured efficiency between data and simulation using a high-purity sample of Z boson decays. In the dilepton channel, the trigger efficiency is measured in data with a method based on triggers that are uncorrelated with those used in the analysis, in particular based on p miss T requirements. These uncertainties are found to be small, typically below 1-2%. Effects of the uncertainty in the distribution of the number of pileup interactions are evaluated by varying the total inelastic cross section used to predict the number of pileup interactions in the simulated events by ±4.6% from its nominal value [82]. The uncertainty due to the limited knowledge of the jet energy scale (resolution) is determined by variations of the energy scale (resolution) correction of all jets in the signal and background predictions by one standard deviation. In the case of the jet energy scale uncertainty, these variations are divided into 26 sources, which include uncertainties owing to the extrapolation between samples of different jet-flavour composition and the presence of pileup collisions in the derivation of the corrections [75]. The effect of each source is evaluated individually. The uncertainty of the CSVv2 b tagging scale factors is evaluated by applying alternative scale factors based on varying the following systematic effects [68] by one standard deviation, separately for the different jet flavours: the contamination of background processes in the control samples, the jet energy scale uncertainty-which is correlated with the overall jet energy scale uncertainty-and the statistical uncertainty in the scale factor evaluation. The impact of the statistical uncertainty is parameterised as the sum of two contributions: one term with linear dependence on the b tagging discriminant value, allowing an overall tilt of the discriminant distribution, and another term with quadratic dependence, allowing an overall shift of the discriminant distribution.
Theoretical uncertainties of the cross sections used to predict the rates of various processes are propagated to the yield estimates. All rates are estimated using cross sections with at least NLO accuracy, which have uncertainties arising primarily from PDFs and the choice of factorisation and renormalisation scales (both in the ME and the PS). The cross section uncertainties are each separated into their PDF and scale components and are correlated where appropriate between processes. For example, the PDF uncertainties for background processes originating primarily from gluon-gluon initial states are treated as 100% correlated. The PDF uncertainty of the ttH signal production is treated separately from the background processes.
The tt+bb process, and to lesser extent the tt+2b, tt+b, and tt+cc production, represent important sources of irreducible background. Neither previous measurements of tt+hf production [83] nor higher-order theoretical calculations can currently constrain the normalisation of these contributions to better than 35% accuracy [84,85]; therefore, a conservative extra 50% rate uncertainty is assigned to the tt+hf processes to account also for differences in the phase space with respect to Ref. [83]. Additionally, the robustness of the fit model was verified using simulated toy data, which were sampled from the fit model modified in the following way: increasing the normalisation of the tt+bb background template by 30% in accordance with the results in Ref. [83] or replacing the templates of the tt+bb, tt+2b, and tt+b processes obtained with the nominal tt simulation by those obtained from a 4-flavour scheme SHERPA (v.2.2.2) [86] simulation combined with OPENLOOPS (v.1.3.1) [87]. In both cases, the fit model recovers the injected signal within a few percent, well within the uncertainties assigned to these processes.
The uncertainty arising from the missing higher-order terms in the simulation of the tt+jets process at the ME level is assessed by varying the renormalisation and factorisation scales in the simulation up and down by factors of two with respect to the nominal values, using event weights obtained directly from the generator. At the PS level, the corresponding uncertainty is estimated by varying the parameters controlling the amount of initial-and final-state radiation independently by factors of 0.5 and 2 [55]. These sources of uncertainties are treated as uncorrelated. The uncertainty originating from the scheme used to match the ME level calculation to the PS simulation is derived by comparing the reference tt+jets simulation with two samples with varied hdamp parameter [80], which controls the ME and PS matching and effectively regulates the high-p T radiation. The effect on the final discriminators owing to uncertainties in the underlying event tune of the tt+jets event generator are estimated using simulations with varied parameters with respect to those used to derive the CUETP8M2T4 tune in the default setup. The event count in the additional samples required to estimate the modelling uncertainties was small and induced changes to the discriminant distributions comparable in size to the statistical fluctuations of the additional samples. For this reason, the uncertainties were estimated conservatively as the changes in the rates of the different tt subprocesses independently for different jet multiplicities. If the statistical uncertainty owing to the size of the simulated samples was larger than the rate change, the former was assigned as uncertainty. The derived rate uncertainties were then correlated between jet multiplicities to account for migration effects. Possible shape variations of the final discriminant distributions due to the PDF uncertainty have been estimated by evaluating the PDF replicas provided with the NNPDF set [50]. The impact of the mismodelling of the top quark p T spectrum in the tt simulation [88] was found to be negligible.
The impact of statistical fluctuations in the signal and background prediction due to the limited number of simulated events is accounted for using the Barlow-Beeston approach described in Refs. [89,90].

Results
The numbers of events selected in the jet-process categories of the single-lepton channel and in the jet-tag categories of the dilepton channel, before and after the fit of the signal strength modifier and the nuisance parameters, are listed in Tables 3-6. The final discriminants in several representative categories in the single-lepton channel and the three dilepton categories before and after the fit to data are displayed in Figs. 5-6 and Figs. 7-8, respectively. All final discriminants in the single-lepton channel before and after the fit to data are displayed in Appendices B and C.
The signal strength modifier µ = σ/σ SM of the ttH production cross section is determined in a simultaneous binned maximum likelihood fit to the data across all analysis categories. The fit procedure takes into account systematic uncertainties that modify the shape and normalisation of the final discriminant distributions, as described in Section 6. The best fit values of the nuisance parameters correspond to variations that for more than 95% of the parameters are within 1 standard deviation of the prior uncertainties. As expected, the fit constrains the nuisance parameters related to the conservatively assigned 50% prior uncertainties on the tt+hf cross section to 40-60% of the prior. A few other nuisance parameters that are related to jet energy scale and b tagging uncertainties are constrained up to a factor of 50%. These constraints are not due to conservatively assigned prior uncertainties but are attributed to the fact that events are selected according to different, large multiplicities of jets and b-tagged jets, thus increasing the sensitivity of the analysis to changes of the jet energy scale and b tagging efficiency, e.g. by their effect on the event yield per analysis category.
The obtained best fit value of µ is 0.72 ± 0.24 (stat) ± 0.38 (syst) with a total uncertainty of ±0. 45. This corresponds to an observed (expected) significance of 1.6 (2.2) standard deviations above the background-only hypothesis. The observed and predicted event yields in all the bins of the Table 3: Observed and expected event yields per jet-process category (node) in the single-lepton channel with 4 jets and at least 3 b tags, prior to the fit to data (after the fit to data). The quoted uncertainties denote the total statistical and systematic components.   Table 4: Observed and expected event yields per jet-process category (node) in the single-lepton channel with 5 jets and at least 3 b tags, prior to the fit to data (after the fit to data). The quoted uncertainties denote the total statistical and systematic uncertainty.     final discriminants, ordered by the pre-fit expected signal-to-background ratio (S/B) are shown in Fig. 9 (left). The best fit values in each analysis channel and in the combination are listed in Table 7 and displayed in Fig. 9 (right).
Events / Bin  Table 7: Best fit value of the signal strength modifier µ and the observed and median expected 95% CL upper limits in the single-lepton and the dilepton channels as well as the combined results. The one standard deviation confidence intervals of the expected limit and the best fit value are also quoted, split into the statistical and systematic components in the latter case. The contributions of the statistical and various systematic uncertainties to the uncertainty in µ are listed in Table 8. The statistical uncertainty is evaluated by fixing all nuisance parameters to their post-fit values. The impact of the systematic uncertainties is evaluated by repeating the fit fixing only the nuisance parameters related to the uncertainty under scrutiny to their postfit values and subtracting the obtained uncertainty in quadrature from the total uncertainty of the fit where no parameters are fixed. The total uncertainty of the full fit (0.45) is different from the quadratic sum of the listed contributions because of correlations between the nuisance parameters. An upper limit on µ under the background-only hypothesis is also determined, using a modified frequentist CL S procedure [91,92] with the asymptotic method [93]. When combining all categories and channels, an observed (expected) upper limit at 95% CL on µ of 1.5 (0.9) is obtained. The observed and expected upper limits in each channel and in the combination are listed in Table 7 and visualised in Fig. 10.
In addition, the statistical analysis has been performed using the jet-process categorisation and DNN output in both channels and their combination, as well as using the jet-tag categorisation and the BDT or MEM in both channels. The results obtained in each channel and the combination are compatible within 1.7 standard deviations or better, evaluated using a jackknife procedure [94]. This serves as an important cross check and validation of the complex analysis methods.  (13   The inner (green) band and the outer (yellow) band indicate the regions containing 68 and 95%, respectively, of the distribution of limits expected under the background-only hypothesis. Also shown is the limit that is expected in case a SM ttH signal (µ = 1) is present in the data (solid red line).

CMS
are selected in final states compatible with the Higgs boson decaying into a b quark-antiquark pair and the single-lepton and dilepton decay channels of the tt system. Selected events are split into mutually exclusive categories according to their tt decay channel and jet content. In each category a powerful discriminant is constructed to separate the ttH signal from the dominant tt+jets background, based on several multivariate analysis techniques (boosted decision trees, matrix element method, and deep neural networks). An observed (expected) upper limit on the ttH production cross section µ relative to the SM expectations of 1.5 (0.9) at 95% confidence level is obtained. The best fit value of µ is 0.72 ± 0.24 (stat) ± 0.38 (syst). These results correspond to an observed (expected) significance of 1.6 (2.2) standard deviations above the background-only hypothesis.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: the Austrian Federal Ministry of Science, Research and Economy and the Austrian Science Fund; the Belgian [32] CMS Collaboration, "Search for ttH production in the all-jet final state in proton-proton collisions at √ s = 13 TeV", (2018). arXiv:1803.06986. Submitted to JHEP.
[34] ATLAS Collaboration, "Evidence for the associated production of the Higgs boson and a top quark pair with the ATLAS detector", (2017). arXiv:1712.08891. Submitted to Phys. Rev. D.
[35] ATLAS Collaboration, "Search for the standard model Higgs boson produced in association with top quarks and decaying into a bb pair in pp collisions at √ s = 13 TeV with the ATLAS detector", (2017). arXiv:1712.08895. Submitted to Phys. Rev. D.

A BDT and DNN input variables and configuration
All input variables used in the DNNs and BDTs are listed in Tables 9-11.  Table 9: Input variables used in the DNNs or BDTs in the different categories of the singlelepton and dilepton channels. Variables used in a specific multivariate method and analysis category are denoted by a "+" and unused variables by a "−". (Continued in Tables 10 and  11 Table 9 and continued in Table 11.     ), the learning rate (shrinkage), the fraction of events used for the training of an individual tree (bagging fraction), the granularity of the cuts at each node splitting (N cuts ), and the number of node splittings per tree (depth) are listed in Table 12.

Variable
The DNNs used in the single-lepton channel comprise two layers with 100 nodes each in each of the two network stages. Overtraining is suppressed by random node dropout with a proba-  Figure 13: Final discriminant (DNN) shapes in the single-lepton (SL) channel before the fit to data, in the jet-process categories with (≥ 6 jets, ≥ 3 b tags) and (from upper left to lower right) ttH, tt+bb, tt+2b, tt+b, tt+cc, and tt+lf. The expected background contributions (filled histograms) are stacked, and the expected signal distribution (line), which includes H → bb and all other Higgs boson decay modes, is superimposed. Each contribution is normalised to an integrated luminosity of 35.9 fb −1 , and the signal distribution is additionally scaled by a factor of 15 for better visibility. The hatched uncertainty bands include the total uncertainty of the fit model. The first and the last bins include underflow and overflow events, respectively. The lower plots show the ratio of the data to the background prediction.  Figure 14: Final discriminant (DNN) shapes in the single-lepton (SL) channel after the fit to data, in the jet-process categories with (4 jets, ≥ 3 b tags) and (from upper left to lower right) ttH, tt+bb, tt+2b, tt+b, tt+cc, and tt+lf. The error bands include the total uncertainty after the fit to data. The first and the last bins include underflow and overflow events, respectively. The lower plots show the ratio of the data to the post-fit background plus signal distribution.  Figure 15: Final discriminant (DNN) shapes in the single-lepton (SL) channel after the fit to data, in the jet-process categories with (5 jets, ≥ 3 b tags) and (from upper left to lower right) ttH, tt+bb, tt+2b, tt+b, tt+cc, and tt+lf. The error bands include the total uncertainty after the fit to data. The first and the last bins include underflow and overflow events, respectively. The lower plots show the ratio of the data to the post-fit background plus signal distribution.  Figure 16: Final discriminant (DNN) shapes in the single-lepton (SL) channel after the fit to data, in the jet-process categories with (≥ 6 jets, ≥ 3 b tags) and (from upper left to lower right) ttH, tt+bb, tt+2b, tt+b, tt+cc, and tt+lf. The error bands include the total uncertainty after the fit to data. The first and the last bins include underflow and overflow events, respectively. The lower plots show the ratio of the data to the post-fit background plus signal distribution.