Search for the Standard Model Higgs boson decaying into $b\bar{b}$ produced in association with top quarks decaying hadronically in $pp$ collisions at $\sqrt{s}$=8 TeV with the ATLAS detector

A search for Higgs boson production in association with a pair of top quarks ($t\bar{t}H$) is performed, where the Higgs boson decays to $b\bar{b}$, and both top quarks decay hadronically. The data used correspond to an integrated luminosity of 20.3 fb$^{-1}$ of $pp$ collisions at $\sqrt{s}=8$ TeV collected with the ATLAS detector at the Large Hadron Collider. The search selects events with at least six energetic jets and uses a boosted decision tree algorithm to discriminate between signal and Standard Model background. The dominant multijet background is estimated using a dedicated data-driven technique. For a Higgs boson mass of 125 GeV, an upper limit of 6.4 (5.4) times the Standard Model cross section is observed (expected) at 95% confidence level. The best-fit value for the signal strength is $\mu = 1.6 \pm 2.6$ times the Standard Model expectation for $m_{H} = 125$ GeV. Combining all $t\bar{t}H$ searches carried out by ATLAS at $\sqrt{s}=8$ and 7 TeV, an observed (expected) upper limit of 3.1 (1.4) times the Standard Model expectation is obtained at 95% confidence level, with a signal strength $\mu = 1.7 \pm 0.8$.


Object reconstruction
The all-hadronic ttH final state is composed of jets originating from (u, d, s)-quarks or gluons (light jets) and jets from c-or b-quarks (heavy-flavour jets). Electrons and muons, selected in the same way as in Ref. [16], are used only to veto events that would overlap with the ttH searches in final states with leptons.
At least one reconstructed primary vertex is required, with at least five associated tracks with p T ≥ 400 MeV, and a position consistent with the luminous region of the beams in the transverse plane. If more than one vertex is found, the primary vertex is taken to be the one which has the largest sum of the squared transverse momenta of its associated tracks.
Jets are reconstructed with the anti-k t algorithm [20][21][22], with a radius parameter R = 0.4 in the (η, φ) plane. They are built from calibrated topological clusters of energy deposits in the calorimeters [18]. Prior to jet finding, a local cluster calibration scheme [23,24] is applied to correct the topological cluster energies for the effects of non-compensating calorimeter response, dead material, and out-of-cluster leakage. After energy calibration based on in-situ measurements [25], jets are required to have transverse momentum p T > 25 GeV and |η| < 2.5. During jet reconstruction, no distinction is made between identified electrons and jet energy deposits. To avoid double counting electrons as jets, any jet within a cone of size ∆R = (∆φ) 2 + (∆η) 2 = 0.2 around a reconstructed electron is discarded. After this, electrons within a ∆R = 0.4 of a remaining jet are removed.
To avoid selecting jets from additional pp interactions in the same event (pile-up), a loose selection is applied to the jet vertex fraction (JVF), defined as the ratio of the scalar sum of the p T of tracks matched The five leading jets in p T are required to have p T > 55 GeV with |η| < 2.5 and all other jets are required to have p T > 25 GeV and |η| < 2.5. Events are required to have at least six jets, of which at least two must be b-tagged. Events with well-identified isolated muons or electrons with p T > 25 GeV are discarded in order to avoid overlap with other ttH analyses.
To enhance the sensitivity, the selected events are categorised into various distinct regions, according to their jet and b-tag multiplicities: the region with m jets, of which n are b-jets, is referred to as "(mj, nb)".

Signal model
The ttH signal process is modelled using matrix elements calculations obtained from the HELAC-Oneloop package [28] with next-to-leading order (NLO) accuracy in α s . Powheg-box [29][30][31] serves as an interface to the MC programs used to simulate the parton shower and hadronisation. The samples created using this approach are referred to as PowHel samples [32]. They include all SM Higgs boson and topquark decays and use the CT10NLO [33] parton distribution function (PDF) sets with the factorisation (µ F ) and renormalisation (µ R ) scales set to µ F = µ R = m t + m H /2. The PowHel ttH samples use Pythia 8.1 [34] to simulate the parton shower with the CTEQ6L1 [35] PDF and the AU2 underlying-event set of generator parameters (tune) [36], while Herwig [37] is used to estimate systematic uncertainties due to the fragmentation modelling.
For these ttH samples the cross-section normalisations and the Higgs boson decay branching fractions are taken from the NLO QCD and from the NLO QCD + EW theoretical calculations [13] respectively. The masses of the Higgs boson and the top quark are set to 125 GeV and to 172.5 GeV respectively.

Simulated backgrounds
The dominant background to the all-hadronic ttH signal is multijet production, followed by tt + jets production. Small background contributions come from the production of a single top quark and from the associated production of a vector boson and a tt pair, ttV (V = W, Z). The multijet background is determined from data using a dedicated method described in Section 5.4. The other background contributions are estimated using MC simulations.
The multijet events, which are used for jet trigger studies and for the validation of the data-driven multijet background estimation, are simulated with Pythia 8.1 using the NNPDF2.3 LO [38] PDFs.
The main tt sample is generated using the Powheg NLO generator with the CT10NLO PDF set, assuming a value of the top-quark mass of 172.5 GeV. It is interfaced to Pythia 6.425 [39] with the CTEQ6L1 PDF set and the Perugia2011C [40] underlying-event tune; this combination of generator and showering programs is hereafter referred to as Powheg+Pythia. The sample is normalised to the top++2.0 theoretical calculation performed at next-to-next-to leading order (NNLO) in QCD and includes resummation of next-to-next-to leading logarithmic (NNLL) soft gluon terms [41][42][43][44][45][46]. A second tt sample is generated using fully matched NLO predictions with massive b-quarks [47] within the Sherpa with OpenLoops framework [48,49] henceforth referred to as Sherpa+OpenLoops. The Sherpa+OpenLoops NLO sample is generated following the four-flavour scheme using the Sherpa2.0 pre-release and the CT10NLO PDF set. The renormalisation scale is set to µ R = i=t,t,b,b E 1/4 T,i , where E T,i is the transverse energy of parton i, and the factorisation and resummation scales are both set to (E T,t + E T,t )/2.
The prediction from Sherpa+OpenLoops is expected to model the tt + bb contribution more accurately than Powheg+Pythia, since the latter MC produces tt + bb exclusively via the parton shower. The Sherpa+OpenLoops sample is not passed through full detector simulation. Thus, tt + jets events from Powheg+Pythia are categorised into three non-overlapping samples, tt + bb, tt + cc, and tt + light-jets, hereafter called tt + light, using a labelling based on an algorithm that matches hadrons to particle jets. Then, tt + bb events from Powheg+ Pythia are reweighted to reproduce the Sherpa+OpenLoops NLO tt + bb prediction. The reweighting is done at generator level using a finer categorisation to distinguish events where one particle jet is matched to two b-hadrons, or where only one b-hadron is matched. The reweighting is applied using several kinematic variables such as the top-quark p T , the tt system p T , and, where this can be defined, ∆R and p T of the dijet system not originating from the top-quark decay [16].
Unlike tt + bb, no fully matched NLO predictions exist for tt + cc and tt + light events. A dedicated reweighting is therefore applied to the top-quark p T spectra as well as to the p T spectra of the tt system of tt + light and tt + cc events in Powheg+Pythia, based on the ratio of data to simulation of the measured differential cross sections at √ s = 7 TeV [50]. No such reweighting is applied to the tt + bb sample, which is already corrected to match the best available theory calculation.
Samples of single-top-quark events produced in the s-and Wt-channels are generated with Powhegbox 2.0 using the CT10NLO PDF set. The samples are interfaced to Pythia 6.425 with the CTEQ6L1 set of parton distribution functions and Perugia2011C underlying-event tune. The t-channel production mode is generated with AcerMC [51] interfaced to Pythia 6.425 with the CTEQ6L1 PDF set and the Perugia2011C underlying-event tune. Overlaps between the tt and Wt final states are removed [52]. The single-top-quark samples are normalised to the approximate NNLO theoretical cross sections [53,54] using the MSTW2008 NNLO PDF set [55,56]. The samples of ttV (V = W, Z) events are generated with the MadGraph v5 LO generator [57] and the CTEQ6L1 PDF set. Pythia 6.425 with the AUET2B tune is used to generate the parton shower. The ttV samples are normalised to NLO cross-sections [58,59].
Finally, event samples for single top quark plus Higgs boson production, tHqb and tHW, are generated. The cross sections are computed using the MG5_aMC@NLO generator [60] at NLO in QCD. For tHqb, samples are generated with MadGraph in the four-flavour scheme and µ F = µ R = 75 GeV then showered with Pythia 8.1 with the CTEQ6L1 PDF and the AU2 underlying-event tune. For tHW, computed with the five-flavour scheme, dynamic µ F and µ R scales are used and events are generated at NLO with MG5_aMC@NLO+Herwig++ [61,62]. These two processes together are referred to as tH.
A summary of the cross-section values and their uncertainties for the signal as well as for the MC simulated background processes is given in Table 1.

Common treatment of MC samples
All samples using Herwig are also interfaced to Jimmy v4.31 [63] to simulate the underlying event. With the exception of Sherpa, all MC samples use Photos 2.15 [64] to simulate photon radiation and Tauola 1.20 [65] to simulate τ decays. The samples are then processed through a simulation [66] of the detector geometry and response using Geant4 [67]. The single-top-quark sample produced in the t-channel is simulated with a parameterised calorimeter response [68].
All simulated events are processed through the same reconstruction software as the data. Simulated events are corrected so that the lepton and jet identification efficiencies, energy scales and energy resolutions match those in data.
When selecting based on the output value of the b-tagging algorithm, the number of selected simulated events is significantly reduced, leading to large statistical fluctuations in the resulting distributions for samples with a high b-tag multiplicity. Therefore, rather than tagging the jets individually, the normalisation and the shape of these distributions are predicted by calculating the probability that a jet with a given flavour, p T , and η is b-tagged [69]. The method is validated by verifying that the predictions reproduce the normalisation and shape obtained for a given working point of the b-tagging algorithm. The method is applied to all simulated signal and background samples.

Multijet background estimation using data: the TRF MJ method
A data-driven technique, the tag rate function for multijet events (TRF MJ ) method, is used to estimate the multijet background. After measuring ε MJ , the probability of b-tagging a third jet in a sample of events with at least two b-tagged jets, the TRF MJ method uses ε MJ to extrapolate the multijet background from the regions with lower b-tag multiplicity to the search regions with higher b-tag multiplicity but otherwise identical event selection.
In the first step, the b-tagging rate is measured in data samples selected with various single-jet triggers, which are enriched in multijet events and have limited (≈10%) overlap with the search region. The events in this TRF MJ extraction region are required to have at least three jets with p T > 25 GeV and |η| < 2.5, with at least two b-tagged jets. Excluding the two jets with the highest b-tagging weight in the event, ε MJ is defined as the rate of b-tagging any other jet in the event. It is parameterised as a function of the jet p T and η, and also of the average ∆R between this jet and the two jets in the event with highest b-tagging weight, ∆R ( j,hMV1) . The p T and η dependence of ε MJ reflects the corresponding sensitivity of the b-tagging efficiency to these variables. In multijet events, the ∆R dependence of ε MJ is correlated with the multi-b-jet production mechanism. This affects ε MJ , shown in Figure 1, which decreases by up to a factor two as ∆R increases for fixed p T and η.  In the search region the TRF MJ method starts from the data sample with exactly two b-tagged jets subtracting the contributions from all other backgrounds obtained from MC simulation. Multijet background samples containing m jets (m ≥ 6), out of which n are b-tagged (n ≥ 3) are then constructed, using an event weight w(mj, nb), which is calculated from ε MJ analogously to the method described in Ref. [69], accounting for the fact that the starting sample contains two b-tagged jets. In each multijet event emulated using TRF MJ by means of ε MJ , (m − 2) jets not originally b-tagged can be used for the emulation of the properties of additional b-tagged jets. This procedure allows to emulate observables that depend on the number of b-tagged jets.

Validation of the TRF MJ method in data and simulation
Validation of the TRF MJ method is performed by a 'closure test', separately in data and simulation. This is performed using the same data samples that were employed to estimate ε MJ . In these low jet multiplicity samples, the TRF MJ method, which is applied to the events with exactly two b-tagged jets, is used to predict distributions in events with at least three b-tagged jets. Using ε MJ derived independently in data and simulation, the predicted distributions are compared to those resulting when directly applying b-tagging. This is done for a number of variables, such as b-tagged jet p T , angular distance between btagged jets, and event shapes. As an example, for events with at least three jets and at least three b-tagged jets (≥3j, ≥3b), Figure 2 shows the closure test in data for the third-leading-jet p T , H T (the scalar sum of the p T of all jets), and Centrality Mass (defined as H T divided by the invariant mass of the jets). Figure 3 shows the results of the closure test in simulated multijet events for distributions of the leading-jet p T , the minimum mass of all jet pairs in the event (m min j j ), and the third-leading b-tagged jet p T . The definitions of these variables can be found in Table 3. In both data and simulated multijet events with at least three b-tagged jets, the predicted and observed number of events agree within 5%. In events with a higher b-tagged jet multiplicity the numbers agree within the large statistical uncertainty. For this reason the systematic uncertainties related to the TRF MJ method are not estimated in the validation regions.  Table 3. Events were selected with various single-jet triggers. The TRF MJ prediction is normalised to the same number of events as the data. The uncertainty band for the TRF MJ predictions shown in the ratio plot represents statistical uncertainties only.  Table 3. Distributions are normalised to the same area. The uncertainty band for the TRF MJ predictions shown in the ratio plot represents statistical uncertainties only.

Multijet trigger efficiency
Not all jets are reconstructed at the trigger level, mainly due to the Level-1 sliding window algorithm and the Level-1 resolution [70]. The multijet trigger efficiency with respect to the offline selection is derived in terms of the efficiency for a single jet to be associated with a complete jet trigger chain, i.e., a complete sequence of jets reconstructed at Level-1, Level-2 and EF satisfying the requirements described in Section 4. This single-jet trigger efficiency, trig , is evaluated in intervals of offline reconstructed p T and η: where N trig (p T , η) is the number of jets matched with a trigger chain and N(p T , η) is the total number of jets within a given offline reconstructed p T and η interval. Figure 4 shows that for large jet p T , trig reaches a plateau close to unity.
For both data and simulation, trig (p T , η) is derived using events triggered by a single-jet trigger with a p T threshold of 110 GeV, and only the offline jets which are in the hemisphere opposite to the trigger jet are used. To avoid additional trigger bias, events are discarded if more than one jet with p T ≥ 110 GeV is reconstructed. The ratio of data trig (p T , η) to MC,dijet trig , where the latter is estimated in simulated dijet events, is referred to as SF trig (p T , η). In the analysis, for each MC sample α considered, the final number of events passing the multijet trigger is estimated by weighting each jet by the product of MC,α trig (p T , η) and SF trig (p T , η). The parameters trig (p T , η) and SF trig (p T , η) are estimated for jet p T up to 100 GeV. Figure 4 shows the p T dependence of data trig (p T , η), MC,ttH trig (p T , η), MC,dijet trig (p T , η) and SF trig (p T , η) for jets within |η| < 2.5, together with the uncertainties from the difference between MC,ttH trig (p T , η) and MC,dijet trig (p T , η), which is taken as the systematic uncertainty of the method.  Figure 4: Single-jet trigger efficiencies, trig , (top) for data, simulated dijet events, and ttH events, as a function of jet p T for jets with |η| < 2.5; (bottom) SF trig (p T , η) = data trig (p T , η)/ MC,dijet trig (p T , η). The uncertainty on SF trig , shown as the green shaded area, is estimated from the difference between the efficiencies in dijet and ttH simulated events in the denominator of SF trig .
The regions are analysed separately and combined statistically to maximise the overall sensitivity. The most sensitive regions, (≥8j, 3b) and (≥8j, ≥4b), are expected to contribute more than 50% of the total significance. Table 2: Event yields from simulated backgrounds and the signal as well as data in each of the analysis regions prior to the fit (pre-fit). The quoted uncertainties are the sum in quadrature of the statistical and systematic uncertainties in the yields for all samples but the multijet background. The multijet normalisation and its systematic uncertainty are determined by the fit, so only its statistical uncertainty is quoted here. Since the numbers are rounded, the sum of all contributions may not equal the total value. The signal-to-background ratio, S /B, and the significance, S / √ B, are also given. The tH background is not shown as it amounts to fewer than 1.5 events in each region.

Analysis method
The Toolkit for Multivariate Data Analysis (TMVA) [71] is used to train a BDT to separate the ttH signal from the background. A dedicated BDT is defined and optimised in each of the six analysis regions. The variables entering the BDT and their definitions are listed in Table 3.  The input variables include event-shape variables such as Centrality Mass and aplanarity, global event variables, such as S T (the modulus of the vector sum of the jet p T ), H T 5 (the scalar sum of the jet p T starting from the fifth jet in p T order), m min j j (the smallest invariant mass of all dijet combinations), and the minimum ∆R between jets. The p T of the softest jet in the event is the only individual kinematic variable that enters the BDT directly. Other variables are calculated from pairs of objects: ∆R(b, b) p max T (the ∆R between the two b-tagged jets with highest vector sum p T ), m ∆R(b,b) min bb (the invariant mass of the two b-tagged jets with the smallest ∆R), (E T 1 + E T 2 )/ E jets T (the sum of the transverse energies of the two leading jets divided by the sum of the transverse energies of all jets), m 2 jets (the mass of the dijet pair, which, when combined with any b-tagged jet, maximises the magnitude of the vector sum of the p T of the three-jet system) and m 2 b-jets (the invariant mass of the two b-tagged jets which are selected by requiring that the invariant mass of all the remaining jets is maximal). Two variables are calculated as the invariant mass of three jets: m top,1 is computed from the three jets whose invariant mass is nearest to the top quark mass, taking into account the jet energy resolutions; the m top,2 calculation uses the same algorithm but excludes the jets which enter m top,1 . Finally, a log-likelihood ratio variable, Λ, is used; it is related to the probability of an event to be a signal candidate, compared to the probability of being a background candidate.
The Λ variable is the sum of the logarithms of ratios of relative probability densities for W boson, top quark and Higgs boson resonances to be reconstructed in the event. For a given resonance X decaying to two jets, the Λ component is built as Λ X (m j j ) = ln P sig (m j j ) P bkg (m j j ) within a mass window w X = ±30 GeV around the given particle mass: Here s and b are the probabilities to find a jet pair with an invariant mass within ±w X of m X . They are calculated from the signal simulation and from the multijet background respectively. The signal mass distribution is modelled with a Gaussian G(m j j |m X , σ X ), while the background is modelled with a uniform distribution Rect(m X , w X ) between m X − w X and m X + w X . Both functions P sig (m j j ) and P bkg (m j j ) are normalised to unity. For the top quark resonance the three-particle mass, m j jb , is used. The width of the Gaussian is set to σ X = 18 GeV for all resonances; this value corresponds to the expected experimental width of a Higgs boson with no combinatoric background.
The expression for the complete event Λ is: The three terms refer to W, top, and Higgs resonances respectively. For the top quark and Higgs boson resonances the masses, m j jb and m bb , as well as the p T , defined as the magnitude of the vector sum of the p T of the jets used to reconstruct the top quark, p T, j jb , and to reconstruct the Higgs boson, p T,bb , are used. The value of Λ is calculated for all possible jet combinations and the maximum Λ of the event is chosen. The variables entering the BDT are selected and ranked according to their separation power with an iterative procedure, which stops when adding more variables does not significantly improve the separation between signal and background. The cut-off corresponds to the point when adding a variable increases the significance, defined as i S 2 i /B 2 i where S i and B i are the expected signal and background yields in the i th bin of the BDT discriminant, by less than 1%. Signal and background samples are classified as described in Section 7, and then each subsample is further subdivided randomly into two subsamples of equal size for training and for testing.
The ranking of the input variables in terms of separation power for each analysis region is shown in Table 3. The distributions of the BDT outputs for simulated signal and background events are shown in Figure 5 for each analysis region. The Figure shows a better separation between signal and background for low jet multiplicities than for high jet multiplicities. This is explained by the number of possible jet permutations. The number of jet permutations increases giving the background more configurations to mimic the signal.

Systematic uncertainties
The sources of systematic uncertainty considered in this analysis can be grouped into six main categories as summarised in Table 4. Each systematic uncertainty is represented by an independent parameter, referred to as a nuisance parameter, and is parameterised with a Gaussian function for the shape uncertainties and a log-normal distribution for the normalisations [72]. They are centred around zero and one, respectively, with a width that corresponds to the given uncertainty. The uncertainties in the integrated luminosity, reconstruction of the physics objects, and the signal and background MC models are treated as in Ref. [16]. The uncertainties related to the jet trigger as well as those related to the data-driven method to estimate the multijet background are discussed below. In total, 99 fit parameters are considered. The determination and treatment of the systematic uncertainties are detailed in this section. Their impact on the fitted signal strength is summarised in Table 8 in Section 11. The systematic uncertainty in the luminosity for the data sample is 2.8%. It is derived following the same methodology as that detailed in Ref. [73]. The trigger uncertainty is determined from the difference between trig , estimated using ttH and dijet MC events. Each jet in the event is weighted according to SF trig (p T , η), the uncertainty of which is propagated to the shape and normalisation of the BDT output distribution, as shown in Figure 6(a).
The uncertainties in physics objects are related to the reconstruction and b-tagging of jets. The jet energy Table 4: Sources of systematic uncertainty considered in the analysis grouped in six categories. "N" denotes uncertainties affecting only the normalisation for the relevant processes and channels, whereas "S" denotes uncertainties which are considered to affect only the shape of normalised distributions. "SN" denotes uncertainties affecting both shape and normalisation. Some sources of systematic uncertainty are split into several components. The number of components is also reported. resolution (JER) and the jet energy scale (JES) uncertainties are derived combining the information from test-beam data and simulation [25]. The JES uncertainties are split into 21 uncorrelated components. The largest of these uncertainties is due to the jet flavour composition. The JVF uncertainty is derived from Z(→ + − )+ 1-jet events in data and simulation by varying the nominal cut value by 0.1 up and down.
The uncertainty related to the b-tagging is modelled with six independent parameters, while four parameters model the c-tagging uncertainty [26]. These are eigenvalues obtained by diagonalising the matrix which parameterises the tagging efficiency as a function of p T , taking into account bin-to-bin correl-ations. Twelve parameters, which depend on p T and η, are used to parameterise the light-jet-tagging systematic uncertainties [74]. The per-jet b-tagging uncertainties are 3%-5%, about 10% for c-tagging and 20% for light jet tagging. An additional uncertainty is assigned to the b-tagging efficiency for jets with p T > 300 GeV, which lacks statistics for an accurate calibration from data.
A combined uncertainty of ±6.0% is assigned to the tt+jets production cross section, including modelling components due to the value of α s , the PDF used, the process energy scale, and the top quark mass. Other systematic uncertainties related to tt+jets production are due to the modelling of parton showers and hadronisation.
The systematic uncertainties arising from the reweighting procedure to improve tt background description by simulation (Section 5.2), have been extensively studied in Ref.
[16] and adopted in this analysis. The largest uncertainties in the tt background description arise from radiation modelling, the choice of generator to simulate tt production, the JES, JER, and flavour modelling. These systematic uncertainties are applied to the tt+light and tt + cc components. Two additional systematic uncertainties, the full difference between applying and not applying the reweightings of the tt system p T and top quark p T , are assigned to the tt + cc component.
Four additional systematic uncertainties in the tt + cc estimate are derived from the simultaneous variation of factorisation and renormalisation scales in Madgraph+Pythia. For the tt + bb background, three scale uncertainties are evaluated by varying the renormalisation and resummation scales. The shower recoil model uncertainty and two uncertainties due to the PDF choice in the sherpa+OpenLoops NLO calculation are also taken into account.
The tt+jets background is parameterised to allow a varying percentage of heavy flavours c and b in the additional jets not originating from the top quark decay products. An uncertainty of ±50% is assigned to the tt + bb and tt + cc components of the tt+jets cross section, which are treated as uncorrelated and are derived by comparing Powheg+Pythia with a NLO result based on sherpa+OpenLoops. The uncertainty in the tt + bb contribution represents the dominant systematic effect in this analysis. An uncertainty of ±30% in the total cross section is assumed for tt + V [58,59].
The multijet background is estimated using data in regions with exactly two b-tagged jets after subtraction of contributions from other events using MC simulation. All systematic uncertainties mentioned above are fully propagated to the data-driven multijet background estimation and treated in a correlated manner.
To estimate the uncertainties associated with the multijet background, the values of ε MJ are determined as a function of different sets of variables, listed in the first part of Table 5, which are sensitive to the amount and the mechanism of heavy-flavour production. Alternative variables used are ∆R min ( j, j) , the minimum ∆R between the probed jet and any other jet in the event, ∆R min ( j,hMV1) , the minimum ∆R between the probed jet and the two jets with highest b-tag probability or ∆R ( j,hMV1) , its average value, and ∆R MV1 , the ∆R between the two jets with the highest b-tag probability. In addition, different choices of methods to exclude b-tagged jets when determining ε MJ in the TRF MJ method are considered: the two b-tagged jets with the lowest MV1 weight or a random choice of two jets among all b-tagged jets in the event are chosen. The different sets of variables used to define ε MJ affect the shape of the BDT distribution for the multijet background, as shown in Figure 6(b). Each of these shape variations is taken into account by a nuisance parameter in the fit. These parameterisations also affect the overall normalisation, with a maximum variation of 18% in the 3-b-tag regions and 38% in the ≥4-b-tag regions. Residual mismodelling of H T and S T from the extraction region are also taken into account as systematic uncertainties. The normalisation of the multijet background is evaluated independently in each of the six analysis regions. Table 5: Alternative predictions of the multijet background with the TRF MJ method. Multijet sets 1 to 5 correspond to variations of the nominal set of variables describing ε MJ . The next two sets specify the variation in the nominal set based on the two b-tagged jets which are used to compute ε MJ . The last two refer to changes due to the residual mismodellings of H T and S T . Each of these variations of the multijet background shape is quantified by one nuisance parameter in the fit.
TRF MJ  For the signal MC modelling, the PowHel factorisation and renormalisation scales are varied independently by a factor two and 0.5. The kinematics of the MC simulated samples are then reweighted to reproduce the effects of these variations. The uncertainties related to the choice of PDFs are evaluated using the recommendations of PDF4LHC [75]. The systematic uncertainties from the parton shower and fragmentation models are evaluated using PowHel+Herwig samples. The uncertainty due to the choice of generator is evaluated by comparing PowHel+Pythia8 with Madgraph5_aMC@NLO+Herwig++.

Statistical methods
The binned distributions of the BDT output discriminants for each of the six analysis regions are combined as inputs to a test statistic to search for the presence of a signal. The analysis uses a maximum-likelihood fit [72] to measure the compatibility of the observed data with the background-only hypothesis, i.e., µ = 0, and to make statistical inferences about µ, such as upper limits, using the CL s method [76,77] as implemented in the RooFit package [78].

Results
The yields in the different analysis regions considered in the analysis after the fit (post-fit) are summarised in Table 6. In each region, the variation of background and signal events with respect to the pre-fit values (cf. Table 2) are modest and, in particular, the fitted multijet background component is well constrained by the fit within an uncertainty of 8%.  Figures 7 and 8 show the BDT output distributions for data and the predictions in each analysis region, both before (left panels) and after (right panels) the fit to data. The relative uncertainties decrease significantly in all regions due to the constraints provided by the data, exploiting the correlations between the uncertainties in the different analysis regions.
The signal strength in the all-hadronic ttH decay mode, for m H = 125 GeV, is measured to be: µ(m H = 125 GeV) = 1.6 ± 2.6.
The expected uncertainty in the signal strength (µ = 1) is ±2.8. The observed (expected) significance of the signal is 0.6 (0.4) standard deviations, corresponding to an observed (expected) p-value of 27% (34%), where the p-value is the probability to obtain a result at least as signal-like as observed if no signal were present.
The observed and expected limits are summarised in Table 7. A ttH signal 6.4 times larger than predicted by the SM is excluded at 95% CL. A signal 5.4 times larger than the signal of a SM Higgs boson is expected to be excluded for the background-only hypothesis. Figure 9 summarises the post-fit event yields for data, total background and signal expectations as a function of log 10 (S /B). The signal is normalised to the fitted value of the signal strength (µ = 1.6). A signal strength 6.4 times larger than predicted by the SM is also shown in Figure 9. Figure 10 shows the effect of the major systematic uncertainties on the fitted value of µ and the constraints provided by the data. The ranking, from top to bottom, is determined by the post-fit impact on µ. This effect is calculated by fixing the corresponding nuisance parameter atθ ± σ θ and performing the fit again. Hereθ is the fitted value of the nuisance parameter and σ θ is its post-fit uncertainty. The difference between the default and the modified µ, ∆µ, represents the effect on µ of this particular systematic uncertainty. This is also shown in Table 8. The largest effect arises from the uncertainty in the normalisation of the irreducible tt + bb background. The tt + bb background normalisation is smaller by 30% in the fit than the prediction, resulting in a decrease of the observed tt + bb yield with respect to the Powheg+Pythia prediction. The second largest effect comes from the multijet background normalisation. The data-driven method focuses on modelling the shape of the multijet background while the normalisation is constrained by the regions dominated by multijet background. The uncertainty in the normalisation parameters amounts to few percent and the values from each region are consistent with the variations applied to these parameters to account for systematic uncertainties. Two of the multijet background shape uncertainties are ranked fourth and fifth, and their pulls are slightly positive.
Other important uncertainties include b-tagging and JES. Uncertainties arising from jet energy resolution, jet vertex fraction, jet reconstruction and JES that affect primarily low-p T jets, as well as the tt+light-jet background modelling uncertainties, do not have a significant impact on the result.  Figure 7: Comparison between data and prediction for the BDT discriminant in the, from top to bottom, (6-8j, 3b) regions before (left) and after (right) the fit. The fit is performed under the signal-plus-background hypothesis. Pre-fit plots show an overlay of the multijet distribution normalised to data for illustration purposes only. The bottom panels display the ratios of data to the total prediction. The hashed areas represent the total uncertainty in the background predictions. The ttH signal yield (solid red) is scaled by a fixed factor before the fit.  Figure 8: Comparison between data and prediction for the BDT discriminant in the, from top to bottom, (6-8j, ≥4b) regions before (left) and after (right) the fit. The fit is performed under the signal-plus-background hypothesis. Pre-fit plots show an overlay of the multijet distribution normalised to data for illustration purposes only. The bottom panels display the ratios of data to the total prediction. The hashed areas represent the total uncertainty in the background predictions. The ttH signal yield (solid red) is scaled by a fixed factor before the fit.  Figure 9: Event yields as a function of log 10 (S /B), where S (expected signal yield) and B (expected background yield) are taken from the corresponding BDT discriminant bin. Events from all fitted regions are included. The predicted background is obtained from the global signal-plus-background fit. The ttH signal is shown both for the best-fit value (µ = 1.6) and for the upper limit at 95% CL (µ = 6.4).  The sensitivity of the search for ttH production can be increased by statistically combining different Higgs boson decay channels. This combination is described in the following.

Individual tt H measurements and results
The ttH searches that are combined are: • ttH(H → bb) in the single-lepton and opposite-charge dilepton tt decay channels using data at √ s = 8 TeV [16], • ttH(H → bb) in the all-hadronic tt decay channel using data at √ s = 8 TeV as presented in this paper, • ttH(H → (WW ( * ) , ττ, ZZ ( * ) ) → leptons) with two same-charge leptons (e or µ), three leptons, four leptons, two hadronically decaying τ leptons plus one lepton and one hadronically decaying τ lepton plus two leptons in the final state using data at √ s = 8 TeV [15], • ttH(H → γγ) at √ s = 7 and 8 TeV in both the hadronic and leptonic (e or µ) tt pair decay channels [17].
First all H → bb final states are combined, obtaining a signal strength for the ttH(H → bb) combination, and then the outcome is combined with the remaining (non-H → bb) channels.

H → bb (single lepton and dilepton tt decays)
The search for ttH production with H → bb is performed in both the single-lepton and dilepton tt decay modes [16]. The single-lepton analysis requires one charged lepton with at least four jets, of which at least two need to be b-tagged, while the dilepton analysis requires two opposite-charge leptons with at least two jets, of which at least two must be b-tagged. The events are then categorised according to the jet and b-tagged jet multiplicity. The dominant background in the signal-enriched regions is from tt + bb events. In these regions, neural networks [79] are built using kinematic information in order to separate the ttH signal from tt background. Furthermore, in the single-lepton channel, a matrix-element discriminant is built in the most signal-enriched regions and is used as an input to the neural network.
12.1.2 H → (WW ( * ) , ττ, Z Z ( * ) ) → leptons The ttH search with H → (WW ( * ) , ττ, ZZ ( * ) ) → leptons [15] exploits several multilepton signatures resulting from Higgs boson decays to vector bosons and/or τ leptons. Events are categorised based on the number of charged leptons and/or hadronically decaying τ leptons in the final state. The categorisation includes events with two same-charge leptons, three leptons, four leptons, one lepton and two hadronic τ leptons, as well as two same-charge leptons with one hadronically decaying τ lepton. Backgrounds include events with electron charge misidentification, which are estimated using data-driven techniques, non-prompt leptons arising from semileptonic b-hadron decays, mostly from tt events, again estimated from data-driven techniques, and production of tt + W and tt + Z, which are estimated using MC simulations. Signal and background event yields are obtained from a simultaneous fit to all channels.

H → γγ
The ttH search in the H → γγ channel [17] exploits the sharp peak in the diphoton mass distribution from the H → γγ decay over the continuum background. The analysis is split according to the decay mode of the tt pair. A leptonic selection requires at least one lepton and at least one b-tagged jet, and missing transverse momentum if there is only one b-tagged jet, whereas a hadronic selection requires a combination of jets and b-tagged jets. Contributions from peaking non-ttH Higgs boson production modes are estimated from MC simulations. The signal is extracted with a fit using the diphoton mass distribution as a discriminant.

Correlations
Nuisance parameters corresponding to the same source of uncertainty in different analyses are generally considered to be correlated with each other, except for the following sets: • Nuisance parameters related to b-tagging (also c-tagging and light mis-tagging) are considered to be independent among the analyses as different b-tagging working points are employed.
• The electron identification uncertainty is considered to be uncorrelated between analyses due to different selections used.

Signal strength
The result of the ttH(H → bb) combination for the signal strength is µ = 1.4 ± 1.0. The observed signal strengths for the individual ttH(H → bb) channels and for their combination are summarised in Figure 11. The tt + bb normalisation nuisance parameters obtained in the all-hadronic analysis (−0.6 ± 0.8) and the leptonic analysis (+0.8 ± 0.4) The expected significance increases from 1.0σ for the leptonic final state of ttH(H → bb) to 1.1σ for the combined ttH(H → bb). Because the combined ttH(H → bb) best-fit value of µ is lower than the leptonic-only value, the observed significance for the ttH(H → bb) combination is reduced from 1.4σ (leptonic [16]) to 1.35σ (combined). The combination of all ttH analyses yields an observed (expected) 95% CL upper limit of 3.1 (1.4) times the SM cross section. The observed 95% CL limits for the individual ttH channels and for the combination are shown in Figure 13 and in Table 9.
The result for the best-fit value is µ = 1.7 ± 0.8.

Couplings
Sensitivity to t − H and W − H couplings stems from several sources: from the ttH production itself, from the Higgs boson decay branching fractions, from associated single top and Higgs boson production =125 GeV H for m µ 95% CL limit on 0 2 4 6 8 10 12 (   Figure 13: Upper limits on the signal strength µ for the individual channels as well as for their combination, at 95% CL. The observed limits (solid lines) are compared to the expected median limits under the background-only hypothesis (black dashed lines) and under the signal-plus-background hypothesis assuming the SM prediction for σ(ttH) (red dotted lines). The surrounding green and yellow bands bands correspond to the ±1σ and ±2σ ranges around the expected limits under the background-only hypothesis. Table 9: Observed and expected (median, for the background-only hypothesis) upper limits at 95% CL on σ(ttH) relative to the SM prediction, for the individual channels as well as for their combination. The ±1σ and ±2σ ranges around the expected limit are also given. The expected median upper limits at 95% CL assuming the SM prediction for σ(ttH) are shown in the last column. The parameterisation of the couplings for the ttH and tH production modes and for the different Higgs boson decay modes is taken from Refs. [7,80]. Figure 14 shows the log-likelihood contours of κ F versus κ V for the combined ttH fit. The combination of all analysis channels slightly prefers positive κ F . Additional studies, performed to determine the contribution of the individual analyses to the combined coupling measurement, indicate that the ttH, H → (WW ( * ) , ττ, ZZ ( * ) ) → leptons analysis prefers somewhat enhanced W − H coupling, which can only be compatible with the ttH(H → γγ) rate if the interference between ttH and WWH amplitudes is destructive, as expected in the SM.

Conclusion
A search for the SM Higgs boson produced in association with a pair of top quarks (ttH) has been carried out with the ATLAS detector at the Large Hadron Collider. The search focuses on H → bb decays with tt pairs decaying hadronically. The data used correspond to an integrated luminosity of 20.3 fb −1 of pp collisions at √ s = 8 TeV. The analysis is carried out in six different jet and b-tagged jet multiplicity regions. Discrimination between signal and background is obtained by employing a boosted decision tree multivariate classifier in all regions. No significant excess of events above the background expectation is found for the SM Higgs boson with a mass of 125 GeV. An observed (expected) 95% CL upper limit of 6.4 (5.4) times the SM cross section is obtained. By performing a fit under the signal-plus-background hypothesis, the ratio of the measured signal strength to the SM expectation is found to be µ = 1.6 ± 2.6.
The statistical combination of all ttH analyses performed at √ s = 7 TeV and 8 TeV yields an observed (expected) upper limit of 3.1 (1.4) times the SM cross section at 95% CL. The combined measured signal strength is found to be µ = 1.7 ± 0.8.