Search for the $b\bar{b}$ decay of the Standard Model Higgs boson in associated $(W/Z)H$ production with the ATLAS detector

A search for the $b\bar{b}$ decay of the Standard Model Higgs boson is performed with the ATLAS experiment using the full dataset recorded at the LHC in Run 1. The integrated luminosities used from $pp$ collisions at $\sqrt{s}=7$ and 8 TeV are 4.7 and 20.3 fb$^{-1}$, respectively. The processes considered are associated $(W/Z)H$ production, where $W\to e\nu/\mu\nu$, $Z\to ee/\mu\mu$ and $Z\to\nu\nu$. The observed (expected) deviation from the background-only hypothesis corresponds to a significance of 1.4 (2.6) standard deviations and the ratio of the measured signal yield to the Standard Model expectation is found to be $\mu = 0.52 \pm 0.32 \mathrm{(stat.)} \pm 0.24 \mathrm{(syst.)}$ for a Higgs boson mass of 125.36 GeV. The analysis procedure is validated by a measurement of the yield of $(W/Z)Z$ production with $Z\to b\bar{b}$ in the same final states as for the Higgs boson search, from which the ratio of the observed signal yield to the Standard Model expectation is found to be $0.74 \pm 0.09 \mathrm{(stat.)} \pm 0.14 \mathrm{(syst.)}$.


Introduction
For decades, the Higgs boson [1][2][3][4] of the Standard Model (SM) remained an unconfirmed prediction. In July 2012, the ATLAS and CMS experiments at the LHC reported the observation of a new particle with a mass of about 125 GeV and with properties consistent with those expected for the SM Higgs boson [5,6]. Since then, more precise measurements have strengthened the hypothesis that the new particle is indeed a Higgs boson [7][8][9]. These measurements, however, have been mainly performed in the bosonic decay modes of the new particle (H → γγ, H → ZZ, and H → W W ). It is also essential to verify whether it decays into fermions as predicted by the Standard Model.
Recently, the CMS Collaboration reported evidence for the τ τ decay mode of the Higgs boson at a level of significance of 3.4 standard deviations (σ) for m H = 125 GeV [10].
The H → bb decay mode is predicted in the SM to have a branching ratio of 58% for m H = 125 GeV [11]. Accessing H → bb decays is therefore crucial for constraining, under fairly general assumptions [12], the overall Higgs boson decay width and, in a global fit to all accessible combinations of Higgs boson production and decay modes, to allow for measurements of absolute Higgs boson couplings. An inclusive search for H → bb is not feasible at hadron colliders because of the overwhelming background from multijet production. In spite of a cross section more than an order of magnitude lower than the dominant gluon-fusion process, associated production of a Higgs boson with a vector boson, W or Z [13], offers a viable alternative because leptonic decays of the vector boson, W → ν, Z → ( = e, µ), and Z → νν, can be efficiently used for triggering and background reduction purposes [14,15]. The CDF and D0 experiments at the Tevatron reported an excess of events in their search for associated (W/Z)H production in the H → bb decay mode at a significance level of 2.8σ for m H = 125 GeV [16]. Recently, the CMS experiment reported an excess of events in the H → bb decay mode with a significance of 2.1σ for m H = 125 GeV [17].
In this paper, a search for associated (W/Z)H production of the SM Higgs boson in the bb decay mode is presented, using the full integrated luminosity accumulated by ATLAS during Run 1 of the LHC: 4.7 and 20.3 fb −1 from proton-proton (pp) collisions at centre-of-mass energies of 7 and 8 TeV in 2011 and 2012, respectively. An analysis of the 7 TeV dataset has already been published by ATLAS [18]. In addition to the increase in the amount of data analysed, the update presented in this paper benefits from numerous analysis improvements. Some of the improvements to the object reconstruction, however, are available only for the 8 TeV dataset, which leads to separate analysis strategies for the two datasets.
The analysis is performed for events containing zero, one, or two charged leptons (electrons or muons), targeting the Z → νν, W → ν, or Z → decay modes of the vector boson, respectively. In addition to Z → νν decays, the 0-lepton channel has a smaller but not insignificant contribution from leptonic W decays when the lepton is produced outside of the detector acceptance or not identified. A b-tagging algorithm is used to identify the jets consistent with originating from an H → bb decay. To improve the sensitivity, the three channels are each split according to the vector-boson transverse momentum, the number of jets (two or three), and the number of b-tagged jets. Topological and kinematic selection criteria are applied within each of the resulting categories.
A binned maximum likelihood fit is used to extract the signal yield and the background normalisations. Systematic uncertainties on the signal and background modelling are implemented as deviations in their respective models in the form of "nuisance" parameters that are varied in the fit. Each nuisance parameter is constrained by a penalty term in the likelihood, associated with its uncertainty. Two versions of the analysis are presented in this paper: in the first, referred to as the dijet-mass analysis, the mass of the dijet system of b-tagged jets is the final discriminating variable used in the statistical analysis; in the other, a multivariate analysis (MVA) incorporating various kinematic variables in addition to the dijet mass, as well as b-tagging information, provides the final discriminating variable. Because the latter information is not available in similar detail for the 7 TeV dataset, the MVA is used only for the 8 TeV dataset. In both analyses, dedicated control samples, typically with loosened b-tagging requirements, constrain the contributions of the dominant background processes. The most significant background sources are (W/Z)+heavy-flavour-jet production and tt production. The normalisations of these backgrounds are fully determined by the likelihood fit. Other significant background sources are single-top-quark and diboson (W Z and ZZ) production, with normalisations taken from theory, as well as multijet events, normalised using multijet-enriched control samples. Since the MVA has higher expected sensitivity, it is chosen as the nominal analysis for the 8 TeV dataset to extract the final results. To validate the analysis procedures, both for the dijet-mass and MVA approaches, a measurement of the yield of (W/Z)Z production is performed in the same final states and with the same event selection, with H → bb replaced by Z → bb.
This paper is organised as follows. A brief description of the ATLAS detector is given in section 2. Details of the data and simulated samples used in this analysis are provided in section 3. This is followed by sections describing the dijet-mass and multivariate analyses applied to the 8 TeV data. The reconstruction of physics objects such as leptons and jets is addressed in section 4. Section 5 details the event selections applied to the dijet-mass and multivariate analyses, while section 6 explains the construction of the final discriminating variable of the MVA. Section 7 discusses the background composition in the various analysis regions, while the systematic uncertainties are addressed in section 8. The statistical procedure used to extract the results is described in section 9. For the 7 TeV data, only a dijet-mass analysis is used, and differences with respect to the 8 TeV data analysis are specified in section 10. The results are presented and discussed in section 11, and a summary of the paper is given in section 12. 3

The ATLAS detector
The ATLAS detector [19] is cylindrically symmetric around the beam axis and is structured in a barrel and two endcaps. It consists of three main subsystems. The inner tracking detector is immersed in the 2 T axial magnetic field produced by a superconducting solenoid. Charged-particle position and momentum measurements are made by pixel detectors followed by silicon-strip detectors in the pseudorapidity 1 range |η| < 2.5 and by a straw-tube transition-radiation tracker (TRT) in the range |η| < 2.0. The pixel detectors are crucial for b-tagging, and the TRT also contributes to electron identification. The calorimeters, located beyond the solenoid, cover the range |η| < 4.9 with a variety of detector technologies. The liquid-argon electromagnetic calorimeters are divided into barrel (|η| < 1.475), endcap (1.375 < |η| < 3.2), and forward (3.1 < |η| < 4.9) sections. The hadronic calorimeters (using scintillator tiles or liquid argon as active materials) surround the electromagnetic calorimeters with a coverage of |η| < 4.9. The muon spectrometer measures the deflection of muon tracks in the field of three large air-core toroidal magnets, each containing eight superconducting coils. It is instrumented with separate trigger and high-precision tracking chambers covering the |η| < 2.4 and |η| < 2.7 ranges, respectively.
The trigger system is organised in three levels. The first level is based on custom-made hardware and uses coarse-granularity calorimeter and muon information. The second and third levels are implemented as software algorithms and use the full detector granularity. At the second level, only regions deemed interesting at the first level are analysed, while the third level, called the event filter, makes use of the full detector read-out to reconstruct and select events, which are then logged for offline analysis at a rate of up to 400 Hz averaged over an accelerator fill.

Data and simulated samples
The datasets used in this analysis include only pp collision data recorded in stable beam conditions and with all relevant sub-detectors providing high-quality data. The corresponding integrated luminosities are 4.7 and 20.3 fb −1 [20] for the 7 and 8 TeV data, respectively.
Events in the 0-lepton channel are selected by triggers based on the magnitude E miss T of the missing transverse momentum vector. The E miss T trigger configuration evolved during data taking to cope with the increasing luminosity, and the trigger efficiency was improved for the 8 TeV data. The dependence of the E miss T trigger efficiency on the E miss T reconstructed offline is measured in W → µν+jets and Z → µµ+jets events collected with single-muon triggers, with the offline E miss T calculated without the muon contribution. As there was a 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis coinciding with the axis of the beam pipe. The x-axis points from the IP towards the centre of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r,φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). The distance in (η,φ) coordinates, ∆R = (∆φ) 2 + (∆η) 2 , is also used to define cone sizes. Transverse momentum and energy are defined as pT = p sin θ and ET = E sin θ, respectively. For the purpose of object selections, η is calculated relative to the geometric centre of the detector; otherwise, it is relative to the reconstructed primary vertex of each event.

Process Generator
Signal ( )  ZZ powheg+pythia8 Table 1. The generators used for the simulation of the signal and background processes. ( ) For the analysis of the 7 TeV data, pythia8 is used for the simulation of the gg → ZH process, and herwig for the simulation of diboson processes.
brief period of data-taking in which the E miss T triggers were not available for the first bunch crossings of two bunch trains, the integrated luminosity for the 0-lepton channel in the 7 TeV dataset is reduced to 4.6 fb −1 . Events in the 1-lepton channel are primarily selected by single-lepton triggers. The E T threshold of the single-electron trigger was raised from 20 to 22 GeV during the 7 TeV data-taking period, and to 24 GeV for the 8 TeV data. The p T threshold of the single-muon trigger was similarly increased from 18 GeV for the 7 TeV data to 24 GeV at 8 TeV. As the single-lepton triggers for the 8 TeV data include isolation criteria, triggers with higher thresholds (60 GeV for electrons and 36 GeV for muons) but no isolation requirements are used in addition. Single-lepton trigger efficiencies are measured using a tag-and-probe method applied to Z → ee and Z → µµ events. In the 1-muon sub-channel, E miss contributions from these "pile-up" interactions are simulated both within the same bunch crossing as the hard-scattering process and in neighbouring bunch crossings. The resulting events are then processed through the same reconstruction programs as the data.
Additional generators are used for the assessment of systematic uncertainties as explained in section 8.
Simulated jets are labelled according to which generated hadrons with p T > 5 GeV are found within a cone of size ∆R = 0.4 around the reconstructed jet axis. If a b-hadron is found, the jet is labelled as a b-jet. If not and a c-hadron is found, the jet is labelled as a c-jet. If neither a b-nor a c-hadron is found, the jet is labelled as a light (i.e., u-, d-, or s-quark, or gluon) jet. Simulated V +jet events, where V stands for W or Z, are then categorised according to the labels of the two jets that are used to reconstruct the Higgs boson candidate. If one of those jets is labelled as a b-jet, the event belongs to the V b category. If not and one of the jets is labelled as a c-jet, the event belongs to the V c category. Otherwise, the event belongs to the V l category. Further subdivisions are defined according to the flavour of the other jet from the pair, using the same precedence order: V bb, V bc, V bl, V cc, V cl. The combination of V bb, V bc, V bl and V cc is denoted V +hf. 4

Object reconstruction
In this section, the reconstruction of physics objects used in the analysis of the 8 TeV data is presented. Differences relevant for the analysis of the 7 TeV data are reported in section 10.
Charged-particle tracks are reconstructed with a p T threshold of 400 MeV. The primary vertex is selected from amongst all reconstructed vertices as the one with the largest sum of associated-track squared transverse momenta Σp 2 T and is required to have at least three associated tracks.
Three categories of electrons [53,54] and muons [55] are used in the analysis, referred to as loose, medium and tight leptons in order of increasing purity. Loose leptons are selected with transverse energy E T > 7 GeV. Loose electrons are required to have |η| < 2.47 and to fulfil the "very loose likelihood" identification criteria defined in ref. [54]. The likelihood-based electron identification combines shower-shape information, track-quality criteria, the matching quality between the track and its associated energy cluster in the calorimeter (direction and momentum/energy), TRT information and a criterion to help identify electrons originating from photon conversions. The electron energies are calibrated by making use of reference processes such as Z → ee [56]. Three types of muons are included in the loose definition to maximise the acceptance: (1) muons reconstructed in both the muon spectrometer and the inner detector (ID); (2) muons with p T > 20 GeV identified in the calorimeter and associated with an ID track with |η| < 0.1, where there is limited muon-chamber coverage; and (3) muons with |η| > 2.5 identified in the muon spectrometer, and which do not match full ID tracks due to the limited inner-detector coverage. For muons of the first and second type, the muon-track impact parameters with respect to the primary vertex must be smaller than 0.1 mm and 10 mm in the transverse plane and along the z-axis, respectively. Finally, the scalar sum of the transverse momenta of tracks within a cone of size ∆R = 0.2 centred on the lepton-candidate track, excluding the lepton track, is required to be less than 10% of the transverse momentum of the lepton.
Medium leptons must meet the loose identification criteria and have E T > 25 GeV. Medium muons must be reconstructed in both the muon spectrometer and the inner detector and have |η| < 2.5. Tight electrons are required to additionally fulfil the "very tight likelihood" identification criteria [54]. For both the tight electrons and the tight muons, more stringent isolation criteria must be satisfied: the sum of the calorimeter energy deposits in a cone of size ∆R = 0.3 around the lepton, excluding energy associated with the lepton candidate, must be less than 4% of the lepton energy, and the track-based isolation requirement is tightened from 10% to 4%.
Jets are reconstructed from noise-suppressed topological clusters of energy in the calorimeters [57] using the anti-k t algorithm [58] with a radius parameter of 0.4. Jet energies are corrected for the contribution of pile-up interactions using a jet-area-based technique [59] and calibrated using p T -and η-dependent correction factors determined from simulation, with residual corrections from in situ measurements applied to data [60,61]. Further adjustments are made based on jet internal properties, which improve the energy resolution without changing the average calibration (global sequential calibration [60]). To reduce the contamination by jets from pile-up interactions, the scalar sum of the p T of tracks matched to the jet and originating from the primary vertex must be at least 50% of the scalar sum of the p T of all tracks matched to the jet. This requirement is only applied to jets with p T < 50 GeV and |η| < 2.4. Jets without any matched track are retained. The jets kept for the analysis must have p T > 20 GeV and |η| < 4.5.
To avoid double-counting, the following procedure is applied to loose leptons and jets. First, if a jet and an electron are separated by ∆R < 0.4, the jet is discarded. Next, if a jet and a muon are separated by ∆R < 0.4, the jet is discarded if it has three or fewer matched tracks since in this case it is likely to originate from a muon having showered in the calorimeter; otherwise the muon is discarded. 5 Finally, if an electron and a muon are separated by ∆R < 0.2, the muon is kept unless it is identified only in the calorimeter, in which case the electron is kept.
The MV1c b-tagging algorithm is used to identify jets originating from b-quark fragmentation. This algorithm combines in a neural network the information from various algorithms based on track impact-parameter significance or explicit reconstruction of band c-hadron decay vertices. It is an improved version of the MV1 algorithm [62][63][64] with higher c-jet rejection. Four b-tagging selection criteria (or operating points) are calibrated and used in the analysis, corresponding to average efficiencies of 80%, 70%, 60% and 50% for b-jets with p T > 20 GeV, as measured in simulated tt events. In this analysis, the 80%, 70% and 50% operating points are denoted loose, medium and tight, respectively. For the tight (loose) operating point, the rejection factors are 26 (3) and 1400 (30) against c-jets and light jets, respectively. For the tight operating point, the c-jet rejection factor is 1.9 5 Such muons are nevertheless included in the computation of the E miss T and in the jet energy corrections described in section 5. 8 times larger than obtained with the MV1 algorithm.
The b-tagging efficiencies for b-jets, c-jets and light jets are measured in both data and simulation using dedicated event samples such as tt events for b-jets, events with identified D * mesons for c-jets, or multijet events for light jets. The small differences observed are used to correct the simulation by so-called "scale factors" (SFs) within intervals between two operating points. These SFs are parameterised as a function of the jet p T and, for light jets, also |η|. The SFs are, however, strictly valid only for the generator used to derive them. The differences observed when the efficiencies are measured with different generators are taken into account by additional "MC-to-MC" SFs. Such differences can be caused by, e.g., different production fractions of heavy-flavour hadrons or modelling of their decays.
Because of the large cross sections of V l and V c production, these backgrounds remain significant despite the powerful rejection of non-b-jets by the b-tagging algorithm. It is impractical to simulate a sufficiently large number of V l and V c events to provide a reliable description of these backgrounds in the analysis samples for which two b-tagged jets are required. An alternative procedure, parameterised tagging, is therefore used. Here, instead of directly tagging the c-and l-labelled jets with the MV1c algorithm, parameterisations as functions of p T and |η| of their probabilities to be b-tagged are used for the V l, V c and W W processes in all analysis samples in which two b-tagged jets are required. These parameterisations are, however, integrated over other variables that can affect the c-and light-jet tagging efficiencies. In particular, a strong dependence of these efficiencies is observed on ∆R, the angular separation from the closest other jet, and a significant difference is seen between direct and parameterised tagging for V cc events with ∆R < 1. No such difference is seen for V cl, V l and W W events. A dedicated correction, depending on ∆R, is therefore applied to the V cc events.
The missing transverse momentum vector E miss T [65,66] is measured as the negative vector sum of the transverse momenta associated with energy clusters in the calorimeters with |η| < 4.9. Corrections are applied to the energies of clusters associated with reconstructed objects (jets, electrons, τ leptons, and photons), using the calibrations of these objects. The transverse momenta of reconstructed muons are included, with the energy deposited by these muons in the calorimeters properly removed to avoid double-counting. In addition, a track-based missing transverse momentum vector, p miss T , is calculated as the negative vector sum of the transverse momenta of tracks with |η| < 2.4 associated with the primary vertex.
Additional corrections are applied to the simulation to account for small differences from data for trigger efficiencies, for lepton reconstruction and identification efficiencies, as well as for lepton energy and momentum resolutions. 9

Event selection
In this section, the event selection applied in the analysis of the 8 TeV data is presented. Differences in the analysis of the 7 TeV data are reported in section 10.
The analysis is optimised for a Higgs boson mass of 125 GeV. Events are first categorised according to the numbers of leptons, jets, and b-tagged jets.
Events containing no loose leptons are assigned to the 0-lepton channel. Events containing one tight lepton and no additional loose leptons are assigned to the 1-lepton channel. Events containing one medium lepton and one additional loose lepton of the same flavour, and no other loose leptons, are assigned to the 2-lepton channel. In the 1-and 2-lepton channels, for at least one of the lepton triggers by which the event was selected, the objects that satisfied the trigger are required to be associated with the selected leptons.
The jets used in this analysis, called "selected jets", must have p T > 20 GeV and |η| < 2.5, the η range within which b-tagging can be applied. There must be exactly two or three such selected jets. Events containing a jet with p T > 30 GeV and |η| > 2.5 are discarded to reduce the tt background. Only selected jets are considered further, e.g., to define the jet multiplicity, or to calculate kinematic variables. The b-tagging algorithm is applied to all selected jets. There must be no more than two such jets loosely b-tagged, and 3-jet events in which the lowest-p T jet is loosely b-tagged are discarded. At least one of the two b-tagged jets must have p T > 45 GeV. The following b-tagging categories are then defined. Events with two jets satisfying the tight b-tagging criterion form the TT (or Tight) category; those not classified as TT, but with two jets satisfying the medium b-tagging criterion, form the MM (or Medium) category; those not classified as TT or MM, but with two jets satisfying the loose b-tagging criterion, form the LL (or Loose) category. This categorisation improves the sensitivity with respect to what would be obtained using a single category, such as TT+MM, with the LL category providing constraints on the backgrounds not containing two real b-jets. Events with exactly one jet loosely b-tagged form the 1-tag category, and those with no loosely b-tagged jet form the 0-tag category. In the 3-jet categories, the dijet system is formed by the two b-tagged jets in any of the 2-tag categories, by the b-tagged jet and the leading (highest-p T ) non-b-tagged jet for events in the 1-tag category, and by the two leading jets in the 0-tag category.
Additional topological and kinematic criteria are applied to reject background events and enhance the sensitivity of the search. They are outlined in table 2 and detailed below. In general, the selection criteria are looser in the MVA than in the dijet-mass analysis in order to maximise the information available to the final discriminant.
Further categorisation is performed according to the transverse momentum of the vector boson, p V T , to take advantage of the better signal-to-background ratio at high p V T . The transverse momentum of the vector boson is reconstructed as the E miss T in the 0-lepton channel, the magnitude p W T of the vector sum of the lepton transverse momentum and the E miss T in the 1-lepton channel, and the magnitude p Z T of the vector sum of the transverse momenta of the two leptons in the 2-lepton channel. In the dijet-mass analysis, the events are categorised in five p V T intervals, with boundaries at 0, 90, 120, 160, and 200 GeV. In the 0-lepton channel and for events fulfilling the condition on p < 120 - Table 2. Event topological and kinematic selections. NU stands for 'Not Used'. ( * ) In the 0-lepton channel, the lower edge of the second p V T interval is set at 100 GeV instead of 90 GeV. For the 1-lepton channel, only the 1-muon sub-channel is used in the p V T < 120 GeV intervals.
the E miss T trigger is fully efficient for E miss T > 160 GeV, 97% efficient for E miss T = 120 GeV, and 80% efficient for E miss T = 100 GeV, with an efficiency that decreases rapidly for lower E miss T . Only four intervals are therefore used in the 0-lepton channel, with a minimum E miss T value of 100 GeV. In the 1-muon sub-channel, the E miss T trigger is used for p W T > 120 GeV to recover events not selected by the single-muon trigger, thus increasing the signal acceptance in this channel by 8%. In the MVA, only two intervals are defined, with p V T below or above 120 GeV, but the detailed p V T information is used in the final discriminant. In the dijet-mass analysis, requirements are applied to the angular separation between the two jets of the dijet system, ∆R(jet 1 , jet 2 ), which depend on the p V T interval. The requirement on the minimum value reduces the background from V +jet production, while the requirement on the maximum value, which reduces the background from tt production, is tightened with increasing p V T to take advantage of the increasing collimation of the dijet system for the signal. To increase the signal acceptance, the requirement on the minimum value is removed in the highest p V T interval, where the amount of background is smallest. In the MVA, where the ∆R(jet 1 , jet 2 ) information is used in the final discriminant, only a minimum value is required, a requirement which is also removed for p V T > 200 GeV. In the 0-lepton channel, the multijet (MJ) background is suppressed by imposing requirements on the magnitude p miss the scalar sum of the jet transverse momenta, p jet i T , which depends on the jet multiplicity. Additional requirements are applied in the lowest p V T interval of the 0-lepton channel, where the MJ background is largest: N jet = 2; E miss T > 100 GeV; ∆φ(jet 1 , jet 2 ) < 2.7; S > 7; and L > 0.5. Here, ∆φ(jet 1 , jet 2 ) is the azimuthal angle between the two jets, S is the E miss T significance, defined as the ratio of E miss T to the square root of p jet i T ; and L is a likelihood ratio constructed to discriminate further against the MJ background. 6 In the 1-lepton channel, a requirement is imposed on the transverse mass 7 m W T in the dijet-mass analysis. This requirement reduces the contamination from the tt background. Requirements are also imposed on H T (E miss T ) for p V T < (>)120 GeV, where H T is the scalar sum of E miss T and the transverse momenta of the two leading jets and the lepton. This mainly reduces the MJ background. As discussed in section 7.1, the MJ background is difficult to model and remains substantial in the 1-electron sub-channel in the p V T < 120 GeV intervals. Therefore, only the 1-muon sub-channel is used in these intervals.
In the 2-lepton channel, criteria are imposed on the dilepton invariant mass, m , which must be consistent with the mass of the Z boson. In the dijet-mass analysis a requirement is imposed on E miss T ; this variable is used in the final discriminant of the MVA. For events in which two jets are loosely b-tagged, these selection criteria define a set of "2-tag signal regions", categorised in terms of channel (0, 1, or 2 leptons), p V T interval, and jet multiplicity (2 or 3). In the dijet-mass analysis, a further division is performed into the TT, MM and LL b-tagging categories. In the MVA, where the b-tagging information is used in the final discriminant, a similar subdivision is performed with the difference that the TT and MM categories are merged in the 0-and 2-lepton channels. Similarly defined 1-tag and 0-tag "control regions" are used in the analysis to constrain the main backgrounds. In the 1-lepton channel, the 2-tag signal regions with a third selected jet act in practice as control regions because they are largely dominated by tt events. Actually, the distinction between 2-tag signal and 1-tag control regions is semantic, rather than technical, as all of them are used simultaneously in the global fit (described in section 9) used to extract the results. The 0-tag control regions are used only for background modelling studies (reported in section 7).
After event selection, the energy calibration of the b-tagged jets is improved as follows. The energy from muons within a jet is added to the calorimeter-based jet energy after removing the energy deposited by the muon in the calorimeter (muon-in-jet correction), and a p T -dependent correction is applied to account for biases in the response due to resolution effects (resolution correction). This latter correction is determined for the p T spectrum of jets from the decay of a Higgs boson with m H = 125 GeV in simulated (W/Z)H events. The dijet mass resolution for the signal is improved by 14% after these corrections and is typically 11% ( figure 1(a)). In the 2-lepton channel, wherein there is no true E miss T involved except possibly from semileptonic heavy-flavour decays, the energy calibration of the jets is further improved by a kinematic likelihood fit, which includes a Breit-Wigner constraint on the dilepton mass, Gaussian constraints on each of the transverse components of the bb system momentum (with a width of 9 GeV, as determined from ZH simulated events), dedicated transfer functions relating the true jet transverse momenta to their reconstructed values (after the muon-in-jet correction, but without the resolution correction) as well as a prior built from the expected true jet p T spectrum in ZH events (playing a role similar to the resolution correction). Overall, the bb mass resolution is improved by 30% in the 2-lepton channel (figure 1(b)).
The cross sections times branching ratios for (W/Z)H with W → ν, Z → , Z → νν, and H → bb, as well as the acceptances in the three channels after full selection are given in table 3 for the MVA and the dijet-mass analysis. The acceptance for other production and decay modes of the Higgs boson is negligible. The 0-lepton channel adds 7% in acceptance for the W → ν process with respect to the 1-lepton channel. Similarly, the 1-lepton channel adds 10% in acceptance for the Z → process with respect to the 2-lepton channel.  Table 3. The cross section times branching ratio (BR) and acceptance for the three channels at 8 TeV. For ZH, the qq-and gg-initiated processes are shown separately. The branching ratios are calculated considering only decays to muons and electrons for Z → , decays to all three lepton flavours for W → ν and decays to neutrinos for Z → νν. The acceptance is calculated as the fraction of events remaining in the combined 2-tag signal regions of the MVA (dijet-mass analysis) after the full event selection.

Multivariate analysis
Although the dijet mass is the kinematic variable that provides the best discrimination between signal and backgrounds, the sensitivity of the search is improved by making use of additional kinematic, topological and b-tagging properties of the selected events in a multivariate analysis. The Boosted Decision Tree (BDT) technique [69,70] is used, which, similarly to other multivariate methods, properly accounts for correlations between variables. Dedicated BDTs are constructed, trained and evaluated in each of the 0-, 1-and 2lepton channels in the 2-tag regions (with the LL, MM and TT categories combined) and separately for the events with two and three jets. In the 0-lepton channel, only events with p V T > 120 GeV are used, 8 whereas for the 1-and 2-lepton channels individual BDTs are used for p V T < 120 GeV and p V T > 120 GeV. In the 1-and 2-lepton channels, events in the electron and muon sub-channels are combined since none of the variables used are lepton-flavour specific.
The BDTs are trained to separate the (V H, H → bb) signal from the sum of the expected background processes. The input variables used to construct the BDTs are chosen in order to maximise the separation, while avoiding the use of variables not improving the performance significantly. Starting from the dijet mass, additional variables are tried one at a time and the one yielding the best separation gain is kept. This procedure is repeated until adding more variables does not result in a significant performance gain. The final sets of variables for the different channels are listed in table 4. The b-tagged jets belonging to the dijet system (with mass denoted m bb ) are labelled in decreasing p T as b 1 and b 2 , and their separation in pseudorapidity is |∆η(b 1 , b 2 )|. The b-tagging information is provided by the Variable 0-Lepton 1-Lepton 2-Lepton Table 4. Variables used in the multivariate analysis for the 0-, 1-and 2-lepton channels.
outputs of the MV1c neural network, M V 1c(b 1 ) and M V 1c(b 2 ). The angular separation, in the transverse plane, of the vector boson and the dijet system of b-tagged jets and their pseudorapidity separation are denoted ∆φ(V, bb) and |∆η(V, bb)|, respectively. In the 0lepton channel, H T is defined as the scalar sum of the transverse momenta of all jets and E miss T . In the 1-lepton channel, the angle between the lepton and the closest b-tagged jet in the transverse plane is denoted min[∆φ ( , b)]. The other variables were defined in the previous sections. In 3-jet events, the third jet is labelled as jet 3 and the mass of the 3-jet system is denoted m bbj .
The input variables of the BDTs are compared between data and simulation, and good agreement is found within the assessed uncertainties. Selected input-variable distributions are shown in figure 2. 9 In this figure, as for all figures in this section, the MJ background is estimated as described in section 7.1, corrections to the simulation as explained in section 7.2 are applied, and background normalisations and shapes are adjusted by the global fit of the MVA as outlined at the beginning of section 7 and presented in more detail in section 9. A similarly good agreement is found for the correlations between pairs of input variables, as can be seen in figure 3.
The Toolkit for Multivariate Data Analysis, TMVA [71], is used to train the BDTs.
The values for the training parameters are found by determining the configuration with the best separation between signal and background in a coarsely binned multi-dimensional training parameter space, followed by more finely grained one-dimensional scans of individual training parameters. In order to make use of the complete set of simulated MC events for the BDT training and evaluation in an unbiased way, the MC events are split into two samples of equal size, A and B. The performance of the BDTs trained on sample A (B) is evaluated with sample B (A) in order to avoid using identical events for both training and evaluation of the same BDT. Half of the data are analysed with the BDTs trained on sample A, and the other half with the BDTs trained on sample B. At the end, the output distributions of the BDTs trained on samples A and B are merged for both the simulated and data events. The values of the BDT outputs do not have a well-defined interpretation. A dedicated procedure is applied to transform the BDT-output distributions to obtain a smoother distribution for the background processes and a finer binning in the regions with the largest signal contribution, while at the same time preserving a sufficiently large number of background events in each bin. Starting from a very fine-binned histogram of the BDT-output distribution, the procedure merges histogram bins, from high to low BDT-output values, until a certain requirement, based on the fractions of signal and background events in the merged bin, is satisfied. To limit the number of bins and to reduce the impact of statistical fluctuations, a further condition is that the statistical uncertainty of the expected total background contribution has to be smaller than 10% in each merged bin. The free parameters of the transformation algorithm are optimised to maximise the expected signal sensitivity. For simplicity, these transformed outputs, which are used for the analysis, are called "BDT V H discriminants" in the following. An optimisation of the number of bins and bin boundaries is also performed for the m bb distribution used in the dijet-mass analysis in a similar way, where the free parameters of the transformation algorithm are optimised separately for the different analysis regions. The effect of the transformation on the BDToutput and dijet-mass distributions can be seen in figure 4 for the 1-lepton channel and one signal region. The transformation groups into few bins the m bb regions that are far from the signal on each of the low and high mass sides, while it expands the region close to the signal mass, where the signal-to-background ratio is largest. The effect on the BDT output is similar, but simpler to visualise because the signal and the background accumulate initially on the high and the low sides of the distribution, respectively.
Correlations between input variables and the BDT V H discriminant can provide information on the impact of individual variables on the classification. Figure 5 shows such correlations for the dijet mass, which is the BDT input that provides the best single-variable discriminating power.  2 ) and dijet-mass axes, respectively, for the total expected background after the global fit of the MVA and the data.

20
This section describes the modelling of individual backgrounds. In many cases, the data are able to constrain the normalisations and shapes better than the a priori estimates. A likelihood fit (also called "global fit") is used to simultaneously extract both the signal yield and constraints on the background normalisations and shapes. The distributions used by the fit are those of the dijet mass or BDT V H discriminant in the 2-tag signal regions, as appropriate, as well as those of the M V 1c value of the b-tagged jet in the 1-tag control regions. More details are provided in section 9.
For the multijet (MJ) backgrounds, the normalisations and shapes provided as inputs to the fit are estimated from data, as explained below. For the other backgrounds the inputs are taken from the simulation, except for the normalisations of the V +jets and tt backgrounds that are left free to float in the fit. The corrections to these two backgrounds, described below, are applied prior to the fit.
In all distributions presented in this section, unless otherwise specified, the normalisations of the various backgrounds are those extracted from the global fit for the dijet-mass or multivariate analysis, as appropriate. The fit also adjusts the background shapes in those distributions within the constraints from the systematic uncertainties discussed in section 8.

Multijet background
Multijet events are produced with a huge cross section via the strong interaction, and therefore give rise to potentially large backgrounds. A first class of MJ background arises from jets or photon conversions misidentified as electrons, or from semileptonic heavy-flavour decays; the 1-and 2-lepton channels are especially sensitive to this class of background. Another class, which affects mostly the 0-lepton channel, arises from large fluctuations in jet energy measurements in the calorimeters, which create "fake" E miss T . These MJ backgrounds cannot be determined reliably by simulation, and are estimated from data in each of the 0-, 1-, and 2-lepton channels, and in each of the 2-and 3-jet, 0-, 1-, and 2-tag regions.
The MJ background is estimated in the 0-lepton channel using an "ABCD method", within which the data are divided into four regions based on the min[∆φ(E miss ) above and below π/2 shows that these two variables are only weakly correlated, and this observation is confirmed in a multijet event sample simulated with pythia8. An MJ template in region A is obtained using events in region C after subtracting the contribution of other backgrounds, taken from simulation. The template is normalised by the ratio of the number of events in region B to that in region D, again after subtracting other backgrounds from those regions. The populations of events in the various regions suffer from low statistical precision after the 2-tag requirement. The b-tagging requirement is therefore dropped in regions B, C and D, and an additional b-tagging normalisation factor is applied to the resulting template, taken as the fraction of 2-tag events in region D. The MJ background in the signal regions is found to amount to ∼ 1% of the total background.
In the 1-lepton channel, the MJ background is determined separately for the electron and muon sub-channels. For each signal or control region, an MJ-background template is obtained in an MJ-dominated region after subtracting the small remaining contribution from the other backgrounds. The other backgrounds are taken from a simulation improved by scale factors for the various contributions obtained from a preliminary global fit. The MJ-dominated region is obtained by modifying the nominal selection to use medium, instead of tight, leptons and loosening both the track and calorimeter-based isolation criteria. The track-based isolation is changed to the intervals 5%-12% and 7%-50% for electrons and muons respectively, instead of < 4%; and the calorimeter-based isolation is loosened to < 7% from < 4%. The sample sizes of the MJ-templates are however rather low in the 2-tag regions. Since it is observed that the kinematic properties of the 1-tag and 2-tag events in the MJ-dominated regions are similar, 1-tag events are used to enrich the 2tag MJ templates. Events in the 1-tag category are promoted to the 2-tag category by assigning to the untagged jet an emulated M V 1c value drawn from the appropriate M V 1c distribution observed in the corresponding 2-tag MJ template. This distribution depends on the rank (leading or sub-leading) of the untagged jet and on the M V 1c value of the tagged jet. To cope with residual differences observed in some distributions between these pseudo-2-tag MJ events and the actual 2-tag MJ events, a reweighting is applied according to the M V 1c of the tagged jet and, for the electron sub-channel, according to ∆R(jet 1 , jet 2 ) and p W T . This procedure is applied in each of the 2-and 3-jet, LL, MM and TT categories. The normalisations of the MJ templates are then obtained from "multijet fits" to the E miss T distributions in the 2-and 3-jet, 1-and 2-tag (LL, MM and TT combined) categories, with floating normalisations for the templates of the other background processes. The templates for these other background processes are taken from the improved simulation mentioned above.
The MJ background in the 1-lepton channel is concentrated at low p W T , and in the 2-jet 2-tag sample with p W T < 120 GeV it ranges from 11% of the total background in the LL category to 6% in the TT category. The main purpose of including the p W T < 120 GeV intervals is to provide constraints on the largest backgrounds (V +jets and tt) in the global fit. Since the MJ background is twice as large for p W T < 120 GeV in the 1-electron subchannel than in the 1-muon sub-channel, only the 1-muon sub-channel is kept for p W T < 120 GeV so as to provide the most reliable constraints on the non-MJ backgrounds. The resulting loss in sensitivity is 0.6%. For p W T > 120 GeV, the MJ background is much smaller: 4% and 2% in the LL and TT categories, respectively, for 2-jet events.
A template for the MJ background in the 2-electron sub-channel is obtained in a similar way, by loosening identification and isolation requirements. The normalisation is performed by a fit to the dilepton-mass distribution, where the Z+jets and MJ components are free parameters, while the other backgrounds (mostly tt) are taken from the simulation. The MJ normalisation factors are found to be consistent in the 0-, 1-and 2-tag regions. To cope with the reduced size of the 2-tag MJ event sample, a procedure similar to that used in the 1-lepton channel is used, wherein the pretag MJ sample is weighted by its 2-tag fraction and combinations of M V 1c values are randomly assigned to the jets according to their distribution in the 2-tag MJ template. In the 2-muon sub-channel, the MJ background is found to be negligible from a comparison between data and MC prediction in the sidebands of the Z mass peak. Altogether, the MJ background amounts to <1% of the total background in the 2-lepton channel.

Corrections to the simulation
The large number of events in the 0-tag samples allows for detailed investigations of the modelling of the V +jet backgrounds by the version of the sherpa generator used in this analysis. Given that the search is performed in intervals of p V T , with the higher intervals providing most of the sensitivity, an accurate modelling of the p V T distribution is important. Figure 6(a) shows that the p W T spectrum for W +jets production in the 1-muon subchannel is softer in the data than in the simulation. It is found that this mismodelling is strongly correlated with a mismodelling of the ∆φ(jet 1 , jet 2 ) distribution, 11 shown in figure 7(a).
In order to address this mismodelling, the W l and W cl simulations are reweighted based on parameterised fits to the ratio of data to simulation in the ∆φ(jet 1 , jet 2 ) variable in the 0-tag region, where these backgrounds dominate. Four separate functions are derived: for the 2-and 3-jet categories and for p W T above and below 120 GeV. The reweighted ∆φ(jet 1 , jet 2 ) distributions show good agreement between data and simulation ( figure 7(b)). This reweighting increases (reduces) by 0.7% (5.6%) the normalisation of the p W T < (>) 120 GeV region. After this reweighting, the modelling of the whole p W T distribution is greatly improved, as can be seen in figure 6(b). This reweighting also improves the modelling of other distributions, most notably the dijet mass. It also improves the modelling in the 1-tag control regions and is therefore applied to the W l and W cl backgrounds in all regions of all channels. The numbers of W cc and W b background events in the 0-and 1-tag regions are too small to allow conclusive studies of their modelling, so no reweighting is applied to these backgrounds, but an associated systematic uncertainty is assessed instead, as explained in section 8.
A similar, but not identical, procedure is used for the Z+jet events in the 2-lepton channel. A ∆φ(jet 1 , jet 2 ) reweighting is found to improve the modelling of the p Z T distribution in the 0-tag regions. In the signal-depleted 2-tag regions obtained by exclusion of the 100-150 GeV dijet mass interval, there is no evidence of a need for a ∆φ(jet 1 , jet 2 ) correction, but the p Z T distribution is mismodelled. A dedicated p Z T reweighting is therefore   Figure 7. The ∆φ(jet 1 , jet 2 ) distribution observed in data (points with error bars) and expected (histograms) for the 2-jet 0-tag control region of the 1-muon sub-channel (MVA selection), (a) before and (b) after reweighting. All p W T intervals are combined. The multijet and simulated-background normalisations are provided by the multijet fits. The size of the statistical uncertainty is indicated by the shaded band. The data-to-background ratio is shown in the lower panel.
determined in the 2-tag regions. Applying the ∆φ(jet 1 , jet 2 ) reweighting to the Zl component and the p Z T reweighting to the Zc and Zb components leads to good modelling also in the 1-tag regions. This procedure is therefore used in all regions of all channels.
It has been observed in an unfolded measurement of the p T distribution of top quarks from pair production that the powheg generator interfaced to pythia predicts too hard a spectrum [73].
A correction accounting for this discrepancy is therefore applied at the level of generated top quarks in the tt production process.

Distributions in the dijet-mass analysis
Distributions of p V T and dijet mass are shown in figure 8 and in figures 9 and 10, respectively, for a selection of 2-tag signal regions of the dijet-mass analysis.
It can be seen that the background composition in the signal regions varies greatly from channel to channel, with the p V T interval, with the jet multiplicity, and with the b-tagging category considered. The signal-to-background ratio is larger in the 2-jet and tighter btagging categories, and lower in the 3-jet and loose b-tagging categories.
In the 2-lepton channel, the dominant background is always Zbb. There is also a significant contribution from tt in the lower p Z T intervals, and the relative diboson contribution increases with p Z T . For the 1-lepton channel and in the 2-jet samples the combination of W bb and tt accounts for most of the background in the most sensitive MM and TT categories, with the relative contribution of W bb and dibosons being largest in the tighter b-tagging categories and increasing with p W T . The flavours of the two selected jets from tt depend on the reconstructed p W T interval. In particular, at high p W T , when the b-quark and the W from a top-quark decay are collimated, there is a large bc contribution, where the c-quark comes from the W → cs decay. A significant contribution from single-top-quark production processes is also seen. In the 3-jet category, the tt contribution is in general dominant, but there are significant contributions from single-top-quark production (mostly in the W t channel) and from W bb, the latter increasing with p W T . A non-negligible contribution of MJ background can be seen in the lowest p W T intervals of the 2-jet category. In the 0-lepton channel, the main backgrounds arise from Zbb and tt, but the W bb background is also significant. The relative tt contribution is largest in the lowest p V T intervals, and larger in the 3-jet than in the 2-jet category.
The variations in the background composition between categories allow the global fit to disentangle the rates of the various background sources. The non-negligible contributions from the V cl and, to a lesser extent, the V l backgrounds are constrained in the global fit by the LL b-tagging categories, and also by the M V 1c distributions of the b-tagged jet in the 1-tag control regions. The 0-tag control regions are not taken into account in the global fit, but are mainly used to improve the modelling of the V +jets backgrounds, as explained in section 7.2.

Distributions in the multivariate analysis
Distributions of the BDT V H discriminants of the MVA are shown in figures 11 to 13 for 2tag signal regions in the 2-and 3-jet categories of the 0-, 1-and 2-lepton channels. It can be seen that the backgrounds dominated by light jets and, to a lesser extent, c-jets accumulate at lower values of the BDT V H discriminants, due to the inclusion of the MV1c information as inputs to the BDTs. The composition of the dominant backgrounds accumulating at higher values of the BDT V H discriminant is similar to what was already observed in the 2-tag signal regions of the dijet-mass analysis, namely V bb and tt, however with a larger contribution of the latter due to the looser requirement on ∆R(jet 1 , jet 2 ) in the MVA selection.
Distributions of the output of the MV1c b-tagging algorithm are shown in figure 14 for the b-tagged jet in the 1-tag control regions of the MVA, in the 2-jet category and for p V T > 120 GeV. In these distributions, the four bins correspond to the four b-tagging operating points and are ordered from left to right in increasing b-jet purity. It can be seen that these distributions, which are used in the global fit, provide strong constraints on the V c and V l backgrounds. As in the dijet-mass analysis, the 0-tag control regions are not used in the global fit.

8 Systematic uncertainties
The systematic uncertainties discussed in this section are: those of experimental origin; those related to the multijet background estimation; and those associated with the modelling of the simulated backgrounds and Higgs boson signal.

Experimental uncertainties
All relevant experimental systematic uncertainties are considered, such as those affecting the trigger selection, the object reconstruction and identification, and the object energy and momentum calibrations and resolutions. The most relevant ones are discussed in the following.
For the E miss T trigger, an efficiency correction is derived from W → µν+jets and Z → µ + µ − +jets events. This correction amounts to 4.5% for events with an E miss T of 100 GeV, the threshold required in the analysis, and is below 1% for E miss T > 120 GeV. The associated uncertainties arise from the statistical uncertainties of this method and differences observed in the two event classes. They are very small (below 1%) for the high E miss T (and thus high p V T ) intervals, and reach about 3% for the low E miss T interval of the 0-lepton channel (100-120 GeV).
For electrons and muons, uncertainties associated with the corrections for the trigger, reconstruction, identification and isolation efficiencies are taken into account. Uncertainties on energy and resolution corrections of the leptons are also considered. The impact of these uncertainties is very small, typically less than 1%.
Several sources contribute to the uncertainty of the jet energy scale (JES) [61] related e.g. to uncertainties from in situ calibration analyses, pile-up-dependent corrections and the flavour composition of jets in different event classes. After being decomposed into uncorrelated components, these are treated as independent sources in the analysis. The total relative systematic uncertainties on the JES range from about 3% to 1% for central jets with a p T of 20 GeV and 1 TeV, respectively. An additional specific uncertainty of about 1%-2% affects the energy calibration of b-jets. Small uncertainties on the corrections applied to improve the dijet-mass resolution are also included. Corrections and uncertainties are also considered for the jet energy resolution (JER) [74], with a separate contribution for b-jets. The total relative systematic uncertainty on the JER ranges from about 10% to 20%, depending on the η range, for jets with p T = 20 GeV to less than 5% for jets with The JES uncertainties are propagated to the E miss T , as are the much smaller uncertainties related to the energy and momentum calibration of leptons. An uncertainty on the E miss T also comes from the uncertainties on the energy calibration (8%) and resolution (2.5%) of calorimeter energy clusters not associated with any reconstructed object [66].
The b-tagging efficiencies for the different jet flavours are measured in both data and simulation using dedicated event samples [63,64]. The b-tagging efficiencies for simulated jets are corrected within intervals between operating points by MC-to-data SFs, which depend on the jet kinematics. For b-jets, the precision is driven by an analysis of tt events in final states containing two leptons. The MC-to-data SFs are close to unity, with uncertainties at the level of 2-3% over most of the jet p T range, reaching 5% for p T = 20 GeV and 8% above 200 GeV. The uncertainties, which depend on p T and on the interval between operating points, are decomposed into uncorrelated components and the ten most significant ones are kept in the analysis. It was checked that the neglected components have a negligible impact. The uncertainties on c-jets are decomposed into 15 components, and the uncertainties on light jets, to which the analysis is much less sensitive, are decomposed into ten components, accounted for in p T and η ranges. For b-and c-jets further uncertainties are added for the application of the additional MC-to-MC SFs to obtain generator specific MC-to-data SFs as explained in section 4. Half of the correction is used as systematic uncertainty. As discussed in section 4, a correction to c-jets in the V cc samples, for which parameterised tagging is used, is applied at low ∆R to the closest jet. Half of this correction is assigned as a systematic uncertainty.
The uncertainty on the integrated luminosity is 2.8%. It is derived, following the same methodology as that described in ref. [20], from a preliminary calibration of the luminosity scale derived from beam-separation scans performed in November 2012. It is applied to the signal and backgrounds estimates that are taken from simulation. A 4% uncertainty on the average number of interactions per bunch crossing is taken into account.

Uncertainties on the multijet backgrounds
In the 0-lepton channel, the robustness of the MJ background estimation is assessed by varying the min[∆φ(E miss T , jet)] values defining the B and D regions of the ABCD method, and by replacing the b-tagging fractions measured in region D by those measured in region B. A systematic uncertainty of 100% is assessed for this small (∼ 1%) background, uncorrelated between 2-and 3-jet, 1-and 2-b-tag categories. The MJ background in the 2-lepton channel is also at the per-cent level, and an uncertainty of 100% is assigned.
In the 1-lepton channel, normalisation uncertainties arise from the statistical uncertainties of the multijet fits and from uncertainties on the non-MJ background subtractions performed to construct the MJ templates. Normalisation uncertainties are also assessed in the LL, MM and TT categories to cover differences between multijet fits performed inclusively in the 2-tag regions and in the individual categories. In the 2-jet 2-tag region of the electron sub-channel, the overall normalisation uncertainties amount to 11%, 14% and 22% in the LL, MM and TT categories, respectively. In the muon sub-channel, the corresponding uncertainties are about three times larger because of the smaller size of the MJ-enriched samples.
In the 1-lepton channel, shape uncertainties are assessed in the various regions by comparison of evaluations obtained using MJ-enriched samples defined by isolation requirements different from those applied in the nominal selections. In the electron sub-channel, an alternative template is constructed with a track-based isolation in the 12% to 50% interval, and another alternative template with a calorimeter-based isolation in the 0% to 4% interval. In the muon sub-channel, the results obtained with the nominal MJ template are compared with those obtained with tighter or looser isolation requirements, defined by track-based isolation intervals of 7%-9.5% and 9.5%-50%, respectively. Furthermore, half of the ∆R(jet 1 , jet 2 ) and p W T reweightings mentioned in section 7.1 for the electron sub-channel are taken as systematic uncertainties.

Uncertainties on the modelling of the simulated backgrounds
The physics-modelling systematic uncertainties evaluated focus on the quantities that are used in the global fit, i.e., those affecting the jet multiplicities, the p V T distributions, the flavour composition and the m bb distributions. For the MVA, systematic uncertainties affecting the other variables used as inputs to the BDTs are also considered. Whenever possible, dedicated control regions are used to extract information directly from the data. This is the case for Z+jets and W +light jets. In other cases, uncertainties are assessed by comparison of MC predictions based on a variety of generators with the nominal ones.
Details of the assessment of systematic uncertainties are provided below in the context of the MVA. When systematic uncertainties are derived from a comparison between generators, all relevant variables are considered independently. The variable showing the largest discrepancy in some generator with respect to the nominal generator is assigned an uncertainty covering this discrepancy, which is symmetrised. If, once propagated to the BDT V H discriminant, this uncertainty is sufficient to cover all variations observed with the different generators, it is considered to be sufficient. If not, an uncertainty is considered in addition on the next most discrepant variable and the procedure is iterated until all variations of the BDT V H discriminant are covered by the assigned uncertainties.
A given source of systematic uncertainty can affect different analysis regions. Whether such an uncertainty should be treated as correlated or not depends on whether constraints resulting from the global fit should be propagated from one region to another. Details of the procedures leading to such decisions are provided in section 9.2.
A summary of the systematic uncertainties affecting the modelling of the backgrounds can be found in table 5.
Top-quark-pair background: As explained in section 7, the top-quark p T distribution is reweighted at generator level to bring it into agreement with measurement [73]. A systematic uncertainty amounting to half of this correction is assigned, correlated across channels.
The predictions of the nominal tt generator (powheg+pythia) are compared, focussing on the 1-lepton channel selection, with those obtained using a variety of generators differing by the PDF choice (powheg+pythia with HERAPDF [75]), by the parton showering and hadronisation scheme (powheg+herwig), by the implementation of the NLO matrix element and the matching scheme (mc@nlo [76]+herwig), by the amount of initialand final-state radiation (ISR/FSR) using AcerMC+pythia, or by the implementation of higher-order tree-level matrix elements (alpgen [77]+pythia). It is found that, in general, the largest deviations are observed for alpgen, which is therefore used to assess further systematic uncertainties as explained below.
In the global fit, the normalisation of the tt background in the 2-jet category is left floating freely, independently in each of the lepton channels. An uncertainty of 20% on the 3-to-2-jet ratio is estimated from the generator comparisons explained above. In the global fit, this uncertainty is treated as correlated between the 0-and 1-lepton channels, and uncorrelated with the 2-lepton channel.
The shape of the m bb distribution is also studied with the same set of generators, leading to correlated shape uncertainties for 2-and 3-jet events, and for p V T < 120 GeV and p V T > 120 GeV. The associated variation is larger in the higher p V T interval: for 2-jet events, when it increases the distribution by 3% for m bb = 50 GeV, it decreases it by 1% at 200 GeV; the effect is similar, but of opposite sign, for 3-jet events.
The same procedure is used for the p V T distribution, from which a 7.5% uncertainty is assessed on the normalisation of the p V T > 120 GeV interval. Finally, the same approach calls for a shape systematic uncertainty on the E miss T distribution in the 1-lepton channel, different but correlated between p V T < 120 GeV and p V T > 120 GeV. This uncertainty is not applied in the 0-and 2-lepton channels.
Single-top-quark background: The theoretical uncertainties on the cross sections of the three processes contributing to single-top production are 4%, 4%, and 7% for the s-channel, t-channel, and W t production, respectively [78].
The predictions of the nominal generators (powheg+pythia for the s-channel and for W t production; acerMC+pythia for the t-channel) are compared, after the 1-lepton channel selection, with those obtained using a variety of generators. For the s-channel, the comparison is made with acerMC and mc@nlo; for W t production with acerMC, powheg+herwig, and mc@nlo; and for the t-channel with amc@nlo 12 [81,82]+herwig. For all three processes, the impact of ISR/FSR is evaluated using acerMC. For W t production, there are interference effects with tt production, which need to be considered. Two methods are available for this: the Diagram Removal (DR) and the Diagram Subtraction (DS) schemes [83]. The former is used in the nominal generation, and the second for comparison.
Uncertainties on the acceptance for each of the three processes are taken as the largest deviations observed, separately for p V T < 120 GeV and p V T > 120 GeV, and for 2-and 3-jet events. They can be as large as 52% for 2-jet events in the t-channel at low p V T , of the order of 5% for W t production (except for 3-jet events at high p V T : 15%), and typically 20% for the s-channel.
In addition to the acceptance uncertainties, the effects of the model variations described above on variables input to the BDT are evaluated and three shape systematic uncertainties are found to be needed in W t production. The first uncertainty is on the shape of the m bb distribution in the high p V T interval for 2-jet events where, when a shift from the nominal model increases the rate by 20% for m bb = 50 GeV, it decreases it by 40% at 200 GeV. A second uncertainty is on the m bb shape for 3-jet events, where the corresponding shifts are 25% and 20%. Finally, a third uncertainty is on the p T distribution of the second-leading jet in the low p V T interval for 2-jet events. Z+jets background: As explained in section 7, ∆φ(jet 1 , jet 2 ) and p Z T reweightings are applied to the Zl and Zc+Zb components, respectively. For the ∆φ(jet 1 , jet 2 ) reweighting, a systematic uncertainty amounting to half of the correction is assigned to the Zl component, while an uncertainty amounting to the full correction is assigned to the Zc+Zb components. This is done separately for 2-and 3-jet events, and all these uncertainties are treated as uncorrelated. For the p Z T reweighting, uncorrelated systematic uncertainties of half the correction are assigned to the Zl and Zc + Zb components. The notation Zc + Zb is meant to indicate that a systematic uncertainty is treated as correlated between the Zc and Zb components.
The normalisation and the 3-to-2-jet ratio for the Zl background are determined from data in the 0-tag region of the 2-lepton channel, both with an uncertainty of 5%. The normalisations of the Zcl and Zbb backgrounds are left free in the global fit. The uncertainties on the 3-to-2-jet ratios for the Zcl and Z+hf components are assessed through a comparison of alpgen with the nominal sherpa generator in the 2-tag region of the 2-lepton channel; these are 26% for Zcl and 20% for Z+hf. The same procedure is used to estimate uncertainties on the flavour fractions within Z+hf events, yielding 12% for each of bl/bb, cc/bb and bc/bb, with bl/bb uncorrelated between 2-and 3-jet samples.
The shape of the m bb distribution is compared between data and simulation in the 2-tag region of the 2-lepton channel, excluding the 100-150 GeV range, from which a shape uncertainty is derived that, when it increases the dijet-mass distribution by 3% at 50 GeV, it decreases it by 5% at 200 GeV. This uncertainty is applied uncorrelated to the Zl and Zb + Zc components. The differences between alpgen and sherpa are covered by this uncertainty.
W +jets background: As explained in section 7, a ∆φ(jet 1 , jet 2 ) reweighting is applied to the W l and W cl components. Uncorrelated systematic uncertainties amounting to half of the correction are assigned to these two components, for each of the 2-and 3-jet categories. For the W cc + W b component, no reweighting is applied but a systematic uncertainty is assigned, equal to the full correction applied to the W l and W cl components, uncorrelated between 2-and 3-jet events.
The normalisation and the 3-to-2-jet ratio for the W l background are taken directly from simulation, both with a 10% uncertainty. This is based on the agreement observed between data and prediction in the 0-tag sample. The 3-to-2-jet ratio for the W cl background is also assigned an uncertainty of 10%. The normalisations of the W cl and W bb backgrounds are left free in the global fit.
To assign further uncertainties on the W bb background, for which dedicated control regions are not available in the data, extensive comparisons are performed at generator level, with kinematic selections mimicking those applied after reconstruction. The predictions of the sherpa generator are compared to those of powheg+pythia8, of amc@nlo+herwig++ [84] and of alpgen+herwig. Comparisons are also made between samples generated with amc@nlo with renormalisation (µ R ) and factorisation (µ F ) scales 13 independently modified by factors of 2 or 0.5 and also with different PDF sets (CT10, MSTW2008NLO and NNPDF2.3 [85]). As a result, a 10% uncertainty is assigned to the 3-to-2-jet ratio, taken as correlated between all W +hf processes. Shape uncertain- 13 The nominal scales are taken as µR = µF = [m 2 ties are also assessed for the m bb and p W T distributions. When the former increases the dijet-mass distribution by 23% at 50 GeV, it decreases it by 28% at 200 GeV. It is taken as uncorrelated for W l, W cl, W bb + W cc and W bl + W bc. For W bb + W cc, it is furthermore uncorrelated among p W T intervals (with the three highest intervals correlated for the dijetmass analysis). When the latter shape uncertainty increases the p W T distribution by 9% for p W T = 50 GeV, it decreases it by 23% at 200 GeV. It is taken as correlated for all W +hf processes, and uncorrelated between the 2-and 3-jet samples.
Predictions using the inclusive production of all flavours by sherpa and alpgen 14 are compared after full reconstruction and event selection to assign uncertainties on the flavour fractions that take properly into account heavy-flavour production at both the matrixelement and parton-shower levels. The following uncertainties are assigned in the W +hf samples: 35% for bl/bb and 12% for each of bc/bb and cc/bb. The uncertainty on bl/bb is uncorrelated between p W T intervals (with the three highest intervals correlated for the dijet-mass analysis). The scale uncertainties are evaluated by varying simultaneously µ R and µ F by factors of 2 or 0.5. Since the analysis is performed in p V T intervals and in exclusive 2-and 3-jet categories, the uncertainties are evaluated for each channel separately in those intervals and categories (2 and 3 final-state partons within the nominal selection acceptance) following the prescription of ref. [86]. This procedure leads, in each p V T interval, to two uncorrelated uncertainties in the 2-jet category, one for 2+3 jets inclusively and one associated with the removal of 3-jet events, and to one in the 3-jet category anti-correlated with the latter uncertainty in the 2-jet category. These uncertainties are largest at high p V T . For p V T > 200 GeV, the two uncertainties affecting the 2-jet category can be as large as 29% and 22% in the W Z channel, roughly half this size in the ZZ channel and intermediate for W W ; and the uncertainty affecting the 3-jet category is about 17% in all channels.
The uncertainties due to the PDF choice are evaluated according to the PDF4LHC recommendation [87], i.e., using the envelope of predictions from the CT10, MSTW2008NLO, and NNPDF2.3 PDF sets and their associated uncertainties. They range from 2% to 4%, with no p V T dependence observed. The shape of the reconstructed Z → bb lineshape in V Z production is affected by the parton-shower and hadronisation model. A shape-only systematic uncertainty is assessed by comparing the lineshapes obtained with the nominal powheg+pythia8 generator and with herwig. The relative difference between the shapes is 20% for a dijet mass around 125 GeV.

Uncertainties on the signal modelling
The qq → W H, qq → ZH, and gg → ZH signal samples are normalised respectively to their inclusive cross sections as explained in section 3. The uncertainties on these cross sections [88] include those arising from the choice of scales µ R and µ F and of PDFs.
The scale uncertainty is 1% for W H production. It is larger (3%) for ZH production, due to the contribution of the gluon-gluon initiated process. Under the assumption that the scale uncertainties are similar (1%) for qq → W H and qq → ZH, a conservative uncertainty of 50% is inferred for gg → ZH. The same procedure leads to PDF uncertainties of 2.4% for qq → (W/Z)H and 17% for gg → ZH. The relative uncertainty on the Higgs boson branching ratio to bb is 3.3% for m H = 125 GeV [11]. The contribution of decays to final states other than bb is verified to amount to less than 1% after selection.
Acceptance uncertainties due to the choice of scales are determined from signal samples generated with powheg interfaced to pythia8, with µ R and µ F varied independently by factors of 2 or 0.5. The procedure advocated in ref. [86] is used, after kinematic selections applied at generator level, leading to acceptance uncertainties of 3.0%, 3.4% and 1.5% for qq → W H, qq → ZH and gg → ZH, respectively, for the 2-and 3-jet categories combined, and of 4.2%, 3.6% and 3.3% for the 3-jet category. The latter uncertainty is anti-correlated with an acceptance uncertainty associated with the removal of 3-jet events from the 2+3-jet category to form the 2-jet category. In addition, the p V T spectrum is seen to be affected, and shape uncertainties are derived. For the qq → (W/Z)H samples, when they increase the distribution by 1% for p V T = 50 GeV, they decrease it by 3% at 200 GeV. These variations are 2% and 8%, respectively, for the gg → ZH samples.
Acceptance uncertainties due to the PDF choice are determined in a similar way, following the PDF4LHC prescription. They range from 2% in the 2-jet gg → ZH samples to 5% in the 3-jet qq → ZH samples. There is no evidence of a need for p V T shape uncertainties related to the PDFs.
The applied uncertainties on the shape of the p V T spectrum associated with the NLO electroweak corrections [38] are typically at the level of 2%, increasing with p V T to reach 2.5% in the highest p V T interval. The effect of the underlying-event modelling is found to be negligible, using various pythia tunes. The effect of the parton-shower modelling is examined by comparison of simulations by powheg interfaced with pythia8 and with herwig. Acceptance variations of 8% are seen, except for 3-jet events in the p V T > 120 GeV interval, where the variation is at the level of 13%. These variations are taken as systematic uncertainties.
A summary of the systematic uncertainties affecting the modelling of the Higgs boson signal is given in  Table 5. Summary of the systematic uncertainties on the signal and background modelling. An "S" symbol is used when only a shape uncertainty is assessed.
9 Statistical procedure

General aspects
A statistical fitting procedure based on the Roostats framework [89,90] is used to extract the signal strength from the data. The signal strength is a parameter, µ, that multiplies the SM Higgs boson production cross section times branching ratio into bb. A binned likelihood function is constructed as the product of Poisson-probability terms over the bins of the input distributions involving the numbers of data events and the expected signal and background yields, taking into account the effects of the floating background normalisations and the systematic uncertainties. The different regions entering the likelihood fit are summarised in table 6. In the dijetmass analysis, the inputs to the "global fit" are the m bb distributions in the 81 2-tag signal regions defined by three channels (0, 1 or 2 leptons), up to five p V T intervals, two numberof-jet categories (2 or 3), and three b-tagging categories (LL, MM and TT). Here and in the rest of this section, m bb distributions are to be understood as transformed distributions, as explained in section 6. In the MVA, the inputs are the BDT V H discriminants in the 24 2-tag signal regions defined by the three lepton channels, up to two p V T intervals, the two numberof-jet categories, and b-tagging categories. In the 1-lepton channel, the b-tagging categories are LL, MM and TT. In the 0-and 2-lepton channels, they are the LL category and a combined MM and TT category (MM+TT). 15  The impact of systematic uncertainties on the signal and background expectations is described by nuisance parameters (NPs), θ, which are constrained by Gaussian or lognormal probability density functions, the latter being used for normalisation uncertainties to prevent normalisation factors from becoming negative in the fit. The expected numbers of signal and background events in each bin are functions of θ. The parameterisation of each NP is chosen such that the predicted signal and background yields in each bin are log-normally distributed for a normally distributed θ. For each NP, the prior is added as a penalty term to the likelihood, L(µ, θ), which decreases it as soon as θ is shifted away from its nominal value. The statistical uncertainties of background predictions from simulation are included through bin-by-bin nuisance parameters.

MVA
Channel 0-lepton 1-lepton 2-lepton 0-lepton 1-lepton 2-lepton  The test statistic q µ is then constructed from the profile likelihood ratio whereμ andθ are the parameters that maximise the likelihood with the constraint 0 ≤μ ≤ µ, andθ µ are the nuisance parameter values that maximise the likelihood for a given µ. This test statistic is used for exclusion intervals derived with the CL s method [91,92]. To measure the compatibility of the background-only hypothesis with the observed data, the test statistic used is q 0 = −2 ln Λ 0 . The results are presented in terms of: the 95% confidence level (CL) upper limit on the signal strength; the probability p 0 of the background-only hypothesis; and the best-fit signal-strength valueμ with its associated uncertainty σ µ . The fittedμ value is obtained by maximising the likelihood function with respect to all parameters. The uncertainty σ µ is obtained from the variation of 2 ln Λ µ by one unit, where Λ µ is now defined without the constraint 0 ≤μ ≤ µ. Expected results are obtained in the same way as the observed results by replacing the data in each input bin by the expectation from simulation with all NPs set to their best-fit values, as obtained from the fit to the data. 16 While the analysis is optimised for a Higgs boson of mass 125 GeV, results are also extracted for other masses. These are obtained without any change to the dijet-mass analysis, except for the binning of the transformed m bb distribution, which is reoptimised. For the MVA, it is observed that the performance degrades for masses away from 125 GeV, for which the BDTs are trained. This is largely due to the fact that m bb is an input to the BDTs. The MVA results for other masses are therefore obtained using BDTs retrained for each of the masses tested at 5 GeV intervals between 100 and 150 GeV. The details Process Scale factor tt 0-lepton 1.36 ± 0.14 tt 1-lepton 1.12 ± 0.09 tt 2-lepton 0.99 ± 0.04 W bb 0.83 ± 0.15 W cl 1.14 ± 0.10 Zbb 1.09 ± 0.05 Zcl 0.88 ± 0.12 Table 7. Factors applied to the nominal normalisations of the tt, W bb, W cl, Zbb, and Zcl backgrounds, as obtained from the global MVA fit to the 8 TeV data. The tt background is normalised in the 2-jet category independently in each of the lepton channels. The errors include the statistical and systematic uncertainties.
provided in the rest of this section refer to the analysis performed for a Higgs boson mass of 125 GeV.

Technical details
The data have sufficient statistical power to constrain the largest background-normalisation NPs, which are left free to float in the fit. This applies to the tt, W bb, W cl, Zbb and Zcl processes. The corresponding factors applied to the nominal background normalisations as resulting from the global fit of the MVA to the 8 TeV data, are shown in table 7. As stated in section 8, the tt background is normalised in the 2-jet category independently in each of the lepton channels. The reason for uncorrelating the normalisations in the three lepton channels is that the regions of phase space probed in the 2-jet category are very different between the three channels. In the 2-lepton channel, the tt background is almost entirely due to events in which both top quarks decay into (W → ν)b (fully leptonic decays) with all final-state objects detected (apart from the neutrinos). In the 1-lepton channel, it is in part due to fully leptonic decays with one of the leptons (often a τ lepton) undetected, and in part to cases where one of the top quarks decays as above and the other into (W → qq )b (semileptonic decays) with a missed light-quark jet. Finally, in the 0-lepton channel, the main contributions are from fully leptonic decays with the two leptons undetected and from semileptonic decays with a missed lepton and a missed light-quark jet; here again, the missed leptons are often τ leptons. Futhermore, the p V T range probed is different in the 0-lepton channel: p V T > 100 GeV in contrast to being inclusive in the 1-and 2-lepton channels.
As described in detail in section 8, a large number of sources of systematic uncertainty are considered. The number of nuisance parameters is even larger because care is taken to appropriately uncorrelate the impact of the same source of systematic uncertainty across background processes or across regions accessing very different parts of phase space. This avoids unduly propagating constraints. For instance, the tt background contributes quite differently in the 2-tag 3-jet regions of the 0-and 1-lepton channels on one side, and of the 2-lepton channel on the other. In the 0-and 1-lepton channels, it is likely that a jet from a t → b(W → qq) decay is missed, while in the 2-lepton channel it is likely that an ISR or FSR jet is selected. This is the reason for not correlating, between these two sets of lepton channels, the systematic uncertainty attached to the 3-to-2 jet ratio for the tt background. Another example is the ∆φ reweighting in the W +jets processes, which is derived in the 0-tag sample and applied to the W cl and W l backgrounds. As explained in section 7, this reweighting is not applied to the W cc and W b backgrounds but, in the absence of further information, an uncertainty is assessed for the ∆φ distributions of the W cc and W b backgrounds, uncorrelated with the uncertainty applied to the W cl and W l backgrounds. Altogether, the fit has to handle almost 170 NPs, with roughly half of those being of experimental origin.
The fit uses templates constructed from the predicted yields for the signal and the various backgrounds in the bins of the input distribution in each region. The systematic uncertainties are encoded in templates of variations relative to the nominal template for each up-and-down (±1σ) variation. The limited size of the MC samples for some simulated background processes in some regions can cause large local fluctuations in templates of systematic variations. When the impact of a systematic variation translates into a reweighting of the nominal template, no statistical fluctuations are expected beyond those already present in the nominal template. This is the case, for instance, for the b-tagging uncertainties. For those, no specific action is taken. On the other hand, when a systematic variation may introduce changes in the events selected, as is the case for instance with the JES uncertainties, additional statistical fluctuations may be introduced, which affect the templates of systematic variations. In such cases, a smoothing procedure is applied to each systematic-variation template in each region. Bins are merged based on the constraints that the statistical uncertainty in each bin should be less than 5% and that the shapes of the systematic-variation templates remain physical: monotonous for a BDT V H discriminant, and with at most one local extremum for a dijet mass.
Altogether, given the number of regions and NPs, the number of systematic-variation template pairs (+1σ and −1σ) is close to twenty thousand, which renders the fits highly time consuming. To address this issue, systematic uncertainties that have a negligible impact on the final results are pruned away, region by region. A normalisation (shape) uncertainty is dropped if the associated template variation is below 0.5% (below 0.5% in all bins). Additional pruning criteria are applied to regions where the signal contribution is less than 2% of the total background and where the systematic variations impact the total background prediction by less than 0.5%. Furthermore, shape uncertainties are dropped if the up-and down-varied shapes are more similar to each other than to the nominal shape. This is only done for those systematic uncertainties where opposite-sign variations are expected. This procedure reduces the number of systematic-variation templates by a factor of two.
The behaviour of the global fit is evaluated by a number of checks, including how much each NP is pulled away from its nominal value, how much its uncertainty is reduced with respect to its nominal uncertainty, and which correlations develop between initially uncorrelated systematic uncertainties. To assess these effects, comparisons are made between the expectations from simulation and the observations in the data. When differences arise, their source is investigated, and this leads in a number of cases to uncorrelating further systematic uncertainties by means of additional NPs. This is to prevent a constraint from being propagated from one kinematic region to another if this is not considered well motivated. The stability of the results is also tested by performing fits for each lepton channel independently, which can also help to identify from which region each constraint originates.
It is particularly useful to understand which systematic uncertainties have the largest impact on the final results, and therefore should be considered with greater care. For this purpose, a so-called ranking of the NPs is established. For each systematic uncertainty, the fit is performed again with the corresponding NP fixed to its fitted value,θ, shifted up or down by its fitted uncertainty, with all the other parameters allowed to vary so as to take properly into account the correlations between systematic uncertainties. The magnitude of the shift in the fitted signal strengthμ is a measure of the observed impact of the considered NP. The same procedure is repeated, using the nominal values of the NP and of its associated uncertainty to provide its expected impact. To reduce the computation time and therefore to enable more detailed fit studies, some of the NPs which have a negligible effect on the expected fitted uncertainty onμ are dropped: those associated with the muon momentum scale and resolution and with the electron energy resolution; one of those associated with the jet energy scale; and those associated with the quark-gluon composition of the backgrounds, which turn out to be fully correlated with those associated with the difference in energy response between quark and gluon jets. The ranking of the systematic uncertainties obtained with the MVA applied to the 8 TeV data is shown in figure 15 with the NPs ordered by decreasing post-fit impact onμ. The five systematic uncertainties with the largest impact are, in descending order, those: on the dijet-mass shape for the W bb and W cc backgrounds for p W T > 120 GeV; on the W bl/W bb normalisation ratio for p W T > 120 GeV; on the W bb background normalisation; on the p W T shape in the 3-jet category for the W +hf background; and on the signal acceptance due to the parton-shower modelling.
Since the same data sample is used for both the dijet-mass analysis and the MVA, the consistency of the two final results, i.e., the two fitted signal strengths, is assessed using the "bootstrap" method [93]. A large number of event samples are randomly extracted from the simulated samples, with the signal strength µ set to unity, the SM value. Each of them is representative of the integrated luminosity used for the data analysis in terms of expected yields as well as of associated Poisson fluctuations. Each of these event samples is subjected to both the dijet-mass analysis and the MVA, thus allowing the two fittedμ values to be compared and their statistical correlation to be extracted. At the same time, the expected distributions ofμ and of its uncertainty are determined for both the dijet-mass analysis and the MVA.

Cross checks using diboson production
Diboson production with a Z boson decaying to a pair of b-quarks and produced in association with either a W or Z boson has a signature very similar to the one considered in this analysis, but with a softer p bb T spectrum and with a m bb distribution peaking at lower values. The cross section is about five times larger than for the SM Higgs boson with a mass of 125 GeV. Diboson production is therefore used as a validation of the analysis procedure. For the dijet-mass analysis, the binning of the transformed m bb distribution is reoptimised for the Z boson mass. For the MVA, the BDTs are retrained to discriminate the diboson signal from all backgrounds (including the Higgs boson). So-called "V Z fits" are performed, where the normalisation of the diboson contributions is allowed to vary with a multiplicative scale factor µ V Z with respect to the SM expectation, except for the small contribution from W W production, which is treated as a background and constrained within its uncertainty. A SM Higgs boson with m H = 125 GeV is included as a background, with a production cross section at the SM value with an uncertainty of 50%.

49
For the 7 TeV dataset, only a dijet-mass analysis is performed. It is similar but not identical to the corresponding analysis for the 8 TeV data, since some of the object reconstruction tools, such as the simultaneous use of multiple b-tagging operating points, are not available for the 7 TeV data. In this section, the main differences between the two analyses are summarised.

Object reconstruction
The three categories of electrons are selected according to the loose, medium, and tight criteria defined in ref. [53]. The transverse energy threshold for loose electrons is set at 10 GeV, instead of 7 GeV. For tight electrons and muons, the calorimeter isolation requirement is loosened from 4% to 7%. The procedure used to avoid double-counting of reconstructed muon and jet objects removes muons separated by ∆R < 0.4 from any jet, irrespective of the multiplicity of tracks associated with the jet. For jets, the global sequential calibration is not used and the requirement on the fraction of track p T carried by tracks originating from the primary vertex is raised from 50% to 75%. The b-tagging algorithm used is MV1 [94][95][96][97] instead of MV1c, with a single operating point to define b-tagged jets corresponding to an efficiency of 70%.

Event selection
The selection criteria are those used in the dijet-mass analysis of the 8 TeV data, with the following differences. With only one b-tagging operating point, a single 2-tag category is defined. In the 0-lepton channel, the 100-120 GeV p V T interval is not used, and the criterion for p jet i T is not applied. In the 1-muon sub-channel, the E miss T trigger is used only in the 2-jet 2-tag category for p W T > 160 GeV, and the events selected only by the E miss T trigger constitute distinct signal regions. In the 1-lepton channel, m W T > 40 GeV is required for p W T < 160 GeV; there is no requirement on H T , but E miss T > 25 GeV is imposed for p W T < 200 GeV. In the 2-lepton channel, no kinematic fit is performed. Different leptonflavour events are used to define a 2-tag tt-dominated e-µ control region in the 2-lepton channel; the region is defined to be inclusive in jet multiplicity (≥ 2).

Background composition and modelling
The templates used to model the MJ background in the 1-lepton channel are obtained by inversion of the track-based isolation criterion, and the normalisations are performed on the m W T and E miss T distributions in the electron and muon sub-channels, respectively.
Corrections to the simulation of the V +jet backgrounds are determined in the 1-and 2-lepton 0-tag samples inclusively in p V T , and applied as ∆φ(jet 1 , jet 2 ) reweightings to the W +jet and Z+jet components in all channels.

Systematic uncertainties
The differences with respect to the 8 TeV data analysis arise mainly from experimental systematic uncertainties. Many of them are evaluated using independent data samples (7 TeV data vs. 8 TeV data), e.g., E miss T trigger efficiencies or JES. Others refer to different identification algorithms, e.g., electron identification or b-tagging. The uncertainty on the integrated luminosity is 1.8% for the 2011 dataset [20].
The uncertainties affecting the signal and background simulation are estimated in a similar way as for the 8 TeV data, i.e., from comparisons between the baseline and alternative generators. For V +jets, the V bc and V bb backgrounds are merged into a single component. For dibosons, the baseline generator is herwig instead of pythia8; systematic uncertainties on the 3-to-2-jet ratios and on the p V T distributions are estimated at generator level for the different diboson processes by comparison with mcfm at NLO. For the signal, the gg → ZH samples are generated with pythia8 instead of powheg; for all processes, the alternative generators used are pythia6 and herwig.
Due to these differences, and because the phase space within which the systematic uncertainties are evaluated is more restricted than for the MVA applied to the 8 TeV data, all systematic uncertainties, except for the theoretical uncertainties on the signal, are treated as uncorrelated between the analyses of the 7 TeV and 8 TeV data in the global fit to the combined dataset, in which the MVA is used for the 8 TeV data.

Statistical procedure
The inputs to the likelihood fits are the m bb distributions (not transformed) in the 28 p V T intervals of the 2-tag signal regions. Additional inputs are the event yields in the five p V T intervals of the 2-tag e-µ control region and the 26 p V T intervals of the 1-tag control regions. For the tt background, a single floating normalisation is determined by the global fit, instead of one in each of the 0-, 1-, and 2-lepton channels. In addition to the other floating normalisations mentioned for the 8 TeV data analysis, the MJ background normalisation is also left freely floating in all regions of the 1-lepton channel, except in the 2-tag 3-jet regions where the statistical power of the data is not sufficient to provide a reliable constraint. In these regions, an uncertainty of 30% is assigned to the MJ background normalisation, using a method similar to what is done for the analysis of the 8 TeV data.

51
Events / 20 GeV As explained in section 9, the results are obtained from maximum-likelihood fits to the data, where the inputs are the distributions of final discriminants in the 2-tag signal regions and the M V 1c distributions of the b-tagged jet in the 1-tag control regions, with nuisance parameters either floating or constrained by priors. The final discriminants are the transformed m bb for the dijet-mass analysis and the BDT V H discriminants for the MVA.
Results are extracted independently for the dijet-mass and multivariate analyses. Since the MVA has better expected sensitivity to a Higgs boson signal, it is used for the nominal results, while the dijet-mass analysis provides a cross-check (cf. section 11.2). For the 7 TeV data, however, only a dijet-mass analysis is performed. Unless otherwise specified, all results refer to a Higgs boson mass of 125 GeV. In the following, the fitted signal-strength parameters are simply denoted µ and µ V Z , rather thanμ andμ V Z .

Nominal results
The nominal results are obtained from global fits using the MVA for the 8 TeV data and the dijet-mass analysis of the 7 TeV data.
Distributions of the BDT V H discriminant and of M V 1c, with background normalisations and nuisance parameters adjusted by the global fit to the 8 TeV data were already presented in section 7.4. Dijet-mass distributions in the 7 TeV data analysis were shown in section 10. Agreement between data and estimated background is observed within the uncertainties shown by the hatched bands. Figure 18 shows the 95% CL upper limits on the cross section times branching ratio for pp → (W/Z)(H → bb) in the Higgs boson mass range 110-140 GeV. The observed limit for m H = 125 GeV is 1.2 times the SM value, to be compared to an expected limit, in the absence of signal, of 0.8. For the 8 TeV (7 TeV) data only, the observed and expected limits are 1.4 (2.3) and 0.8 (3.2), respectively.
The probability p 0 of obtaining from background alone a result at least as signal-like as the observation is 8% for a tested Higgs boson mass of 125 GeV; in the presence of a Higgs boson with that mass and the SM signal strength, the expected p 0 value is 0.5%. This corresponds to an excess observed with a significance of 1.4σ, to be compared to an expectation of 2.6σ. For the 8 TeV data alone, the observed and expected levels of significance are 1.7σ and 2.5σ, respectively. For the 7 TeV data alone, the expected significance is 0.7σ and there is a deficit rather than an excess in the data, as can be seen in figure 17. Figure 19       data, the fitted value of the signal-strength parameter is µ = 0.65 ± 0.32(stat.) ± 0.26(syst.). For the 7 TeV data, it is µ = −1.6 ± 1.2(stat.) ± 0.9(syst.).
Fits are also performed where the signal strengths are floated independently for (i) the W H and ZH production processes, or (ii) the three lepton channels. The results of these fits are shown in figures 21 and 22 respectively. The consistency of the fitted signal strengths in the W H and ZH processes is at the level of 20%. For the lepton channels, the consistency between the three fitted signal strengths is at the level of 72% for the 7 TeV data, and of 8% for the 8 TeV data. The low values of the fitted signal strengths for the ZH process and in the 0-lepton channel are associated with the data deficit observed in the most sensitive bins of the BDT V H discriminant in the 0-lepton channel, shown in figure 11(a).

Cross-check with the dijet-mass analysis
The distributions of m bb in the dijet-mass analysis, with background normalisations and nuisance parameters adjusted by the global fit to the 8 TeV data were already presented in section 7.3. Agreement between data and estimated background is observed within the uncertainties shown by the hatched bands.
In the dijet-mass analysis, a µ value of 1.23 ± 0.44(stat.) ± 0.41(syst.) is obtained for the 8 TeV dataset. The consistency of the results of the three lepton channels is at the level of 8%. Using the "bootstrap" method mentioned in section 9.2, the results for the 8 TeV data with the dijet-mass analysis and with the MVA are expected to be 67% correlated, and the observed results are found to be statistically consistent at the level of 8%. The observed significance in the dijet-mass analysis is 2.2σ. The expected significance is 1.9σ, to be compared to 2.5σ for the MVA, which is the reason for choosing the MVA for the   nominal results. Figure 24 shows the m bb distribution in data after subtraction of all backgrounds except for diboson production for the 7 and 8 TeV data, as obtained with the dijet-mass analysis. In this figure, the contributions of all 2-tag signal regions in all channels are summed weighted by their respective ratios of expected Higgs boson signal to fitted background. The V Z contribution is clearly seen, located at the expected Z mass. The Higgs boson signal contribution is shown as expected for the SM cross section.

Cross-check with the diboson analysis
To validate the analysis procedures, V Z fits are performed, the technical details of which were discussed in section 9.3.
The measured signal strength for the 8 TeV dataset with the MVA is µ V Z = 0.77 ± 0.10(stat.) ± 0.15(syst.). This result is consistent with the observations already made on figure 24. The signal strengths obtained for the three lepton channels are consistent at the 85% level. In the dijet-mass analysis at 8 TeV, a µ V Z value of 0.79±0.11(stat.)±0.16(syst.) is obtained. The correlation of the systematic uncertainties on µ V Z and µ is 35% in the MVA and 67% in the dijet-mass analysis. Fits are performed with the same final discriminants as used to obtain the results for the Higgs boson based on the 8 TeV dataset, but with both the V Z and Higgs boson signal-strength parameters µ V Z and µ left freely floating. The result for the Higgs boson signal strength is unchanged from the nominal result, and the statistical correlation between the two signal-strength parameters is found to be −3% in the MVA and 9% in the dijet- mass analysis. The main reason for these low correlations is the different shape of the p V T distributions for V Z and for the Higgs boson signal, the p V T variable being used by both the MVA and the dijet-mass analysis. The yield tables in the appendix show that the ratio of the diboson contribution to that of the Higgs boson is indeed smaller in the higher p V T interval than in the lower one. The additional variables input to the BDT provide further separation in the MVA, leading to a very small diboson contribution in the most significant bins of the BDT V H discriminant, as can be seen in table 8. A value of µ V Z = 0.50 ± 0.30(stat.) ± 0.38(syst.) is obtained for the 7 TeV dataset. The signal strength obtained for the combined 7 and 8 TeV dataset is 0.74 ± 0.09(stat.) ± 0.14(syst.) The V Z signal is observed with a significance of 4.9σ, to be compared to an expected significance of 6.3σ.
The fitted µ V Z values are shown in figure 25 for the 7 TeV, 8 TeV and combined datasets, and for the three lepton channels separately for the combined dataset, all with the MVA used for the 8 TeV data.

Summary
A search for the Standard Model Higgs boson produced in association with a W or Z boson and decaying into bb has been presented. The (W/Z) decay channels considered are W → ν, Z → and Z → νν. The dataset corresponds to integrated luminosities of 4.7 fb −1 and 20.3 fb −1 from pp collisions at 7 TeV and 8 TeV, respectively, recorded by the ATLAS experiment during Run 1 of the LHC.
The analysis is carried out in event categories based on the numbers of leptons, jets, and jets tagged as originating from b-quark fragmentation, and on the transverse momentum of the vector-boson candidate. A multivariate analysis provides the nominal results. An alternative analysis using invariant-mass distributions of the Higgs boson candidates leads to consistent results.
For a Higgs boson mass of 125.36 GeV, the observed (expected) deviation from the background-only hypothesis corresponds to a significance of 1.4 (2.6) standard deviations and the ratio of the measured signal yield to the Standard Model expectation is found to be µ = 0.52 ± 0.32(stat.) ± 0.24(syst.). The analysis procedure is validated by a measurement of the yield of (W/Z)Z production with Z → bb, from which the ratio of the observed signal yield to the Standard Model expectation is found to be 0.74 ± 0.09(stat.) ± 0.14(syst.).  GeV > 120 GeV < 120 GeV > 120 GeV < 120 GeV > 120