Evidence for the $H \to b\bar{b}$ decay with the ATLAS detector

A search for the decay of the Standard Model Higgs boson into a $b\bar{b}$ pair when produced in association with a $W$ or $Z$ boson is performed with the ATLAS detector. The analysed data, corresponding to an integrated luminosity of 36.1 fb$^{-1}$, were collected in proton-proton collisions in Run 2 of the Large Hadron Collider at a centre-of-mass energy of 13 TeV. Final states containing zero, one and two charged leptons (electrons or muons) are considered, targeting the decays $Z\to\nu\nu$, $W\to\ell\nu$ and $Z\to\ell\ell$. For a Higgs boson mass of 125 GeV, an excess of events over the expected background from other Standard Model processes is found with an observed significance of 3.5 standard deviations, compared to an expectation of 3.0 standard deviations. This excess provides evidence for the Higgs boson decay into $b$-quarks and for its production in association with a vector boson. The combination of this result with that of the Run 1 analysis yields a ratio of the measured signal events to the Standard Model expectation equal to $0.90 \pm 0.18 \rm{(stat.)} ^{+0.21}_{-0.19} \rm{(syst.)}$. Assuming the Standard Model production cross-section, the results are consistent with the value of the Yukawa coupling to $b$-quarks in the Standard Model.

1 Introduction signal yield is extracted using the mass of the dijet system of b-tagged jets as the main fit observable, and the diboson analysis, where the nominal multivariate analysis is modified to extract the (W/Z)Z diboson process, with the Z boson decaying into bb. The combination of the results of the Higgs boson search with those of the previously published analysis of the Run 1 dataset [18] is also presented.

ATLAS detector
ATLAS [24] is a general-purpose particle detector covering nearly the entire solid angle 1 around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroidal magnets.
The inner tracking detector (ID or inner detector in the rest of the article), located within a 2 T axial magnetic field generated by the superconducting solenoid, is used to measure the trajectories and momenta of charged particles. The inner layers, consisting of high-granularity silicon pixel detectors, instrument a pseudorapidity range |η| < 2.5. A new innermost silicon pixel layer, the insertable B-layer [25] (IBL), was added to the detector between Run 1 and Run 2. The IBL improves the ability to identify displaced vertices and thereby significantly improves the b-tagging performance [26]. Silicon strip detectors covering |η| < 2.5 are located beyond the pixel detectors. Outside the strip detectors and covering |η| < 2.0, there are straw-tube tracking detectors, which also provide measurements of transition radiation that are used in electron identification.
The outermost part of the detector is the muon spectrometer, which measures the curved trajectories of muons in the field of three large air-core toroidal magnets. High-precision tracking is performed within the range |η| < 2.7 and there are chambers for fast triggering within the range |η| < 2.4.
A two-level trigger system [27] is used to reduce the recorded data rate. The first level is a hardware implementation that makes use of only a subset of the total available information to make fast decisions to accept or reject an event, aiming to reduce the rate to approximately 100 kHz, and the second level is the software-based high-level trigger that provides the remaining rate reduction to approximately 1 kHz.

Dataset and simulated event samples
The data used in this analysis were collected at a centre-of-mass energy of 13 TeV during the 2015 and 2016 running periods, and correspond to integrated luminosities of 3.2 ± 0.1 fb −1 and 32.9 ± 1.1 fb −1 , respectively [28]. They were collected using missing transverse momentum (E miss T ) triggers for the 0and 1-lepton channels and single-lepton triggers for the 1-and 2-lepton channels. Events are selected for analysis only if they are of good quality and if all the relevant detector components are known to be in good operating condition. In the combined dataset, the recorded events have an average of 25 inelastic pp collisions (the collisions other than the hard scatter are referred to as pile-up).
Monte Carlo (MC) simulated events are used to model the SM background and V H, H → bb signal processes. All simulated processes are normalised using the most accurate theoretical predictions currently available for their cross-sections. Data-driven methods are used to estimate the multi-jet background from strong interactions (QCD) for the 1-lepton channel, as discussed in Section 6. This background is negligible in the other channels, as a result either of the high E miss T requirement and dedicated selection criteria (0-lepton channel) or of the two lepton selection (2-lepton channel).
All samples of simulated events were passed through the ATLAS detector simulation [29] based on GEANT 4 [30] and are reconstructed with the standard ATLAS reconstruction software. The effects of pile-up from multiple interactions in the same and nearby bunch crossings were modelled by overlaying minimum-bias events, simulated using the soft QCD processes of Pythia 8.186 [31] with the A2 [32] set of tuned parameters (tune) and MSTW2008LO [33] parton distribution functions (PDF). For all samples of simulated events, except for those generated using Sherpa [34], the EvtGen v1.2.0 program [35] was used to describe the decays of bottom and charm hadrons. A summary of all the generators used for the simulation of the signal and background processes is shown in Table 1.
Simulated events for qq → V H plus zero or one jet production at next-to-leading order (NLO) were generated with the Powheg-Box v2 + GoSam + MiNLO generator [37,[40][41][42] (named Powheg+MiNLO in the rest of the article). The contribution from gg → ZH (gluon-induced) production was simulated using the leading-order (LO) Powheg-Box v2 matrix-element generator. An additional scale factor is applied to the qq → V H processes as a function of the vector boson's transverse momentum to account for electroweak (EW) corrections at NLO. This makes use of the V H differential cross-section computed with Hawk [71,72]. The samples of simulated events include all final states where the Higgs boson decays into bb and the vector boson to a leptonic final state, including those with a τ-lepton. The analysis has only a small acceptance for other Higgs boson production and decay modes which are therefore neglected. The mass of the Higgs boson was fixed at 125 GeV and the H → bb branching fraction was fixed at 58%. The inclusive pp → V H cross-sections [43][44][45][46][47][48][49] were calculated at next-to-next-to-leading order (NNLO) (QCD) and NLO (EW). Electroweak corrections include the photon-induced contributions, which are of the order of 5% for the WH → νbb process and 1% for the ZH → bb process. For the gluon-induced ZH production, the cross-section is calculated at next-to-leading order and next-to-leadinglogarithm accuracy (NLO+NLL) in QCD [50][51][52][53][54]. This is then subtracted from the inclusive pp → ZH production cross-section to estimate the quark-induced contribution to the cross-section.
For the generation of tt at NLO, the Powheg-Box v2 generator [55] was used. Single top quark events in the Wt-, s-and t-channels were generated using the Powheg-Box v1 generator [58,64]. The top quark mass was set to 172.5 GeV. Events were filtered such that at least one W boson in each event decays leptonically. The overall yield predicted for the tt process is rescaled according to the NNLO crosssection, including the resummation of soft gluon emission at next-to-next-to-leading-logarithm accuracy Table 1: The generators used for the simulation of the signal and background processes. If not specified, the order of the cross-section calculation refers to the expansion in the strong coupling constant (α S ). The acronyms ME, PS and UE stand for matrix element, parton shower and underlying event, respectively. ( ) The events were generated using the first PDF in the NNPDF3.0NLO set and subsequentially reweighted to PDF4LHC15NLO set [36] using the internal algorithm in Powheg-Box v2. ( †) The NNLO(QCD)+NLO(EW) cross-section calculation for the pp → ZH process already includes the gg → ZH contribution. The qq → ZH process is normalised using the NNLO(QCD)+NLO(EW) cross-section for the pp → ZH process, after subtracting the gg → ZH contribution. scale correction (JES) derived from both simulation and in situ calibration using data [82,83]. Jet cleaning criteria are applied to find jets arising from non-collision sources or noise in the calorimeters and any event containing such a jet is removed [84,85]. Jets with p T below 60 GeV and with |η| < 2.4 have to pass a requirement on the jet vertex tagger (JVT) [86], a likelihood discriminant that uses track and vertex information in order to suppress jets originating from pile-up activity. Jets in the central region (|η| < 2.5) are required to have p T > 20 GeV. For jets in the forward region (2.5 ≤ |η| < 4.5), thus outside the acceptance of the inner detector, a stricter requirement of p T > 30 GeV is applied in order to suppress jets from pile-up activity.
Jets in the central region can be tagged as containing b-hadrons by using a multivariate discriminant (MV2c10) [87, 88] that combines information from an impact-parameter-based algorithm, from the explicit reconstruction of a secondary vertex and from a multi-vertex fitter that attempts to reconstruct the full b-to c-hadron decay chain. A significantly improved algorithm, which also profits from the addition of the IBL detector, was developed for Run 2 [88]. At the chosen working point, the improved algorithm provides nominal light-flavour (u,d,s-quark and gluon) and c-jet misidentification efficiencies of 0.3% and 8.2%, respectively, for an average 70% b-jet tagging efficiency, as estimated from simulated tt events for jets with p T > 20 GeV and |η| < 2.5. The flavour tagging efficiencies in simulation are corrected separately for b-, c-and light-flavour jets, based on the respective data-based calibration analyses. The ratio of the efficiencies in data and simulation is close to unity for b-jets, while more significant corrections are needed for c-and light-flavour jets, up to ≈ 1.4 and ≈ 2, respectively.
Simulated jets are labelled according to which hadrons with p T > 5 GeV are found within a cone of size ∆R = 0.3 around the jet axis. If a b-hadron is found the jet is labelled as a b-jet. If no b-hadron is found, but a c-hadron is present, then the jet is labelled as a c-jet. Otherwise the jet is labelled as a light (i.e., u,d,s-quark, or gluon) jet. Simulated V + jets events are categorised depending on the generator-level truth labels of the jets in the event that are selected to form the Higgs boson candidate: V + bb, V + bc, V + cc, V + bl, V + cl, V + ll where b, c, l stand for b-jet, c-jet and light-jet respectively. An inclusive V + heavy flavour (V + HF) category is defined as containing the first four: V + bb, V + bc, V + cc, V + bl.
The V + bb component is dominant: its fraction ranges from 70% to 90% of V + HF events, depending on the channel and analysis region.
Hadronically decaying τ-leptons are reconstructed [89, 90] as jets from noise-suppressed energy clusters, using the anti-k t algorithm with radius parameter R = 0.4. They are required to have exactly one or three matching tracks in the inner detector within a cone of size ∆R = 0.2 around the jet axis, to have p T > 20 GeV and |η| < 2.5, and to be outside the transition region between the barrel and endcap calorimeters (1.37 < |η| < 1.52). To reject jets being reconstructed and identified as τ-leptons, a multivariate approach using boosted decision trees is employed, based on information from the calorimeters and from the tracking detectors; and the medium quality criteria described in Ref.
[90] are applied. Hadronically decaying τ-leptons are only used in the analysis in the overlap removal procedure described at the end of this subsection. This has an impact on the determination of the event's jet multiplicity.
The uncertainty in the expected number of events depends on the size of the samples of simulated events. The combination of processes with large production cross-section and small selection efficiencies can make the production of samples exceeding the integrated luminosity of the data challenging. For cases where the small selection efficiency is due to the high rejection achieved by the application of b-tagging, a method called parameterised tagging is applied. Unlike when explicitly applying the b-tagging algorithm (direct tagging), in parameterised tagging all jets are kept but the event is weighted by the expected probability for a jet with a certain flavour label (b, c or light) to be tagged as a b-jet. These probabilities are parameterised as a function of jet kinematics (p T and η) based on a large sample of simulated tt events. Parameterised tagging is used for the V + cc, V + cl, V + ll and WW samples, which simulate small background contributions (< 2% of the total background). For all other samples, direct tagging is applied.
In addition to the JES correction, two more corrections are applied to b-tagged jets. The muon-in-jet correction is applied when a medium quality muon with p T > 5 GeV is found within ∆R = 0.4 of a jet, to account for the presence of b-and c-hadron decays into muons which do not deposit their full energy in the calorimeter. Unlike in the lepton selection introduced previously, no isolation criteria are applied. When more than one muon is found, the one closest to the jet axis is chosen. The muon four-momentum is added to that of the jet, and the energy deposited by the muon in the calorimeter is removed. To further improve the jet response, a second correction, denoted PtReco, is applied as a function of jet p T . This correction is based on the residual difference in jet response expected from the signal simulation between the reconstructed b-jets (with all corrections previously applied) and the corresponding truth jets (formed by clustering final-state particles taken from the Monte Carlo truth record, including muons and neutrinos). This correction increases the energy of jets with p T ∼ 20 GeV by 12% and the energy of those with p T > 100 GeV by 1%. A larger correction is applied in case a muon or electron is identified within ∆R = 0.4 of the jet axis, to account for the missing neutrino energy.
In the 2-lepton channel, where the ZH → bb event kinematics can be fully reconstructed, a per-event kinematic likelihood fit, described in more detail in Ref.
[18], is used to improve the estimate of the energy of the two b-jets, in place of the PtReco correction. These corrections result in an improved m bb mass distribution in the region of the Higgs boson signal, as illustrated in Figure 1; the central value is moved closer to its nominal value, and the resolution is improved by up to about 40%.
[ The object reconstruction and identification algorithms do not always result in unambiguous identifications. An overlap removal procedure is therefore applied, with the following actions taken in sequence. Any hadronically decaying τ-lepton reconstructed closer than ∆R = 0.2 to an electron or muon is removed, except in cases where the muon is deemed to be of low quality. If a reconstructed muon shares an electron's ID track, the electron is removed. Jets within a cone of size ∆R = 0.2 around an electron are removed, since a jet is always expected from clustering an electron's energy deposits in the calorimeter. Any electrons reconstructed within ∆R = min(0.4, 0.04 + 10 GeV/p electron T ) of the axis of any surviving jet are removed. Such electrons are likely to originate from semileptonic b-or c-hadron decays. If a jet is reconstructed within ∆R = 0.2 of a muon and the jet has fewer than three associated tracks or the muon energy constitutes most of the jet energy then the jet is removed. Muons reconstructed within a cone of size ∆R = min(0.4, 0.04 + 10 GeV/p muon T ) around the jet axis of any surviving jet are removed. Jets that are reconstructed within a cone of size ∆R = 0.2 around the axis of a hadronically decaying τ-lepton are removed.

Event selection and categorisation
The online event selection relies on either the E miss T or the single-charged-lepton triggers. Events passing the trigger selection and satisfying basic quality requirements are then categorised according to the charged lepton multiplicity, the vector boson's transverse momentum, and jet multiplicity. Events are assigned to the 0-, 1-and 2-lepton channels depending on the number of charged leptons , targeting the ZH → vvbb, WH → νbb and ZH → bb signatures, respectively. Although τ-leptons from vectorboson decays are not targeted explicitly, they pass the selection with reduced efficiency through leptonic decays of the τ-lepton into muons and electrons. All events are required to have at least two jets, and exactly two must pass the b-tagging requirement. The Higgs boson candidate is reconstructed from the two b-tagged jets and the highest-p T (leading) b-tagged jet is required to have p T > 45 GeV.
The analysis covers the phase space at large Higgs boson (and equivalently vector boson) transverse momentum, which has the highest signal-to-background ratio. For the same reason, events are categorised according to the reconstructed vector boson's transverse momentum p V T . This observable corresponds to E miss T in the 0-lepton channel, to the size of the vectorial sum of E miss T and the charged lepton's transverse momentum in the 1-lepton channel, and the transverse momentum of the 2-lepton system in the 2-lepton channel. In the 0-and 1-lepton channels a single region is defined, with p V T > 150 GeV. In the 2-lepton channel two regions are considered, 75 GeV < p V T < 150 GeV and p V T > 150 GeV. Events are further split into two categories according to jet multiplicity. In the 0-and 1-lepton channels, events are considered with exactly two or exactly three jets. Events with four or more jets are rejected in these channels to reduce the large background arising from tt production. In the 2-lepton channel, extra sensitivity is gained by accepting events with higher jet multiplicity due to the lower level of the tt background, thus the categories become either exactly two jets or three or more jets. For simplicity, these two selection categories are referred to as the 2-and 3-jet categories for all three lepton channels.
The event selection criteria for the three channels are detailed below and summarised in Table 2. The 1and 2-lepton selections are both divided into two sub-channels depending on the flavour of the leptons: either electron or muon. There are small differences between these two sub-channels and these are mentioned when appropriate. The two sub-channels are merged to form the single 1-and 2-lepton channels used for the statistical analysis. The statistical analysis uses eight signal regions (SRs) and six control regions (CRs). Multivariate discriminants are used as the main observables to extract the signal, as described in Section 5.
The predicted cross-sections times branching ratios for (W/Z)H with W → ν, Z → , Z → νν, and H → bb, as well as the acceptances in the three channels after full selection are given in Table 3. The nonnegligible acceptance for the WH process in the 0-lepton channel is mostly due to events with hadronically decaying τ-leptons produced in the W decay, and the larger acceptance for the gg → ZH process with respect to qq → ZH is due to the harder p V T distribution from the gluon-induced process.   Table 3: The cross-section times branching ratio (B) and acceptance for the three channels at √ s = 13 TeV. The qqand gg-initiated ZH processes are shown separately. The branching ratios are calculated considering only decays to muons and electrons for Z → , decays to all three lepton flavours for W → ν and decays to all neutrino flavours for Z → νν. The acceptance is calculated as the fraction of events remaining in the combined signal and control regions after the full event selection.

Zero-lepton selection
The online event selection relies on an E miss T trigger. The threshold for this trigger was 70 GeV for the 2015 data, and it was initially raised to 90 GeV and then to 110 GeV during 2016. In the offline analysis events are required to have no loose leptons and E miss T > 150 GeV. When compared to the offline selection, the E miss T trigger is fully efficient for E miss T > 180 GeV, and it is 85 − 90% efficient at E miss T = 150 GeV, depending on the data taking period. The trigger efficiency is measured in W + jets and tt events in data using an orthogonal set of single-muon triggers; these measurements are utilised to determine data-oversimulation scaling factors, used to correct the simulation. The scaling factors are within 5% of unity and parameterised as a function of E miss T . A selection based on the scalar sum of the transverse momenta of the jets in the event, H T , is used to remove a marginal region of phase space in which the trigger efficiency exhibits a small dependence on the jet multiplicity. For 2-jet events the requirement is H T > 120 GeV, and H T > 150 GeV is required for 3-jet events.
In order to suppress the multi-jet background, which is mostly due to jets mismeasured in the calorimeters, four angular selection criteria are applied: • ∆φ(E miss T , bb) > 120 • , • min[∆φ(E miss T , jets)] > 20 • for 2 jets, > 30 • for 3 jets. Here ∆φ(a, b) indicates the difference in azimuthal angle between objects a and b; b 1 and b 2 are the two b-tagged jets forming the Higgs boson candidate's dijet system bb; E miss T,trk is defined as the missing transverse momentum calculated from the negative vector sum of the transverse momenta of tracks reconstructed in the inner detector and identified as originating from the primary vertex. The final selection is a requirement on the azimuthal angle between the E miss T vector and the closest jet.

One-lepton selection
For the electron sub-channel, events are selected using a logical OR of single-electron triggers with p T thresholds of 24 GeV, 60 GeV and 120 GeV for the 2015 data and with increased thresholds of 26 GeV, 60 GeV and 140 GeV in 2016. The lowest-threshold trigger in 2016 includes isolation and identification requirements that are looser than any of the isolation and identification requirements applied in the analysis. These requirements are removed or relaxed for the higher-threshold triggers. The muon sub-channel uses the same E miss T triggers as the 0-lepton channel. Since muons are not included in the E miss T calculation at trigger level, in events where a muon is present this trigger is in effect selecting events based on p V T , and is therefore fully efficient for values of p V T above 180 GeV. This trigger is preferred because it has an overall signal efficiency (with respect to the offline selection) of 98%, compared to ∼ 80% efficiency for the combination of single-muon triggers, which is due to the limited muon trigger chamber coverage in the central |η| region of the detector. Events are required to contain exactly one tight electron with p T above 27 GeV (electron sub-channel) or one medium muon with p T above 25 GeV (muon sub-channel), and no additional loose leptons. In the electron sub-channel, where multi-jet production is a significant background, an additional selection of E miss T > 30 GeV is applied.
Control regions enhanced in the W + HF background are defined for both the 2-and 3-jet categories. These are obtained by applying two additional selection requirements beyond the respective nominal selection criteria: m bb < 75 GeV and m top > 225 GeV. To calculate the reconstructed top quark mass, m top , an estimate of the four-momentum of the neutrino from the W boson decay is required. The vector E miss T is assumed to give an estimate of the neutrino's transverse momentum components and then p ν z can be determined up to a possible two-fold ambiguity by constraining the mass of the lepton-plus-neutrino system to be the W boson mass. 4 The top quark is then reconstructed by considering the reconstructed W boson and one of the two b-tagged jets. The combination of b-tagged jet and p ν z resulting in m top closest to 172.5 GeV is selected. The requirement on the reconstructed top quark mass significantly reduces the contamination from tt and single-top-quark events in the W + HF CRs. The events in the control regions are removed from the corresponding signal regions. In the W + HF CRs, between 75% and 78% of the events are expected to be from W + HF production.

Two-lepton selection
Events are selected in the electron sub-channel using the same single-electron triggers as for the 1-lepton channel. For the muon sub-channel a logical OR of single-muon triggers with p T thresholds of 20 GeV and 40 GeV is used for 2015 data, and 24-26 GeV and 40-50 GeV for 2016 data, with the increase of the thresholds applied to cope with the increasing instantaneous luminosity. The lowest-threshold triggers include an isolation requirement that is removed for the higher-threshold triggers. The trigger efficiency with respect to the offline selection ranges from 97% to 99.5% for the electron sub-channel and from 87% to 90% for the muon sub-channel, depending on the p V T region. To ensure that the trigger efficiency reached its plateau, the lepton that triggered the event is required to have p T > 27 GeV. Exactly two loose leptons of the same flavour are required. In dimuon events, the two muons are required to have opposite-sign charge. This is not used in the electron sub-channel, where the charge misidentification rate is not negligible. The invariant mass of the dilepton system must be consistent with the Z boson mass: 81 GeV < m < 101 GeV. This requirement suppresses backgrounds with non-resonant lepton pairs, such as tt and multi-jet production.
Control regions are defined to be very pure in tt and Wt background by applying the nominal selection but requiring an eµ lepton flavour combination instead of ee or µµ, and with no opposite-charge requirement. The tt and Wt events in these control regions are kinematically identical to those in the signal region, except for slight differences in acceptance between electrons and muons. These regions are called eµ CR in the following. In the eµ CRs, more than 99% of the events are expected to be from tt and single top quark production, and between 88% and 97% from tt production alone.

Selection for the dijet-mass analysis
To validate the result of the multivariate analysis, a second analysis is performed where the multivariate discriminants are replaced by the dijet invariant mass of the two b-tagged jets, m bb . This second analysis adopts the same objects and event selection criteria as described in Table 2, with the additional selection criteria shown in Table 4. With respect to the p V T regions described earlier, the events with p V T > 150 GeV are further split into two categories: 150 GeV < p V T ≤ 200 GeV and p V T > 200 GeV. Events with In the 1-lepton channel, since the low m bb range in the dijet mass spectrum provides sufficient information to constrain the W + HF background normalisation, no dedicated W + HF control region is defined. Also, a requirement on the W boson's transverse mass m W T < 120 GeV is used to suppress events from tt background. The W boson's transverse mass is defined as m W T = 2p T E miss T (1 − cos(∆φ( , E miss T ))), where p T is the lepton's transverse momentum.
In the 2-lepton channel, the tt background is suppressed thanks to the additional requirement E miss T / GeV, where S T is defined as the scalar sum of the transverse momenta of all jets and leptons in the event. Events with p V T > 150 GeV in the eµ CR are used inclusively in p V T . Table 4: Summary of the event selection criteria in the 0-, 1-and 2-lepton channels for the dijet-mass analysis, applied in addition to those described in Table 2 for the multivariate analysis.

Multivariate analysis
Multivariate discriminants making use of boosted decision trees (BDTs) are constructed, trained and evaluated in each lepton channel and analysis region separately. Two versions of the BDTs, using the same input variables, are trained. The nominal version is designed to separate the V H, H → bb signal from the sum of the expected background processes, and is referred to as BDT V H . A second one, which is used to validate the analysis, aims at separating the VZ, Z → bb diboson process from the sum of all other expected background processes (including V H), and is referred to as BDT VZ .
The input variables used for the BDTs are chosen in order to maximise the separation in the V H search. Starting from the dijet mass (m bb ), additional variables describing the event kinematics and topology are tried one at a time and the one yielding the best separation gain is added to the list. This procedure is repeated until adding more variables results in a negligible performance gain. The final selections of variables for the different channels are listed in Table 5. The b-tagged jets are labelled in decreasing p T as b 1 and b 2 , and |∆η(b 1 , b 2 )| is their separation in pseudorapidity. In 3-jet events, the third jet is labelled as jet 3 and the mass of the 3-jet system is denoted m bb j . The azimuthal angle between the vector boson and the system of b-tagged jets is denoted ∆φ(V, bb), and their pseudorapidity separation is denoted |∆η(V, bb)|. In the 0-lepton channel, m eff is defined as the scalar sum of the transverse momenta of all jets and E miss T (m eff = H T + E miss T ). In the 1-lepton channel, the angle between the lepton and the closest b-tagged jet in the transverse plane is denoted min[∆φ( , b)]. In the 1-lepton channel, two variables are used to improve the rejection of the tt background: the rapidity difference between the W and Higgs boson candidates, |∆Y(V, bb)| and, assuming that the event is tt, the reconstructed top quark mass, m top . To construct the |∆Y(V, bb)| variable, the four-vector of the neutrino in the W boson decay is estimated as explained in Section 4.2.2 for m top . The distributions of input variables of the BDTs are compared between data and simulation, and good agreement is found within the uncertainties.
The Toolkit for Multivariate Data Analysis, TMVA [95], is used to train the BDTs, with values of the training parameters similar to those described in Ref. [18]. In order to make use of the complete set of simulated MC events for the BDT training and evaluation in an unbiased way, the MC events are split into two samples of equal size, A and B. The performance of the BDTs trained on sample A (B) is evaluated with sample B (A) in order to avoid using identical events for both training and evaluation of the same BDT. Half of the data are analysed with the BDTs trained on sample A, and the other half with the BDTs trained on sample B. At the end, the output distributions of the BDTs trained on samples A and B are merged for both the simulated and data events. A dedicated procedure is applied to transform the BDT output distributions to obtain a smoother distribution for the background processes and finer binning in the regions with the largest signal contribution, whilst ensuring that the statistical uncertainty of the simulated background is less than 20% in each bin. The binning procedure is described in more detail in Ref. [18].

Estimation of the multi-jet background
The MC samples summarised in Section 3 are used to model background processes with W or Z boson decays into leptons; these are defined as electroweak (EW) backgrounds in the following. Multi-jet backgrounds are produced with large cross-sections and thus, despite not providing genuine leptonic signatures, have the potential to contribute a non-negligible background component. In the following this background contribution is discussed channel by channel.

0-lepton channel
As described in Section 4, specific criteria are applied in the event selection to suppress the multi-jet backgrounds. A data-driven method is used to estimate the residual contribution. After removing the selection applied to the min[∆φ(E miss T , jets)] variable, a fit to this distribution in the 3-jet category is performed to extract the multi-jet contribution while allowing the tt and Z + jets background normalisations to float. In multi-jet background events, a fake E miss T can arise from a jet energy fluctuation, and it is expected that its direction is close to the direction of the poorly measured jet. Therefore, the min[∆φ(E miss T , jets)] variable is very effective in suppressing the multi-jet contribution, which is confined to low values of x = min[∆φ(E miss T , jets)] and is parameterised with a falling exponential (exp (−x/c)). The parameter c is determined in the fit itself, while the templates for the other backgrounds are taken directly from simulation. After the nominal selection criteria are applied, the residual multi-jet contamination within an 80 GeV < m bb < 160 GeV mass window is found to be ∼ 10% of the signal contribution and negligible (< 0.1%) with respect to the total background. The BDT distribution for the multi-jet background is estimated from the data at low min[∆φ(E miss T , jets)], and found to have a shape similar to the one expected for the sum of the remaining backgrounds. The small multi-jet contribution is therefore absorbed in the floating normalisation factors of the EW backgrounds in the global likelihood fit. The same data-driven estimation technique cannot be used in the 2-jet region, where events at low values of min[∆φ(E miss T , jets)] are removed by the other selection requirements. A multi-jet Pythia8 MC sample generated with the A14 tune and NNPDF2.3LO PDFs is used to extrapolate the data-driven estimate from the 3-to the 2-jet region, with the extrapolation factor derived after removing any b-tagging requirement. The contribution in the 2-jet region is found to be negligible. Multi-jet production in the 0-lepton channel is therefore found to be a small enough background that it can be neglected in the global likelihood fit.

1-lepton channel
Both the electron and muon sub-channels have contributions from multi-jet events. The dominant contribution to this background stems from real muons or electrons from heavy-flavour hadrons that undergo semileptonic decays. In the electron sub-channel a second contribution arises from γ → e + e − conversions of photons produced in the decay of neutral pions in jets, or directly from π 0 Dalitz decays. Although those leptons are not expected to be isolated, a small but non-negligible fraction passes the lepton isolation requirements. This background is estimated separately in the electron and muon sub-channels, and in the 2-and 3-jet categories, using similar procedures.
In each signal region, a template fit to the W boson candidate's transverse mass (m W T ) distribution is performed in order to extract the multi-jet yield. The variable m W T is chosen as it offers the clearest discrimination between the multi-jet and EW processes. The template used for the multi-jet contribution is obtained from data in a control region after subtraction of the residual EW contribution, based on MC predictions, while the template for the EW contribution in the signal region is obtained directly from MC predictions. The control region is enriched in multi-jet events that are kinematically close to the corresponding signal region but not overlapping with it, and is defined by applying the nominal selection but inverting the tight isolation requirement. To increase the statistical precision of the datadriven estimate, the number of required b-tags is reduced from two to one. The template fit determines the normalisation of the multi-jet contribution in the signal region, while the shape of the BDT discriminant (or of other relevant observables) is obtained analogously to the m W T template. Both the normalisation and shape derived for the BDT discriminant are then used in the global likelihood fit.
Since the efficiency of the tight isolation requirement on multi-jet events depends in general on lepton kinematics, and on the composition of the multi-jet background, the control regions that are based on inverting such a requirement provide biased estimators for the multi-jet templates in the corresponding signal regions. The templates are therefore corrected for such a bias, by applying event-by-event extrapolation factors that depend on lepton p T and η, and, in the electron sub-channel, also on the value of E miss T . These extrapolation factors are derived in additional control regions where the 2-and 3-jet requirements of the nominal selection are replaced by a 1-jet requirement, and the b-tagging requirement is removed. The extrapolation factors are computed as the ratio of the number of events with an isolated lepton to the number of events with a non-isolated lepton, after removing the MC-predicted EW background contribution.
The estimate of the normalisations of the W + jet and top quark (tt and single top quark) background contributions in the signal region provided by Monte Carlo simulations is subject to significant uncertainties. In addition, the m W T distributions of the W + jet and top quark backgrounds are sufficiently different that a common normalisation factor induces a bias in the multi-jet estimate. The normalisation of these two backgrounds is therefore left free to be determined in the template fit used to extract the multi-jet contribution. In order to improve their relative separation, the fit to the m W T distribution in the signal region is performed together with a fit to the overall yield in the corresponding W + HF control region. Furthermore, in order to improve the statistical precision in the determination of the W + jet and top quark background normalisation factors, the multi-jet template fit is performed simultaneously in the electron and muon sub-channels. This corresponds to performing separate fits for the two sub-channels, but with common W + jet and top quark background normalisation factors.
The multi-jet contribution in the 2-jet region is found to be 4.8% (4.6%) of the total background contribution in the electron (muon) sub-channel, while in the 3-jet region it is found to be 0.3% (0.5%). These estimates are subject to sizeable systematic uncertainties, which are described in Section 7.

2-lepton channel
Requiring two isolated leptons with a dilepton invariant mass compatible with that of the Z boson strongly suppresses the contributions from multi-jet events. The residual contribution is estimated using a fit to the dilepton mass distribution in a sample of events where the two lepton candidates have the same charge. The fit model includes expected contributions from EW backgrounds from simulation and an exponential model for the multi-jet background. An estimate is then made of the fraction of the background in a mass window around the Z boson peak in the signal region that could be attributed to multi-jet events based on the assumption that the numbers of opposite-charge and same-charge events are equal for the multi-jet background. Inside a mass window 81 GeV < m < 101 GeV the fraction of the background in the signal region coming from multi-jet events is estimated to be 0.03% and 0.2% for the muon and electron sub-channels, respectively. The residual multi-jet contamination within a 100 GeV < m bb < 140 GeV mass window is found to be ∼ 8% of the signal contribution, without an m bb resonant shape, and found to have a BDT shape similar to the one expected for the sum of the remaining backgrounds. The multi-jet contamination is also extracted in the eµ control region and found to be 0.3% of the total background. The multi-jet contribution in the 2-lepton channel is thus small enough to have a negligible impact on the signal extraction and is therefore not included in the global likelihood fit.

Systematic uncertainties
The sources of systematic uncertainty can be broadly divided into four groups: those of experimental nature, those related to the modelling of the simulated backgrounds, those related to the multi-jet background estimation, and those associated with the Higgs boson signal simulation. The finite size of the simulated background samples is also an important source of systematic uncertainty, and, whenever possible, generator-level filters are employed to enhance the amount of simulated events in the phase-space region that is most relevant for the analysis.

Experimental uncertainties
The dominant experimental uncertainties originate from the flavour-tagging simulation-to-data efficiency correction factors, from the jet energy scale corrections and the modelling of the jet energy resolution. Flavour-tagging simulation-to-data efficiency correction factors are derived [87] separately for b-jets, c-jets and light-flavour jets. All three correction factors depend on jet p T (or p T and |η|) and have uncertainties estimated from multiple sources. These are decomposed into uncorrelated components which are then treated independently, resulting in three uncertainties for b-jets and for c-jets, and five for lightflavour jets. The approximate size of the uncertainty in the tagging efficiency is 2% for b-jets, 10% for c-jets and 30% for light jets. Additional uncertainties are considered in the extrapolation of the b-jet efficiency calibration above p T = 300 GeV and in the misidentification of hadronically decaying τ-leptons as b-jets. The uncertainties in the jet energy scale and resolution are based on their respective measurements in data [83,96]. The many sources of uncertainty in the jet energy scale correction are decomposed into 21 uncorrelated components which are treated as independent. An additional specific uncertainty is considered that affects the energy calibration of b-and c-jets.
Uncertainties in the reconstruction, identification, isolation and trigger efficiencies of muons [78] and electrons [75], along with the uncertainty in their energy scale and resolution, are estimated based upon 13 TeV data. These are found to have only a small impact on the result. The uncertainties in the energy scale and resolution of the jets and leptons are propagated to the calculation of E miss T , which also has additional uncertainties from the scale, resolution and efficiency of the tracks used to define the soft term [93], along with the modelling of the underlying event. An uncertainty is assigned to the simulationto-data E miss T trigger scale factors to account for the statistical uncertainty in the measured scale factors and differences between the scale factors determined from W + jets and tt events. The uncertainty in the luminosity is 2.1% for the 2015 data and 3.4% for the 2016 data, resulting in an uncertainty of 3.2% for the combined dataset. It is derived, following a methodology similar to that detailed in Ref.
[28], from a preliminary calibration of the luminosity scale using x-y beam-separation scans performed in 2015 and 2016. The average number of interactions per bunch crossing is rescaled by 9% to improve the agreement between simulation with data, and an uncertainty, as large as the correction, is included.

Simulated background uncertainties
Modelling uncertainties are derived for the simulated backgrounds and broadly cover three areas: normalisation, acceptance differences that affect the relative normalisation between analysis regions with a common background normalisation, and the differential distributions of the most important kinematic variables. These uncertainties are derived either from particle-level comparisons between nominal and alternative samples using the RIVET [97] framework, or from comparisons to data in control regions. The particle-level comparisons are cross-checked with detector-level simulations whenever these are available, and good agreement is found. When acceptance uncertainties are estimated all the nominal and alternative samples are normalised using the same production cross-section. Such uncertainties are estimated by adding the differences between the nominal and alternative samples in quadrature. Shape uncertainties are considered in each of the analysis regions separately, with the samples scaled to have the same normalisation in each region. In this case, the uncertainty is taken from the alternative generator which has the largest shape difference compared to the nominal sample. Shape uncertainties are only derived for the m bb and p V T variables, as it was found that it is sufficient to only consider the changes induced in these variables by an alternative generator to cover the overall shape variation of the BDT V H discriminant. The systematic uncertainties affecting the modelling of the background samples are reported in Tables 6 and 7, and the specific details of how the uncertainties are estimated are provided below for each simulated background sample.
V + jets production The V + jets backgrounds are subdivided into three different components based upon the jet flavour labels of the two b-tagged jets in the event. The main background contributions (V + bb, V + bc, V + bl and V + cc) are jointly considered as the V + HF background. Their overall normalisation, separately in the 2-and 3-jet categories, is free to float in the global likelihood fit, as detailed in Section 8. The remaining flavour components, V + cl and V + ll, make up less than ∼ 1% of the background in each analysis region, so only uncertainties in the normalisation of these backgrounds are included.
Acceptance uncertainties are estimated for the relative normalisations of the different regions that share a common floating normalisation parameter. In the case of the W + HF background, this includes the uncertainties in the ratio of the event yield in the 0-lepton channel to that in the 1-lepton channel and, in the 1-lepton channel, in the ratio of the event yield in the W +HF control region to that in the signal region. For the Z + HF background, there is an uncertainty in the ratio of the event yield in the 0-lepton channel Table 6: Summary of the systematic uncertainties in the background modelling for Z + jets, W + jets, tt, single top quark and multi-jet production. An "S" symbol is used when only a shape uncertainty is assessed. The regions for which the normalisations float independently are listed in brackets.
Uncertainties are also estimated in the relative normalisation of the four heavy-flavour components that make up the V + HF background. These are taken as uncertainties in the bc, cc and bl yields compared to the dominant bb yield and are estimated separately for the 0-and 1-lepton channels in the case of W + HF and separately for the 0-lepton, 2-lepton 2-jet and 2-lepton 3-jet regions in the case of Z + HF.
The normalisation and acceptance uncertainties are all calculated by adding the differences between the nominal Sherpa 2.2.1 sample and its associated systematic variations in quadrature, including a variation of (i) the renormalisation scale by factors of 0.5 and 2; (ii) the factorisation scale by factors of 0.5 and 2; (iii) the CKKW merging scale from 30 GeV to 15 GeV; (iv) the parton-shower/resummation scale by factors of 0.5 and 2. In addition, the difference between the Sherpa 2.2.1 nominal sample and an alternative sample produced with a different matrix-element generator is added in quadrature to the rest to yield the total uncertainty. The alternative sample is produced with Madgraph5_aMC@NLO v2.2.2 [98], with up to four extra partons at LO, and interfaced to Pythia 8.212; the A14 tune is used together with the NNPDF2.3LO PDF set.
Uncertainties in the shapes of the m bb and p V T distributions are estimated for Z + HF by comparing the Z + jets background to data in signal-depleted regions with a very high Z + jets purity, specifically the 1and 2-tag regions of the 2-lepton channel, with the m bb region around the Higgs boson mass excluded in the 2-tag case. In order to remove most of the residual tt contamination, a selection requirement is made on E miss T / √ S T < 3.5 √ GeV as done for the dijet-mass analysis. Single top quark production In the Wt and t-channels, uncertainties are derived in the normalisation, acceptance and shapes of the m bb and p V T distributions. The s-channel only has a normalisation uncertainty derived as its contribution is negligible overall. For the t-channel, the nominal samples (Powheg+Pythia6) are compared to alternative samples, which are similar to those used in the tt case using different parton-shower generation (Powheg+Herwig++), and matrix-element generation (Madgraph5_aMC@NLO+Herwig++). For the Wt channel, uncertainties related to the interference between the Wt and tt production processes are assessed by using a diagram subtraction scheme instead of the nominal diagram removal scheme [64,101]. For both the t-and Wtchannels, the settings of the nominal generator are varied so as to maximise or minimise the amount of radiation. The normalisation uncertainties take into account variations of the renormalisation and factorisation scales, α S and PDFs. Uncertainties in the acceptance in both the 2-and 3-jet regions are derived by comparing the alternative generators and summing the differences with respect to the nominal sample in quadrature. Shape uncertainties are derived for the m bb and p V T distributions. These uncertainties cover all the differences in the shapes of the kinematic distributions investigated by comparing nominal and alternative samples.
Diboson production The diboson backgrounds are composed of three distinct processes, WZ, WW and ZZ production. Given the small contribution from WW production (< 0.1% of the total background) only a normalisation uncertainty is assigned. The more important contributions from the WZ and ZZ backgrounds have uncertainties derived for the overall normalisation, the relative acceptance between regions and for the m bb and p V T shapes. Uncertainties are derived by comparing the nominal sample (Sherpa 2.2.1) to the alternative samples with varied factorisation, renormalisation and resummation scales, and using the Stewart-Tackmann method [102] to estimate scale variation uncertainties for the acceptance in the jet multiplicity categories. Additional uncertainties in the overall acceptance, in the relative acceptance across jet multiplicities and in the shape of the m bb and p V T distributions are estimated in the parton-shower and underlying-event model. These are estimated by considering the difference between Powheg+Pythia8 and Powheg+Herwig++, as well as changes in the Pythia8 parton-shower tune. The envelope of the two effects is considered to define these uncertainties. A systematic uncertainty in the shape of the m bb distribution results from the comparison of Sherpa 2.2.1 and Powheg+Pythia8. This changes the shape of the m bb distribution for values in the range 100 -130 GeV by 10 -20%. Acceptance uncertainties are derived for the ratio of 0-to-1 lepton channels and for the ratio of the 2-to-3 jet regions for WZ production. In the ZZ production case the acceptance uncertainties are derived for the ratio of the 0-to-2 lepton channels and of the 2-to-3 jet regions. Uncertainties in the acceptance and m bb or p V T shapes of the diboson background due to PDF and α S variations were found to have a negligible impact.

Multi-jet background uncertainties
The multi-jet background in the 1-lepton channel is estimated from data as outlined in Section 6. Systematic uncertainties can have an impact on the multi-jet estimates in two ways: either changing the m W T distributions used in the multi-jet template fits, therefore impacting the extracted multi-jet normalisations, or directly changing the multi-jet BDT distributions used in the global likelihood fit. Several sources of uncertainty are considered, uncorrelated between the electron and muon sub-channels. The respective variations are added in quadrature for the normalisations, or considered as separate shape uncertainties. The variations are obtained by changing the definition of the multi-jet control region (2 versus 1 b-tag, more stringent isolation requirements, a different single-electron trigger to probe a potential trigger bias in the isolation requirements); removing the bias correction that makes use of the p T -, η-, and E miss Tdependent extrapolation factors derived in the (1-jet, 0 b-tag) region; varying the normalisation of the contamination from the top (tt and Wt) and V + jets processes in the multi-jet control region. In addition, the following sources of uncertainty that only have an impact on the multi-jet normalisation, are considered: use of the E miss T variable instead of m W T for the multi-jet template fit and, for the electron sub-channel only, the inclusion of the E miss T < 30 GeV region, which significantly enhances the multi-jet contribution in the template fit.

Signal uncertainties
The signal samples are normalised using their inclusive cross-sections, as described in Section 3, and an additional scale factor computed using the Hawk generator is applied as a function of p V T to correct for the sizeable impact of the NLO (EW) corrections to the p V T distributions. The systematic uncertainties that affect the modelling of the signal are summarised in Table 8.
Uncertainties in the calculations of the V H production cross-sections and the H → bb branching ratio are assigned following the recommendations of the LHC Higgs Cross Section working group [15, 53,54,103,104]. The uncertainties in the overall V H production cross-section from missing higher-order terms in the QCD perturbative expansion are obtained by varying the renormalisation scale µ R and factorisation scale µ F independently, from 1/3 to 3 times their original value. The PDF+α S uncertainty in the overall V H production cross-section is calculated from the 68% CL interval using the PDF4LHC15_nnlo_mc PDF set. The latest recommendations of the LHC Higgs working group [105] do not distinguish between uncertainties in qq → ZH production and gg → ZH production. To obtain the scale uncertainties separately for these two processes, it is assumed that the uncertainty in qq → ZH production is identical to the uncertainty in WH production. The gg → ZH production uncertainty is then derived such that the sum in quadrature of the qq → ZH and gg → ZH production uncertainties (considering their respective production cross-sections) amount to the scale uncertainty in the overall ZH production. Since the PDF+α S uncertainty is larger for WH production than ZH production, the method used for the scale uncertainty cannot be used for this uncertainty. The PDF+α S uncertainty in the gg → ZH production is taken from previous recommendations [15] and the uncertainty in the qq → ZH production is taken from the latest recommendation [105].
Another systematic uncertainty in the overall V H cross-section originates from missing higher-order electroweak corrections. This is estimated as the maximum variation among three quantities: the maximum size expected for the missing NNLO EW effects (1%), the size of the NLO EW correction and the uncertainty in the photon-induced cross-section relative to the total (W/Z)H cross-section described in Section 3. The systematic uncertainty in the H → bb branching ratio is 1.7% [13]. This uncertainty takes into account missing higher-order QCD and EW corrections as well as uncertainties in the b-quark mass and in the value of α S .
Acceptance and shape systematic uncertainties are derived to account for missing higher-order QCD and EW corrections, for PDF+α S uncertainty, and for variations of the parton-shower and underlying-event models. Uncertainties in the acceptance and in the shape of the m bb and p V T distributions, originating from missing higher-order terms in QCD, are estimated by comparing the nominal samples to those generated with weights corresponding to varied factorisation and renormalisation scales applied. The Stewart-Tackmann method is used to assign scale variation uncertainties in the acceptance in the jet multiplicity categories. Uncertainties due to the parton-shower and underlying-event models are estimated by considering the difference between Powheg+MiNLO+Pythia8 and Powheg+MiNLO+Herwig7, as well as changes in the Pythia8 parton-shower tune. The latter effect is assessed in events generated with Madgraph5_aMC@NLO and showered with Pythia8, using the A14 tune and its variations. The envelope of the two effects is considered to define uncertainties separately in the overall acceptance, in the relative acceptance across jet multiplicities and in the shape of the m bb and p V T distributions. The PDF+α S uncertainty in the acceptance between regions and in the m bb and p V T shapes is estimated applying the PDF4LHC15_30 PDF set and its uncertainties, according to the PDF4LHC recommendations [36]. A statistical fitting procedure based on the Roostats framework [106,107] is used to extract the strength of the Higgs boson signal from the data.
The signal strength is a parameter, µ, that multiplies the SM Higgs boson production cross-section times branching ratio into bb. A binned likelihood function is constructed as the product of Poisson probability terms over the bins of the input distributions involving the numbers of data events and the expected signal and background yields, taking into account the effects of the floating background normalisations and the systematic uncertainties. The different regions entering the likelihood fit are summarised in Table 9. The primary inputs to the global fit are the BDT V H discriminants in the eight 2-b-tag signal regions defined by the three lepton channels, up to two p V T intervals and the two jet multiplicity categories. Additional inputs are the event yields in the two W + HF control regions in the 1-lepton channel subdivided into the two number-of-jet categories, and the m bb distributions or the event yields for the four eµ control regions defined by the two p V T intervals and the two number-of-jet categories. The electron and muon sub-channels are combined in the fit. Altogether, there are 141 bins in the 14 regions used in the global fit. In addition to the global fit with all channels combined, separate 0-, 1-and 2-lepton channel fits are performed, where only the analysis regions specific to a single channel are considered and a channel-specific signal strength is obtained.
The effect of systematic uncertainties in the signal and background predictions is described by nuisance parameters (NPs), θ, which are constrained by Gaussian or log-normal probability density functions, the latter being used for normalisation uncertainties to prevent normalisation factors from becoming negative in the fit. The expected numbers of signal and background events in each bin are functions of µ and θ. For each NP, the prior is added as a penalty term to the likelihood, L(µ, θ), which decreases as soon as the nuisance parameter θ is shifted away from its nominal value. The statistical uncertainties of background predictions from simulation are included through one nuisance parameter per bin, using the Beeston-Barlow technique [108].

Process
Normalisation factor tt 0-and 1-lepton 0.90 ± 0.08 tt 2-lepton 2-jet 0.97 ± 0.09 tt 2-lepton 3-jet 1.04 ± 0.06 W + HF 2-jet 1.22 ± 0.14 W + HF 3-jet 1.27 ± 0.14 Z + HF 2-jet 1.30 ± 0.10 Z + HF 3-jet 1.22 ± 0.09 The test statistic q µ is constructed from the profile likelihood ratio whereμ andθ are the parameters that maximise the likelihood, andθ µ are the nuisance parameter values that maximise the likelihood for a given µ. To measure the compatibility of the background-only hypothesis with the observed data, the test statistic used is q 0 = −2 ln Λ 0 . The results are presented in terms of the probability p 0 of the background-only hypothesis, and the best-fit signal strength valueμ with its associated uncertainty σ µ . The fittedμ value is obtained by maximising the likelihood function with respect to all parameters. The uncertainty σ µ is obtained from the variation of q µ by one unit. Expected results are obtained in the same way as the observed results by replacing the data in each input bin by the prediction from simulation with all NPs set to their best-fit values, as obtained from the fit to the data, except for the signal strength parameter, which is kept at its nominal value.
The data have sufficient statistical power to constrain the largest background normalisation NPs, which are left free to be determined in the fit without having priors. This applies to the tt, W + HF and Z + HF processes. The corresponding normalisation factors expressed with respect to their expected nominal value and resulting from the global fit to the 13 TeV data, are shown in Table 10. As stated in Section 7, the tt background is normalised independently for the 2-lepton channel and for the 0-and 1-lepton channels.
In the 2-lepton channel, the tt background is almost entirely due to events in which both top quarks decay into (W → ν)b (dileptonic decays) with all final-state objects detected (apart from the neutrinos). In the 0-and 1-lepton channels, it is in part due to dileptonic decays with one or two of the leptons (often a τ-lepton) undetected, and in part due to cases where one of the top quarks decays into (W → qq )b (semileptonic decays) with at least one undetected light-or c-quark jet. Furthermore, the p V T range probed is different in the 0-and 1-lepton channels: p V T > 150 GeV in contrast to p V T > 75 GeV in the 2-lepton channel. For the Z + HF and W + HF backgrounds, the data have enough statistical power to constrain the normalisations in the 2-jet and in the 3-jet categories independently. The normalisation factors for these backgrounds can deviate significantly from one due to the large theoretical uncertainty in the crosssections of the contributing processes.
The systematic uncertainties are encoded in variations of the nominal BDT V H or m bb templates, and of the nominal yields across analysis categories, for each up-and-down (±1σ) variation. The limited size of the MC samples for some simulated background processes in some regions can cause large local fluctuations in templates of systematic variations. When the impact of a systematic variation translates into a reweighting of the nominal template, no statistical fluctuations are expected beyond those already present in the nominal template. This is the case, for instance, for the b-tagging uncertainties. For those, no specific action is taken. On the other hand, when a systematic variation may introduce changes in the events selected, as is the case for instance with the JES uncertainties, additional statistical fluctuations may be introduced, which affect the templates of systematic variations. In such cases, a smoothing procedure is applied to each systematic-variation template in each region. To reduce the complexity of the fit, systematic uncertainties that have a negligible impact on the final results are pruned away, region by region. Studies were performed to verify that the smoothing and pruning procedures do not induce any bias in the result. More details about the smoothing and pruning procedures can be found in Ref. [18].
In order to understand the effect of systematic uncertainties on the final results, the breakdown of the contributions to the uncertainties inμ is reported in Table 11. The individual sources of systematic uncertainty detailed in Section 7 are combined into categories. To assess the contribution of a category to the total systematic uncertainty, all NPs associated with the uncertainties within the category are fixed to their fitted values and the fit is repeated. The difference in quadrature between the uncertainties forμ from this fit and from the nominal fit provides an estimate of the systematic uncertainty attached to the considered category of uncertainties. As shown in the table, the systematic uncertainties for the modelling of the signal play a dominant role, followed by the uncertainty due to the limited size of the simulated samples, the modelling of the backgrounds and the b-jet tagging uncertainty.

Dijet-mass analysis
In the dijet-mass analysis, the BDT V H discriminant is replaced by the m bb variable as the main input used in the global fit, and the number of signal regions is increased from eight to fourteen, as a consequence of splitting the event categories with p V T > 150 GeV in two in each of the three lepton channels. The different regions entering the likelihood fit are summarised in Table 12. Altogether, for the dijet-mass analysis, there are 283 bins in the 18 regions used in the global fit.

Diboson analysis
The diboson analysis targets diboson production with a Z boson decaying into a pair of b-quarks and produced in association with either a W or Z boson. This process has a signature that is similar to the one considered in this analysis, and therefore provides an important validation of the V H result. The cross-section is about nine times larger than for the SM Higgs boson with a mass of 125 GeV, the m bb distribution peaks at lower values, and the p bb T spectrum is softer. The multivariate discriminant BDT VZ is used to extract the diboson signal. In the diboson-analysis fits, the normalisation of the diboson contributions is allowed to vary with a multiplicative scale factor µ VZ with respect to the SM prediction, except for the small contribution from WW production, which is treated as a background and constrained within its uncertainty. The overall normalisation uncertainties for the WZ and ZZ processes are removed, while all other systematic uncertainties are kept identical to those in the nominal fit used to extract the Higgs boson signal. A SM Higgs boson with m H = 125 GeV is included as a background, with a production cross-section at the SM value with an uncertainty of 50%. The diboson and Higgs boson BDTs provide sufficient separation between the VZ and V H processes that they only have a weak direct correlation (< 1%) in their results.

Combination with Run 1 data
The statistical analysis of the 13 TeV data is combined with the results of the data recorded at 7 TeV and 8 TeV, reported in Ref. [18]. No change is implemented in the analysis of the 7 TeV and 8 TeV data, but several studies were carried out on the correlation and compatibility of the 13 TeV results and the 7 TeV and 8 TeV results. Studies on the correlation of the experimental systematic uncertainties between the 7 TeV, 8 TeV and 13 TeV analyses were performed for the dominant uncertainties.
The changes in the detector layout (inclusion of the IBL), in the tagging discriminating variable, in the used working points, in the b-tagging calibration analyses, and in the way the discriminating variable is used in the analysis support the choice of assuming a negligible correlation in the experimental systematic uncertainties affecting the b-tagging across datasets. Nevertheless, even correlating the leading systematic uncertainties for the b-jet efficiencies measured in data affects the combined measurement of µ by less than 5%, and has a negligible impact on its uncertainty. Different correlation schemes for the jet energy scale uncertainties were tested, with no significant impact on the combined result observed: a weak correlation scheme was finally adopted, where only the b-jet-specific jet energy scale uncertainty is correlated across the 7 TeV, 8 TeV and 13 TeV analyses.
Studying the impact of potential correlations in the modelling of the background processes is difficult, due to the changes in centre-of-mass energy, Monte Carlo generators, object and event selection, and in the software tools used for simulation, reconstruction and analysis. However, the potential impact of underestimating or omitting correlations is limited by the fact that each of these modelling systematic uncertainties only constitutes a fraction of the total uncertainty, and, furthermore, that this fraction in most cases varies with the centre-of-mass energy following variations in cross-section and acceptance. To evaluate the maximum potential effect of these correlations, a χ 2 -combination of the two measurements, the signal strengths from the Run 1 and Run 2 datasets, was performed and studied as a function of different linear correlation coefficients, expressing the degree of correlation between the two measurements. These coefficients were chosen to correspond to different correlation schemes, from uncorrelated to fully correlated, between the tt, Z + HF, and W + HF normalisations and systematic shape variations across the two datasets, and they were computed based on the assumed correlation and the relative contribution of a specific uncertainty to the total uncertainty for µ. In all cases considered, the impact on the combined signal strength was found to be smaller than 1%, while the effect on the signal strength uncertainty was found to be smaller than 4%.
As a result of these studies, among the experimental uncertainties, only the b-jet-specific jet energy scale uncertainty is correlated across the 7 TeV, 8 TeV and 13 TeV datasets for the combined results. For the Higgs boson signal, theory uncertainties in the overall cross-section, in the H → bb branching ratio and in the p V T -dependent NLO EW corrections, are correlated across the different datasets.

Results
The results of the Higgs boson search and diboson analysis are reported below. In the following the fitted signal strength parameters are denoted µ and µ VZ rather thanμ andμ VZ .

Results of the SM Higgs boson search at
√ s = 13 TeV Figure 2 shows a selection of characteristic post-fit distributions for each of the lepton channels, while Figure 3 shows the BDT output distributions in the most sensitive (high-p V T ) categories. The background prediction in all post-fit distributions is obtained by normalising the backgrounds and setting the systematic uncertainties according to the values of the floating normalisations and nuisance parameters obtained in the signal extraction fit. The post-global likelihood fit signal and background yields are shown in Table 13 for all the analysis regions.
s For the tested Higgs boson mass of 125 GeV, when all lepton channels are combined, the probability p 0 of obtaining from background alone a result at least as signal-like as the observation is 0.019%. In the presence of a Higgs boson with that mass and the SM signal strength, the expected p 0 value is 0.12%. The observation corresponds to an excess with a significance of 3.5 standard deviations, to be compared to an expectation of 3.0 standard deviations. Table 14  Combined fits are also performed with floating signal strength parameters separately for (i) the three lepton channels, or (ii) the WH and ZH production processes, but leaving all other NPs with the same correlation scheme as for the nominal result. The results of these fits are shown in Figures 4 and 5. The compatibility of the signal strength parameters measured in the three lepton channels is 10%. The WH and ZH production modes are observed with a significance of 2.4 and 2.6 standard deviations, respectively. The linear correlation term between the signal strengths related to the WH and the ZH production modes is 0.6%. Assuming that the observed signal is due to the SM Higgs boson, with corresponding modeldependent extrapolation corrections to the inclusive phase space, the signal strengths can be interpreted as measurements of the WH and ZH production cross-sections times the H → bb branching ratio. After removing the theoretical uncertainties for the production cross-sections and branching ratio, these are determined to be   Table 15.     Stat.     Combined 0.12% 0.019% 3.0 3.5 Table 15: The numbers of fitted signal and background events and the observed numbers of events in the four highest S /B bins of Figure 6. An entry of "-" indicates that a specific background component is negligible in a certain bin, or that no simulated events are left after the analysis selection.

Results of the dijet-mass analysis
The distributions of m bb in the dijet-mass analysis are shown in Figure 7 for the 2-jet category and the most sensitive analysis regions with p V T > 200 GeV for the 0-, 1-and 2-lepton channels separately. The m bb distribution for all channels and regions summed, weighted by their respective value of the ratio of fitted Higgs boson signal and background yields, and after subtraction of all backgrounds except for the (W/Z)Z diboson processes, is shown in Figure 8. The data and the sum of expected signal and backgrounds are found to be in good agreement.  compared to those for the multivariate analysis, with the largest difference between the respective central values of the two analyses being within 15%.

Results of the diboson analysis
The measurement of VZ production based on the multivariate analysis described in Section 8 returns a value of signal strength in good agreement with the Standard Model prediction. The VZ signal is observed with a significance of 5.8 standard deviations, to be compared to an expected significance of 5.3 standard deviations. Analogously to the V H signal, fits are also performed with separate signal strength parameters for the WZ and ZZ production modes, and the results are shown in Figure 9. Figure

Results of the combination with Run 1
The combination of the Run 1 and Run 2 analyses is used to estimate the combined probability p 0 of obtaining from a background-only experiment a signal at least as large as the one observed, to measure the combined signal strength, and to check the compatibility of the results from the two datasets.
For the tested Higgs boson mass of 125 GeV, the observed p 0 value is 0.018%. In the presence of a Higgs boson with that mass and the SM signal strength, the expected p 0 value is 3 × 10 −5 . The observation corresponds to an excess with a significance of 3.6 standard deviations, to be compared to an expectation of 4.0 standard deviations. For all channels combined the fitted value of the signal strength parameter is µ = 0.90 ± 0.18(stat.) +0.21 −0.19 (syst.).
The Run 1 and Run 2 analyses each contribute three measurements, corresponding to the three lepton channels, yielding a total of six measurements. Their compatibility is estimated to be 7%. Fits are also performed with the signal strength parameters floated independently for the WH and ZH production processes, and for Run 1 and Run 2. The compatibility of the signal strengths for the WH and ZH production processes is 34%, and the results of this fit are shown in Figure 11. The compatibility of the signal strength parameters measured in Run 1 with those measured in Run 2 is 21%. Figure 12 shows the signal strengths as measured separately for the 7 TeV, 8 TeV and 13 TeV datasets and their combination. Figure 13 shows the data, background and signal yields, where final-discriminant bins in all regions are combined into bins of log(S /B). Here, S and B are the fitted signal and background yields, respectively.

Conclusion
Evidence for a Standard Model Higgs boson decaying into a bb pair and produced in association with a W or Z boson is presented, using data collected by the ATLAS experiment in proton-proton collisions from Run 2 of the Large Hadron Collider. This dataset corresponds to an integrated luminosity of 36.1 fb −1 collected at a centre-of-mass energy of √ s =13 TeV. An excess over the expected background is observed, with a significance of 3.5 standard deviations compared to an expectation of 3.0. The measured signal strength with respect to the SM prediction for m H = 125 GeV is found to be µ = 1.20 +0.24 −0.23 (stat.) +0.34 −0.28 (syst.). The analysis procedure adopted to extract the Higgs boson signal is also used to measure the yield of (W/Z)Z production with Z → bb, where the ratio of the observed yield to that expected in the Standard Model is found to be 1.11 +0.12 The crucial computing support from all WLCG partners is acknowledged gratefully, in particular from CERN, the ATLAS Tier-1 facilities at TRIUMF (Canada), NDGF (Denmark, Norway, Sweden), CC-IN2P3 (France), KIT/GridKA (Germany), INFN-CNAF (Italy), NL-T1 (Netherlands), PIC (Spain), ASGC (Taiwan), RAL (UK) and BNL (USA), the Tier-2 facilities worldwide and large non-WLCG resource providers. Major contributors of computing resources are listed in Ref. [109].    [90] ATLAS Collaboration, Measurement of the tau lepton reconstruction and identification performance in the ATLAS experiment using pp collisions at √ s = 13 TeV, ATLAS-CONF-2017-029, 2017, url: https://cds.cern.ch/record/2261772.

The ATLAS Collaboration
Italy y Also at Fakultät für Mathematik und Physik, Albert-Ludwigs-Universität, Freiburg, Germany z Also at Institute for Mathematics, Astrophysics and Particle Physics, Radboud University Nijmegen/Nikhef, Nijmegen, Netherlands aa Also at Department of Physics, The University of Texas at Austin, Austin TX, United States of America ab Also at Institute of Theoretical Physics, Ilia State University, Tbilisi, Georgia ac Also at CERN, Geneva, Switzerland ad Also at Georgian Technical University (GTU),Tbilisi, Georgia ae Also at Ochadai Academic Production, Ochanomizu University, Tokyo, Japan a f Also