Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector

Results of a search for H → ττ decays are presented, based on the full set of proton-proton collision data recorded by the ATLAS experiment at the LHC during 2011 and 2012. The data correspond to integrated luminosities of 4.5 fb−1 and 20.3 fb−1 at centre-of-mass energies of s=7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{s}=7 $$\end{document} TeV and s=8\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{s}=8 $$\end{document} TeV respectively. All combinations of leptonic (τ→ℓνν¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \tau \to \ell \nu \overline{\nu} $$\end{document} with ℓ = e, μ) and hadronic (τ → hadrons ν) tau decays are considered. An excess of events over the expected background from other Standard Model processes is found with an observed (expected) significance of 4.5 (3.4) standard deviations. This excess provides evidence for the direct coupling of the recently discovered Higgs boson to fermions. The measured signal strength, normalised to the Standard Model expectation, of μ = 1. 43− 0.37+ 0.43 is consistent with the predicted Yukawa coupling strength in the Standard Model.

1 Introduction The investigation of the origin of electroweak symmetry breaking and, related to this, the experimental confirmation of the Brout-Englert-Higgs mechanism [1][2][3][4][5][6] is one of the prime goals of the physics programme at the Large Hadron Collider (LHC) [7]. With the discovery of a Higgs boson with a mass of approximately 125 GeV by the ATLAS [8] and CMS [9] collaborations, an important milestone has been reached. More precise measurements of the properties of the discovered particle [10,11] as well as tests of the spin-parity quantum numbers [12][13][14] continue to be consistent with the predictions for the Standard Model (SM) Higgs boson.
These measurements rely predominantly on studies of the bosonic decay modes, H → γγ, H → ZZ * and H → W W * . To establish the mass generation mechanism for fermions as implemented in the SM, it is of prime importance to demonstrate the direct coupling of the Higgs boson to fermions and the proportionality of its strength to mass [15]. The most promising candidate decay modes are the decays into tau leptons, H → τ τ , and bottom quarks (b-quarks), H → bb. Due to the high background, the search for decays to bb is restricted to Higgs bosons produced in modes which have a more distinct signature but a lower cross-section, such as H production with an associated vector boson. The smaller rate of these processes in the presence of still large background makes their detection challenging. More favourable signal-to-background conditions are expected for H → τ τ decays. Recently, the CMS Collaboration published evidence for H → τ τ decays at a significance in terms of standard deviations of 3.2σ [16], and an excess corresponding to a significance of 2.1σ in the search for H → bb decays [17]. The combination of channels provides evidence for fermionic couplings with a significance of 3.8σ [18]. The yield of events in the search for H → bb decays observed by the ATLAS Collaboration has a signal significance of 1.4σ [19]. The Tevatron experiments have observed an excess corresponding to 2.8σ in the H → bb search [20].
In this paper, the results of a search for H → τ τ decays are presented, based on the full proton-proton dataset collected by the ATLAS experiment during the 2011 and 2012 data-taking periods, corresponding to integrated luminosities of 4.5 fb −1 at a centre-ofmass energy of √ s = 7 TeV and 20.3 fb −1 at √ s = 8 TeV. These results supersede the earlier upper limits on the cross section times the branching ratio obtained with the 7 TeV data [21]. All combinations of leptonic (τ → νν with = e, µ) and hadronic (τ → hadrons ν) tau decays are considered. 1 The corresponding three analysis channels are denoted by τ lep τ lep , τ lep τ had , and τ had τ had in the following. The search is designed to be sensitive to the major production processes of a SM Higgs boson, i.e. production via gluon fusion (ggF) [22], vector-boson fusion (VBF) [23], and associated production (V H) with V = W or Z. These production processes lead to different final-state signatures, which are exploited by defining an event categorisation. Two dedicated categories are considered to achieve both a good signal-to-background ratio and good resolution for the reconstructed τ τ invariant mass. The VBF category, enriched in events produced via vector-boson fusion, is defined by the presence of two jets with a large separation in pseudorapidity. 2 The boosted category contains events where the reconstructed Higgs boson candidate has a large transverse momentum. It is dominated by events produced via gluon fusion with additional jets from gluon radiation. In view of the signal-to-background conditions, and in order to exploit correlations between final-state observables, a multivariate analysis technique, based on boosted decision trees (BDTs) [24][25][26], is used to extract the final results. As a cross-check, a separate analysis where cuts on kinematic variables are applied is carried out.

The ATLAS detector and object reconstruction
The ATLAS detector [27] is a multi-purpose detector with a cylindrical geometry. It comprises an inner detector (ID) surrounded by a thin superconducting solenoid, a calorimeter system and an extensive muon spectrometer in a toroidal magnetic field. The ID tracking system consists of a silicon pixel detector, a silicon microstrip detector (SCT), and a transition radiation tracker (TRT). It provides precise position and momentum measurements for charged particles and allows efficient identification of jets containing b-hadrons (b-jets) in the pseudorapidity range |η| < 2.5. The ID is immersed in a 2 T axial magnetic field and is surrounded by high-granularity lead/liquid-argon (LAr) sampling electromagnetic calorimeters which cover the pseudorapidity range |η| < 3.2. A steel/scintillator tile calorimeter provides hadronic energy measurements in the central pseudorapidity range (|η| < 1.7). In the forward regions (1.5 < |η| < 4.9), the system is complemented by two end-cap calorimeters using LAr as active material and copper or tungsten as absorbers. The muon spectrometer (MS) surrounds the calorimeters and consists of three large superconducting eight-coil toroids, a system of tracking chambers, and detectors for triggering. The deflection of muons is measured within |η| < 2.7 by three layers of precision drift tubes, and cathode strip chambers in the innermost layer for |η| > 2.0. The trigger chambers consist of resistive plate chambers in the barrel (|η| < 1.05) and thin-gap chambers in the end-cap regions (1.05 < |η| < 2.4).
A three-level trigger system [28] is used to select events. A hardware-based Level-1 trigger uses a subset of detector information to reduce the event rate to a value to 75 kHz or less. The rate of accepted events is then reduced to about 400 Hz by two software-based trigger levels, Level-2 and the Event Filter.
The reconstruction of the basic physics objects used in this analysis is described in the following. The primary vertex referenced below is chosen as the proton-proton vertex candidate with the highest sum of the squared transverse momenta of all associated tracks.
Electron candidates are reconstructed from energy clusters in the electromagnetic calorimeters matched to a track in the ID. They are required to have a transverse en- 2 The ATLAS experiment uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam direction. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r, φ) are used in the transverse (x, y) plane, φ being the azimuthal angle around the beam direction. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). The distance ∆R in the η-φ space is defined as ∆R = (∆η) 2 + (∆φ) 2 .
ergy, E T = E sin θ, greater than 15 GeV, to be within the pseudorapidity range |η| < 2.47, and to satisfy the medium shower shape and track selection criteria defined in ref. [29]. Candidates found in the transition region between the end-cap and barrel calorimeters (1.37 < |η| < 1.52) are not considered. Typical reconstruction and identification efficiencies for electrons satisfying these selection criteria range between 80% and 90% depending on E T and η.
Muon candidates are reconstructed using an algorithm [30] that combines information from the ID and the MS. The distance between the z-position of the point of closest approach of the muon inner-detector track to the beam-line and the z-coordinate of the primary vertex is required to be less than 1 cm. This requirement reduces the contamination due to cosmicray muons and beam-induced backgrounds. Muon quality criteria such as inner detector hit requirements are applied to achieve a precise measurement of the muon momentum and reduce the misidentification rate. Muons are required to have a momentum in the transverse plane p T > 10 GeV and to be within |η| < 2.5. Typical efficiencies for muons satisfying these selection criteria are above 95% [30].
Jets are reconstructed using the anti-k t jet clustering algorithm [31,32] with a radius parameter R = 0.4, taking topological energy clusters [33] in the calorimeters as inputs. Jet energies are corrected for the contribution of multiple interactions using a technique based on jet area [34] and are calibrated using p T -and η-dependent correction factors determined from data and simulation [35][36][37]. Jets are required to be reconstructed in the range |η| < 4.5 and to have p T > 30 GeV. To reduce the contamination of jets by additional interactions in the same or neighbouring bunch crossings (pile-up), tracks originating from the primary vertex must contribute a large fraction of the p T when summing the scalar p T of all tracks in the jet. This jet vertex fraction (JVF) is required to be at least 75% (50%) for jets with |η| < 2.4 in the 7 TeV (8 TeV) dataset. Moreover, for the 8 TeV dataset, the JVF selection is applied only to jets with p T < 50 GeV. Jets with no associated tracks are retained.
In the pseudorapidity range |η| < 2.5, b-jets are selected using a tagging algorithm [38]. The b-jet tagging algorithm has an efficiency of 60-70% for b-jets in simulated tt events. The corresponding light-quark jet misidentification probability is 0.1-1%, depending on the jet's p T and η.
Hadronically decaying tau leptons are reconstructed starting from clusters of energy in the electromagnetic and hadronic calorimeters. The τ had 3 reconstruction is seeded by the anti-k t jet finding algorithm with a radius parameter R = 0.4. Tracks with p T > 1 GeV within a cone of radius 0.2 around the cluster barycentre are matched to the τ had candidate, and the τ had charge is determined from the sum of the charges of its associated tracks. The rejection of jets is provided in a separate identification step using discriminating variables based on tracks with p T > 1 GeV and the energy deposited in calorimeter cells found in the core region (∆R < 0.2) and in the region 0.2 < ∆R < 0.4 around the τ had candidate's direction. Such discriminating variables are combined in a boosted decision tree and three working points, labelled tight, medium and loose [39], are defined, corresponding to different τ had identification efficiency values. In this analysis, τ had candidates with p T > 20 GeV and |η| < 2.47 are used. The τ had candidates are required to have charge ±1, and must be 1-or 3-track (prong) candidates. In addition, a sample without the charge and track multiplicity requirements is retained for background modelling in the τ had τ had channel, as described in section 6.2. The identification efficiency for τ had candidates satisfying the medium criteria is of the order of 55-60%. Dedicated criteria [39] to separate τ had candidates from misidentified electrons are also applied, with a selection efficiency for true τ had decays of 95%. The probability to misidentify a jet with p T > 20 GeV as a τ had candidate is typically 1-2%.
Following their reconstruction, candidate leptons, hadronically decaying taus and jets may point to the same energy deposits in the calorimeters (within ∆R < 0.2). Such overlaps are resolved by selecting objects in the following order of priority (from highest to lowest): muons, electrons, τ had , and jet candidates. For all channels, the leptons that are considered in overlap removal with τ had candidates need to only satisfy looser criteria than those defined above, to reduce misidentified τ had candidates from leptons. The p T threshold of muons considered in overlap removal is also lowered to 4 GeV.
The missing transverse momentum (with magnitude E miss T ) is reconstructed using the energy deposits in calorimeter cells calibrated according to the reconstructed physics objects (e, γ, τ had , jets and µ) with which they are associated [40]. The transverse momenta of reconstructed muons are included in the E miss T calculation, with the energy deposited by these muons in the calorimeters taken into account. The energy from calorimeter cells not associated with any physics objects is scaled by a soft-term vertex fraction and also included in the E miss T calculation. This fraction is the ratio of the summed scalar p T of tracks from the primary vertex not matched with objects to the summed scalar p T of all tracks in the event also not matched to objects. This method allows to achieve a more accurate reconstruction of the E miss T in high pile-up conditions [41].

Data and simulated samples
After data quality requirements, the integrated luminosities of the samples used are 4.5 fb −1 at √ s = 7 TeV and 20.3 fb −1 at √ s = 8 TeV. Samples of signal and background events were simulated using various Monte Carlo (MC) generators, as summarised in table 1. The generators used for the simulation of the hard-scattering process and the model used for the simulation of the parton shower, of the hadronisation and of the underlying-event activity are listed. In addition, the cross section values to which the simulation is normalised and the perturbative order in QCD of the respective calculations are provided.
The signal contributions considered include the three main processes for Higgs boson production at the LHC: ggF, VBF, and associated V H production processes. The contributions from the associated ttH production process are found to be small and are neglected. The ggF and VBF production processes are simulated with Powheg [42][43][44][45]

interfaced to
Pythia8 [46]. In the Powheg event generator, the CT10 [47] parameterisation of the parton distribution functions (PDFs) is used. The overall normalisation of the ggF process is taken from a calculation at next-to-next-to-leading-order (NNLO) [48][49][50][51][52][53] in QCD, including soft-gluon resummation up to next-to-next-to-leading logarithm terms (NNLL) [54]. Next-to-leading order (NLO) electroweak (EW) corrections are also included [55,56]. Production by VBF is normalised to a cross section calculated with full NLO QCD and EW corrections [57][58][59] with an approximate NNLO QCD correction applied [60]. The associated V H production process is simulated with Pythia8. The Cteq6L1 [61] parameterisation of PDFs is used for the Pythia8 event generator. The predictions for V H production are normalised to cross sections calculated at NNLO in QCD [62], with NLO EW radiative corrections [63] applied.
Additional corrections to the shape of the generated p T distribution of Higgs bosons produced via ggF are applied to match the distribution from a calculation at NNLO including the NNLL corrections provided by the HRes2.1 [64] program. In this calculation, the effects of finite masses of the top and bottom quarks [64,65] are included and dynamical renormalisation and factorisation scales, µ R , µ F = m 2 H + p 2 T , are used. A reweighting is performed separately for events with no more than one jet at particle level and for events with two or more jets. In the latter case, the Higgs boson p T spectrum is reweighted to match the MinLo HJJ predictions [66]. The reweighting is derived such that the inclusive Higgs boson p T spectrum and the p T spectrum of events with at least two jets match the HRes2.1 and MinLo HJJ predictions respectively, and that the jet multiplicities are in agreement with (N)NLO calculations from JetVHeto [67][68][69].
The NLO EW corrections for VBF production depend on the p T of the Higgs boson, varying from a few percent at low p T to ∼ 20% at p T = 300 GeV [70]. The p T spectrum of the VBF-produced Higgs boson is therefore reweighted, based on the difference between the Powheg+Pythia calculation and the Hawk [57,58] calculation which includes these corrections.
The main and largely irreducible Z/γ * → τ τ background is modelled using Z/γ * → µµ events from data, 4 where the muon tracks and associated energy depositions in the calorimeters are replaced by the corresponding simulated signatures of the final-state particles of the tau decay. In this approach, essential features such as the modelling of the kinematics of the produced boson, the modelling of the hadronic activity of the event (jets and underlying event) as well as contributions from pile-up are taken from data. Thereby the dependence on the simulation is minimised and only the τ decays and the detector response to the tau-lepton decay products are based on simulation. By requiring two isolated, high-energy muons with opposite charge and a dimuon invariant mass m µµ > 40 GeV, Z → µµ events can be selected from the data with high efficiency and purity. To replace the muons in the selected events, all tracks associated with the muons are removed and calorimeter cell energies associated with the muons are corrected by subtracting the corresponding energy depositions in a single simulated Z → µµ event with the same kinematics. Finally, both the track information and the calorimeter cell energies from a simulated Z → τ τ decay are added to the data event. The decays of the tau leptons are simulated by Tauola [71].
The tau lepton kinematics are matched to the kinematics of the muons they are replacing, including polarisation and spin correlations [72], and the mass difference between the muons and the tau leptons is accounted for. This hybrid sample is referred to as embedded data in the following.
Other background processes are simulated using different generators, each interfaced to Pythia [46,73] or Herwig [74,75] to provide the parton shower, hadronisation and the modelling of the underlying event, as indicated in table 1. For the Herwig samples, the decays of tau leptons are also simulated using Tauola [71]. Photon radiation from charged leptons for all samples is provided by Photos [76]. The samples for W/Z+jets production are generated with Alpgen [77], employing the MLM matching scheme [78] between the hard process (calculated with LO matrix elements for up to five partons) and the parton shower. For W W production, the loop-induced gg → W W process is also generated, using the gg2WW [79] program. In the AcerMC [80], Alpgen, and Herwig event generators, the Cteq6L1 parameterisation of the PDFs is used, while the CT10 parameterisation is used for the generation of events with gg2WW. The normalisation of these background contributions is either estimated from control regions using data, as described in section 6, or the cross sections quoted in table 1 are used. For all samples, a full simulation of the ATLAS detector response [81] using the Geant4 program [82] was performed. In addition, events from minimum-bias interactions were simulated using the AU2 [83] parameter tuning of Pythia8. The AU2 tune includes the set of optimized parameters for the parton shower, hadronisation, and multiple parton interactions. They are overlaid on the simulated signal and background events according to the luminosity profile of the recorded data. The contributions from these pile-up interactions are simulated both within the same bunch crossing as the hard-scattering process and in neighbouring bunch crossings. Finally, the resulting simulated events are processed through the same reconstruction programs as the data.

Event selection
Single lepton, dilepton and di-τ had triggers were used to select the events for the analysis. A summary of the triggers used by each channel at the two centre-of-mass energies is reported in table 2. Due to the increasing luminosity and the different pile-up conditions, the online p T thresholds increased during data-taking in 2011 and again for 2012, and more stringent identification requirements were applied for the data-taking in 2012. The p T requirements on the objects in the analysis are usually 2 GeV higher than the trigger requirements, to ensure that the trigger is fully efficient.
In addition to applying criteria to ensure that the detector was functioning properly, requirements to increase the purity and quality of the data sample are applied by rejecting non-collision events such as cosmic rays and beam-halo events. At least one reconstructed vertex is required with at least four associated tracks with p T > 400 MeV and a position consistent with the beam spot.  Table 1.
Monte Carlo generators used to model the signal and the background processes at √ s = 8 TeV. The cross sections times branching fractions (σ × B) used for the normalisation of some processes (many of these are subsequently normalised to data) are included in the last column together with the perturbative order of the QCD calculation. For the signal processes the H → τ τ SM branching ratio is included, and for the W and Z/γ * background processes the branching ratios for leptonic decays ( = e, µ, τ ) of the bosons are included. For all other background processes, inclusive cross sections are quoted (marked with a †).
With respect to the object identification requirements described in section 2, tighter criteria are applied to address the different background contributions and compositions in the different analysis channels. Higher p T thresholds are applied to electrons, muons, and τ had candidates according to the trigger conditions satisfied by the event, as listed in table 2. For the channels involving leptonic tau decays, τ lep τ lep and τ lep τ had , additional isolation criteria for electrons and muons, based on tracking and calorimeter information, are used to suppress the background from misidentified jets or from semileptonic decays of charm and bottom hadrons. The calorimeter isolation variable I(E T , ∆R) is defined as the sum of the total transverse energy in the calorimeter in a cone of size ∆R around the electron cluster or the muon track, divided by the E T of the electron cluster or the p T of the muon respectively. The track-based isolation I(p T , ∆R) is defined as the sum of the transverse momenta of tracks within a cone of ∆R around the electron or muon track, divided by the E T of the electron cluster or the muon p T respectively. The isolation Single muon 18 Di-electron 12/12 ee:  Table 2. Summary of the triggers used to select events for the different analysis channels at the two centre-of-mass energies. The transverse momentum thresholds applied at trigger level and in the analysis are listed. When more than one trigger is used, a logical OR is taken and the trigger efficiencies are calculated accordingly. requirements applied are slightly different for the two centre-of-mass energies and are listed in table 3.
In the τ had τ had channel, isolated taus are selected by requiring that there are no tracks with p T > 0.5 GeV in an isolation region of 0.2 < ∆R < 0.6 around the tau direction. This requirement leads to a 12% (4%) efficiency loss for hadronic taus, while 30% (10%) of contamination from jets is rejected in 8 (7) TeV data.
After the basic lepton selection, further channel-dependent cuts are applied, as detailed in the following. The full event selection is summarised in table 4.  Table 3. Summary of isolation requirements applied for the selection of isolated electrons and muons at the two centre-of-mass energies. The isolation variables are defined in the text.
The τ lep τ lep channel: exactly two isolated leptons with opposite-sign (OS) electric charges, passing the p T threshold listed in table 2, are required. Events containing a τ had candidate are vetoed. For the τ had candidates considered, the criteria used to reject electrons misidentified as τ had candidates are tightened to a working-point of 85% signal efficiency [39].
In addition to the irreducible Z → τ τ background, sizeable background contributions from Z → and from tt production are expected in this channel. Background contributions from Z decays, but also from low mass resonances (charmonium and bottomonium), are rejected by requirements on the invariant mass m vis τ τ of the visible tau decay products, on the angle ∆φ between the two leptons in the transverse plane and on E miss T . To reject the large Z → contribution in events with same-flavour (SF) leptons (ee, µµ), more stringent cuts on the visible mass and on E miss T are applied for these events than for events with different-flavour (DF) leptons (eµ). For SF final states, an additional variable named high-p T objects E miss T (E miss,HPTO T ) is also used to reject background from Z/γ * production. It is calculated from the high-p T objects in the event, i.e. from the two leptons and from jets with p T > 25 GeV. Due to the presence of real neutrinos, the two E miss T variables are strongly correlated for signal events but only loosely correlated for background from Z → ee and Z → µµ decays.
To further suppress background contributions from misidentified leptons 5 a minimum value of the scalar sum of the transverse momenta of the two leptons is required. Contributions from tt events are further reduced by rejecting events with a b-jet with p T > 25 GeV.
Within the collinear approximation [99], i.e. assuming that the tau directions are given by the directions of the visible tau decay products and that the momenta of the neutrinos constitute the missing transverse momentum, the tau momenta can be reconstructed. For tau decays, the fractions of the tau momenta carried by the visible decay products, 6 x τ,i = p vis,i /(p vis,i + p mis,i ), with i = 1, 2, are expected to lie in the interval 0 < x τ,i < 1, and hence corresponding requirements are applied to further reject non-tau background contributions.
Finally, to avoid overlap between this analysis and the search for H → W W * → ν ν decays, the τ τ mass in the collinear approximation is required to satisfy the condition m coll τ τ > m Z − 25 GeV.
The τ lep τ had channel: exactly one isolated lepton and one τ had candidate with OS charges, passing the p T thresholds listed in table 2, are required. The criteria used to reject electrons misidentified as τ had are also tightened in this channel to a working-point of 85% signal efficiency [39]. The production of W +jets and of top quarks constitute the dominant reducible background in this channel. To substantially reduce the W +jets contribution, a cut on the transverse mass 7 constructed from the lepton and the missing transverse momentum is applied and events with m T > 70 GeV are rejected. Contributions from tt events are reduced by rejecting events with a b-jet with p T > 30 GeV.
The τ had τ had channel: one isolated medium τ had candidate and one isolated tight τ had candidate with OS charges are required. Events with electron or muon candidates are rejected. For all data, the missing transverse momentum must satisfy E miss T > 20 GeV and its direction must either be between the two visible τ had candidates in φ or within ∆φ < π/4 of the nearest τ had candidate. To further reduce the background from multijet production, additional cuts on the ∆R and pseudorapidity separation ∆η between the two τ had candidates are applied.
With these selections, there is no overlap between the individual channels.

Analysis categories
To exploit signal-sensitive event topologies, two analysis categories are defined in an exclusive way.
• The VBF category targets events with a Higgs boson produced via vector boson fusion and is characterised by the presence of two high-p T jets with a large pseudorapidity separation (see table 4). The ∆η(j 1 , j 2 ) requirement is applied to the two highestp T jets in the event. In the τ lep τ had channel, there is an additional requirement that m vis τ τ > 40 GeV, to eliminate low-mass Z/γ * events. Although this category is dominated by VBF events, it also includes smaller contributions from ggF and V H production.
• The boosted category targets events with a boosted Higgs boson produced via ggF.
Higgs boson candidates are therefore required to have large transverse momentum, p H T > 100 GeV. The p H T is reconstructed using the vector sum of the missing transverse momentum and the transverse momentum of the visible tau decay products. In the τ lep τ lep channel, at least one jet with p T > 40 GeV is required. The jet requirement where ∆φ is the azimuthal separation between the directions of the lepton and the missing transverse momentum.
Exactly one isolated lepton and one medium τ had candidate with opposite charges m T < 70 GeV Events with a b-tagged jet with p T > 30 GeV are rejected One isolated medium and one isolated tight opposite-sign τ had -candidate Events with leptons are vetoed E miss T > 20 GeV E miss T points between the two visible taus in φ, or min[∆φ(τ, E miss Channel VBF category selection cuts At least two jets with p j1 T > 40 GeV and p j2 At least two jets with p j1 T > 50 GeV and p j2 T > 30 GeV p j2 T > 35 GeV for jets with |η| > 2.4 ∆η(j 1 , j 2 ) > 2.0

Channel Boosted category selection cuts
At least one jet with p T > 40 GeV All Failing the VBF selection p H T > 100 GeV Table 4. Summary of the event selection for the three analysis channels. The requirements used in both the preselection and for the definition of the analysis categories are given. The labels (1) and (2) refer to the leading (highest p T ) and subleading final-state objects (leptons, τ had , jets). The variables are defined in the text.
selects a region of the phase space where the E miss T of same-flavour events is well modelled by simulation. In order to define an orthogonal category, events passing the VBF category selection are not considered. This category also includes small contributions from VBF and VH production.
While these categories are conceptually identical across the three channels, differences in the dominant background contributions require different selection criteria. For both categories, the requirement on jets is inclusive and additional jets, apart from those passing the category requirements, are allowed.
For the τ had τ had channel, the so-called rest category is used as a control region. In this category, events passing the preselection requirements but not passing the VBF or boosted category selections are considered. This category is used to constrain the Z → τ τ and multijet background contributions. The signal contamination in this category is negligible.

Higgs boson candidate mass reconstruction
The di-tau invariant mass (m MMC τ τ ) is reconstructed using the missing mass calculator (MMC) [100]. This requires solving an underconstrained system of equations for six to eight unknowns, depending on the number of neutrinos in the τ τ final state. These unknowns include the x-, y-, and z-components of the momentum carried by the neutrinos for each of the two tau leptons in the event, and the invariant mass of the two neutrinos from any leptonic tau decays. The calculation uses the constraints from the measured xand y-components of the missing transverse momentum, and the visible masses of both tau candidates. A scan is performed over the two components of the missing transverse momentum vector and the yet undetermined variables. Each scan point is weighted by its probability according to the E miss T resolution and the tau decay topologies. The estimator for the τ τ mass is defined as the most probable value of the scan points.
The MMC algorithm provides a solution for ∼99% of the H → τ τ and Z → τ τ events. This is a distinct advantage compared to the mass calculation using the collinear approximation where the failure rate is higher due to the implicit collinearity assumptions. The small loss rate of about 1% for signal events is due to large fluctuations of the E miss T measurement or other scan variables. Figure 1 shows reconstructed m MMC τ τ mass distributions for H → τ τ and Z → τ τ events in the τ lep τ had VBF and boosted categories. The mass resolution, defined as the ratio between the full width at half maximum (FWHM) and the peak value of the mass distribution (m peak ), is found to be ≈ 30% for all categories and channels.   for H → τ τ (m H = 125 GeV) and Z → τ τ events in MC simulation and embedding respectively, for events passing (a) the VBF category selection and (b) the boosted category selection in the τ lep τ had channel.

Boosted decision trees
Boosted decision trees are used in each category to extract the Higgs boson signal from the large number of background events. Decision trees [24] recursively partition the parameter space into multiple regions where signal or background purities are enhanced. Boosting is a method which improves the performance and stability of decision trees and involves the combination of many trees into a single final discriminant [25,26]. After boosting, the final score undergoes a transformation to map the scores on the interval −1 to +1. The most signal-like events have scores near 1 while the most background-like events have scores near −1.
Separate BDTs are trained for each analysis category and channel with signal and background samples, described in section 6, at √ s = 8 TeV. They are then applied to the analysis of the data at both centre-of-mass energies. The separate training naturally exploits differences in event kinematics between different Higgs boson production modes. It also allows different discriminating variables to be used to address the different background compositions in each channel. For the training in the VBF category, only a VBF Higgs production signal sample is used, while training in the boosted category uses ggF, VBF, and V H signal samples. The Higgs boson mass is chosen to be m H = 125 GeV for all signal samples. The BDT input variables used at both centre-of-mass energies are listed in table 5. Most of these variables have straightforward definitions, and the more complex ones are defined in the following.
• ∆R(τ 1 , τ 2 ): the distance ∆R between the two leptons, between the lepton and τ had , or between the two τ had candidates, depending on the decay mode.
• p Total T : magnitude of the vector sum of the transverse momenta of the visible tau decay products, the two leading jets, and E miss T .
• Sum p T : scalar sum of the p T of the visible components of the tau decay products and of the jets.
• E miss T φ centrality: a variable that quantifies the relative angular position of the missing transverse momentum with respect to the visible tau decay products in the transverse plane. The transverse plane is transformed such that the direction of the tau decay products are orthogonal, and that the smaller φ angle between the tau decay products defines the positive quadrant of the transformed plane. The E miss T φ centrality is defined as the sum of the x-and y-components of the E miss T unit vector in this transformed plane.
• Sphericity: a variable that describes the isotropy of the energy flow in the event [101].
It is based on the quadratic momentum tensor In this equation, α and β are the indices of the tensor. The summation is performed over the momenta of the selected leptons and jets in the event. The sphericity of the event (S) is then defined in terms of the two smallest eigenvalues of this tensor, λ 2 and λ 3 , • min(∆η 1 2 ,jets ): the minimum ∆η between the dilepton system and either of the two jets.
• Object η centrality: a variable that quantifies the η position of an object (an isolated lepton, a τ had candidate or a jet) with respect to the two leading jets in the event. It is defined as where η, η 1 and η 2 are the pseudorapidities of the object and the two leading jets respectively. This variable has a value of 1 when the object is halfway in η between the two jets, 1/e when the object is aligned with one of the jets, and < 1/e when the object is not between the jets in η. In the τ lep τ lep channel the η centrality of a third jet in the event, C η 1 ,η 2 (η j 3 ), and the product of the η centralities of the two leptons are used as BDT input variables, while in the τ lep τ had channel the η centrality of the lepton, C η 1 ,η 2 (η ), is used, and in the τ had τ had channel the η centrality of each τ , C η 1 ,η 2 (η τ 1 ) and C η 1 ,η 2 (η τ 2 ), is used. Events with only two jets are assigned a dummy value of −0.5 for C η 1 ,η 2 (η j 3 ).
Among these variables the most discriminating ones include m MMC τ τ , ∆R(τ 1 , τ 2 ) and ∆η(j 1 , j 2 ). Figure 2 shows the distributions of selected BDT input variables. For the VBF category, the distributions of ∆η(j 1 , j 2 ) are shown for all three channels. For the boosted category, the distributions of ∆R(τ 1 , τ 2 ) are shown for the τ lep τ had and τ had τ had channels and the distribution of the p T of the leading jet is shown for the τ lep τ lep channel. For all distributions, the data are compared to the predicted SM backgrounds at √ s = 8 TeV. The corresponding uncertainties are indicated by the shaded bands. All input distributions are well described, giving confidence that the background models (from simulation and data) describe well the relevant input variables of the BDT. Similarly, good agreement is found for the distributions at √ s = 7 TeV.   T in the τ lep τ lep channel, for (c) ∆η(j 1 , j 2 ) and (d) ∆R(τ 1 , τ 2 ), the distance ∆R between the lepton and τ had , in the τ lep τ had channel and for (e) ∆η(j 1 , j 2 ) and (f) ∆R(τ 1 , τ 2 ), the distance ∆R between the two τ had candidates, in the τ had τ had channel. The contributions from a Standard Model Higgs boson with m H = 125 GeV are superimposed, multiplied by a factor of 50. These figures use background predictions made without the global fit defined in section 8. The error band includes statistical and pre-fit systematic uncertainties.

Background estimation
The different final-state topologies of the three analysis channels have different background compositions which necessitate different strategies for the background estimation. In general, the number of expected background events and the associated kinematic distributions are derived from a mixture of data-driven methods and simulation. The normalisation of several important background contributions is performed by comparing the simulated samples of individual background sources to data in regions which only have a small or negligible contamination from signal or other background events. The control regions used in the analysis are summarised in table 6. Common to all channels is the dominant Z → τ τ background, for which the kinematic distributions are taken from data by employing the embedding technique, as described in section 3. Background contributions from jets that are misidentified as hadronically decaying taus (fake backgrounds) are estimated by using either a fake-factor method or samples of non-isolated τ had candidates. Likewise, samples of non-isolated leptons are used to estimate fake-lepton contributions from jets or hadronically decaying taus and leptons from other sources, such as heavy-quark decays. 8 Contributions from various other physics processes with leptons and/or τ had candidates in the final state are estimated using the simulation, normalised to the theoretical cross sections, as given in table 1. A more detailed discussion of the estimation of the various background components in the different channels is given in the following.

Background from Z → τ τ production
A reliable modelling of the irreducible Z → τ τ background is an important ingredient of the analysis. It has been shown in other ATLAS analyses that existing Z+jets Monte Carlo simulation needs to be reweighted to model data correctly [102][103][104]. Additionally, it is not possible to select a sufficiently pure and signal-free Z → τ τ control sample from data to model the background in the signal region. Therefore this background is estimated using embedded data, as described in section 3. This procedure was extensively validated using both data and simulation. To validate the subtraction procedure of the muon cell energies and tracks from data and the subsequent embedding of the corresponding information from simulation, the muons in Z → µµ events are replaced by simulated muons. The calorimeter isolation energy in a cone of ∆R = 0.3 around the muons from data before and after embedding is compared in figure 3(a). Good agreement is found, which indicates that no deterioration (e.g. possible energy biases) in the muon environment is introduced. Another important test validates the embedding of more complex Z → τ τ events, which can only be performed in the simulation. To achieve a meaningful validation, the same MC generator with identical settings was used to simulate both the Z → µµ and Z → τ τ events. The sample of embedded events is corrected for the bias due to the trigger, reconstruction and acceptance of the original muons. These corrections are determined from data as a function of p µ T and η(µ), and allow the acceptance of the original selection to be corrected. The tau decay products are treated like any other objects obtained from the simulation, with one important difference due to the absence of trigger simulation in this sample. Trigger effects are parameterised from the simulation as a function of the tau decay product p T . After replacing the muons with simulated taus, kinematic distributions of the embedded sample can be directly compared to the fully simulated ones. As an example, the reconstructed invariant mass, m MMC τ τ , is shown in figure 3(b), for the τ lep τ had final state. Good agreement is found and the observed differences are covered by the systematic uncertainties. Similarly, good agreement is found for other variables, such as the missing transverse momentum, the kinematic variables of the hadronically decaying tau lepton or of the associated jets in the event. A direct comparison of the Z → τ τ background in data and the modelling using the embedding technique also shows good agreement. This can be seen in several kinematic quantity distributions, which are dominated by Z → τ τ events, shown in figure 2.
The normalisation of this background process is taken from the final fit described in section 8. The normalisation is independent for the τ lep τ lep , τ lep τ had , and τ had τ had analysis channels.
[GeV]  , in the τ lep τ had final state, for simulated Z → τ τ events, compared to the one obtained from simulated Z → µµ events after tau embedding. The ratios of the values before and after the embedding and between the embedded Z → µµ and Z → τ τ events are given in (a) and (b) respectively. The errors in (a) and (b) on the ratios (points) represent the statistical uncertainties, while the systematic uncertainties are indicated by the hatched bands in (b). The shaded bands represent the statistical uncertainties from the Z → µµ data events in (a) and from the Z → τ τ simulation in (b).

Background from misidentified leptons or hadronically decaying taus
For the τ lep τ lep channel, all background sources resulting from misidentified leptons are treated together. In this approach, contributions from multijet and W +jets production, as well as the part of the tt background resulting from decays to leptons and hadrons (tt → νb qqb) are included. A control sample is defined in data by inverting the isolation requirements for one of the two leptons, while applying all other signal region requirements. The contributions from other background channels (dileptonic tt decays (tt → νb νb), Z → ee, Z → µµ, and diboson production) are obtained from the simulation and are subtracted. From this control sample a template is created. The normalisation factor is obtained by fitting the p T distribution of the subleading lepton at an early stage of the preselection.
For the τ lep τ had channel, the fake-factor method is used to derive estimates for the multijet, W +jets, Z+jets, and semileptonic tt background events that pass the τ lep τ had selection due to a misidentified τ had candidate. The fake factor is defined as the ratio of the number of jets identified as medium τ had candidates to the number satisfying the loose, but not the medium, criteria. Since the fake factor depends on the type of parton initiating the jet and on the p T of the jet, it is determined as a function of p T separately for samples enriched in quark-and gluon-initiated jets. In addition, the fake factor is found to be different for 1-track and 3-track candidates. Three different, quark-jet dominated samples are used separately for the W +jets, tt and Z+jets background components. They are defined by selecting the high-m T region (m T > 70 GeV), by inverting the b-jet veto and by requiring two leptons with an invariant mass consistent with m Z (80 GeV < m < 100 GeV) respectively. In addition, a multijet sample dominated by gluon-initiated jets is selected by relaxing the lepton identification and requiring it to satisfy the loose identification criteria. The derived fake factors are found to vary from 0.124 (0.082) for p T = 20 GeV to 0.088 (0.038) for p T = 150 GeV for 1-track (3-track) candidates in the VBF category. The corresponding values for the boosted category are 0.146 (0.084) for p T = 20 GeV and 0.057 (0.033) for p T = 150 GeV. To obtain the fake-background estimate for the VBF and boosted signal regions, these factors are then applied, weighted by the expected relative W +jets, Z+jets, multijet, and tt fractions, to the events in regions defined by applying the selections of the corresponding signal region, except that the τ had candidate is required to pass the loose and to fail the medium τ had identification. As an example, the good agreement between data and background estimates is shown in figure 4(a) for the reconstructed τ τ mass for events in the high-m T region, which is dominated by W +jets production.
For the τ had τ had channel, the multijet background is modelled using a template extracted from data that pass the VBF or boosted category selection, where, however, the taus fail the isolation and opposite-sign charge requirements (the number-of-tracks requirement is not enforced). The normalisation of the multijet background is first determined by performing a simultaneous fit of the multijet (modelled by the data sample just mentioned) and Z → τ τ (modelled by embedding) templates after the preselection cuts. The fit is performed for the distribution of the difference in pseudorapidity between the two hadronic tau candidates, ∆η(τ had , τ had ). The signal contribution is expected to be small in this category.
The agreement between data and the background estimate for this distribution is shown in figure 4(b) for the rest category defined in section 4. The preselection normalisation is used as a reference point and starting value for the global fit (see below) and is used for validation plots. The final normalisations of the two important background components, from multijet and Z → τ τ events, are extracted from the final global fit, as described in section 8, in which the ∆η(τ had , τ had ) distribution for the rest category is included.

Z → ee and Z → µµ background
The Drell-Yan Z/γ * → ee and Z/γ * → µµ background channels are important contributions to the final states with two same-flavour leptons. They also contribute to the other channels. As described below, a simulation based on Alpgen is used to estimate these background sources. Correction factors are applied to account for differences between data and simulation.
In the τ lep τ lep channel, the Alpgen simulation is normalised to the data in the Z-mass control region, 80 GeV < m < 100 GeV, for each category, and separately for Z → ee and Z → µµ events. The normalisation factors are determined from the final fit described in section 8. The distribution of the reconstructed τ τ mass for events in this control region is shown in figure 5 (a).
In the τ lep τ had channel, the Z → ee and Z → µµ background estimates are also based on simulation. The corrections applied for a τ had candidate depend on whether it originates from a lepton from the Z boson decay or from a jet. In the first case, corrections from data, derived from dedicated tag-and-probe studies, are applied to account for the difference in the rate of misidentified τ had candidates resulting from leptons [21,105]. This is particularly important for Z → ee events with a misidentified τ had candidate originating from a true electron. In the second case, the fake-factor method described in section 6.2 is applied.
In the τ had τ had channel, the contribution of this background is very small and is taken from simulation.

W +jets background
Events with W bosons and jets constitute a background to all channels since leptonic W decays can feed into all signatures when the true lepton is accompanied by a jet which is falsely identified as a τ had or a lepton candidate. This process can also contribute via semileptonic heavy quark decays that provide identified leptons.
As stated in section 6.2, for the τ lep τ lep and τ lep τ had channels, the W +jets contributions are determined with data-driven methods. For the τ had τ had channel, the W → τ had ν background is estimated from simulation. A correction is applied to account for differences in the τ had misidentification probability between data and simulation.

Background from top-quark production
Background contributions from tt and single top-quark production, where leptons or hadronically decaying taus appear in decays of top quarks, are estimated from simulation in the τ lep τ lep and τ lep τ had channels. The normalisation is obtained from data control regions defined by requiring a b-jet instead of a b-veto. In the τ lep τ had channel, a large value of the transverse mass m T is also required, to enhance the background from top-quark production and to suppress the signal contribution. This background is also found to be small for the τ had τ had channel and it is estimated using simulation. The distribution of ∆η(j 1 , j 2 ) for events in the top control region, for the τ lep τ had channel, is shown in figure 5 (b).

Diboson background
The production of pairs of vector bosons (W + W − , ZZ and W ± Z), with subsequent decays to leptons or jets, contributes especially to the background in the τ lep τ lep channel. For all analysis channels, these contributions are estimated from simulation, normalised to the NLO cross sections indicated in table 1.

Contributions from other Higgs boson decays
In the τ lep τ lep channel, a non-negligible contribution from H → W W → ν ν exists and this process is considered as background. Its contribution is estimated for m H = 125 GeV using simulation. The corresponding signal cross section is assumed to be the SM value and is indicated in table 1.

Validation of background estimates
As described above, the normalisation for important background sources that are modelled with simulation are determined by fitting to data in control regions. These normalisations are compared in table 7 to predictions based on the theoretical cross sections for the 8 TeV analysis. In most cases, the values obtained are compatible with unity within the statistical uncertainties shown. For the top control region in the VBF category of the τ lep τ had channel, the value is also in agreement with unity if the experimental and theoretical systematic uncertainties are included. The control-region normalisations are used for validation plots, and they are used as starting values in the final global fit described in section 8. The global fit does not change any of these normalisations by more than 2%.
It is important to verify that the BDT output distributions in data control regions are well described after the various background determinations. Figure 6

Systematic uncertainties
The numbers of expected signal and background events, the input variables to the BDT, and thereby the BDT output and the final discrimination between signal and background are affected by systematic uncertainties. They are discussed below, grouped into three categories: experimental uncertainties, background modelling uncertainties, and theoretical uncertainties. For all uncertainties, the effects on both the total signal and background yields and on the shape of the BDT output distribution are evaluated. Table 8 gives a summary of the systematic uncertainties and their impact on the number of expected events for the signal and the total background for the analysis of the data taken at √ s = 8 TeV. The dominant sources that affect the shape of the BDT output distribution are marked in the table. All uncertainties are treated either as fully correlated or uncorrelated across channels. The latter are also marked in table 8. The effects of the systematic uncertainties at √ s = 7 TeV are found to be similar and are not discussed here. The inclusion of the uncertainties in the profile likelihood global fit is described in section 8 and the effect of the most significant systematic uncertainties is presented in table 13.

Experimental uncertainties
The major experimental systematic uncertainties result from uncertainties on efficiencies for triggering, object reconstruction and identification, as well as from uncertainties on the energy scale and resolution of jets, hadronically decaying taus and leptons. In general, the effects resulting from lepton-related uncertainties are smaller than those from jets and taus. They are not discussed in detail, however, their impact is included in table 8. In addition, uncertainties on the luminosity affect the number of signal and background events from simulation.
• Luminosity: the uncertainty on the integrated luminosity is ±2.8% for the 8 TeV dataset and ±1.8% for the 7 TeV dataset. It is determined from a calibration of the luminosity scale derived from beam-separation scans performed in 2011 and 2012 using the method described in ref. [106].
• Efficiencies: the efficiencies for triggering, reconstructing and identifying electrons, muons, and τ had candidates are measured in data using tag-and-probe techniques. The uncertainties on the τ had identification efficiency are ±(2-3)% for 1-prong and ±(3-5)% for 3-prong tau decays [39]. The b-jet tagging efficiency has been measured from data using tt events, where both top quarks decay to leptons, with a total uncertainty of about ±2% for jets with transverse momenta up to 100 GeV [38, 107]. The MC samples used are corrected for differences in these efficiencies between data and simulation and the associated uncertainties are propagated through the analysis.
• Energy scales: the uncertainties on the jet energy scale (JES) arise from several sources. These include, among others, varied response due to the jet flavour composition (quark-versus gluon-initiated jets), pile-up, η intercalibration, and detector response and modelling of in situ jet calibration [35,36]. The impact of the JES uncertainty in this analysis is reduced because many of the background components are estimated using data. The tau energy scale is obtained by fitting the reconstructed visible mass for Z → τ τ events in data, which can be selected with a satisfactory purity. It is measured with a precision of ±(2-4)% [108]. Since systematic uncertainties on the energy scales of all objects affect the reconstructed missing transverse momentum, it is recalculated after each variation is applied. The scale uncertainty on E miss T due to the energy in calorimeter cells not associated with physics objects is also taken into account.
• Energy resolutions: systematic uncertainties on the energy resolution of taus, electrons, muons, jets, and E miss T affect the final discriminant. The effects resulting from uncertainties on the tau energy resolution are small. The impact of changes in the amount of material (inactive material in the detector, e.g. support structures), in the hadronic shower model and in the underlying-event tune were studied in the simulation. They result in systematic uncertainties below 1% on the tau energy resolution. The jet energy resolution is determined by in situ measurements, as described in ref. [109], and affects signal modelling and background components modelled by the simulation. The uncertainty of the resolution on E miss T is estimated by evaluating the energy resolution of each of the E miss T terms. The largest impact results from the soft term (see section 2), arising both from the MC modelling and the effects of pile-up. It is evaluated using simulated Z(→ µµ)+jets events.

Background modelling uncertainties
The most significant systematic uncertainties on the background estimation techniques, as described in section 6, are detailed in the following for the three decay modes considered.
In the τ lep τ lep channel, systematic uncertainties on the shape and normalisation of fakelepton background sources are estimated by comparing samples of same-sign lepton events that pass and fail the lepton isolation criteria. These uncertainties amount to ±33% (±20%) at 8 TeV and ±10.5% (±13%) at 7 TeV for the boosted (VBF) category. The extrapolation uncertainty for the Z → background is obtained by varying the m window that defines the control region for this background, and amounts to about ±6%. The corresponding extrapolation uncertainty for top-quark background sources is ±(3-6)%, obtained from the differences in event yields in the top-quark control regions when using different MC generators. Neither of these extrapolation uncertainties is significant for the final result. The dominant uncertainties on the normalisation of the tt background, obtained from the global fit, are the systematic uncertainties on the b-jet tagging efficiency and the jet energy scale.
In the τ lep τ had channel, an important systematic uncertainty on the background determination comes from the estimated fake background, for which several sources of systematic uncertainty are considered. The statistical uncertainty on the effective fake factor is ±4.3% (±2.3%) in the 8 TeV VBF (boosted) category, and about ±22% (±11%) in the 7 TeV VBF (boosted) category. The dominant systematic uncertainty on the methodology itself arises from the composition of the combined fake background (W +jets, Z+jets, multijet, and tt fractions), which is largely estimated based on simulated event samples as explained in section 6.2. The uncertainty is estimated by varying each fractional contribution by ±50%, which affects the effective fake factor by ±3% (±6%) and by ±10% (±15%) in the 8 TeV and 7 TeV boosted (VBF) categories respectively. As a closure test, the method was also applied in a region of data where the lepton and τ had candidate have the same charge, rich in fake τ had candidates. Very good agreement was observed between data and the method's prediction, so that no additional in situ uncertainty was deemed necessary. In addition, the uncertainties on the normalisation of the tt background are important. As in the case of the τ lep τ lep channel, the dominant contribution obtained from the global fit originates from systematic uncertainties on the b-jet tagging efficiency and the jet energy scale, along with statistical uncertainties on the observed data in the respective control regions.
In the τ had τ had channel, the major background from multijet production is determined using a data-driven template method. The default multijet template, derived from a sample in data where the τ had candidates fail the isolation and opposite-sign charge requirements, is compared with an alternative template derived from a sample where the τ had candidates fail just the opposite-sign charge requirement. The normalisation of the alternative template is fixed to that of the default template at preselection; the alternative multijet template is propagated into the various categories and gives a different set of yields from the default template. This difference, along with the difference in shape between the two templates, constitutes the systematic uncertainty on the background estimate. This leads to an overall multijet yield variation of 10 % (3 %) in the VBF (boosted) category at √ s = 8 TeV and of 10 % (30 %) in the VBF (boosted) category √ s = 7 TeV. However, there is a very strong shape dependence, such that the uncertainties on the BDT output are much larger at higher output values.
For the embedding method used in all channels, the major systematic uncertainties are related to the selection of Z → µµ events in data and to the subtraction of the muon energy deposits in the calorimeters. The selection uncertainties are estimated by varying the muon isolation criteria in the selection from the nominal value of I(p T , 0.2) < 0.2 (see section 4) to tighter (I(p T , 0.4) < 0.06 and I(E T , 0.2) < 0.04) and looser (no isolation requirements) values. The muon-related cell energies to be subtracted are varied within ±20% (±30%) for the 8 TeV (7 TeV) data. In addition, systematic uncertainties on the corrections for trigger and reconstruction efficiencies are taken into account. Due to the combination of singlelepton and dilepton triggers used, the uncertainties are largest for the τ lep τ lep channel. All experimental systematic uncertainties relating to the embedded τ decay products (such as tau energy scale or identification uncertainties) are applied normally. The combined effect of all uncertainties on the signal and background yields is included in table 8. Because the Z → τ τ normalisation is determined in the final fit, the impact on the final result is much smaller.

Theoretical uncertainties
Theoretical uncertainties are estimated for the signal and for all background contributions modelled with the simulation. Since the major background contributions, from Z → τ τ and misidentification of hadronically decaying τ leptons, are estimated using data-driven methods, they are not affected by these uncertainties. Uncertainties on the signal cross sections are assigned from missing higher-order corrections, from uncertainties in the PDFs, and from uncertainties in the modelling of the underlying event.
For VBF and VH Higgs boson production cross sections, the uncertainties due to missing higher-order QCD corrections are estimated by varying the factorisation and renormalisation scales by factors of two around the nominal scale m W , as prescribed by the LHC Higgs Cross Section Working Group [110]. The resulting uncertainties range from ±2% to ±4%, depending on the process and the category-specific selection considered. In addition, a 2% uncertainty related to the inclusion of the NLO EWK corrections (see section 3) is assigned.
For Higgs boson production via ggF, the uncertainties on the cross sections due to missing higher order QCD corrections are estimated by varying the renormalisation and factorisation scales around the central values µ R = µ F = m 2 H + p 2 T in the NLO cross section calculations of H + 1-jet and H + 2-jet production. In the calculation of the uncertainties, appropriate cuts on the Higgs p T (p H T > 100 GeV ) and on the jet kinematics (∆η, p T ) are applied at parton level for the boosted and VBF categories respectively. The resulting uncertainties on the ggF contributions are found to be about ±24% in the boosted category and ±23% in the VBF category. The ggF contribution is dominant in the boosted category, whereas it is only about 20% of the signal in the VBF category. Since the two categories are exclusive, their anti-correlation is taken into account following the prescription in ref. [111].
In the present analysis, no explicit veto on jets is applied in the VBF selection, but enough kinematical information is provided as input to the BDT so that the high BDToutput region corresponds to a more exclusive region, where the probability of finding a third jet is reduced. Since the cross section for gluon-fusion events produced with a third jet is only known at LO, this could introduce a large uncertainty on the gluon-fusion contamination in the highest (and most sensitive) BDT-output bins. The uncertainty on the BDT shape of the ggF contribution is evaluated using the Mcfm Monte Carlo program [98], which calculates H + 3 jets at LO. Scale variations induce changes of the ggF contribution in the highest BDT bin of about ±30%. They are taken into account in the final fit.
Uncertainties related to the simulation of the underlying event and parton shower are estimated by comparing the acceptance from Powheg+Pythia to Powheg+Herwig for both VBF and ggF Higgs boson production modes. Differences in the signal yields range from ±1% to ±8% for the VBF and from ±1% to ±9% for ggF production, depending on the channel and category. The BDT-score distribution of the Powheg+Pythia and Powheg+Herwig samples are compatible with each other within statistical uncertainties.
The PDF uncertainties are estimated by studying the change in the acceptance when using different PDF sets or varying the CT10 PDF set within its uncertainties. The standard VBF Powheg sample and a MC@NLO [112] ggF sample, both generated with the CT10 PDFs, are reweighted to the MSTW2008NLO [113], NNPDF [114] and the CT10 eigen-tunes parameterisation. The largest variation in acceptance for each category is used as a constant PDF uncertainty; it varies between approximately ±4.5% and ±6% for ggF production and between about ±0.8% and ±1.0% for VBF production. A shape uncertainty is also included to cover any difference between the BDT score in the default sample and the reweighted ones. The uncertainty on the total cross section for the VBF, VH and ggF production modes due to the PDFs is also considered.
Variations in the acceptance for different Monte Carlo generators are also included, comparing Powheg+Herwig samples to MC@NLO+Herwig samples for ggF, and to aMC@NLO+Herwig [115] samples for VBF. The generator modelling uncertainty is around ±2% for ggF and ±4% for VBF productions modes.
The theoretical systematic uncertainties on the background predictions taken from the simulation are evaluated by applying the same procedures as used for the signal samples. Uncertainties resulting from the choice of QCD scales, PDF parameterisation and underlying-event model are estimated. The results are reported in table 8. Table 8. Impact of systematic uncertainties on the total signal, S, (sum of all production modes) and on the sum of all background estimates, B, for each of the three channels and the two signal categories for the analysis of the data taken at √ s = 8 TeV. Each systematic uncertainty is assumed to be correlated across the analysis channels, except those marked with a *. Uncertainties that affect the shape of the BDT-output distribution in a non-negligible way are marked with a †. All values are given before the global fit.

Background Model
Modelling of fake backgrounds* † -

Signal extraction procedure
The BDT output in the six analysis categories provides the final discrimination between signal and background for both the 7 and 8 TeV datasets. A maximum-likelihood fit is performed on all categories simultaneously to extract the signal strength, µ, defined as the ratio of the measured signal yield to the Standard Model expectation. The value µ = 0 (µ = 1) corresponds to the absence (presence) of a Higgs boson signal with the SM production cross section. The statistical analysis of the data employs a binned likelihood function L(µ, θ), constructed as the product of Poisson probability terms, to estimate µ.
The impact of systematic uncertainties on the signal and background expectations is described by nuisance parameters, θ, which are each parameterised by a Gaussian or lognormal constraint. The expected numbers of signal and background events in each bin are functions of θ. The test statistic q µ is then constructed according to the profile likelihood ratio: q µ = −2 ln[L(µ,ˆ θ)/L(μ,ˆ θ)], whereμ andˆ θ are the parameters that maximise the likelihood, andˆ θ are the nuisance parameter values that maximise the likelihood for a given µ. This test statistic is used to measure the compatibility of the background-only hypothesis with the observed data.
The likelihood is maximised on the BDT distributions in the signal regions, with information from control regions included to constrain background normalisations. The fit includes the event yields from the Z → and top control regions in the τ lep τ lep channel, and from the top control region of the τ lep τ had channel; furthermore the ∆η(τ had , τ had ) distribution in the rest control region of the τ had τ had channel is also included.
The Z → τ τ background is constrained primarily in the signal regions, due to the difference between the BDT distributions for Z → τ τ events and the signal. For the τ had τ had channel, the Z → τ τ and multijet background rates are constrained by the simultaneous fit of the two signal regions and the ∆η(τ had τ had ) distribution in the rest category control region. The top and Z → background components for the τ lep τ lep and τ lep τ had channels are also allowed to float freely, but are primarily constrained by the inclusion of the respective control regions.
As described in section 7, a large number of systematic uncertainties, taken into account via nuisance parameters, affect the final results. It is important to investigate the behaviour of the global fit and in particular to investigate how far the nuisance parameters are pulled away from their nominal values and how well their uncertainties are constrained. Furthermore, it is important to understand which systematic uncertainties have the most impact on the final result. For this purpose a ranking of nuisance parameters is introduced. For each parameter, the fit is performed again with the parameter fixed to its fitted value shifted up or down by its fitted uncertainty, with all the other parameters allowed to vary. The ranking obtained for those nuisance parameters contributing most to the uncertainty on the signal strength is shown in figure 7 for the combined fit of the three channels at the two centre-of-mass energies. The parameters contributing most are those related to the jet energy scale, the normalisation uncertainties for Z → τ τ and top-quark events, and the tau energy scale. The uncertainties on the jet energy scale are decomposed into several uncorrelated components (among others: η intercalibration of different calorimeter regions, jet energy response, and response to jets of different flavour). In addition, theoretical uncertainties on the branching ratio BR (H → τ τ ) are found to have a significant impact. In general, good agreement is found between the pre-fit and post-fit values for these nuisance parameters and neither large pulls nor large constraints are observed.
The distributions of the BDT discriminants for all channels and categories for the data at 8 TeV are shown in figure 8, with background normalisations, signal normalisation, and nuisance parameters adjusted by the profile likelihood global fit.
The results for the numbers of fitted signal and background events, split into the various contributions, are summarised in tables 9, 10 and 11 for the three channels separately, for the dataset collected at 8 TeV centre-of-mass energy. In addition to the total number of events, the expected number of events in each of the two highest BDT output bins is given. The number of events observed in the data is also included. Within the uncertainties, good agreement is observed between the data and the model predictions for the sum of background components and a Standard Model Higgs boson with m H = 125 GeV.    3.5 ± 0.

Results
As explained in the previous section, the observed signal strength is determined from a global maximum likelihood fit to the BDT output distributions in data, with nuisance parameters that are either free or constrained. The results are extracted for each channel and for each category individually as well as for combinations of categories and for the overall combination. At the value of the Higgs boson mass obtained from the combination of the ATLAS H → γγ and H → ZZ * measurements [116], m H = 125.36 GeV, the signal strength obtained from the combined H → τ τ analysis is: The systematic uncertainties are split into two groups: systematic uncertainties (syst.) including all experimental effects as well as theoretical uncertainties on the signal region acceptance, such as those due to the QCD scales, the PDF choice, and the underlying event and parton shower; and, separately, theoretical uncertainties on the inclusive Higgs boson production cross section and H → τ τ branching ratio (theory syst.). The results for each individual channel and for each category as well as for their combination are shown in figure 9. They are based on the full dataset, however, separate combined results are given for the two centre-of-mass energies. The probability p 0 of obtaining a result at least as signal-like as observed in the data if no signal were present is calculated using the test statistic q µ=0 = −2 ln(L(0,ˆ θ)/L(μ,ˆ θ)) in the asymptotic approximation [117]. For m H = 125.36 GeV, the observed p 0 value is 2.7 × 10 −6 , which corresponds to a deviation from the background-only hypothesis of 4.5σ. This can be compared to an expected significance of 3.4σ. This provides evidence at the level of 4.5σ for the decay of the Higgs boson into tau leptons. Table 12 shows the expected and observed significances for the signal strength measured in each channel separately. Figure 10 shows the expected and observed number of events, in bins of log 10 (S/B), for all signal region bins. Here, S/B is the signal-to-background ratio calculated assuming µ = 1.4 for each BDT bin in the signal regions. The expected signal yield for both µ = 1 and the best-fit value µ = 1.4 for m H = 125 GeV is shown on top of the background prediction from the best-fit values. The background expectation where the signal-strength parameter is fixed to µ = 0 is also shown for comparison.
To   Table 13. Important sources of uncertainty on the measured signal-strength parameter µ. The contributions are given as absolute uncertainties on the best-fit value of µ = 1.43. Various subcomponents are combined assuming no correlations.
As discussed in section 8, the dominant uncertainties on the measurement of the signalstrength parameters include statistical uncertainties on the data from the signal regions, uncertainties on the jet and tau energy scales, uncertainties on the normalisation of the Z → τ τ and tt background components as well as theoretical uncertainties. The contributions of each of these significant sources to the uncertainty of the measured signal strength are summarised in table 13.
The normalisation uncertainties on the Z → τ τ embedded sample are correlated across the categories in each respective channel. The global fit also constrains the normalisation for Z → τ τ more strongly than for the Z → and top-quark background components, as  Figure 9.
The best-fit value for the signal strength µ in the individual channels and their combination for the full ATLAS datasets at √ s = 7 TeV and √ s = 8 TeV. The total ±1σ uncertainty is indicated by the shaded green band, with the individual contributions from the statistical uncertainty (top, black), the experimental systematic uncertainty (middle, blue), and the theory uncertainty (bottom, red) on the signal cross section (from QCD scale, PDF, and branching ratios) shown by the error bars and printed in the central column.
the low BDT-score region is dominated by Z → τ τ events.
The measurement of the overall signal strength discussed above does not give direct information on the relative contributions of the different production mechanisms. Therefore, the signal strengths of different production processes contributing to the H → τ τ decay mode are determined, exploiting the sensitivity offered by the use of the event categories in the analyses of the three channels. The data are fitted separating the vector-bosonmediated VBF and V H processes from gluon-mediated ggF processes. Two signal strength parameters, µ τ τ ggF and µ τ τ VBF+VH , which scale the SM-predicted rates to those observed, are introduced. The two-dimensional 68% and 95% confidence level (CL) contours in the plane of µ τ τ ggF and µ τ τ VBF+VH [118] are shown in figure 12 for . The observed (expected) significances of the µ τ τ ggF and µ τ τ VBF+VH signal strengths are 1.74σ (0.95σ) and 2.25σ (1.72σ) respectively. A total cross section times branching ratio for H → τ τ with m H = 125 GeV can also be measured. The central value is obtained from the product of the measured µ and the predicted cross section used to define it. The uncertainties are similarly obtained by scaling the uncertainties on µ by the predicted cross section, noting that theoretical uncertainties on the inclusive cross section cancel between µ and the predicted cross section and thus are not included for the production processes under consideration. These include the uncertainties on the inclusive cross section due to the QCD scale and the PDF choice as   well as the uncertainty on the branching ratio H → τ τ ; however, theoretical uncertainties on the acceptance of the signal regions from the QCD scale and PDF choice are retained, along with the uncertainties due to underlying event and parton shower, and the electroweak correction on VBF production. Table 14 gives the measured values for the total cross section at 7 and at 8 TeV, as well as the measured values at 8 TeV for gluon fusion production and for VBF and V H production separately.  Figure 12. Likelihood contours for the combination of all channels in the (µ τ τ ggF , µ τ τ VBF+VH ) plane. The signal strength µ is the ratio of the measured signal yield to the Standard Model expectation, for each production mode. The 68% and 95% CL contours are shown as dashed and solid lines respectively, for m H = 125.36 GeV. The SM expectation is shown by a filled plus symbol, and the best fit to the data is shown as a star.

Cut-based analysis
The search for the SM Higgs boson presented above is cross-checked for the dataset collected at √ s = 8 TeV in an analysis where cuts on kinematic variables are applied. This search uses improved definitions of event categories and an improved fit model with respect to results previously published for the √ s = 7 TeV dataset [21]. To allow a straightforward comparison of results, the multivariate and cut-based analyses have common components. The two analyses are performed for the same three channels, τ lep τ lep , τ lep τ had and τ had τ had , they use the same preselection and share the same strategy for the estimation of background contributions and systematic uncertainties. As in the multivariate analysis, the irreducible Z → τ τ background is estimated using the embedding procedure and the reducible ones are estimated using similar data-driven methods, as described in section 6. Finally, the same statistical methods are used to extract the results, although these are applied to different discriminating variables. While the multivariate analysis performs a fit to the BDT output distribution, the cut-based analysis relies on a fit to the τ τ invariant mass distribution. The τ τ invariant mass is calculated using the missing mass calculator, as described in section 4.3. The analysis is not designed to be sensitive to a specific value of the Higgs boson mass m H . The use of the mass as the discriminating variable is motivated not only by its power to separate the irreducible Z → τ τ background from signal, but also by its sensitivity to the mass of the signal itself.
In the cut-based analysis, a categorisation is performed similar to that in the multivariate analysis, i.e. VBF and boosted categories are defined. To increase the separation power, subcategories are introduced for the τ lep τ had and τ had τ had channels. These subcategories target events produced via the same production mode, but select different phase-space regions with different signal-to-background ratios. With this strategy the most sensitive subcategories have a small number of events, but a high signal-to-background ratio. Although the combined sensitivity is dominated by the few highly sensitive subcategories, the others are important not just to increase the sensitivity but also to constrain the various background components.
An overview of the defined categories in the three channels is given in table 15. In all channels, the event categorisation is designed by splitting events first according to the production mode, either VBF-like or boosted ggF-like, and second, for the τ lep τ had and τ had τ had channels, by signal-to-background ratio. The events accepted in the VBF categories pass a common selection that requires the presence of the two forward jets distinctive of VBF production. In the τ lep τ had channel, tight and loose VBF subcategories are defined, via cuts on the mass of the dijet system, m jj , and p H T , the transverse momentum of the Higgs boson candidate. In the τ had τ had channel, the variables used to select the most sensitive categories for both production modes are p H T and the separation ∆R(τ 1 , τ 2 ) between the two τ had candidates. In the VBF-like events, correlations between the invariant mass of the selected jets m jj and ∆η jj of the jets characteristic of VBF production are also used. The subcategory with the highest purity is the VBF high-p H T subcategory, where tight cuts on p H T and ∆R(τ 1 , τ 2 ) reject almost all non-resonant background sources. The other two VBF-like subcategories are distinguished by a different signal-to-background ratio due to a tighter selection applied to the forward jets. For the τ had τ had channel, boosted subcategories are also defined. The division is based on the same cuts on p H T and ∆R(τ 1 , τ 2 ) as used in the VBF high-p H T category. Events with low transverse momentum are not used in any category because in such events the signal cannot be effectively distinguished from background channels. The proportion of the signal yield produced via VBF in the VBFlike subcategories is found to be 80% in the τ lep τ lep channel, between 67% and 85% in the τ lep τ had channel and between 58% and 78% in the τ had τ had channel.
The final results are derived from the combined fit of the m τ τ distributions observed in the various subcategories. The number of fitted signal and background events in each channel and category is given in table 16. The combined mass distribution for the three channels is shown in figure 13, where events are weighted by ln(1 + S/B), based on the signal and background content of their channel and category. An excess of events above the expected SM background is visible in the mass region around 125 GeV. The signal strengths extracted in the three analysis channels and their combination are given in  8 TeV are also included in table 17. Good agreement between the results of the two analyses is found for the individual channels as well as for their combination. To further quantify the level of agreement, the correlation ρ and the uncertainties on the differences between the µ values obtained, i.e. ∆µ ± δ(∆µ), were evaluated using the socalled jackknife technique [119,120]. Using this method, the correlation between the µ values obtained in the two analyses is found to be between 0.55 and 0.75 for each of the three analysis channels. The results of the analyses are found to be fully compatible, with deviations ∆µ/δ(∆µ) below 1 for all analysis channels as well as for the combined result.
The probability p 0 of obtaining a result at least as signal-like as observed if no signal were present is shown as a function of the mass in figure 14 for the cut-based analysis for the combined dataset at √ s = 8 TeV. The observed p 0 values show a shallow minimum around 125 GeV, corresponding to a significance of 3.2σ. The expected significance for the cut-based analysis is superimposed on the figure and reaches a significance of 2.5σ at m H = 125.36 GeV. The corresponding significance values for the multivariate analysis for the dataset at √ s = 8 TeV are found to be 4.5σ (observed) and 3.3σ (expected). They are also indicated in the figure.
Given the mass sensitivity of the cut-based analysis, a two-dimensional likelihood fit for the signal strength µ and the mass m H is performed. The mass points are tested in steps of 5 GeV in the range between 100 GeV and 150 GeV. The best fit value is found at µ = 1.  At least two jets with p j1 T > 40 GeV and p j2 T > 30 GeV |∆η j1,j2 | > 3.0 m j1,j2 > 400 GeV b-jet veto for jets with p T > 25 GeV Jet veto: no additional jet with p T > 25 GeV within |η| < 2.4 τ lep τ had At least two jets with p j1 T > 40 GeV and p j2 T > 30 GeV E miss T > 20 GeV |∆η j1,j2 | > 3.0 and η(j 1 ) · η(j 2 ) < 0, m j1,j2 > 300 GeV p Total T = | p T + p τ had T + p j1 T + p j2 T + E miss T | < 30 GeV b-jet veto for jets with p T > 30 GeV min(η (j1) , η (j2) ) < η ( ) , η (τ had ) < max(η (j1) , η (j2) ) VBF tight VBF loose m j1,j2 > 500 GeV Non tight VBF p H T > 100 GeV p τ had T > 30 GeV m vis > 40 GeV τ had τ had At least two jets with p j1 T > 50 GeV and p j2 T > 30 GeV |∆η(τ 1 , τ 2 )| < 1.5 |∆η j1,j2 | > 2.6 and m j1,j2 > 250 GeV min(η (j1) , η (j2) ) < η (τ1) , η (τ2) < max(η (j1) , η (j2) ) VBF high p H T VBF low p H T , tight VBF low p H T , loose ∆R(τ 1 , τ 2 ) < 1.5 and ∆R(τ 1 , τ 2 ) > 1.5 or ∆R(τ 1 , τ 2 ) > 1.5 or  Table 15. Summary of the selection criteria used to define the VBF and boosted subcategories in the cut-based analysis for the three analysis channels. The labels (1) and (2) Table 17. Fitted values of the signal strength for the different channels at √ s = 8 TeV for the multivariate and cut-based analyses, measured at m H =125.36 GeV. The results for the combinations of all channels are also given. The total uncertainties (statistical and systematic) are quoted.  Figure 15. The results of the two-dimensional likelihood fit in the (m H , µ) plane for the cut-based analysis for the data taken at √ s = 8 TeV. The signal strength µ is the ratio of the measured signal yield to the Standard Model expectation. The 68% and 95% CL contours are shown as dashed and solid red lines respectively. The best-fit value is indicated as a red cross. The dashed and solid blue lines correspond to the expected 68% and 95% CL contours for m H = 125.36 GeV and µ = 1.43.

Conclusions
Evidence for decays of the recently discovered Higgs boson into pairs of tau leptons is presented. The analysis is based on the full set of proton-proton collision data recorded by the ATLAS experiment at the LHC during Run 1. The data correspond to integrated luminosities of 4.5 fb −1 and 20.3 fb −1 at centre-of-mass energies of √ s = 7 TeV and √ s = 8 TeV respectively. All combinations of leptonic and hadronic tau decay channels are included and event categories selecting both the vector boson fusion and highly boosted τ τ signatures are considered in a multivariate analysis. An excess of events over the expected background from other Standard Model processes is found with an observed (expected) significance