Measurements of Higgs boson production in the decay channel with a pair of τ leptons in proton–proton collisions at √ s = 13 TeV

MeasurementsofHiggsbosonproduction,where the Higgs boson decays into a pair of τ leptons, are presented, using a sample of proton-proton collisions collected with the CMS experiment at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 138 fb − 1 . Three analyses are presented. Two are targeting Higgs boson production via gluon fusion and vector boson fusion: a neural network based analysis and an analysis based on an event categorization optimized on the ratio of signal over background events. These are complemented by an analysis targeting vector boson associated Higgs boson production. Results are presented in the form of signal strengths relative to the standard model predictions and products of cross sections and branching fraction to τ leptons, in up to 16 different kinematic regions. For the simultaneous measurements of the neural network based analysis and the analysis targeting vector boson associated Higgs boson production signal strengths are found to be 0 . 82 ± 0 . 11 for inclusive Higgs boson production, 0 . 67 ± 0 . 19 (0 . 81 ± 0 . 17) for the production mainly via gluon fusion (vector boson fusion), and 1 . 79 ± 0 . 45 for vector boson associated Higgs boson production.


Introduction
In the standard model (SM) of particle physics [1][2][3], the masses of the W and Z bosons are obtained through their interaction with a fundamental field that enters the theory via the Brout-Englert-Higgs mechanism [4][5][6][7][8][9], in a process known as electroweak symmetry breaking. The Higgs boson (H) is the quantized manifestation of this field. A particle compatible with H was observed at the CERN LHC by the ATLAS and CMS experiments in the γγ, ZZ, and WW final states using data collected in 2011-2012 at center-of-mass energies of √ s = 7 and 8 TeV [10][11][12]. The properties of the new particle, including its couplings, spin, and CP eigenstate are so far consistent with those expected for a Higgs boson with a mass of 125. 35 ± 0.14 GeV [13] as predicted by the SM [14][15][16][17][18][19][20][21][22][23][24][25].
In the SM, the mass generation of fermions is introduced in the form of Yukawa couplings to the Brout-Englert-Higgs field. Extensions of the SM, like supersymmetry [26,27], predict deviations of the H couplings, particularly to down-type fermions, such as the τ lepton or b quark, which further increases interest in the H → ττ decay [28,29].
The H → ττ decay, which in the SM and its extensions offers a much larger branching fraction than H → µµ and reduced background compared to H → bb, is the most promising channel to study H decays to fermions. Accordingly, the first evidence for the H coupling to (down-type) fermions was found in the H → ττ decay channel using data collected at √ s = 7 and 8 TeV [30][31][32]. A combination of the measurements performed by the ATLAS and CMS experiments at the same √ s led to the first measurement of the H coupling to the τ lepton with a statistical significance of more than 5 standard deviations (s.d.) [15]. The first observation of H → ττ decays with a single experiment was achieved by the CMS experiment adding data collected at 13 TeV in 2016 to the data collected at 7 and 8 TeV [33].
In recent years, H production rates have been investigated in the framework of the simplified template cross section (STXS) scheme, introduced by the LHC Higgs Working Group [34]. This scheme defines a set of kinematic and topological phase space regions, referred to as STXS bins, for differential measurements. The STXS scheme supports the investigation of each production mode individually. It facilitates the combination of measurements across different H decay channels and across experiments. The STXS bins have been chosen to reduce the dependence on any underlying theoretical models embodied in the measurements. They have been defined in stages, corresponding to the anticipated statistical power of the data required to perform the measurements. In STXS stage-0, which was used in the analyses of the LHC Run-1 data, events have been assigned to basic categories according to their main H production mechanisms: Subsequent refinements to the STXS scheme resulted in division of the basic categories into finer bins, called stage-1.1 [35], and a redefinition of some of the stage-0 categories: Gluon fusion and gluon-initiated gg → Z(qq)H production with hadronic Z boson decays, which can be viewed as part of inclusive H production via gluon fusion at higher order in perturbation theory, are defined as a combined category, referred to as ggH. The VBF and quark-initiated qq → V(qq)H production with hadronic V decays are defined as a combined category, referred to as qqH. The VH category in turn refers to leptonic V decays with at least one charged lepton. One further update resulted in the STXS stage-1.2 scheme, which forms the basis for this paper. With respect to the STXS stage-1.1 this update introduces a finer split of the ggH bin with the H transverse momentum (p H T ) larger than 200 GeV, which is the only difference of relevance for this paper.
The first STXS measurements combining the H → γγ, ZZ, WW, b b, ττ, and µµ decay modes have been performed by the ATLAS [17] and CMS [16] Collaborations. First STXS measurements in the H → ττ decay channel alone have been performed by the ATLAS Collaboration [36,37]. In this paper, we report three STXS analyses in the H → ττ decay channel performed by the CMS Collaboration. All analyses are based on the LHC data sets of the years 2016-2018, corresponding to an integrated luminosity of 138 fb −1 . The events are analyzed in the eµ, eτ h , µτ h , and τ h τ h final states based on the number of electrons, muons, and τ h candidates in the event, where τ h refers to a hadronic τ lepton decay. Results are reported in the form of signal strengths relative to the SM predictions and products of cross sections and branching fraction for the decay into τ leptons. They are provided in the redefined STXS stage-0 and -1.2 scheme. Some of the STXS stage-1.2 bins, to which the analyses are not yet sensitive, have been combined into supersets.
Two analyses target the simultaneous measurement of the ggH and qqH processes. One, referred to as the "cut-based" (CB) analysis, exploits an event categorization optimized on the ratio of signal over background events, together with one-(1D) and two-dimensional (2D) discriminants related to the properties of the ττ and jet final state to distinguish between signal and backgrounds. The other, referred to as the neural network (NN) analysis, exploits NN event multiclassification, with the output of the NNs as the only discriminating observables. The NN-analysis is found to be more sensitive with a 30% stronger constraint on the signal strength for the STXS stage-0 qqH bin and 30% (40%) stronger constraints, on average, on the STXS stage-1.2 ggH (qqH) bins, relative to the CB-analysis. In addition, a third analysis, referred to as the VH-analysis, targets VH production. The inclusive, STXS stage-0, and -1.2 signal strengths and products of the cross sections and branching fraction for the decay into τ leptons obtained from the combination of the NN-and VH-analyses are considered the main results of the paper. A description of the CB-analysis is given as it facilitates comparisons of the selections documented in the paper with alternative simulations or theory models. It also serves as reference for a previous publication of differential cross section measurements in the H → ττ decay channel [38], with which it shares the same object selections, strategy for signal extraction, and methods for the estimation of data-driven backgrounds, and as a detailed verification for the statistical methods exploited by the NN-analysis.
The remainder of this paper is organized as follows. In Sections 2 and 3 the CMS detector and the reconstruction of the objects and event variables used in the analyses are introduced. The event selections commonly used in all three analyses are described in Section 4. Modelings of signals and backgrounds are described in Section 5. The STXS scheme used to classify events is introduced in Section 6. In Sections 7, 8, and 9, the selection steps and event classifications specific to the individual analyses are described in more detail. Systematic uncertainties are discussed in Section 10. The results of the analyses are discussed in Section 11. The paper is summarized in Section 12.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
Events of interest are selected using a two-tiered trigger system. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of about 4 µs [39]. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage [40].
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [41].

Event reconstruction
The reconstruction of the proton-proton (pp) collision products is based on the particle-flow (PF) algorithm [42], which combines the information from all CMS subdetectors to reconstruct a set of particle candidates (PF candidates), identified as charged and neutral hadrons, electrons, photons, and muons. In the 2016 (2017-2018) data sets the average number of interactions per bunch crossing was 23 (32). The fully recorded detector data of a bunch crossing defines an event for further processing. The primary vertex (PV) is taken to be the vertex corresponding to the hardest scattering in the event, evaluated using tracking information alone, as described in Ref. [43]. Secondary vertices, which are detached from the PV, might be associated with decays of long-lived particles emerging from the PV. Any other collision vertices in the event are associated with additional mostly soft inelastic pp collisions called pileup (PU). Electron candidates are reconstructed by combining clusters of energy deposits in the ECAL with hits in the tracker [44,45]. To increase their purity, reconstructed electrons are required to pass a multivariate electron identification discriminant, which combines information on track quality, shower shape, and kinematic quantities. For the analyses presented here, a working point with an identification efficiency of 90% is used, with a misidentification rate of ≈1% from jets, in the kinematic region of interest. Muons in the event are reconstructed by performing a simultaneous track fit to hits in the tracker and in the muon chambers [46,47]. The presence of hits in the muon chambers already leads to a strong suppression of particles misidentified as muons. Additional identification requirements on the track fit quality and the compatibility of individual track segments with the fitted track further reduce the misidentification rate. For the analyses presented here, muon identification requirements with an efficiency of ≈99% are chosen, with a misidentification rate below 0.2% for pions.
The contributions from backgrounds to the electron (muon) selection are further reduced by requiring the corresponding lepton to be isolated from any hadronic activity in the detector. This property is quantified by an isolation variable where p e(µ) T corresponds to the electron (muon) transverse momentum p T and ∑ p charged T , ∑ E neutral T , and ∑ E γ T to the p T (transverse energy E T ) sum of all charged particles, neutral hadrons, and photons, in a predefined cone of radius ∆R = (∆η) 2 + (∆φ) 2 around the lepton direction at the PV, where ∆η and ∆φ (measured in radians) correspond to the angular distances of the particle to the lepton in the η and azimuthal φ directions. The chosen cone size is ∆R = 0.3 (0.4) for electrons (muons). The lepton itself is excluded from the calculation. To mitigate the contamination from PU, only those charged particles whose tracks are associated with the PV are taken into account. Since for neutral hadrons and photons an unambiguous association with the PV or PU is not possible, an estimate of the contribution from PU (p PU T ) is subtracted from the sum of ∑ E neutral T and ∑ E γ T . This estimate is obtained from tracks not associated with the PV in the case of I µ rel and from the mean energy flow per area unit in the case of I e rel . For negative values the result of this difference is set to zero. The isolation criteria given in Section 4 have an efficiency for isolated electrons and muons from τ-decays well above 95%.
For further characterization of the event, all reconstructed PF candidates are clustered into jets using the anti-k T jet clustering algorithm as implemented in the FASTJET software package [48,49] with a distance parameter of 0.4. Jets resulting from the hadronization of b quarks are used to separate signal from top anti-top quark pair (tt) events. These are identified either exploiting the DEEPCSV (CB-and VH-analyses) or the DEEPJET (NN-analysis) algorithm, as described in Refs. [50,51]. The working points chosen for the DEEPCSV algorithm correspond to b jet identification efficiencies of 70 and 80% for a misidentification rate for jets originating from light quarks and gluons of 1 and 11%, respectively. For the DEEPJET algorithm a working point is chosen with an identification efficiency for b jets of 80% for a misidentification rate for jets originating from light quarks and gluons of 1% [52]. Jets with p T > 30 GeV and |η| < 4.7 and b jets with p T > 20 GeV and |η| < 2.4, (2.5 for 2017 and afterwards) are used. In 2017, the ECAL endcaps were subject to an increased noise level affecting the reconstruction of jets. Jets with p T < 50 GeV in the corresponding range of 2.65 < |η| < 3.14 have been excluded from the analysis resulting in an expected efficiency loss of ≈4% for H → ττ events provided via VBF.
Jets are also used as seeds for the reconstruction of τ h candidates. This is done by exploiting the substructure of the jets, using the "hadrons-plus-strips" algorithm [53,54]. Decays into one or three charged hadrons with up to two neutral pions with p T > 2.5 GeV are used. Neutral pions are reconstructed as strips with dynamic size from reconstructed electrons and photons contained in the seeding jet, where the strip size varies as a function of the p T of the electron or photon candidates. The τ h decay mode is then obtained by combining the charged hadrons with the strips. To distinguish τ h candidates from jets originating from the hadronization of quarks or gluons, and from electrons, or muons, the DEEPTAU (DT) algorithm [54] is used. This algorithm exploits the information of the reconstructed event record, comprising tracking, impact parameter, and ECAL and HCAL cluster information; the kinematic and object identification properties of the PF candidates in the vicinity of the τ h candidate and the τ h candidate itself; and several global characterizing quantities of the event. It results in a multiclassification output y DT α (α = τ, e, µ, jet) equivalent to the Bayesian probability that the τ h candidate originated from a τ lepton, an electron, muon, or the hadronization of a quark or gluon. From this output three discriminants are built according to For the analyses presented here, predefined working points of D e , D µ , and D jet are chosen [54]. The exact choice of working point depends on the analysis and ττ final state, and is given in Table 1. For D e , the efficiencies vary from 54 (Tight) to 71% (VVVLoose) for misidentification Table 1: Selected working points for each of the analyses presented in this paper. For each analysis, the working point of D jet is the same for all final states. The working points for D e and D µ vary by final state. The labels WH and ZH stand for the individual production modes addressed by the VH-analysis. More details on the efficiency and misidentification rates of the individual working points are given in the text and in Ref. [54]. (VTight) to 49% (Medium) for misidentification rates from 0.14-0.43%. The misidentification rate of D jet strongly depends on the p T and quark flavor of the misidentified jet. The estimated value of this rate should therefore be viewed as approximate.
The missing transverse momentum vector p miss T is also used in event characterization. For the CB-and VH-analyses, it is calculated from the negative vector p T sum of all PF candidates [55]. For the NN-analysis the pileup-per-particle identification algorithm [56] is applied to reduce the dependence of p miss T on PU. This means that p miss T is computed from the PF candidates weighted by their probability to originate from the PV [57]. The p miss T is used for the discrimination of W boson production in association with jets (W+jets) from signal by exploiting the transverse mass where stands for an electron or a muon, but might also refer to the vector sum of p e T and p µ T , and ∆φ stands for the angular difference of p T and p miss T . The p miss T is also used for a likelihood-based estimate of the invariant mass m τ τ of the ττ system before the decays of the τ leptons [58]. This estimate combines the measurement of p miss T and its covariance matrix with the measurements of the visible ττ decay products, utilizing the matrix elements for unpolarized τ decays [59] for the decay into leptons and the two-body phase space [60] for the decay into hadrons. On average the resolution of this estimate amounts to 10% in the τ h τ h , 15% in the eτ h and µτ h , and 20% in the eµ final states, related to the number of neutrinos that escape detection.

Event selection
Common to all analyses presented in this paper is the selection of a τ pair. Depending on the final state, the online selection in the HLT step is based on the presence of an eµ pair, a single electron or muon, an eτ h or µτ h pair, or a τ h τ h pair in the event [61][62][63]. In the eτ h and µτ h final states, the presence of a lepton pair in the HLT step allows lower p T thresholds on the light lepton candidate. The efficiency of the online selection is generally above 90% without strong kinematic dependencies of the leptons selected for the offline analysis, as checked from independent monitor trigger setups. In the offline selection, further requirements on p T , η, and Table 2: Offline selection requirements applied to the electron, muon, and τ h candidates used for the selection of the τ pair. The expressions first and second lepton refer to the label of the final state in the first column. The p T requirements are given in GeV. For the eµ final state two lepton pair trigger paths, with a stronger requirement on the p T of electron (muon), are used for the online selection of the event. For the eτ h and µτ h final states, the values (in parentheses) correspond to the lepton pair (single lepton) trigger paths that have been used in the online selection. A detailed discussion is given in the text.
rel , are applied in addition to the object identification requirements described in Section 3 and summarized in Table 2.
In the eµ final state, an electron and a muon with p T > 15 GeV and |η| < 2.4 are required.
Depending on the trigger path that has led to the online selection of an event, a stricter requirement of p T > 24 GeV is imposed on one of the two leptons to ensure a sufficiently high efficiency of the HLT selection. Both leptons are required to be isolated from any hadronic activity in the detector according to I e(µ) rel < 0.15 (0.20). In the eτ h (µτ h ) final state, an electron (muon) with p T > 25 (20) GeV is required, if an event was selected by a trigger based on the presence of the eτ h (µτ h ) pair in the event. From 2017 on the threshold on the muon is raised to 21 GeV. If the event was selected only by a singleelectron trigger, the p T requirement on the electron is increased to 26, 28, or 33 GeV for the years 2016, 2017, or 2018, respectively. For muons, the p T requirement is increased to 23 (25) GeV for 2016 (2017-2018), if selected only by a single-muon trigger. The electron (muon) is required to be contained in the central detector with |η| < 2.1, and to be isolated according to I e(µ) rel < 0.15. The τ h candidate is required to have |η| < 2.3 and p T > 35 (32) GeV if selected by an eτ h (µτ h ) pair trigger, or p T > 30 GeV if selected by a single-electron (single-muon) trigger. In the τ h τ h final state, both τ h candidates are required to have |η| < 2.1 and p T > 40 GeV. The working points of the DT discriminants as described in Section 3 are chosen depending on the final state and are given in Table 1.
The selected τ decay candidates are required to be of opposite charges and to be separated by more than ∆R = 0.3 in the η-φ plane in the eµ final state and 0.5 otherwise. This applies also to all selected τ-decay candidates in the VH-analysis, where the final states comprise a combination of one or more light leptons (e or µ) and one or two τ h candidates. The closest distance of the tracks to the PV is required to be d z < 0.2 cm along the beam axis. For electrons and muons, an additional requirement of d xy < 0.045 cm in the transverse plane is applied. In rare cases in which more than the expected number of τ h candidates fulfilling all selection requirements is found in an event, in the CB-and NN-analyses, the candidates with the highest D jet scores are chosen until the expected number of τ h candidates is met. In subsequent steps, the analyses differ slightly in their selection requirements, which is a consequence of the different analysis strategies employed in the CB-and NN-analyses, and the different measurement target for the VH-analysis. For the CB-and NN-analyses, events with additional leptons fulfilling looser selection criteria are excluded to avoid the assignment of single events to more than one ττ final state. For the NN-analysis, which is more inclusive than the CB-analysis before event classification, a requirement of m T < 70 GeV is imposed in the eτ h (µτ h ) final state, to keep an orthogonal control region for the estimate of the background from events with quark-or gluoninduced jets, which are misidentified as τ h leptons (jet → τ h ), as discussed in Section 5.2. In the eµ final state, events with at least one b jet are excluded from the selection in order to suppress the background from tt production. Finally, to prevent kinematic event overlap with the analysis of H → WW events, m eµ T calculated from p e T + p µ T and p miss T is required to be less than 60 GeV in the eµ final state.
Further selection details of the CB-and VH-analyses are discussed in Sections 8, 9.1, and 9.2.

Background and signal modeling
The main backgrounds in the CB-and NN-analyses originate from Z boson production in association with jets in the ττ decay channel (Z → ττ), W+jets, tt production, and SM events where light quark-or gluon-induced jets are produced through the strong interaction, referred to as quantum chromodynamics (QCD) multijet production. Minor backgrounds originate from the production of two V bosons (diboson), single t quark, and Z boson production in the ee and µµ final states (also denoted as Z → ). We distinguish three ways in which these processes may contribute to the selected event samples: 1. they contain two genuine τ leptons in their final states; 2. at least one quark-or gluon-induced jet is misidentified as τ h (jet → τ h ), e, or µ (jet → ); 3. an isolated high-p T electron or muon is misidentified as originating from a τ lepton decay or mistakenly identified as a τ h candidate.
Event groups 1 and 2 are estimated from data, as will be discussed in the following sections. Event group 3 and the signal are estimated from simulation. Group 1, which still relies on the simulation of the τ decays, group 3, and the signal are subject to simulation-to-data corrections, which have been determined from dedicated control regions, as will be discussed in Section 10.1.
For the VH-analysis, the backgrounds generally involve the production of an additional V boson. Details of the background estimation are given in Section 9.4

Backgrounds with genuine τ lepton pairs
For events in the CB-and NN-analyses in which, e.g., the decay of a Z boson results in two genuine τ leptons, the τ-embedding method is used, as described in Ref. [64]. For this purpose µµ events are selected in data. All energy deposits of the muons are removed from the event record and replaced by simulated τ lepton decays with the same kinematic properties as the selected muons. In this way, the method relies only on the simulation of the τ lepton decay and its energy deposits in the detector, while all other parts of the event, such as the reconstructed jets, their identification as originating from the PV, the identification of b jets, or the non-τ related parts of p miss T , are obtained from data. This results in an improved modeling of the data compared to the simulation of the full process and removes the need for several of the simulation-to-data corrections detailed in Section 10.1 for these events. A detailed discussion of the selection of the original µµ events, the exact procedure itself, its range of validity, and related uncertainties can be found in Ref. [64].
Although the selected muons predominantly originate from Z boson decays, there are also contributions from other processes that result in two genuine τ leptons. For example, tt and diboson events where both W bosons decay into a muon and neutrino are included in the original selection of µµ events, and replacing the selected muons by simulated τ lepton decays naturally leads to an estimate of these processes, as well. For the selection described in Section 4, 97% of the µµ events selected for the τ-embedding method originate from Z boson decays, ≈1% from tt production, and the rest from other processes.

Backgrounds with jets misidentified as hadronic τ lepton decays
The main processes contributing to jet → τ h events in the eτ h , µτ h , and τ h τ h final states are QCD multijet, W+jets, and tt production. These events are estimated using the "fake factor" or F F -method described in Refs. [29,65] and adapted to each corresponding analysis described in this paper. For this purpose, the signal region (SR) as defined by the event selection given in Section 4 is complemented by the disjoint application region (AR) and determination regions (DR i , where i stands for QCD, W+jets, or tt). All other processes are estimated either from simulation or from the τ-embedding method and subtracted from the data in the AR, DR QCD , and DR W+jets . The SR and the AR differ only in the working point chosen for the identification of the τ h candidate, where for the AR a looser working point is chosen and the events from the SR are excluded.
Depending on the final state, one or three independent sets of extrapolation factors F i F are then derived. For the τ h τ h final state, where QCD multijet production contributes ≈94% of the events in the AR, only F QCD F is determined, and F W+jets F and F tt F are assumed to be similar. In the eτ h and µτ h final states, where the sharing is more equal, separate F i F are used for QCD multijet, W+jets, and tt production. In these final states the largest fraction in the AR, in the range of 55-70%, is expected to originate from W+jets production; the smallest fraction in the AR, in the range of 2-5%, is expected to originate from tt production.
Each F i F is determined in a dedicated DR i , defined to enrich each corresponding process. The F i F are then used to estimate the yields N SR and kinematic properties of the combination of these backgrounds in the SR from the number of events N AR in the AR according to For this purpose, the F i F are combined into a weighted sum, using the simulation-based estimate of the fractions w i of each process in the AR. A template fit to the data in the AR yields a similar result for the w i .
For the estimation of F QCD F , the charges of the two selected τ decay products are required to be of the same sign. For the estimation of F W+jets F , high m T and the absence b jets are required. For tt production a sufficiently pure DR with still similar kinematic properties to the SR in data can not be defined. Instead the F tt F are obtained from simulation and corrected to data with a selection of more than two jets, at least one b jet, and more than two leptons in an event.
The F i F depend mainly on the p T of the (leading) τ h candidate and are derived on an event-byevent basis. For F QCD F , an additional dependency on the number of selected jets N jet being 0, 1, or ≥2, and for F W+jets F , a similar dependency on N jet and the distance ∆R(τ h , e (µ)) between the τ h and the e (µ) in η-φ are introduced. Subleading dependencies on p e(µ) T , I e(µ) rel , the p T of the second leading τ h candidate, or the mass of the visible ττ decay products (m vis ) enter via bias corrections obtained from additional control regions in data. They are usually close to 1 but can range between 0.8-1.2, depending on the process and observable.

Backgrounds with jets misidentified as an electron or muon
The number of jet → events in the eµ final state is estimated in a way that is similar to the F F -method. In this case, an AR is distinguished from the SR by requiring the charges of the electron and muon to be of the same sign. A DR is defined requiring 0.2 < I µ rel < 0.5 from which a transfer factor F T is obtained to extrapolate the number N AR of events in the AR to the number N SR of events in the SR according to The main dependency of F T is on the distance ∆R(e, µ) between the e and µ trajectories at the PV in η-φ, and N jet being 0, 1, or ≥2. Subleading dependencies on p e T and p µ T are introduced via a closure correction in the DR and a bias correction to account for the fact that F T has been determined from less-isolated muons. The latter is obtained from another control region with 0.15 (0.20) < I e rel < 0.50 for the (CB-) NN-analysis.

Simulated backgrounds and signal
In the eτ h and µτ h final states, more than 80% of all backgrounds, after the selection described in Section 4, are obtained from one of the methods described in the previous sections. In the τ h τ h final state, this fraction is 95%. All remaining backgrounds-for example Z boson, tt, or diboson production, where at least one decay of a V boson into an electron or muon is not covered by either of the methods-are obtained from simulation.
The production of Z bosons in the ee and µµ final states and W boson production are simulated at leading order (LO) precision in the strong coupling constant α S , using the MAD-GRAPH5 aMC@NLO 2.2.2 (2.4.2) event generator [66,67] for the simulation of the data taken in 2016 (2017-2018), exploiting the "so-called" MLM jet matching and merging scheme of the matrix element calculation with the parton [68]. To increase the number of simulated events in regions of high signal purity, supplementary samples are generated with up to four outgoing partons in the hard interaction. For diboson production MADGRAPH5 aMC@NLO is used at next-to-LO (NLO) precision in α S . For the simulation of tt and single t quark production the POWHEG 2 [69][70][71][72][73][74] event generator is used at NLO precision in α S .
The signal samples are also obtained at NLO precision in α S using POWHEG for the five main production modes, gluon fusion [71,74], VBF [75], VH (split by WH and ZH) [76], and ttH [77]. For the production via gluon fusion the distributions of p H T and the jet multiplicity in the simulation are tuned to match the next-to-NLO (NNLO) accuracy obtained from full phase space calculations with the NNLOPS generator [78,79].
All signal samples have been produced assuming m H = 125 GeV and rescaled to correspond to the cross sections and branch fraction for the decay to τ leptons as expected for m H = 125.38 GeV, based on the recommendations of the LHC Higgs Working Group [80]. This change is less than −1% for the cross sections for H production via gluon fusion and VBF, and −1% for the branching fraction for the decay in τ leptons.
For the generation of all processes, the NNPDF3.0 [81] (NNPDF3.1 [82]) set of parton distribution functions (PDFs) is used for the data taken in 2016 (2017-2018). All matrix element generators are interfaced with the PYTHIA 8.230 event generator [83], which is used to model the effects of parton showering, hadronization, and fragmentation, as well as the decay of the τ lepton. For this purpose, two different tunes (CUETP8M1 [84] and CP5 [85]) are used for the data taken in 2016 and from 2017 onward, for the parameterization of multiparton interactions and the underlying event.
When comparing to data, Z boson, tt, and single t quark events in the tW-channel are normalized to their cross sections at NNLO precision in α S [86][87][88]. Single t quark production in the t-channel and diboson events are normalized to their cross sections at NLO precision in α S or higher [88][89][90]. The signal samples are normalized to their inclusive cross sections and branching fractions as recommended by the LHC Higgs Working Group [34], assuming an H mass of 125.38 GeV [13].
For all simulated events, additional inclusive inelastic pp collisions generated with PYTHIA are added according to the expected PU profile in data to take the effect of the observed PU into account. All events generated are passed through a GEANT4-based [91] simulation of the CMS detector and reconstructed using the same version of the CMS event reconstruction software as used for the data.

Simplified template cross section schemes
The analyses presented in this paper aim for measurements of inclusive and differential production cross sections for the signal at increasing levels of granularity following the STXS scheme specified by the LHC Higgs Working Group [34]. In this scheme two stages are defined. Stage-0 assumes the separation by production modes, of which ggH, qqH, and VH are of relevance for this paper. For ggH, which subsumes gluon fusion and gg → Z(qq)H production, and qqH, which comprises VBF and qq → V(qq)H production, the simulated signals of each corresponding process are combined for signal extraction, assuming relations between the processes as expected from the SM.
At stage-1.1 [35] and -1.2, these production modes are further split into STXS bins according to the jet multiplicity at stable-particle level, the invariant mass of the two leading jets m jj , if present in an event, and p H T . The VH process is further split by V type and p T (p V T ). For all cross section measurements, the absolute value of the H rapidity is required to be less than 2.5. The only difference of stage-1.2 with respect to stage-1.1 that is of relevance for this paper is the further splitting of the ggH bin with p Throughout the text the qqH bin with <2 jets or 0 < m jj < 350 GeV is also labeled as "non-VBFtopo". For all STXS bins, histogram template distributions are obtained from the simulated signal processes discussed in Section 5.4. These are fitted to the data together with corresponding template distributions from each considered background process, for signal extraction in each given STXS bin, as discussed in Section 11. Figure 1: Binning for ggH production in the reduced STXS stage-1.2 scheme as exploited by the analyses presented in this paper. The dark gray boxes indicate the STXS bins measured by the analyses. Depending on the analysis, a split is applied to the 0-jet bin, as is indicated by the dashed line. The boxes colored in light gray indicate a finer subdivision of the NN signal classes for N jet ≥ 2 used for the classification task, as explained in Section 7. Thresholds on p H T are given in GeV.

Neural network based analysis
For the NN-analysis, all selected events are provided as input to a set of multiclassification NNs. The outputs of these NNs are used to distribute the events into background classes depending on the ττ final state, as shown in Table 3, and a number of signal classes.
The background classes, which are based on the experimental signatures of groups of processes rather than individual processes, closely resemble the background model discussed in Section 5. The genuine τ, jet → τ h , and jet → classes are trained on data. The tt (tt) and diboson (db) classes are trained on simulated events from tt and diboson production excluding those parts of the processes which are already covered by the τ-embedding method. The db class in addition subsumes single t quark production. The Z → (zll) classes in the eτ h and µτ h final states, are trained on simulated Z → ee and Z → µµ events. Finally the miscellaneous (misc) classes comprise those processes that are either difficult to isolate or too small to be treated as single templates for signal extraction. These are Z → events in the eµ final state; diboson and single t quark production in the eτ h and µτ h final states; and tt, Z → , diboson, and single t quark production in the τ h τ h final state.
Depending on the stage of the STXS measurement that the analysis is targeting, two different sets of NNs are used that differ by the number of signal classes. Two signal classes are defined for the stage-0 measurement corresponding to the ggH and qqH processes. For the stage-1.   Table 3: Background processes and event classes for each ττ final state for the NN-analysis. All event classes enter the NN trainings for the event multiclassification with the same statistical weight (i.e., with uniform prevalence). The label tt (e/µ + X) refers to the fraction of tt production that is not covered by any of the estimation methods from data, as listed in the first two rows. The classes tt, misc, zll, and db are defined in the introduction of Section 7.

Classes per final state
Diboson/single t db misc misc misc measurement, the stage-0 processes are divided into 15 subclassess, splitting the signal events for training by their jet multiplicity and kinematic properties at the stable-particle level. This subdivision follows the STXS stage-1.2 scheme as shown in Figs. 1 and 2, with the exception that the signal class for ggH events with N jet ≥ 2 is subdivided into four additional STXS bins, according to p H T and m jj to accommodate future combined coupling measurements. The goal of this strategy is to achieve not only the best possible separation between the signal and each of the most relevant backgrounds, but also across all individual STXS stage-1.2 bins.

Neural network layout
For each measurement, a distinct NN is trained for each of the four ττ final states. All NNs have a fully connected feed-forward architecture with two hidden layers of 200 nodes each. The activation function for the hidden nodes is the hyperbolic tangent. For the output layers the nodes are defined by the classes discussed in the previous section and the activation function is chosen to be the softmax function [92]. The NN output function y l of each output node l can be interpreted as a Bayesian conditional probability for an event to be associated with event class l, given its input features x. This conditional probability interpretation does not take the prior of the production rate of each corresponding process into account.
In the eτ h , µτ h , and τ h τ h final states each NN has the same 14 input features comprising the p T of both τ candidates and their vector sum; the p T of the two leading jets, their vector sum, their difference in η, and m jj ; N jet ; the number of b jets N Btag ; m τ τ ; m vis ; and the estimates of the momentum transfer of each exchanged vector boson under the VBF hypothesis, as used in Ref. [93]. In the eµ final state, where events containing b jets have been excluded from the analysis, N Btag carries no discriminating information. Instead, m eµ T has been added as an input feature with some separating power. These variables have been selected from a larger feature space based on their importance for the classification task derived from the metric defined in Ref. [94]. Moreover, to account for differences in data-taking conditions, the data-taking year is an additional input to each NN that is provided through one-hot-encoding, such that the correct data-taking year obtains the value 1, while all other data-taking years obtain the value 0.
Before entering the NNs, all input features are standardized for their distributions to have mean 0 and s.d. 1. Potentially missing features in a given event, such as m jj for events with less than two selected jets, are assigned a default value that is close enough to the transformed value space to not influence the NN decision.

Neural network training
The samples discussed in Section 5 are used for the training of the NNs. Those processes, which are part of the misc event class, are weighted according to their expected production rates to represent the event mixture as expected in the test samples.
The parameters to be optimized during training are the weights ({w a }) and biases ({b b }) of y l . The classification task is encoded in the NN loss function, chosen to be the categorical cross entropy where k indicates the event, on which L is evaluated. All training events are implicitly labeled by the true process p to which they belong. The NN output function for event k to belong to event class l is given by y It is 1 if the predicted class l of event k coincides with p, and is 0 otherwise. The y (k) l depend on the weights, biases, and input features { x (k) p } of event k to the corresponding NN. Before training, the weights are initialized with random numbers using the Glorot initialization technique [95] with values drawn from a uniform distribution. The biases are initialized with zero. The training is then performed as a minimization task on the empirical risk functional in the space of {w a } and {b b } using randomly sampled mini-batches of 30 events per signal and background class and data-taking year, drawn from the training data set using a balanced batch approach [96]. This approach has shown improved convergence properties on training samples with highly imbalanced lengths. The batch definition guarantees that all true event classes enter the training with equal weight in the evaluation of R, i.e., uniform prevalence. On each mini-batch a gradient step is applied defined by the partial derivatives of L in each weight, w a , and bias, b b , using the Adam minimization algorithm [97], with a constant learning rate of 10 −4 .
To guarantee statistical independence, events that are used for training are not used for any other step of the analysis. The performance of the NN during training is monitored evaluating R on a validation subset that contains a fraction of 25% of randomly chosen events from the training sample, which are excluded from the gradient computation. The training is stopped if the evaluation of R on the validation data set does not indicate any further decrease for a sequence of 50 epochs, where an epoch is defined by 1000 mini-batches. The NNs used for the analysis are then defined by the weights and biases of the epoch with the minimal value of R on the validation sample. To improve the generalization property of the NNs, two regularization techniques are introduced. First, after each hidden layer, a layer with a dropout probability of 30% is added. Second, the weights of the NNs are subject to an L2 (Tikhonov) regularization [98] with a regularization factor of 10 −5 .
The NNs draw their power not only from the isolated values of each corresponding feature, but also from correlations across features. To provide an objective statistical measure of an adequate modeling of the inclusive NN feature space, goodness-of-fit (GoF) tests have been performed on 1D and 2D histograms of all features and their pairwise combinations. This has been done prior to the application of the NNs to the test data set and the data. These GoF tests are based on a saturated likelihood model, as described in Ref. [99], exploiting the data model as described in Section 5 and including all systematic uncertainties of the model and their correlations as used for signal extraction. During these tests we generally observe an agreement of the model with the data within 5-10% of the event density, well contained in the combined systematic variations of the uncertainty model typically ranging between 10-15%, prior to the maximum likelihood fit used for signal extraction.

Characterization of the classification task
The overall success of the NNs in adapting to the given classification task is monitored with the help of confusion matrices. Examples of such confusion matrices for the STXS stage-0 and -1.2 measurements, evaluated on the full analyzed data set in the µτ h final state, are shown in Fig. 4. For the given representation the matrices have been normalized such that all entries in each column, corresponding to a given true event class, sum to unity. The values on the diagonals in the figure thus represent the sensitivity of the NN to each given true event class.
Random association would lead to a uniform distribution across predicted event classes with a weight of one out of seven (twenty) for the stage-0 (-1.2) training.
For the stage-0 classification, sensitivities of 70% and larger can be observed for the qqH, genuine τ, tt, and zll classes. The jet → τ h and misc classes have less prominent features to identify them, which partially relates to the fact that they comprise several different processes. The ggH class also reveals ambiguities, especially with respect to the qqH and genuine τ classes.
Adding the fractions of true ggH events in these three NN output classes results in a sensitivity of 79% to distinguish ggH events from events without genuine τ leptons in the final state, giving hint to the importance of τ-related features for the NN response, in this case. The distinction of ggH from genuine τ events mostly relies on features related to p H T , such as the vector sum p T of the two τ candidates. On the other hand ggH events with high p H T tend to higher jet multiplicities, which makes them harder to distinguish from qqH events. These trends are confirmed by the confusion matrix of the stage-1.2 classification that reveals larger off-diagonal elements relating ggH events with N jet = 0 and low p H T with genuine τ events and ggH events with N jet ≥ 2 and m jj > 350 GeV with qqH events.
The stage-1.2 classification reveals high sensitivities to the qqH signal classes with high m jj and the ggH signal classes with high p H T . Larger confusion, sticking out from the general trend, is observed for qqH events with N jet < 2 or m jj < 350 GeV. We observe that 70% of these events migrate from their original truth labeled class into one of three ggH NN output classes with N jet ≥ 2 and m jj < 350 GeV. This can be explained by the similarity of the observable signatures. For ggH events with N jet ≥ 2 and m jj < 350 GeV the reconstructed jets may originate from the matrix element calculation, but most probably they emerge from the parton-shower model. In the case of initial-state radiation, m jj can take large values and thus mimic the VBF signature in the detector. The fact that the NN associates 76% of the qqH events originating from the bin in discussion with NN output classes with N jet ≥ 2 and m jj < 350 GeV indicates the experimental signature that the NN has identified to be decisive for this classification. The fact that no migrations from the corresponding ggH bins into the qqH bin are observed can be explained by the more specific signatures in the ggH bins, which are additionally split in p H T . The qqH bin, which is inclusive in p H T , acts like a superclass to these ggH bins in this respect. An event that is compatible with a certain p H T hypothesis will be associated with one of the ggH classes rather than the qqH class. The background processes are identified with a sensitivity comparable to the stage-0 training. The slightly larger confusion can be understood by the increased number of classes and therefore increased variety of signatures that a given event can be associated with.
We visualize the influence of single features and their pairwise linear correlations on the NN response exploiting the metric t α based on Taylor expansions of the y l after training, with respect to the features x up to second order, as described in Ref. [94]. For a classic gradient descent training, t α corresponds to the mean of the absolute values of the given Taylor coefficient obtained from the whole sampled input space. In this way we identify m τ τ , m vis , m jj , and especially correlations across these features as most influential for the NN classification. For the stage-0 ggH event class we identify m τ τ , m vis , and corresponding (self-) correlations of m τ τ and m vis as the most important characteristics for identification; the term self-correlation corresponds to the second derivative in the Taylor expansion and reveals, e.g., that the signal is peaking in the m τ τ and m vis distributions. The fact that these characteristics are shared across the ggH, qqH, and genuine τ event classes explains the relatively high degree of confusion across these categories for the stage-0 training. The separation of ggH from qqH events mostly relies on characteristics related to m jj . As previously discussed this is more difficult for ggH    These confusion matrices have been evaluated on the full test data set, comprising all datataking years in the µτ h final state. They are normalized such that all entries in each column, corresponding to a given true event class, sum to unity. events with N jet ≥ 2. The distinction of genuine τ events mostly relies on the vector p T sum of the two τ candidates and on m τ τ . For the tt event class, we find that the information that tt events are nonpeaking in m vis and m τ τ contributes as much as N Btag and the jet properties to the observed high sensitivity. These findings, which similarly apply to all final states, demonstrate that the NNs have indeed captured the features that are expected to provide the best discrimination between event classes.

Classification and discriminants for signal extraction
For each event, the maximum of the y l obtained from all classes l defines the class to which the event is assigned. This maximum takes values ranging from one over the number of classes, for events that cannot be determined without ambiguity, to one, for events that can clearly be associated with a corresponding class. Histogrammed distributions of y l for each corresponding class are also used as input for signal extraction. For the stage-0 measurement, the training with two signal categories is used, resulting in five event classes in the τ h τ h final state and seven event classes in the eµ, eτ h , and µτ h final states, for each data-taking year. For this measurement, the signal classes are combined into a 2D discriminant depending on y ggH and y qqH . The binning scheme used for this discriminant and the distribution of the unrolled discriminant for all data-taking years combined, in the µτ h final state, are shown in Fig. 6. The binning is grouped in up to 11 bins in y qqH and up to seven bins in y ggH . Vertical gray dashed lines in Fig. 6 (left) indicate the main groups of bins in y qqH in this scheme. Each main group corresponds to increasing values in y qqH from left to right. From bin 0-20, the binning within each main group indicates increasing values in y ggH . The last main group on the right of the figure is ordered by y qqH only. The same input distributions are also used for a measurement of the inclusive H production cross section. In both figures, the differences either of the corresponding additional signals or the data relative to the background expectation after the fit used for signal extraction are shown in the lower panel of each plot.

Cut-based analysis
For the CB-analysis the event selection as described in Section 4 is modified and extended in a few places. To reduce the contamination from W+jets production in the eτ h and µτ h final states in the CB-analysis, a requirement of m T < 50 GeV is imposed. To suppress the background from tt production, events that contain at least one b jet are excluded from the selection not only in the eµ, but also in the eτ h , and µτ h final states. For similar reasons, in the eµ final state an additional requirement of D ζ > −30 GeV is imposed on the event variable D ζ defined as whereζ corresponds to the bisectional direction between the electron and muon trajectories at the PV in the transverse plane [100]. The variables p miss ζ and p vis ζ can each take positive or negative values. Their linear combination has been chosen to optimize the sensitivity of the   138 fb [60,120]      analysis in the eµ final state. A more detailed discussion of the variable D ζ is given in Ref. [65]. Finally, the requirement on I µ rel is tightened from 0.20 to 0.15 in the eµ final state. After selection, event categories are designed to increase the sensitivity to the signal by isolating regions with large signal-to-background ratios, and provide sensitivity to the stage-0 and -1.2 ggH and qqH STXS bins.
Events are distributed in different categories, which separate the different H production modes, corresponding to the stage-0 processes in the STXS scheme. A 0-jet category is used to collect events with no reconstructed jet. This category predominantly contains background, but also some signal events, which mainly originate from ggH production. It therefore mainly acts as a control region for backgrounds. Two event categories target qqH production. Depending on the ττ final state, these are defined by the presence of more than 2 jets with m jj > 350 GeV . All other events enter two so-called "boosted" categories, which are distinguished by the presence of exactly one or at least two jets in an event. The boosted categories are supposed to contain mostly ggH events with H recoiling against one or several jets, but they also contain contributions from qqH events that did not pass the VBF category selection. This leads to five categories for each ττ final state.
In each of these categories, 2D distributions are then built to provide more granularity for the analysis. The observables for these distributions are chosen to separate the signal from the backgrounds, but also to provide additional sensitivity to the individual STXS stage-1.2 bins. One of the observables is always chosen to be m τ τ . In the 0-jet category of the eτ h and µτ h final states, the p T of the τ h candidate is taken as a second observable, as the contribution from backgrounds with misidentified τ h candidates significantly decreases with p T . In the eµ and τ h τ h final states, where the sensitivity to 0-jet signal events is low, no second observable is chosen, and 1D distributions are used. In the VBF categories, the second observable is m jj . In addition to aligning with the definition of the STXS stage-1.2 qqH bins, using this variable as an observable increases the analysis sensitivity to the qqH process as a whole, since the signalto-background ratio quickly increases with increasing values of m jj . In the boosted categories, the second observable is chosen to bep H T . The category definitions, as well as the observables per category, are summarized in Table 4. Figure 7 shows the composition of the categories in terms of signal in the individual STXS stage-1.2 bins integrated over all ττ final states. The subcategorization on the vertical axis of the figure is given by the categorization given in Table 4

VH production modes
Higgs boson production in association with a W or Z boson has a cross section that is much lower than the cross section for ggH or qqH production, but it provides an additional check of the predictions of the SM. The best sensitivity for VH production is obtained using H decay modes with large branching fractions, such as H → ττ and H → bb.

WH final states
After the trigger selection detailed in Section 4, e, µ, and τ h candidates comprising the eµτ h , eτ h τ h , µµτ h , and µτ h τ h final states are required to satisfy the additional selection criteria listed in Table 5. Final states with a µ benefit from a lower p T threshold at the trigger level. Beyond that, events with additional electrons, muons, or b jets are rejected for all WH final states. In the eµτ h final state, the e and µ can come from either the W boson or from one of the τ leptons of the H → ττ decay. The light lepton with the largest p T is assigned to the W boson, and the other to H, which gives the correct pairing more than 75% of the time. The e and µ are required to have the same charges to reduce backgrounds with prompt and isolated electrons and muons, e.g., from Z → ττ events in the eµ final state. The τ h candidate is required to have the opposite charge relative to the light leptons, to give the proper charge for H. The p T threshold for both the electron and muon is set to 15 GeV, instead of the baseline value of 10 GeV for the muon, to reduce backgrounds where jets are misidentified as electrons or muons. Three additional selection requirements are imposed in this final state to increase the purity of the expected signal selection. Specifically, these are: 1. the scalar p T sum of the two light leptons and the τ h (L T ) is required to be larger than 100 GeV; 2. the |∆η| between the lepton associated with the W boson and the other two leptons is required to be less than 2; 3. the |∆φ| between the light lepton associated with the W boson and the other two leptons is required to be larger than 2.
The same selection requirements are applied for the µµτ h final state, where the same charge requirement on the muons already significantly reduces the background from Z → µµ events.
For the eτ h τ h and µτ h τ h final states, the τ h that has the same sign as the electron (muon) is required to have a p T > 30 GeV, L T is required to be larger than 130 GeV, and the magnitude of the p T sum of the e (µ), the τ h candidates, and p miss T (S T ) is required to be S T < 70 GeV.

ZH final states
The ZH analysis proceeds by identifying a pair of light leptons from either a Z → ee or Z → µµ decay, and a pair of τ candidates comprising the eτ h , µτ h , or τ h τ h final states. At least one of the light leptons from the Z boson, referred to here as the triggering lepton, is required to satisfy the online selection. The selection criteria for the light leptons are summarized in Table 6. The selection criteria for the ττ final state are summarized in Table 7. The charges for the light leptons associated with the Z → decay, as well as the charges of the individual τ decays associated with the ττ final state are required to be of opposite sign. As is the case for the WH final states, events with additional electrons, muons, or b jets are rejected for all ZH final states. Moreover, to increase the signal purity and to reject jet → τ h background events, the scalar p T sum of the τ decay products in the τ h τ h final state is required to be larger than 60 GeV.

VH observables
The results are extracted by fitting 2D distributions of m τ τ and the p T of the reconstructed vector boson (p V T ), where m τ τ is chosen because it provides the best discrimination between signal and background, andp V T is chosen to separate the different VH STXS stage-1.2 bins. For the WH process, p miss T originates from both the W and H decays, resulting in a total of three or four neutrinos that escape detection. The system is underconstrained and the individual neutrinos cannot be fully reconstructed. However, an estimate ofp V T can still be made: Simulation studies indicate that the invisible p T of the H → ττ decay is on average 47 (69)% of the visible p T of the ττ system in the τ h τ h ( τ h ) final state. Assuming that the sum of the neutrino momenta in the ττ final state points in the direction of the visible p T , the neutrino system from the H → ττ decay is assumed to have the corresponding proportions of the visible ττ four-momentum, this is known as the collinear approximation. The p miss T associated with the W decay is taken to be the difference between the reconstructed p miss T and the assumed p T of the neutrino system from the H → ττ decay. No significant dependence of this assumption on the p T of the visible ττ decay products is observed. This estimate ofp V T provides a better separation between the STXS stage-1.2 bins than using other estimates of p

VH Background estimation
Backgrounds to the VH final states are estimated using methods similar to those described in Sections 5.2 and 5.4. There are, however, some notable differences for the estimates, which are obtained from data, as described below.

Backgrounds with the same final state as the signal
In the WH analysis, the dominant irreducible background is W( ν)Z(ττ), which can only be separated from the signal by using m τ τ . Other small irreducible backgrounds are ZZ → 4 , ttZ, ttW, WWW, WWZ, WZZ, and ZZZ. In the ZH analysis, the dominant irreducible background is ZZ → 4 , with small contributions from WWZ, WZZ, ZZZ, and ttZ. The H → WW process is treated as an irreducible background to both WH and ZH production. All of the above processes are estimated from simulation. Simulated events where one of the reconstructed e, µ, or τ h candidates corresponds to a jet at the generator level are vetoed, because they are estimated as part of the backgrounds described below.

Backgrounds with jets misidentified as an electron, muon, or hadronic τ lepton decay
Reducible backgrounds have at least one jet that is misidentified as one of the objects in the final state (e, µ, or τ h ). In the WH analysis, they mostly consist of Z and tt production, with small contributions from W+jets, and diboson production, where one boson subsequently decays leptonically and the other hadronically. In the ZH analysis, the reducible backgrounds mostly include Z → decays with one or two jets that are misidentified as τ decays.
These backgrounds are estimated from data using a method that is similar to the one described in Section 5.2, but differing in detail, as described below. The F i F , with i = e, µ, τ h , denote the probabilities that a loosely selected e, µ, or τ h candidate, which is assumed to originate from a jet, will satisfy a tighter selection criterion, where in this context "tight" and "loose" refer to the object identification criteria of each corresponding object used in the analyses. The F i F are measured in +jets events and are common for all VH final states. They are applied in slightly different ways for the τ h and τ h τ h final states in the WH analysis, because of the different background compositions. The applications of the F i F in the WH and ZH analyses differ because the number of final state objects subject to misidentification is not the same.

Jet → τ h misidentification measurement
The probability that a jet is misidentified as a τ h candidate is measured using Z → µµ events with additional jets. These events are selected by requiring two muons of opposite charges that satisfy the muon identification criteria, as given in Section 3, and have I µ rel < 0.15, p T > 10 GeV, and |η| < 2.4. The leading muon must have p T > 23 (25) GeV in 2016 (2017-2018) data and the event must be selected through the single-muon trigger path in the online selection. The muons must be separated by ∆R > 0.3 from each other. In addition the event must have a τ h candidate that satisfies the VVVLoose working point of D jet , as defined in Ref. [54]. The τ h candidate must have p T > 20 GeV and |η| < 2.3. Almost all τ h candidates selected in this way are expected to originate from misidentified quark-or gluon-induced jets. The F τ h F are then measured as function of the p T of the τ h candidate for different τ h decay modes. They are taken as the ratio between the number of τ h candidates, selected as described above and passing the tighter DT working point used in the analysis, and the number of τ h candidates selected as above, without any additional requirements. Genuine τ h decays contribute to the F τ h F measurement at the percent level, mainly originating from WZ and ZZ production with leptonic V boson decays. Their contribution is estimated from simulation and subtracted from the data before taking the ratio. Depending on the decay mode and p T of the τ h candidate, F

Jet → e misidentification measurement
The probability that a jet is misidentified as an electron is also measured in Z → µµ events with additional jets. These events are selected in the same way as described above, but with the loosely selected τ h candidate replaced by a reconstructed electron with no specific identification or isolation requirements. This electron must satisfy p T > 10 GeV and |η| < 2.5, and be separated by ∆R > 0.3 from the muons. In addition, we veto all events with m e T > 40 GeV to increase the purity of the measurement sample. The F e F are then measured as a function of the electron candidate p T . They are taken as the ratio between the number of electron candidates selected as described above, fulfilling in addition the identification requirement given in Section 3 and I e rel < 0.15 and the number of electron candidates selected as described above without any additional requirements. The fraction of events selected in data with a genuine electron, which ranges between 20-30%, is estimated from simulation and subtracted from the data before taking the ratio. The values of F e F thus obtained fall in the range 0.004-0.013.

Jet → µ misidentification measurement
Similarly to the jet → e misidentification measurement, the probability that a jet is misidentified as a muon is measured using Z → ee events with additional jets. These events are selected in the same way as described above, but with the muons replaced by electrons. The electrons must satisfy the criteria outlined in Section 3, p T > 10 GeV, and |η| < 2.5. The leading electron must have p T > 26, 28, and 33 GeV in 2016, 2017, and 2018, respectively, and the event must be selected through the single-electron trigger path in the online selection. The electrons are required to be separated by ∆R > 0.3 from each other. In addition, we veto all events with m µ T > 40 GeV to increase the purity of the measurement sample, and select a muon with no specific identification or isolation criteria. This muon must satisfy p T > 10 GeV, |η| < 2.5, and be separated by ∆R > 0.3 from the electrons. The F µ F are then measured as function of the muon candidate p T . They are taken as the ratio between the number of muon candidates selected as described above, with the additional requirement that the muon must satisfy the muon identification criteria as given in Section 3 and I µ rel < 0.15 and the number of muon candidates selected as described above without any additional requirements. The fraction of events selected in data with a genuine muon, which ranges between 20-40%, is estimated from simulation and subtracted from the data before taking the ratio. The values of F µ F thus obtained fall in the range 0.01-0.03.

Systematic uncertainties
Control regions comprising event samples, that are not used to carry out the measurements, are used to ascertain how well the model describes the data, and to derive corrections and their corresponding uncertainties as needed.
A summary of all systematic uncertainties considered in the analyses is given in Table 8. They mainly comprise uncertainties in the object reconstruction and identification, in the signal and background modelings, and due to the limited population of template distributions available for the signal and background model. The last group of uncertainties is incorporated for each bin of each corresponding template individually following the approach proposed in Ref. [102]. All other uncertainties lead to correlated changes across bins taking the form of either normalization changes or general nontrivial shape-altering variations. Depending on the way they are derived, correlations may also arise across data-taking years, individual signal and background samples, or individual uncertainties.

Corrections to the model
The following corrections equally apply to simulated and τ-embedded events, where the τ lepton decay is also simulated. Since the simulation of τ-embedded events happens under different detector conditions, corrections and related uncertainties may differ, as detailed in Ref. [64]. Corrections are derived for residual differences in the efficiency of the selected triggers, differences in the electron and muon tracking efficiencies, and in the efficiency of the identification and isolation requirements for electrons and muons. These corrections are obtained in bins of p T and η of the corresponding lepton, using the "tag-and-probe" method described in Ref. [103] with Z → ee and Z → µµ events. They usually amount to not more than a few percent. In a similar way, corrections are obtained for the efficiency of triggering on the τ h decays and for the τ h identification efficiency, following the procedures described in Ref. [53]. The latter is derived as a function of the p T of the τ h candidate in four bins below 40 GeV and one bin above. For p τ h T > 40 GeV a correction is derived also for each τ h decay mode, which is used only in the τ h τ h final state. Corrections to the energy scale of the τ h decays and for electrons misidentified as τ h candidates are derived for each data-taking year and each τ h decay mode individually, from likelihood scans of discriminating observables, like the mass of the visible decay products of the τ h candidate, as detailed in Ref. [53]. For the trigger efficiency, corrections are obtained from parametric fits to the trigger efficiency curves derived for each corresponding sample and data.
The following corrections only apply to fully simulated events. During the 2016-2017 data taking, a gradual shift in the timing of the inputs of the ECAL L1 trigger in the region of |η| > 2.0 caused a specific inefficiency [39]. For events containing an electron (a jet) with p T 50(100) GeV, in the region of 2.5 < |η| < 3.0 the efficiency loss is 10-20%, depending on p T , η, and time. Corresponding corrections depend on the multiplicity, η, and p T distributions of the jets in an event. They are at the level of 1% for tt events and up to 5% for signal events in VBF. In the qqH STXS stage-1.2 bins they can be up to 15%. Uncertainties range between 0.2 and 3% depending on the size of the correction.
The energies of jets are corrected to the expected response of the jet at the stable-hadron level, using corrections measured in bins of the jet p T and η, as described in Ref. [104]. These correc- Table 8: Summary of the most important systematic uncertainties discussed in the text. The columns indicate the source of uncertainty, the process class that it applies to, the variation, and how it is correlated. A checkmark is given also for partial correlations. More details are given in the text.

Uncertainty
Process Variation Correlated across Sim. τ-emb. F F Years Processes tions are usually not larger than 10-15%. Residual data-to-simulation corrections are applied to the simulated event samples, which are also propagated to p miss T . They usually range from less than 1% at high jet p T in the central part of the detector to a few percent in the forward region. An explicit correction to the direction and magnitude of p miss T is obtained from differences between estimates of the hadronic recoil in Z → µµ events in data and simulation, as described in Ref. [57]. This correction is applied to the simulated Z → , W+jets, and signal events, where a hadronic recoil against a single particle is well defined, and replaces the propagation of jet energy scale corrections to the p miss T . The efficiencies for genuine and misidentified b jets to pass the working points of the b jet identification discriminants, as given in Section 3, are determined from data, using tt events for genuine b jets and Z boson production in association with jets for jets originating from light quarks or gluons. Data-to-simulation corrections are obtained for these efficiencies and used to correct the number of b jets in the simulation, as described in Ref. [50].
Data-to-simulation corrections are further applied to Z → ee (µµ) events in the eτ h (µτ h ) and τ h τ h final states, in which an electron (muon) is reconstructed as a τ h candidate, to account for residual differences in the e(µ) → τ h misidentification rate between data and simulation. Deficiencies in the modeling of Z → events, which have been simulated only at LO precision in α S , are corrected for by a weighting of the simulated Z → µµ events to data in bins of p µµ T and m µµ . In addition, all simulated tt events are weighted to better match the top quark p T distribution observed in data [105].
In the NN-analysis the background classes described in Section 7 and summarized in Table 3 are included in the likelihood model discussed in Section 11 to further constrain these backgrounds during the signal extraction process.

Uncertainties related to the τ-embedding method or the simulation
The following uncertainties related to the level of control of the reconstruction of electrons, muons, and τ h decays after selection refer to simulated and τ-embedded events. Unless stated otherwise, they are partially correlated across τ-embedded and simulated events.
Uncertainties in the electron and muon tracking, reconstruction, identification, and isolation efficiencies amount to 2% each for electrons and muons [45,47]. They are introduced as normalization uncertainties. Uncertainties in the electron or muon trigger efficiencies contribute an additional normalization uncertainty of 2% for events selected with each corresponding trigger. Due to differences in the trigger leg definitions they are treated as uncorrelated for single-lepton and lepton-pair triggers, which may result in shape altering effects in the overall model, since both triggers act on different regimes in lepton p T .
Uncertainties in the muon momentum scale range between 0.4-2.7%, depending on the muon η [47]. For fully simulated events an uncertainty in the electron energy scale is derived from the calibration of ECAL crystals, and applied on an event-by-event basis [44]. It usually amounts to less than 1.5% with a dependence on the 0.1% level in magnitude on the electron p T and η. For τ-embedded events uncertainties of 0.50-1.25%, split by the ECAL barrel and endcap regions, are derived for the corrections described in Section 10.1. Due to the different ways, the uncertainties are determined and differences in detector conditions, they are treated as uncorrelated across simulated and τ-embedded events. They lead to shape-altering variations and are treated as correlated across years.
Uncertainties in the τ h identification range from 3-5% in its different bins of p T and decay mode. They are obtained from the corrections described in Section 10.1 following procedures as described in Ref. [53]. Due to the nature of the way they are derived, these uncertainties are statistically dominated and therefore treated as uncorrelated across decay modes, p T bins, and data-taking years. An additional normalization uncertainty of 3% is quadratically added in the µτ h and τ h τ h final states to account for the use of different working points of D e and D µ compared to the selection that has been used for the determination of the corrections. Uncertainties in the τ h trigger efficiency usually range between 5-10% depending on the τ h p T . They are obtained from parametric fits to data and simulation, and lead to shape-altering effects. They are treated as uncorrelated across trigger paths and data-taking years. Uncertainties in the τ h momentum scale range between 0.2-1.1% depending on the τ h p T and decay mode. They are treated as uncorrelated across decay modes, p T bins, and data-taking years for the same reason as given in the case of the uncertainties in the τ h identification efficiency.
Two further sources of uncertainty are considered for τ-embedded events, as discussed in Section 5.1. A 4% normalization uncertainty accounts for the level of control in the efficiency of the µµ selection in data, which is unfolded during the τ-embedding procedure. The dominant part of this uncertainty originates from the trigger used for selection. Since these trigger setups differed across data-taking years, this uncertainty is treated as uncorrelated in this respect. Another shape-and normalization-altering uncertainty relates to the normalization of tt → µµ + X decays, which are part of the τ-embedded event samples. It ranges from less than 1 to 3%, depending on the event composition of the model. For this uncertainty the number and shape of tt events contained in the τ-embedded event samples are estimated from simulation, for which the corresponding decay has been selected at the parton level. This estimate is then varied by ±10%. A more detailed discussion of these uncertainties is given in Ref. [64].
For fully simulated events, as discussed in Section 5.4, the following additional uncertainties apply: For the rare cases of electrons or muons misidentified as τ h candidates, an uncertainty is derived in bins of p T , η, and decay mode of the misidentified τ h candidate. It amounts to up to 40% for electrons and ranges 10-70% for muons. The relatively large size of these uncertainties originates from the rareness of such cases in the control regions that are used to measure the rates. These uncertainties influence the measurements at the sub percent level. Uncertainties in the momentum scale of these misidentified leptons range 0.8-6.6% (amount to 1%) for electrons [45] (muons [47]). Uncertainties in the energy calibration and resolution of jets are applied with different correlations depending on their sources, comprising the limited sample size for the measurements used for calibration, the time-dependence of the energy measurements in data due to detector aging, and corrections introduced to cover residual differences between simulation and data [104]. They range between sub percent level and O(10%) depending on the kinematic properties of the jets in an event. They have the largest impact in the m τ τ distribution for simulated events, since they are propagated to p miss T , and lead to migration effects in N jet or m jj . Uncertainties in the jet energy resolution typically have a lower impact than the sum of the jet energy scale uncertainties. They only have relevance due to migrations in m jj .
Depending on the process under consideration, two independent uncertainties in p miss T are applied. For processes that are subject to recoil corrections, i.e., Z boson, W+jets production, and signal, uncertainties in the calibration and resolution of the hadronic recoil are applied, ranging 1-5%. For all other processes an uncertainty in p miss T results from the uncertainties in the jet energy calibration, which is propagated to p miss T . An additional uncertainty is obtained from the amount of unclustered energy in the event, which is not subject to the jet energy calibration [57]. These uncertainties mostly affect the event selections based on m T and D ζ , the distribution of m τ τ , andp H T used for event classification in the CB-analysis. Uncertainties in the misidentification rate for light quark-or gluon-induced jets as b jets amount to 1%. The uncertainty in the efficiency to identify b jets amounts to 10% [50,51].
Additional uncertainties account for the timing shift of the inputs to the ECAL L1 trigger described in Section 10.1. These result in a normalization variation of 1-3% for the inclusive VBF signal sample. A shape-altering uncertainty is derived in the reweighting of the top quark p T distribution, described in Section 10.1, applying the correction twice or not applying it at all. It usually has only a small effect on the final discriminants. The integrated luminosities of the 2016-2018 data-taking periods are individually known with uncertainties in the 1.2-2.5% range [106][107][108], while the total Run-2 (2016-2018) integrated luminosity has an uncertainty of 1.6%, the improvement in precision reflecting the (uncorrelated) time evolution of some systematic effects. Finally, uncertainties in the predictions of the normalizations of all simulated processes amount to 4% for Z → and W+jets production [86], 6% for tt production [87,88], and 5% for diboson and single t quark production [88][89][90]. These uncertainties have been applied in all places, where the corresponding simulated samples have been used, in the analyses. They are correlated across years.

Uncertainties related to jets misidentified as hadronic τ lepton decays, electrons, or muons
The F i F described in Sections 5.2 and the F τ h F described in Sections 9.4, and their corrections are subject to statistical fluctuations in each corresponding DR i . These uncertainties are split into normalization-and shape-altering parts and propagated to the final discriminants for each analysis. They usually amount to a few percent and are treated as uncorrelated across the kinematic and topological bins they are derived in. Additional uncertainties are applied to capture the needs and magnitudes of bias corrections and extrapolation factors, varying from a few percent to O(10%), depending on the kinematic properties of the τ h candidate and the topology of the event. These are both normalization and shape-altering uncertainties. An additional source of uncertainty concerns the subtraction of processes other than the enriched process in each corresponding DR i . These are subtracted from the data using simulated or τ-embedded events. The combined shape of the events to be removed is varied by 7%, and the measurements are repeated. The impacts of these variations are then propagated to the final discriminants as shape-altering uncertainties. An uncertainty in the estimate of the three main background fractions in the AR, as described in Section 5.2, is estimated from a variation of each individual contribution also by 7%, increasing or decreasing the remaining fractions such that the sum of all contributions remains unchanged. The amount of variation is motivated by the uncertainty in the production cross sections and acceptances of the involved processes, bounded by the constraint on the process composition that can be clearly obtained from the AR. The effect of this variation is observed to be very small, since usually one of the contributions dominates the event composition in the AR. Due to their mostly statistical nature, and differences across years, all uncertainties related to the F F -method are treated as uncorrelated across years.
Several shape-altering uncertainties affect the estimation of the backgrounds where a jet is misidentified as an electron or muon, as described in Sections 5.3 and 9.4. The first group covers statistical uncertainties in the determination of the F T , F e F , and F µ F . The second set covers statistical uncertainties in the bias corrections in p e T and p µ T . Due to the way they are derived these uncertainties are treated as uncorrelated across individual kinematic bins. The last uncertainty covers the bias correction due to the determination of the F T from a control region with nonisolated muons, for the method described in Section 5.3. The combined effect of these uncertainties is usually not larger than 10%.

Uncertainties related only to the signal
For the measurement of the signal strengths µ s of the H production cross sections with respect to the SM expectations, the following uncertainties are taken into account, following the recommendations of the LHC Higgs Working Group [34]: For B(H → ττ) an uncertainty of 1.2% is assumed. For B(H → WW), which is considered as background in the eµ final state of the CBand NN-analyses and in the VH-analysis, an uncertainty of 1.5% is assumed. The uncertainties due to PDF variations and the uncertainty in α S are obtained following the PDF4LHC recommendations, taking the root mean square of the variation of the results when using different replicas of the default NNPDF sets, as described, e.g., in Ref [109]. Uncertainties due to the choice of the renormalization (µ r ) and factorization (µ f ) scales in the calculation of the matrix elements are obtained from an independent variation of these scales by factors of 0.5 and 2, omitting the variations where one scale is multiplied by 2 and the corresponding other scale by 0.5. The uncertainties are then obtained from an envelope of these variations.
For the inclusive and STXS stage-0 measurements the uncertainties due to PDF variations and the uncertainty in α S amount to 3.2, 2.1, 1.9, and 1.6% for the gluon fusion, VBF, WH, and ZH production modes, respectively. The uncertainties from the variations of µ r and µ f amount to 3.9, 0.4, 0.7, and 3.8%, for each corresponding process.
For the STXS stage-1.2 measurements, additional uncertainties are obtained varying µ r and µ f within each STXS bin using the simulation of signal with POWHEG as described in Section 5.4. These uncertainties account for relative variations across STXS stage-1.2 bins and migration effects. Acceptance effects, within a given STXS stage-1.2 bin are also taken into account. Their combined effect is typically below 1%. An uncertainty in the parton-shower model of PYTHIA is obtained by varying the scales in the initial-and final-state radiation models; the observed effect typically ranges 1-3%, but can become as large as 10% for gluon fusion production with VBF-like topologies. For the measurements of H production cross sections, the effects of theoretical uncertainties in the normalizations of the signal templates within each STXS bin are removed from the uncertainty model.

Results
The model used to extract the signal from the data is defined by an extended binned likelihood of the form where i labels all bins of the input distributions for each signal class, with index s, and background class, with index b, defined for each of the analyses as described in Sections 7-9. Signal and background templates are obtained from the data model as discussed in Section 5, with further specifications for the VH-analysis in Section 9. The function P ( In particular they are not bound to be ≥0 in the absence of signal, but are allowed to take also negative values, to obtain unbiased confidence intervals and facilitate combinations with independent data.
For the NN-analysis the input distributions are obtained from the y l , where l comprises the signal and background classes. Depending on the measurement, two different sets of input distributions are used for the signal. For the stage-0 measurement, single 2D histograms for signal are used, binned in y qqH and y ggH . An example distribution in the µτ h final state is shown in Fig. 6. Together with the input distributions for the background classes this results in 66 input distributions split by ττ final state and data-taking year. For the stage-1.2 measurement, 15 1D histograms are used for each individual STXS stage-1.2 bin. We note that the event categories targeting the ggH N jet ≥ 2 bin have been subdivided in anticipation of future measurements, as discussed in Section 7. Therefore, the number of categories is slightly larger than the number of POIs used for signal extraction. Together with the input distributions for the background classes, this results in 234 input distributions split by ττ final states and datataking years. Example distributions for three adjacent bins in p H T in the ggH N jet = 1 subspace, and for the qqH STXS bin with N jet ≥ 2 and m jj > 350 GeV are shown in Fig. 5.
For the CB-analysis, the signal extraction is based on 60 input distributions provided for five event categories split by ττ final state and data-taking year. The inputs are usually provided as 2D histograms binned in two discriminating observables, with the exception of the 0-jet categories in the eµ and τ h τ h final states, for which 1D distributions are used. An overview of the discriminating observables, from which the input distributions are obtained, is given in Table 4. Example distributions are shown in Fig. 8.
For the VH-analysis, 30 input distributions are used, split by four (six) final states in the WH (ZH) analysis and data-taking years. The inputs are provided as 2D histograms binned in m τ τ andp V T , as defined in Section 9. Input distributions of all final states and all data-taking years combined are shown in Fig. 9.
Systematic uncertainties are incorporated in the form of penalty terms for additional nuisance parameters {θ j } in the likelihood, appearing as a product with predefined probability density functions C(θ j |θ j ), whereθ j corresponds to the nominal value for θ j . The predefined uncertainties in theθ j , as discussed in Section 10, may be constrained by the fit to the data. We do not observe any strong constraints nor shifts of any of the nuisance parameters in the fits from which the most probable values of the POIs and their uncertainties are obtained.
Maximum likelihood estimates for the POIs are obtained from the combination of the CB-and VH-analyses (labeled by CB in the figures) on the one hand, and the NN-and VH-analyses (labeled by NN in the figures) on the other hand. In all cases the ggH, qqH, and VH STXS stage-0 processes are treated as signals. A 1% contribution of b quark associated H production is additionally subsumed into the ggH process, assuming relations between the production modes as expected from the SM. The H production in association with a tt pair, which contributes at the 1% level, is treated as background with a production cross section fixed to the SM expectation. The same is true for the contamination from H → WW events, mostly contributing to the eµ final state in the CB-and NN-analyses, and in the VH-analysis, with the only exception of the likelihood scans discussed in the end of this section, where this contribution is also treated as signal.
The measurements of the inclusive signal strength parameter µ incl and the STXS stage-0 processes µ ggH , µ qqH , and µ VH are shown in Fig. 10. Here and in the following, in addition to the central value that maximizes the likelihood the combined systematic and statistical uncertainties (tot), as well as a split of each uncertainty into its purely statistical component (stat), experimental (syst) and theoretical (theo) systematic uncertainties, and uncertainties due to the finite sample sizes used for template production (bbb) are given; the expression bbb stands for "bin-by-bin", indicating that these uncertainties are incorporated for each template bin individually, as discussed in the introduction of Section 10. This split of uncertainties is obtained by successively fixing the nuisance parameters in each group to their maximum likelihood estimates and subtracting the resulting uncertainty in quadrature from the previous total. Correlations of uncertainties across these groups are assumed to be small, so that this procedure is justified.  Figure 10: Measurements of the signal strength parameters for inclusive H production (µ incl ) and the ggH (µ ggH ), qqH (µ qqH ), and VH (µ VH ) STXS stage-0 processes. The combination of the CB-and VH-analyses is labeled by CB. The combination of the NN-and VH-analyses is labeled by NN. Central values maximizing the likelihood and a split of uncertainties as explained in the text are provided with each result.
For the combination of the NN-and VH-analyses an inclusive signal strength of µ incl = 0.82 ± 0.11 is obtained, compatible within two s.d. with the SM expectation. The p-value for such an outcome of the measurement under the assumption of the SM hypothesis is found to be 0.1. For the ggH process, for which the overall uncertainty is already dominated by systematic uncertainties, a signal strength of µ ggH = 0.67 ± 0.19 is obtained. For the qqH and VH processes, which are still dominated by statistical uncertainties, the signal strengths are found to be µ qqH = 0.81 ± 0.17 and µ VH = 1.79 ± 0.45. The result for µ VH significantly supersedes a previous measurement that has been obtained from a smaller data set [101].
The correlation between µ ggH and µ qqH is found to be −0.35, compatible with the observed migration effects of ggH events with N jet ≥ 2 into the qqH category. As a result of the vetoes of additional leptons in the CB-or NN-analyses, very little or no correlation is expected between µ VH and any of the other POIs, which is confirmed by the observation.
For the combination of the CB-and VH-analyses, similar results are obtained with a p-value for the outcome of the inclusive measurement under the assumption of the SM hypothesis of 0.6. The CB-and the NN-analyses obtain compatible results also in terms of the constraints on µ incl and µ ggH , which are a measure for the sensitivity of the analyses to these quantities. This can be understood in the following way: 1. For the ggH process systematic uncertainties start to dominate over statistical uncertainties. The NNs are trained based on the cross entropy loss function, as defined in Eq. (6), which is a maximum likelihood estimate for the separation of signals and backgrounds in the presence of only statistical uncertainties. Taking systematic uncertainties into account during the training of the NNs can help to improve the overall constraining power in future versions of the NN-analysis; 2. We observe that the separation of ggH from qqH and Z → ττ is challenging. This is visible from Fig. 4 (upper), and discussed in Section 7.3. The challenging phase space regions are ggH with N jet ≥ 2, where the ggH process becomes similar to the qqH process, contributing roughly 60% to the expected inclusive ggH template cross section; and ggH with low p H T , where the signal is difficult to separate from the Z → ττ background. The template cross section for ggH is expected to be roughly 10 times larger than for the qqH process. For this reason we anticipate the findings for the inclusive result to be similar to those for the stage-0 ggH result.
The constraint on µ qqH obtained from the NN-analysis on the other hand is 30% stronger than for the CB-analysis. This can be understood from the fact that 1. the qqH process is easier to distinguish from ggH and Z → ττ events; 2. the uncertainty in µ qqH is still dominated by the statistical variance of the measurement.
Another observation that can be made is that the systematic uncertainties especially in µ qqH are smaller for the NN-analysis compared to the CB-analysis. This trend is even more visible for some STXS stage-1.2 bins, as will be discussed below. Here the effect can be understood by the following means: 1. The CB-analysis relies on one or two input distributions and a small number of inputs for primary categorization, to discriminate signal from background. The NN-analysis uses a 14-dimensional (14D) feature space to distinguish signals from backgrounds. Usually more than one feature, as well as correlations across features, considerably contribute to the separation task, as discussed in Section 7.3. Because of the fact that more features, which are subject to complementary systematic variations, contribute to the NN response with roughly equal weight, systematic variations in a single feature may have less impact on the NN response. We find this best illustrated by the classification of the qqH processes, where both the leading dijet system and the ττ system contribute to the separation of signals from backgrounds, and for the classification of tt events, to which not only b-tagging information, but a large variety of kinematic observables equally contribute to the NN response; 2. The likelihood of the NN-analysis includes the discriminating distributions not only for signal, but also for the background classes, as given in Table 3. All discriminating distributions are ordered by purity, in each corresponding event class, for increasing discriminant values. This results in high-purity control regions for each corresponding background class, and a transition from these control regions to the signal regions. This helps to constrain the nuisance parameters related to the main background processes of the analysis, in the maximum likelihood estimate, within the given uncertainties; 3. Finally, the smaller uncertainties due to the finite sample sizes used for template production are explained by the use of larger sets of simulated signal samples, also used for training of the NNs. In addition the NN-analysis uses roughly a factor of five fewer bins for the input distributions to the likelihood, compared to the CB-analysis.
A visualization of the manifesting signal in the m τ τ distribution is shown in Fig. 11. For the CB-, WH-, and ZH-analyses these distributions are obtained from histogramming m τ τ from all event categories in all ττ final states and data-taking years in a predefined window around the assumed H mass [13], weighted by the signal (S) over combined background (B) ratio in each event category. For the NN-analysis, the events fulfilling y ggH + y qqH > 0.8 are shown without further weight. In all cases, the points with the error bars correspond to the observation. The stacked filled histograms correspond to the expectation from the background model. The inclusive contribution of the H signal with µ incl as obtained from the maximum likelihood fit to the data is also shown in each of the figures. A clear signal over the background is visible, especially for the CB-and NN-analyses.
For the STXS stage-1.2 measurements, the ggH process is split in 7 (8) POIs for the (CB-) NNanalysis; the qqH and VH processes are split in 4 POIs each, resulting in a total of up to 16 POIs, which have an exact correspondence to the gray boxes shown in Figs. 1-3. We note that, for the CB-analysis, the POIs for the ggH N jet = 0 bins have been combined into one, due to a lack of sensitivity. For the CB-and VH-analyses, the STXS stage-1.2 estimates are obtained from the same set of input distributions as the inclusive and STXS stage-0 estimates. For the NN-analysis, two different sets of input distributions are used for the inclusive and STXS stage-0 estimates, and for the STXS stage-1.2 estimates, as stated above and discussed in more detail in Section 7.4. In Fig. 12, the results for the STXS stage-1.2 measurement are shown. Tabulated values of the inclusive, STXS stage-0, and -1.2 results are given in Table 9. On average the constraints achieved by the NN-analysis are 30 (40)% stronger in the ggH (qqH) bins when compared to the CB-analysis. In particular in the qqH STXS bins an increased sensitivity by up to 200% is observed. This can be understood in the following way: 1. for the more differential analysis, statistical uncertainties again dominate over systematic uncertainties, hence the NN training setup is optimally suited to serve the measurement target; 2. by the kinematic restrictions the signal can be more clearly distinguished from backgrounds and competing signals;  from all event categories and data-taking years in a predefined window around the assumed H mass [13] is histogrammed, weighted by the signal (S) over combined background (B) ratio in each event category. For the (upper right) NN-analysis, the events fulfilling y ggH + y qqH > 0.8 are shown without further weight applied. The inclusive contribution of the H signal with µ incl as obtained from the maximum likelihood estimates is also shown, in each subfigure. For the upper right figure µ incl is obtained from the combination of the VH-and the NN-analyses, for the other figures µ incl is obtained from the combination of the VH-and the CB-analyses.
3. at stage-1.2, the signal in general obtains higher emphasis in the training, which is performed without prevalence. This is implied by the fact that we have more signal classes relative to the background classes in stage-1.2, compared to stage-0.  An important point for the assessment of the STXS stage-1.2 results of the CB-and NN-analyses, and their constraining power relates to the separation of individual STXS bins. The CB-analysis primarily targets the separation of signal from background, while the distinction of STXS bins is left to the selection based event categorization. In contrast to this strategy the multiclassification ansatz of the NN-analysis targets the best possible separation of individual STXS bins and backgrounds in the 14D input space to the NNs. In this way, the NN-analysis generally achieves smaller migration effects, increased separation of individual STXS bins, and reduced Table 9: Tabulated values of the STXS stage-0 and -1.2 signal strengths for the combination of the (CB) CB-, resp. (NN) NN-analysis with the VH-analysis. The upper four lines refer to the inclusive and STXS stage-0 measurements. The values in braces correspond to the expected 68% confidence intervals for an assumed SM signal. The products of cross sections and branching fraction to τ leptons as expected from the SM with the uncertainties as discussed in Section 10.4 are also given. Both, the CB-and NN-analyses observe the same trends across STXS stage-1.2 bins. In particular, both analyses observe no signal in the 0-jet STXS bins, while their sensitivity to observe a signal larger than zero for a signal as expected from the SM is at the 2-3 s.d. level. In all other STXS bins, a signal compatible with the expectation of the SM is observed.
The linear correlation matrices of the STXS stage-1.2 measurements are shown in Fig. 13. For the NN-analysis the observed correlations are generally below 0.4 with trends of a more global anticorrelation of ggH N jet = 1 and N jet ≥ 2 STXS bins with the qqH STXS bins, reflecting the difficulty of experimentally distinguishing processes in these topological regions, especially for p H T > 200 GeV. As expected, no correlation of the ggH and qqH with the VH STXS bins is observed. For the CB-analysis, global and local trends are again similar, which confirms that both analyses capture the same features in data. Single correlation coefficients are generally significantly higher for the CB-analysis, demonstrating the NN-analysis's ability to separate the individual STXS bins.
The same results, as shown in Figs. 10 and 12, may be obtained as measurements of the product of cross sections and branching fraction to τ leptons maximizing the same likelihood function, where those theoretical uncertainties related to pure normalization changes of the signal templates for individual processes or within individual STXS stage-1.2 bins are dropped from the uncertainty model. A summary of these measurements is shown in Fig. 14. The upper left panel of the figure refers to the inclusive and STXS stage-0 results, the lower left panel to the STXS stage-1.2 measurements in the VH, and the right panel to the STXS stage-1.2 measurements in the ggH and qqH bins. The measured cross sections span several orders of magnitude and apart from that follow the same trends, as discussed in the previous paragraphs. Figure 15 shows the most probable value, as well as the 68 and 95% confidence interval contours in the plane of multiplicative modifiers of the H coupling strengths to vector bosons and fermions, κ V -κ F . This figure has been extensively referred to in the past [80]. For this figure, H → WW decays have been treated as signal, giving increased sensitivity to the eµ final state in the CB-and NN-analyses, and the VH-analysis. Also here the measurement, as obtained from the NN-analysis, is compatible with the expectation from the SM, within the 95% confidence interval. The measured value of κ V turns out to be close to 1, while the value of κ F is roughly 15% lower than the SM expectation. These findings coincide with the previously discussed findings for µ ggH and µ qqH , shown in Fig. 10.
Typically less than 10% of the events in the most signal-sensitive bins of the CB-analysis are shared with the NN-analysis. The fraction of shared events with respect to the events used by the NN-analysis typically amounts to not more than 20%. Given this small overlap we conclude that both analyses are consistent with each other in their trends and general results. The measurement of the CB-analysis is less precise but closer to the SM expectation. These findings again coincide with the p-values under the assumption of the SM hypothesis, previously reported for the corresponding inclusive measurements. A complete set of the reported measurements is available in the HepData database [110].

Summary
Measurements of Higgs boson production, where the Higgs boson decays into a pair of τ leptons, have been presented. The analyzed data are selected from proton-proton collisions at a  center-of-mass energy of 13 TeV collected by the CMS experiment at the LHC from 2016-2018, corresponding to an integrated luminosity of 138 fb −1 . Results are presented in the form of signal strengths relative to the standard model predictions and products of cross sections and branching fraction to τ leptons, in up to 16 different kinematic regions, following the simplified template cross section scheme of the LHC Higgs Working Group. For the simultaneous measurements of a neural network based analysis and an analysis targeting vector boson associated Higgs boson production signal strengths are found to be 0.82 ± 0.11 for inclusive Higgs boson production, 0.67 ± 0.19 (0.81 ± 0.17) for the production mainly via gluon fusion (vector boson fusion), and 1.79 ± 0.45 for vector boson associated Higgs boson production. The latter result significantly improves an earlier measurement performed by the CMS Collaboration on a smaller data set.