Measurement of the Higgs boson production rate in association with top quarks in final states with electrons, muons, and hadronically decaying tau leptons at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s} = 13\,\text {TeV} $$\end{document}s=13TeV

The rate for Higgs (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}} $$\end{document}H) bosons production in association with either one (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {\mathrm{H}} $$\end{document}tH) or two (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$\end{document}tt¯H) top quarks is measured in final states containing multiple electrons, muons, or tau leptons decaying to hadrons and a neutrino, using proton–proton collisions recorded at a center-of-mass energy of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$13\,\text {TeV} $$\end{document}13TeV by the CMS experiment. The analyzed data correspond to an integrated luminosity of 137\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {fb}^{-1}$$\end{document}fb-1. The analysis is aimed at events that contain \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} $$\end{document}H→WW, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}} \rightarrow {\uptau } {\uptau } $$\end{document}H→ττ, or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} $$\end{document}H→ZZ decays and each of the top quark(s) decays either to lepton+jets or all-jet channels. Sensitivity to signal is maximized by including ten signatures in the analysis, depending on the lepton multiplicity. The separation among \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {\mathrm{H}} $$\end{document}tH, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$\end{document}tt¯H, and the backgrounds is enhanced through machine-learning techniques and matrix-element methods. The measured production rates for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$\end{document}tt¯H and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {\mathrm{H}} $$\end{document}tH signals correspond to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.92 \pm 0.19\,\text {(stat)} ^{+0.17}_{-0.13}\,\text {(syst)} $$\end{document}0.92±0.19(stat)-0.13+0.17(syst) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5.7 \pm 2.7\,\text {(stat)} \pm 3.0\,\text {(syst)} $$\end{document}5.7±2.7(stat)±3.0(syst) of their respective standard model (SM) expectations. The corresponding observed (expected) significance amounts to 4.7 (5.2) standard deviations for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$\end{document}tt¯H, and to 1.4 (0.3) for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{t}} {\mathrm{H}} $$\end{document}tH production. Assuming that the Higgs boson coupling to the tau lepton is equal in strength to its expectation in the SM, the coupling \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y_{{\mathrm{t}}}$$\end{document}yt of the Higgs boson to the top quark divided by its SM expectation, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\kappa _{{\mathrm{t}}}=y_{{\mathrm{t}}}/y_{{\mathrm{t}}}^{\mathrm {SM}}$$\end{document}κt=yt/ytSM, is constrained to be within \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.9< \kappa _{{\mathrm{t}}}< -0.7$$\end{document}-0.9


Introduction
The discovery of a Higgs (H) boson by the ATLAS and CMS experiments at the CERN LHC [1][2][3] opened a new field for exploration in the realm of particle physics. Detailed measurements of the properties of this new particle are important to ascertain if the discovered resonance is indeed the Higgs boson predicted by the standard model (SM) [4][5][6][7]. In the SM, e-mail: cms-publication-committee-chair@cern.ch the Yukawa coupling y f of the Higgs boson to fermions is proportional to the mass m f of the fermion, namely y f = m f /v, where v = 246 GeV denotes the vacuum expectation value of the Higgs field. With a mass of m t = 172.76 ± 0.30 GeV [8], the top quark is by far the heaviest fermion known to date, and its Yukawa coupling is of order unity. The large mass of the top quark may indicate that it plays a special role in the mechanism of electroweak symmetry breaking [9][10][11]. Deviations of y t from the SM prediction of m t /v would indicate the presence of physics beyond the SM.
The measurement of the Higgs boson production rate in association with a top quark pair (ttH) provides a modelindependent determination of the magnitude of y t , but not of its sign. The sign of y t is determined from the associated production of a Higgs boson with a single top quark (tH). Leading-order (LO) Feynman diagrams for ttH and tH production are shown in Figs. 1 and 2, respectively. The diagrams for tH production are separated into three contributions: the t-channel (tHq) and the s-channel, that proceed via the exchange of a virtual W boson, and the associated production of a Higgs boson with a single top quark and a W boson (tHW). The interference between the diagrams where the Higgs boson couples to the top quark ( Fig. 2 upper and lower left), and those where the Higgs boson couples to the W boson ( Fig. 2 upper and lower right) is destructive when y t and g W have the same sign, where the latter denotes the coupling of the Higgs boson to the W boson. This reduces the tH cross section and influences the kinematical properties of the event as a function of y t and g W . The interference becomes constructive when the coupling of the g W and y t have opposite signs, causing an increase in the cross section of up to one order of magnitude. This is referred to as inverted top quark coupling.
Indirect constraints on the magnitude of y t are obtained from the rate of Higgs boson production via gluon fusion and from the decay rate of Higgs bosons to photon pairs [12], where in both cases, y t enters through top quark loops. The H → γγ decay rate also provides sensitivity to the sign of y t [13], as does the rate for associated production of a Higgs boson with a Z boson [14]. The measured rates of these processes suggest that the Higgs boson coupling to top quarks is SM-like. However, contributions from non-SM particles to these loops can compensate, and therefore mask, deviations of y t from its SM value. A model-independent direct measurement of the top quark Yukawa coupling in ttH and tH production is therefore very important. The comparison of the magnitude and sign of y t obtained from the measurement of the ttH and tH production rates, where y t enters at lowest "tree" level, with the value of y t obtained from processes where y t enters via loop contributions can provide evidence about such contributions.
This manuscript presents the measurement of the ttH and tH production rates in final states containing multiple electrons, muons, or τ leptons that decay to hadrons and a neutrino (τ h ). In the following, we refer to τ h as "hadronically decaying τ". We also refer to electrons and muons collectively as "leptons" ( ). The measurement is based on data recorded by the CMS experiment in pp collisions at √ s = 13 TeV during Run 2 of the LHC, that corresponds to an integrated luminosity of 137 fb −1 .
The measurement of the ttH and tH production rates presented in this manuscript constitutes their first simultaneous analysis in this channel. This approach is motivated by the high degree of overlap between the experimental signatures of both production processes and takes into account the dependence of the ttH and tH production rates as a function of y t . Compared to previous work [23], the sensitivity of the present analysis is enhanced by improvements in the identification of τ h decays and of jets originating from the hadronization of bottom quarks, as well as by performing the analysis in four additional experimental signatures, also referred to as analysis channels, that add up to a total of ten. The signatures involve Higgs boson decays to WW, ττ, and ZZ, and are defined according to the lepton and τ h multiplicities in the events. Some of them require leptons to have the same (opposite) sign of electrical charge and are therefore referred to as SS (OS). The signatures 2 SS + 0τ h , 3 + 0τ h , 2 SS + 1τ h , 2 OS + 1τ h , 1 + 2τ h , 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h target events where at least one top quark decays via t → bW + → b + ν , whereas the signatures 1 + 1τ h and 0 + 2τ h target events where all top quarks decay via t → bW + → bqq . We refer to the first and latter top quark decay signatures as semi-leptonically and hadronically decaying top quarks, respectively. Here and in the following, the term top quark includes the corresponding chargeconjugate decays of top antiquarks. As in previous analyses, the separation of the ttH and tH signals from backgrounds is improved through machine-learning techniques, specifically boosted decision trees (BDTs) and artificial neural networks (ANNs) [32][33][34], and through the matrix-element method [35,36]. Machine-learning techniques are also employed to improve the separation between the ttH and tH signals. We use the measured ttH and tH production rates to set limits on the magnitude and sign of y t . This paper is organized as follows. After briefly describing the CMS detector in Sect. 2, we proceed to discuss the data and simulated events used in the measurement in Sect. 3. Section 4 covers the object reconstruction and selection from signals recorded in the detector, while Sect. 5 describes the Fig. 2 Feynman diagrams at LO for tH production via the t-channel (tHq in upper left and upper right) and s-channel (middle) processes, and for associated production of a Higgs boson with a single top quark and a W boson (tHW in lower left and lower right). The tHq and tHW production processes are shown for the five-flavor scheme selection criteria applied to events in the analysis. These events are grouped in categories, defined in Sect. 6, while the estimation of background contributions in these categories is described in Sect. 7. The systematic uncertainties affecting the measurements are given in Sect. 8, and the statistical analysis and the results of the measurements in Sect. 9. We end the paper with a brief summary in Sect. 10.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. A silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections, are positioned within the solenoid volume. The silicon tracker measures charged particles within the pseudorapidity range |η| < 2.5. The ECAL is a fine-grained hermetic calorimeter with quasiprojective geometry, and is segmented into the barrel region of |η| < 1.48 and in two endcaps that extend up to |η| < 3.0. The HCAL barrel and endcaps similarly cover the region |η| < 3.0. Forward calorimeters extend the coverage up to |η| < 5.0. Muons are measured and identified in the range |η| < 2.4 by gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. A two-level trigger system [37] is used to reduce the rate of recorded events to a level suitable for data acquisition and storage. The first level of the CMS trigger system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events with a latency of 4 μs. The high-level trigger processor farm further decreases the event rate from around 100 kHz to about 1 kHz. Details of the CMS detector and its performance, together with a definition of the coordinate system and the kinematic variables used in the analysis, are reported in Ref. [38].

Data samples and Monte Carlo simulation
The analysis uses pp collision data recorded at √ s = 13 TeV at the LHC during 2016-2018. Only the data-taking periods during which the CMS detector was fully operational are included in the analysis. The total integrated luminosity of the analyzed data set amounts to 137 fb −1 , of which 35.9 [39], 41.5 [40], and 59.7 [41] fb −1 have been recorded in 2016, 2017, and 2018, respectively.
The event samples produced via Monte Carlo (MC) simulation are used for the purpose of calculating selection efficiencies for the ttH and tH signals, estimating background contributions, and training machine-learning algorithms. The contribution from ttH signal and the backgrounds arising from tt production in association with W and Z bosons (ttW, ttZ), from triboson (WWW, WWZ, WZZ, ZZZ, WZγ) production, as well as from the production of four top quarks (tttt) are generated at next-to-LO (NLO) accuracy in perturbative quantum chromodynamics (pQCD) making use of the program MadGraph5_amc@nlo 2.2.2 or 2.3.3 [42][43][44][45], whereas the tH signal and the ttγ, ttγ * , tZ, ttWW, W+jets, Drell-Yan (DY), Wγ, and Zγ backgrounds are generated at LO accuracy using the same program. The symbols γ * and γ are employed to distinguish virtual photons from the real ones. The event samples with virtual photons also include contributions from virtual Z bosons. The DY production of electron, muon, and τ lepton pairs are referred to as Z/γ * → ee, Z/γ * → μμ, and Z/γ * → ττ, respectively. The modeling of the ttW background includes additional α S α 3 electroweak corrections [46,47], simulated using MadGraph5_amc@nlo. The NLO program powheg v2.0 [48][49][50] is used to simulate the backgrounds arising from tt+jets, tW, and diboson (W ± W ∓ , WZ, ZZ) production, and from the production of single top quarks, and from SM Higgs boson production via gluon fusion (ggH) and vector boson fusion (qqH) processes, and from the production of SM Higgs bosons in association with W and Z bosons (WH, ZH) and with W and Z bosons along with a pair of top quarks (ttWH, ttZH). The modeling of the top quark transverse momentum ( p T ) distribution of tt+jets events simulated with the program powheg is improved by reweighting the events to the differential cross section computed at next-to-NLO (NNLO) accuracy in pQCD, including electroweak corrections computed at NLO accuracy [51]. We refer to the sum of WH plus ZH contributions by using the symbol VH and to the sum of ttWH plus ttZH contributions by using the symbol ttVH. The SM production of Higgs boson pairs or a Higgs boson in association with a pair of b quarks is not considered as a background to this analysis, because its impact on the event yields in all categories is found to be negligible. The production of same-sign W pairs (SSW) is simulated using the program MadGraph5_amc@nlo in LO accuracy, except for the contribution from double-parton interactions, which is simulated with pythia v8.2 [52] (referred to as pythia hereafter). The NNPDF3.0LO (NNPDF3.0NLO) [53][54][55]  Different flavor schemes are chosen to simulate the tHq and tHW processes. In the five-flavor scheme (5 FS), bottom quarks are considered as sea quarks of the proton and may appear in the initial state of proton-proton (pp) scattering processes, as opposed to the four-flavor scheme (4 FS), where only up, down, strange, and charm quarks are considered as valence or sea quarks of the proton, whereas bottom quarks are produced by gluon splitting at the matrix-element level, and therefore appear only in the final state [57]. In the 5 FS the distinction of tHq, s-channel, and tHW contributions to tH production is well-defined up to NLO, whereas at higher orders in perturbation theory the tHq and s-channel production processes start to interfere and can no longer be uniquely separated [58]. Similarly, in the same regime the tHW process starts to interfere with ttH production at NLO. In the 4 FS, the separation among the tHq, s-channel, and tHW (if the W boson decays hadronically) processes holds only up to LO, and the tHW process starts to interfere with ttH production already at tree level [58].
The tHq process is simulated at LO in the 4 FS and the tHW process in the 5 FS, so that interference contributions of latter with ttH production are not present in the simulation. The contribution from s-channel tH production is negligible and is not considered in this analysis.
Parton showering, hadronization, and the underlying event are modeled using pythia with the tune CP5, CUETP8M1, CUETP8M2, or CUETP8M2T4 [59][60][61], depending on the dataset, as are the decays of τ leptons, including polarization effects. The matching of matrix elements to parton showers is done using the MLM scheme [42] for the LO samples and the FxFx scheme [44] for the samples simulated at NLO accuracy.
The modeling of the ttH and tH signals, as well as of the backgrounds, is improved by normalizing the simulated event samples to cross sections computed at higher order in pQCD. The cross section for tH production is computed in the 5 FS. The SM cross section for tHq production has been computed at NLO accuracy in pQCD as 74.3 fb [62], and the SM cross section for ttH production has been computed at NLO accuracy in pQCD as 506.5 fb with electroweak corrections calculated at the same order in perturbation theory [62]. Both cross sections are computed for pp collisions at √ s = 13 TeV. The tHW cross section is computed to be 15.2 fb at NLO in the 5 FS, using the DR2 scheme [63] to remove overlapping contributions between the tHW process and ttH production. The cross sections for tt+jets, W+jets, DY, and diboson production are computed at NNLO accuracy [64][65][66].
Event samples containing Higgs bosons are normalized using the SM cross sections published in Ref. [62]. Event samples of ttZ production are normalized to the cross sections published in Ref. [62], while ttW simulated samples are normalized to the cross section published in the same reference increased by the contribution from the α S α 3 electroweak corrections [46,47]. The SM cross sections for the ttH and tH signals and for the most relevant background processes are given in Table 1.
The ttH and tH samples are produced assuming all couplings of the Higgs boson have the values expected in the SM. The variation in kinematical properties of tH signal events, which stem from the interference of the diagrams in Fig. 2 described in Sect. 1, for values of y t and g W that differ from the SM expectation, is accounted for by applying weights calculated for each tH signal event with Mad-Graph5_amc@nlo, following the approach suggested in [67,68]. No such reweighting is necessary for the ttH signal, because any variation of y t would only affect the inclusive cross section for ttH production, which increases proportional to y 2 t , leaving the kinematical properties of ttH signal events unaltered.
The presence of simultaneous pp collisions in the same or nearby bunch crossings, referred to as pileup (PU), is modeled by superimposing inelastic pp interactions, simulated using pythia, to all MC events. Simulated events are weighed so the PU distribution of simulated samples matches the one observed in the data.
All MC events are passed through a detailed simulation of the CMS apparatus, based on Geant4 [69,70], and are processed using the same version of the CMS event reconstruction software used for the data.
Simulated events are corrected by means of weights or by varying the relevant quantities to account for residual differences between data and simulation. These differences arise in: trigger efficiencies; reconstruction and identification efficiencies for electrons, muons, and τ h ; the energy scale of τ h and jets; the efficiency to identify jets originating from the hadronization of bottom quarks and the corresponding misidentification rates for light-quark and gluon jets; and the resolution in missing transverse momentum. The corrections are typically at the level of a few percent [71][72][73][74][75]. They are measured using a variety of SM processes, such as Z/γ * → ee, Z/γ * → μμ, Z/γ * → ττ, tt+jets, and γ+jets production.

Event reconstruction
The CMS particle-flow (PF) algorithm [76] provides a global event description that optimally combines the information from all subdetectors, to reconstruct and identify all individual particles in the event. The particles are subsequently classified into five mutually exclusive categories: electrons, muons, photons, and charged and neutral hadrons.
Electrons are reconstructed combining the information from tracker and ECAL [77] and are required to satisfy p T > 7 GeV and |η| < 2.5. Their identification is based on a multivariate (MVA) algorithm that combines observables sensitive to: the matching of measurements of the electron energy and direction obtained from the tracker and the calorimeter; the compactness of the electron cluster; and the bremsstrahlung emitted along the electron trajectory. Electron candidates resulting from photon conversions are removed by requiring that the track has no missing hits in the innermost layers of the silicon tracker and by vetoing candidates that are matched to a reconstructed conversion vertex. In the 2 SS + 0τ h and 2 SS + 1τ h channels (see Sect. 5 for channel definitions), we apply further electron selection criteria that demand the consistency among three independent measurements of the electron charge, described as "selective algorithm" in Ref. [77].
The reconstruction of muons is based on linking track segments reconstructed in the silicon tracker to hits in the muon detectors that are embedded in the steel flux-return yoke [78]. The quality of the spatial matching between the individual measurements in the tracker and in the muon detectors is used to discriminate genuine muons from hadrons punching through the calorimeters and from muons produced by in-flight decays of kaons and pions. Muons selected in the analysis are required to have p T > 5 GeV and |η| < 2.4. For events selected in the 2 SS + 0τ h and 2 SS + 1τ h channels, the relative uncertainty in the curvature of the muon track is  [64] required to be less than 20% to ensure a high-quality charge measurement.
The electrons and muons satisfying the aforementioned selection criteria are referred to as "loose leptons" in the following. Additional selection criteria are applied to discriminate electrons and muons produced in decays of W and Z bosons and leptonic τ decays ("prompt") from electrons and muons produced in decays of b hadrons ("nonprompt"). The removal of nonprompt leptons reduces, in particular, the background arising from tt+jets production. To maximally exploit the information available in each event, we use MVA discriminants that take as input the charged and neutral particles reconstructed in a cone around the lepton direction besides the observables related to the lepton itself. The jet reconstruction and b tagging algorithms are applied, and the resulting reconstructed jets are used as additional inputs to the MVA. In particular, the ratio of the lepton p T to the reconstructed jet p T and the component of the lepton momentum in a direction perpendicular to the jet direction are found to enhance the separation of prompt leptons from leptons originating from b hadron decays, complementing more conventional observables such as the relative isolation of the lepton, calculated in a variable cone size depending on the lepton p T [79,80], and the longitudinal and transverse impact parameters of the lepton trajectory with respect to the primary pp interaction vertex. Electrons and muons passing a selection on the MVA discriminants are referred to as "tight leptons".
Because of the presence of PU, the primary pp interaction vertex typically needs to be chosen among the several vertex candidates that are reconstructed in each pp collision event. The candidate vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [81,82] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets.
While leptonic decay products of τ leptons are selected by the algorithms described above, hadronic decays are reconstructed and identified by the "hadrons-plus-strips" (HPS) algorithm [74]. The algorithm is based on reconstructing individual hadronic decay modes of the τ lepton: and all the chargeconjugate decays, where the symbols h − and h + denotes either a charged pion or a charged kaon. The photons resulting from the decay of neutral pions that are produced in the τ decay have a sizeable probability to convert into an electronpositron pair when traversing the silicon tracker. The conversions cause a broadening of energy deposits in the ECAL, since the electrons and positrons produced in these conversions are bent in opposite azimuthal directions by the magnetic field and may also emit bremsstrahlung photons. The HPS algorithm accounts for this broadening when it reconstructs the neutral pions, by means of clustering photons and electrons in rectangular strips that are narrow in η but wide in φ. The subsequent identification of τ h candidates is performed by the "DeepTau" algorithm [83]. The algorithm is based on a convolutional ANN [84], using as input a set of 42 high-level observables in combination with low-level information obtained from the silicon tracker, the electromagnetic and hadronic calorimeters, and the muon detectors. The high-level observables comprise the p T , η, φ, and mass of the τ h candidate; the reconstructed τ h decay mode; observables that quantify the isolation of the τ h with respect to charged and neutral particles; as well as observables that provide sensitivity to the small distance that a τ lepton typically traverses between its production and decay. The lowlevel information quantifies the particle activity within two η × φ grids, an "inner" grid of size 0.2 × 0.2, filled with cells of size 0.02 × 0.02, and an "outer" grid of size 0.5 × 0.5 (partially overlapping with the inner grid) and cells of size 0.05×0.05. Both grids are centered on the direction of the τ h candidate. The τ h considered in the analysis are required to have p T > 20 GeV and |η| < 2.3 and to pass a selection on the output of the convolutional ANN. The selection differs by analysis channel, targeting different efficiency and purity levels. We refer to these as the very loose, loose, medium, and tight τ h selections, depending on the requirement imposed on the ANN output.
Jets are reconstructed using the anti-k T algorithm [81,82] with a distance parameter of 0.4 and with the particles reconstructed by the PF algorithm as inputs. Charged hadrons associated with PU vertices are excluded from the clustering. The energy of the reconstructed jets is corrected for residual PU effects using the method described in Refs. [85,86] and calibrated as a function of jet p T and η [72]. The jets considered in the analysis are required to: satisfy p T > 25 GeV and |η| < 5.0; pass identification criteria that reject spurious jets arising from calorimeter noise [87]; and not overlap with any identified electron, muon or hadronic τ within ΔR = √ (Δη) 2 + (Δφ) 2 < 0.4. We tighten the requirement on the transverse momentum to the condition p T > 60 GeV for jets reconstructed within the range 2.7 < |η| < 3.0, to further reduce the effect of calorimeter noise, which is sizeable in this detector region. Jets passing these selection criteria are then categorized into central and forward jets, the former satisfying the condition |η| < 2.4 and the latter 2.4 < |η| < 5.0. The presence of a highp T forward jet in the event is a characteristic signature of tH production in the t-channel and is used to separate the ttH from the tH process in the signal extraction stage of the analysis.
Jets reconstructed within the region |η| < 2.4 and originating from the hadronization of bottom quarks are denoted as b jets and identified by the DeepJet algorithm [88]. The algorithm exploits observables related to the long lifetime of b hadrons as well as to the higher particle multiplicity and mass of b jets compared to light-quark and gluon jets. The properties of charged and neutral particle constituents of the jet, as well as of secondary vertices reconstructed within the jet, are used as inputs to a convolutional ANN. Two different selections on the output of the algorithm are employed in the analysis, corresponding to b jet selection efficiencies of 84 ("loose") and 70% ("tight"). The respective mistag rates for light-quark and gluon jets (c jet) are 11 and 1.1% (50% and 15%).
The missing transverse momentum vector, denoted by the symbol p miss T , is computed as the negative of the vector p T sum of all particles reconstructed by the PF algorithm. The magnitude of this vector is denoted by the symbol p miss T . The analysis employs a linear discriminant, denoted by the symbol L D , to remove backgrounds in which the reconstructed p miss T arises from resolution effects. The discriminant also reduces PU effects and is defined by the relation L D = 0.6 p miss T + 0.4H miss T , where the observable H miss T corresponds to the magnitude of the vector p T sum of electrons, muons, τ h , and jets [23]. The discriminant is constructed to combine the higher resolution of p miss T with the robustness to PU of H miss T .

Event selection
The analysis targets ttH and tH production in events where the Higgs boson decays via H → WW, H → ττ, or H → ZZ, with subsequent decays WW → + ν qq or + ν − ν ; ττ → + ν ν τ − ν ν τ , + ν ν τ τ h ν τ , or τ h ν τ τ h ν τ ; ZZ → + − qq or + − νν; and the corresponding chargeconjugate decays. The decays H → ZZ → + − + − are covered by the analysis published in Ref. [20]. The top quark may decay either semi-leptonically via t → bW + → b + ν or hadronically via t → bW + → bqq , and analogously for the top antiquarks. The experimental signature of ttH and tH signal events consists of: multiple electrons, muons, and τ h ; p miss T caused by the neutrinos produced in the W and Z bosons, and tau lepton decays; one (tH) or two (ttH) b jets from top quark decays; and further light-quark jets, produced in the decays of either the Higgs boson or of the top quark(s).
The channels 1 +1τ h and 0 +2τ h specifically target events in which the Higgs boson decays via H → ττ and the top quarks decay hadronically, the other channels target a mixture of H → WW, H → ττ, and H → ZZ decays in events with either one or two semi-leptonically decaying top quarks.
Events are selected at the trigger level using a combination of single-, double-, and triple-lepton triggers, lepton+τ h triggers, and double-τ h triggers. Spurious triggers are discarded by demanding that electrons, muons, and τ h reconstructed at the trigger level match electrons, muons, and τ h reconstructed offline. The p T thresholds of the triggers typically vary by a few GeV during different data-taking periods, depending on the instantaneous luminosity. For example, the threshold of the single-electron trigger ranges between 25 and 35 GeV in the analyzed data set, and that of the single-muon trigger varies between 22 and 27 GeV. The double-lepton (triplelepton) triggers reduce the p T threshold that is applied to the lepton of highest p T to 23 (16) GeV in case this lepton is an electron and to 17 (8) GeV in case it is an muon. The electron+τ h (muon+τ h ) trigger requires the presence of an electron of p T > 24 GeV (muon of p T > 19 or 20 GeV) in combination with a τ h of p T > 20 or 30 GeV ( p T > 20 or 27 GeV), where the lower p T thresholds were used in 2016 and the higher ones in 2017 and 2018. The threshold of the double-τ h trigger ranges between 35 and 40 GeV and is applied to both τ h . In order to attain these p T thresholds, the geometric acceptance of the lepton+τ h and double-τ h triggers is restricted to the range |η| < 2.1 for electrons, muons, and τ h . The p T thresholds applied to electrons, muons, and τ h in the offline event selection are chosen above the trigger thresholds.
The charge of leptons and τ h is required to match the signature expected for the ttH and tH signals. The 0 + 2τ h and 1 + 2τ h channels target events where the Higgs boson decays to a τ lepton pair and both τ leptons decay hadronically. Consequently, the two τ h are required to have OS charges in these channels. In events selected in the channels 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h , the leptons and τ h are expected to originate from either the Higgs boson decay or from the decay of the top quark-antiquark pair and the sum of their charges is required to be zero. In the 3 + 0τ h , 2 SS + 1τ h , 2 OS + 1τ h , and 1 + 2τ h channels the chargesum of leptons plus τ h is required to be either +1 or −1.
No requirement on the charge of the lepton and of the τ h is applied in the 1 + 1τ h channel, because studies performed with simulated samples of signal and background events indicate that the sensitivity of this channel is higher when no charge requirement is applied. The 2 SS + 0τ h channel targets events in which one lepton originates from the decay of the Higgs boson and the other lepton from a top quark decay. Requiring SS leptons reduces the signal yield by about half, but increases the signal-to-background ratio by a large factor by removing in particular the large background arising from tt+jets production with dileptonic decays of the top quarks. The more favorable signal-to-background ratio for events with SS, rather than OS, lepton pairs motivates the choice of analyzing the events containing two leptons and one τ h separately, in the two channels 2 SS + 1τ h and 2 OS + 1τ h .
The selection criteria on b jets are designed to maintain a high efficiency for the ttH signal: one b jet can be outside of the p T and η acceptance of the jet selection or can fail the b tagging criteria, provided that the other b jet passes the tight b tagging criteria. This choice is motivated by the observation that the main background contributions, arising from the associated production of single top quarks or top quark pairs with W and Z bosons, photons, and jets, feature genuine b jets with a multiplicity resembling that of the ttH and tH signals.
The requirements on the overall multiplicity of jets, including b jets, take advantage of the fact that the multiplicity of jets is typically higher in signal events compared to the background. The total number of jets expected in ttH (tH) signal events with the H boson decaying into WW, ZZ, and ττ amounts to where N j , N and N τ denote the total number of jets, electrons or muons, and hadronic τ decays, respectively. The requirements on N j applied in each channel permit up to two jets to be outside of the p T and η acceptance of the jet selection. In the 2 SS + 0τ h channel, the requirement on N j is relaxed further, to increase the signal efficiency in particular for the tH process.
Background contributions arising from ttZ, tZ, WZ, and DY production are suppressed by vetoing events containing OS pairs of leptons of the same flavor, referred to as SFOS lepton pairs, passing the loose lepton selection criteria and having an invariant mass m within 10 GeV of the Z boson mass, m Z = 91.19 GeV [8]. We refer to this selection criterion as "Z boson veto". In the 2 SS + 0τ h and 2 SS + 1τ h channels, the Z boson veto is also applied to SS electron pairs, because the probability to mismeasure the charge of electrons is significantly higher than the corresponding probability for muons.
Background contributions arising from DY production in the 2 SS + 0τ h , 3 + 0τ h , 2 SS + 1τ h , 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h channels are further reduced by imposing a requirement on the linear discriminant, L D > 30 GeV. The requirement on L D is relaxed or tightened, depending on whether or not the event meets certain conditions, in order to either increase the efficiency to select ttH and tH signal events or to reject more background. In the 2 SS + 0τ h and 2 SS+1τ h channels, the requirement on L D is only applied to events where both reconstructed leptons are electrons, to suppress the contribution of DY production entering the selection through a mismeasurement of the electron charge. In the 3 + 0τ h , 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h channels, the distribution of N j is steeply falling for the DY background, thus rendering the expected contribution of this background small if the event contains a high number of jets; we take advantage of this fact by applying the requirement on L D only to events with three or fewer jets. If events with N j ≤ 3 contain an SFOS lepton pair, the requirement on L D is tightened to the condition L D > 45 GeV. Events considered in the 3 + 0τ h , 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h channels containing three or fewer jets and no SFOS lepton pair are required to satisfy the nominal condition L D > 30 GeV.
Events containing a pair of leptons passing the loose selection criteria and having an invariant mass m of less than 12 GeV are vetoed, to remove events in which the leptons originate from quarkonium decays, cascade decays of heavyflavor hadrons, and low-mass DY production, because such events are not well modeled by the MC simulation.
In the 3 + 0τ h and 4 + 0τ h channels, events containing four leptons passing the loose selection criteria and having an invariant mass of m 4 of the four-lepton system of less than 140 GeV are vetoed, to remove ttH and tH signal events in which the Higgs boson decays via H → ZZ → + − + − , thereby avoiding overlap with the analysis published in Ref. [20].
A summary of the event selection criteria applied in the different channels is given in Tables 2, 3 and 4. Table 2 Event selections applied in the 2 SS + 0τ h , 2 SS + 1τ h , 3 + 0τ h , and 3 + 1τ h channels. The p T thresholds applied to the lepton of highest, second-highest, and third-highest p T are separated by slashes. The symbol "-" indicates that no requirement is applied Trigger Single-and double-lepton triggers Single-and double-lepton triggers Charge requirements 2 SS leptons and charge quality requirements 2 SS leptons and charge quality requirements Four-lepton invariant mass m 4 > 140 GeV ¶ - † A complete description of this requirement can be found in the main text ‡ Applied to all SFOS lepton pairs and to pairs of electrons of SS charge § Applied to all SFOS lepton pairs ¶ If the event contains two SFOS pairs of leptons that pass the loose lepton selection criteria Table 3 Event selections applied in the 0 + 2τ h , 1 + 1τ h , 1 + 2τ h , and 2 + 2τ h channels. The p T thresholds applied to the lepton and to the τ h of highest and second-highest p T are separated by slashes. The symbol "-" indicates that no requirement is applied Trigger Double-τ h trigger Single-lepton and lepton+τ h triggers Single-lepton and lepton+τ h triggers Single-and double-lepton triggers Missing transverse -L D > 0 / 30 / 45 GeV † momentum Dilepton invariant mass m > 12 GeV m > 12 GeV † A complete description of this requirement can be found in the main text

Event classification, signal extraction, and analysis strategy
Contributions from background processes that pass the event selection criteria detailed in Sect. 5, significantly exceed the expected ttH and tH signal rates. The ratio of expected signal to background yields is particularly unfavorable in channels with a low multiplicity of leptons and τ h , notwithstanding that these channels also provide the highest acceptance for the ttH and tH signals. In order to separate the ttH and tH signals from the background contributions, we employ a maximum-likelihood (ML) fit to the distributions of a number of discriminating observables. The choice of these observables is based on studies, performed with simulated samples of signal and background events, that aim at maximizing the expected sensitivity of the analysis. Compared to the alternative of reducing the background by applying more stringent event selection criteria, the chosen strategy has the advantage of retaining events reconstructed in kinematic regions of low signal-to-background ratio for analysis. Even though these events enter the ML fit with a lower "weight" compared to the signal events reconstructed in kinematic regions where the signal-to-background ratio is high, the retained events increase the overall sensitivity of the statistical analysis, firstly by increasing the overall ttH and tH signal yield and secondly by simultaneously constraining the background contributions. The likelihood function used in the ML fit is described in Sect. 9. The diagram displayed in Fig. 3 describes the classification employed in each of the categories, which defines the regions that are fitted in the signal extraction fit. The chosen discriminating observables are the outputs of machine-learning algorithms that are trained using simulated samples of ttH and tH signal events as well as ttW, ttZ, Table 4 Event selections applied in the 2 OS + 1τ h and 4 + 0τ h channels. The symbol "-" indicates that no requirement is applied Trigger Single-and double-lepton triggers Single-, double-and triple-lepton triggers Four-lepton invariant mass m 4 > 140 GeV ¶ † Only applied to events containing two electrons ‡ A complete description of this requirement can be found in the main text § Applied to all SFOS lepton pairs ¶ If the event contains two SFOS pairs of leptons passing the loose lepton selection criteria In addition to the ten channels, the ML fit receives input from two control regions (CRs) defined in Sect. 7.3 tt+jets, and diboson background samples. For the purpose of separating the ttH and tH signals from backgrounds, the 2 SS + 0τ h , 3 + 0τ h , and 2 SS + 1τ h channels employ ANNs, which allows to discriminate among the two signals and background simultaneously, while the other channels use BDTs.
The observables used as input to the ANNs and BDTs are outlined in Table 5. These are chosen to maximize the discrimination power of the discriminators, with the objective of maximizing the expected sensitivity of the analysis. The optimization is performed separately for each of the ten analysis channels. Typical observables used are: the number of leptons, τ h , and jets that are reconstructed in the event, where electrons and muons, as well as forward jets, central jets, and jets passing the loose and the tight b tagging criteria are counted separately; the 3-momentum of leptons, τ h , and jets; the magnitude of the missing transverse momentum, quantified by the linear discriminant L D ; the angular separation between leptons, τ h , and jets; the average ΔR separation between pairs of jets; the sum of charges for different combinations of leptons and τ h ; observables related to the reconstruction of specific top quark and Higgs boson decay modes; as well as a few other observables that provide discrimination between the ttH and tH signals. A boolean variable that indicates whether the event has an SFOS lepton pair passing looser isolation criteria is included in regions with at least three leptons in the final state.
Input variables are included related to the reconstruction of specific top quark and Higgs boson decay modes comprise the transverse mass of a given lepton, where Δφ refers to the angle in the transverse plane between the lepton momentum and the p miss T vector; the invariant masses of different combinations of leptons and τ h ; and the invariant mass of the pair of jets with the highest and second-highest values of the b tagging discriminant. These observables are complemented by the outputs of MVA-based algorithms, documented in Ref.
[23], that reconstruct hadronic top quark decays and identify the jets originating from H → WW → + ν qq decays.
In the 0 + 2τ h channel, we use as additional inputs the invariant mass of the τ lepton pair, which is expected to be close to the Higgs boson mass in signal events and is reconstructed using the algorithm documented in Ref. [89] (SVFit), in conjunction with the decay angle, denoted by cos θ * , of the two tau leptons in the Higgs boson rest frame.
In the 2 SS + 0τ h , 3 + 0τ h , and 2 SS + 1τ h channels, the p T and η of the forward jet of highest p T , as well as the distance Δη of this jet to the jet nearest in pseudorapidity, are used as additional inputs to the ANN, in order to improve the separation of the tH from the ttH signal. The presence of such a jet is a characteristic signature of tH production in the t-channel. The forward jet in such tH signal events is expected to be separated from other jets in the event by a pseudorapidity gap, since there is no color flow at tree level between this jet and the jets originating from the top quark and Higgs boson decays.
The number of simulated signal and background events that pass the event selection criteria described in Sect. 5 and are available for training the BDTs and ANNs typically amount to a few thousand. In order to increase the number of events in the training samples, in particular for the channels with a high multiplicity of leptons and τ h where the amount of available events is most limited, we relax the identification criteria for electrons, muons, and hadronically decaying tau leptons. The resulting increase in the ratio of misidentified to genuine leptons and τ h is corrected. We have checked that the distributions of the observables used for the BDT and ANN training are compatible, within statistical uncertainties, between events selected with relaxed and with nominal lepton and τ h selection criteria, provided that these corrections are applied.
The ANNs used in the 2 SS + 0τ h , 3 + 0τ h , and 2 SS + 1τ h channels are of the multiclass type. Such ANNs have multiple output nodes that, besides discriminating the ttH and tH signals from backgrounds, accomplish both the separation of the tH from the ttH signal and the distinction between individual types of backgrounds. In the 2 SS + 0τ h channel, we use four output nodes, to distinguish between ttH signal, tH signal, ttW background, and other backgrounds. No attempt is made to distinguish between individual types of backgrounds in the 3 + 0τ h and 2 SS + 1τ h channels, which therefore use three output nodes. The ANNs in the 2 SS + 0τ h , 3 + 0τ h , and 2 SS + 1τ h channels implement 16, 5 and 3 hidden layers, respectively, each one of them containing 8 to 32 neurons. The softmax [90] function is chosen as an activation function for all output nodes, permitting the interpretation of their activation values as probability for a given event to be either ttH signal, tH signal, ttW background, or other background (ttH signal, tH signal, or background) in the 2 SS + 0τ h channel (in the 3 + 0τ h and 2 SS + 1τ h channels). The events selected in the 2 SS + 0τ h channel (3 + 0τ h and 2 SS + 1τ h channels) are classified into four (three) categories, corresponding to the ttH signal, tH signal, ttW background, or other background (ttH signal, tH signal, or background), according to the output node that has the highest such probability value. We refer to these categories as ANN output node categories. The four (three) distributions of the probability values of the output nodes in the 2 SS + 0τ h channel (in the 3 + 0τ h and 2 SS + 1τ h channels) are used as input to the ML fit. Events are prevented from entering more than one of these distributions by assigning each event only to the distribution corresponding to the output node that has the highest activation value. The rectified linear activation function [91] is used for the hidden layers. The training is performed using the TensorFlow Table 5 Input variables to the multivariate discriminants in each of the ten analysis channels. The symbol "-" indicates that the variable is not used. For all objects, the three-momentum is constituted by the p T , η, and φ components of the object momentum Three-momenta of leptons and/or τ h s - Minimum |Δη| between leading forward jet and jets [92] package with the Keras [93] interface. The objective of the training is to minimize the cross-entropy loss function [94]. Batch gradient descent is used to update the weights of the ANN during the training. Overtraining is minimized by using Tikhonov regularization [95] and dropout [96]. The sensitivity of the 2 SS + 0τ h and 3 + 0τ h channels, which are the channels with the largest event yields out of the three using multiclass ANN, is further improved by analyzing selected events in subcategories based on the flavor (electron or muon) of the leptons and on the number of jets passing the tight b tagging criteria. The motivation for distinguishing events by lepton flavor is that the rate for misidentifying nonprompt leptons as prompt ones and, in the 2 SS + 0τ h channel, also the probability for mismeasuring the lepton charge is significantly higher for electrons compared to muons. Distinguishing events by the multiplicity of b jets improves in particular the separation of the ttH signal from the tt+jets background. This occurs because if a nonprompt lepton produced in the decay of a b hadron gets misidentified as a prompt lepton, the remaining particles resulting from the hadronization of the bottom quark are less likely to pass the b jet identification criteria, thereby reducing the number of b jets in such tt+jets background events. The distribution of the multiplicity of b jets in tt+jets background events in which a nonprompt lepton is misidentified as prompt lepton ("nonprompt") and in tt+jets background events in which this is not the case ("prompt") is shown in Fig. 4. The figure also shows the distributions of p T and η of bottom quarks produced in top quark decays in ttH signal events compared to in tt+jets background events. The ttH signal features more bottom quarks of high p T , whereas the distribution of η is similar for the ttH signal and for the tt+jets background.
The number of subcategories is optimized for each of the four (three) ANN output categories of the 2 SS + 0τ h (3 + 0τ h ) channel individually. In the 2 SS + 0τ h channel, each of the 4 ANN output node categories is subdivided into three subcategories, based on the flavor of the two leptons (ee, eμ, μμ). In the 3 + 0τ h channel, the ANN output node categories corresponding to the ttH signal and to the tH signal are shown separately for tt+jets background events in which a nonprompt lepton is misidentified as a prompt lepton and for those background events in which all reconstructed leptons are prompt leptons. The events are selected in the 2 SS + 0τ h channel subdivided into two subcategories, based on the multiplicity of jets passing tight b tagging criteria (bl: <2 tight b-tagged jets, bt: ≥2 tight b-tagged jets), while the output node category corresponding to the backgrounds is subdivided into seven subcategories, based on the flavor of the three leptons and on the multiplicity of jets passing tight b tagging criteria (eee; eeμ bl, eeμ bt; eμμ bl, eμμ bt; μμμ bl, μμμ bt), where bl (bt) again corresponds to the condition of <2 (≥2) tight b-tagged jets. The eee subcategory is not further subdivided by the number of b-tagged jets, because of the lower number of events containing three electrons compared to events in other categories. The aforementioned event categories are constructed based on the output of the BDTs and ANNs with the goal of enhancing the analysis sensitivity, while keeping a sufficiently high rate of background events for a precise estimation.
The BDTs used in the 1 + 1τ h , 0 + 2τ h , 2 OS + 1τ h , 1 + 2τ h , 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h channels address the binary classification problem of separating the sum of ttH and tH signals from the aggregate of all backgrounds. The training is performed using the scikit-learn [34] package with the XGBoost [33] algorithm. The training parameters are chosen to maximize the integral, or area-under-the-curve, of the receiver-operating-characteristic curve of the BDT output.

Background estimation
The dominant background in most channels comes from the production of top quarks in association with W and Z bosons. We collectively refer to the sum of ttW and ttWW back-grounds using the notation ttW(W). In ttW(W) and ttZ background events selected in the signal regions (SRs), reconstructed leptons typically originate from genuine prompt leptons or reconstructed b jets arising from the hadronization of bottom quarks, whereas reconstructed τ h are a mixture of genuine hadronic τ decays and misidentified quark or gluon jets. Background events from ttZ production may pass the Z boson veto applied in the 2 SS + 0τ h , 3 + 0τ h , 2 SS + 1τ h , 2 OS + 1τ h , 4 + 0τ h , and 3 + 1τ h channels in the case that the Z boson either decays to leptons and one of the leptons fails to get selected, or the Z boson decays to τ leptons and the τ leptons subsequently decay to electrons or muons. In the latter case, the invariant mass m of the lepton pair is shifted to lower values because of the neutrinos produced in the τ decays. Additional background contributions arise from off-shell ttγ * and tγ * production: we include them in the ttZ background. The tt+jets production cross section is about three orders of magnitude larger than the cross section for associated production of top quarks with W and Z bosons, but in most channels the tt+jets background is strongly reduced by the lepton and τ h identification criteria. Except for the channels 1 + 1τ h and 0 + 2τ h , the tt+jets background contributes solely in the cases that a nonprompt lepton (or a jet) is misidentified as a prompt lepton, a quark or gluon jet is misidentified as τ h , or the charge of a genuine prompt lepton is mismeasured. Photon conversions are a relevant background in the event categories with one or more reconstructed electrons in the 2 SS + 0τ h and 3 + 0τ h channels. The production of WZ and ZZ pairs in events with two or more jets constitutes another relevant background in most channels. In the 1 + 1τ h and 0 + 2τ h channels, an additional background arises from DY production of τ lepton pairs.
We categorize the contributions of background processes into reducible and irreducible ones. A background is considered irreducible if all reconstructed electrons and muons are genuine prompt leptons and all reconstructed τ h are genuine hadronic τ decays; in the 2 SS + 0τ h and 2 SS + 1τ h channels, we further require that the measured charge of reconstructed electrons and muons matches their true charge. The irreducible background contributions are modeled using simulated events fulfilling the above criteria to avoid doublecounting of all the other background contributions, which are considered to be reducible and are mostly determined from data.
Throughout the analysis, we distinguish three sources of reducible background contributions: misidentified leptons and τ h ("misidentified leptons"), asymmetric conversions of a photon into electrons ("conversions"), and mismeasurement of the lepton charge ("flips").
The background from misidentified leptons and τ h refers to events in which at least one reconstructed electron or muon is caused by the misidentification of a nonprompt lepton or hadron, or at least one reconstructed τ h arises from the misidentification of a quark or gluon jet. The main contribution to this background stems from tt+jets production, reflecting the large cross section for this background process.
The conversions background consists of events in which one or more reconstructed electrons are due to the conversion of a photon. The conversions background is typically caused by ttγ events in which one electron or positron produced in the photon conversion carries most of the energy of the converted photon, whereas the other electron or positron is of low energy and fails to get reconstructed. We refer to such photon conversions as asymmetric conversions.
The flips background is specific to the 2 SS + 0τ h and 2 SS + 1τ h channels and consists in events where the charge of a reconstructed lepton is mismeasured. The main contribution to the flips background stems from tt+jets events in which both top quarks decay semi-leptonically. In case of the 2 SS + 1τ h channel, a quark or gluon jet is additionally misidentified as τ h . The mismeasurement of the electron charge typically results from the emission of a hard bremsstrahlung photon, followed by an asymmetric conversion of this photon. The reconstructed electron is typically the electron or positron that carries most of the energy of the converted photon, resulting in an equal probability for the reconstructed electron to have either the same or opposite charge compared to the charge of the electron or positron that emitted the bremsstrahlung photon [77]. The probability of mismeasuring the charge of muons is negligible in this analysis.
The three types of reducible background are made mutually exclusive by giving preference to the misidentified lep-tons type over the flips and conversions types and by giving preference to the flips type over the conversions type when an event qualifies for more than one type of reducible background. The misidentified leptons and flips backgrounds are determined from data, whereas the conversions background is modeled using the MC simulation. The procedures for estimating the misidentified leptons and flips backgrounds are described in Sects. 7.1 and 7.2, respectively. We performed dedicated studies in the data to ascertain that photon conversions are adequately modeled by the MC simulation similar to the ones performed in Ref. [97]. To avoid potential double-counting of the background estimates obtained from data with background contributions modeled using the MC simulation, we match reconstructed electrons, muons, and τ h to their generator-level equivalents and veto simulated signal and background events selected in the SR that qualify as misidentified leptons or flips backgrounds.
Concerning the irreducible backgrounds, we refer to the aggregate of background contributions other than those arising from ttW(W), ttZ, tt+jets, DY, and diboson backgrounds, or from SM Higgs boson production via the processes ggH, qqH, WH, ZH, ttWH, and ttZH as "rare" backgrounds. The rare backgrounds typically yield a minor background contribution to each of the ten analysis channels and include such processes as tW and tZ production, the production of SSW boson pairs, triboson, and tttt production.
We validate the modeling of the ttW(W), ttZ, WZ, and ZZ backgrounds in dedicated control regions (CRs) whose definitions are detailed in Sect. 7.3.

Estimation of the "misidentified leptons" background
The background from misidentified leptons and τ h is estimated using the misidentification probability (MP) method [23]. The method is based on selecting a sample of events satisfying all selection criteria of the SR, detailed in Sect. 5, except that the electrons, muons, and τ h used to construct the signal regions are required to pass relaxed selections instead of the nominal ones. We refer to this sample of events as the application region (AR) of the MP method. Events in which all leptons and τ h satisfy the nominal selections are vetoed, to avoid overlap with the SR.
An estimate of the background from misidentified leptons and τ h in the SR is obtained by applying suitably chosen weights to the events selected in the AR. The weights, denoted by the symbol w, are given by the expression: where the product extends over all electrons, muons, and τ h that pass the relaxed, but fail the nominal selection criteria, and n refers to the total number of such leptons and τ h . The symbol f i denotes the probability for an electron, muon, or τ h passing the relaxed selection to also satisfy the nominal one. The contributions of irreducible backgrounds to the AR are subtracted based on the MC expectation of such contributions. The ttH and tH signal yields in the AR are found to be negligible. The probabilities f i for leptons are measured in multijet events, separately for electrons and muons, and are binned in p T and η of the lepton candidate. The measurement is based on selecting events containing exactly one electron or muon that passes the relaxed selection and at least one jet separated from the lepton by ΔR > 0.7. Selected events are then subdivided into "pass" and "fail" samples, depending on whether the lepton candidate passes the nominal selection or not. The fail sample is dominated by the contribution of multijet events. The contributions of other processes, predominantly arising from W+jets, DY, diboson, and tt+jets production, are subtracted based on MC estimates of these contributions. The number of multijet events in the pass sample is obtained by an ML fit to the distribution of the observable: where p fix T is a constant value set to 35 GeV, and the symbol Δφ refers to the angle in the transverse plane between the lepton momentum and the p miss T vector. p fix T is used instead of the lepton p T to reduce the correlation between m fix T and the lepton p T . The ML fit is similar to the one used in the measurement of the ttH and tH signal rates, described in Sect. 9. The distribution of W+jets, DY, diboson, tt+jets, and rare backgrounds in the observable m fix T is modeled using the MC simulation, whereas the distribution of multijet events in the pass sample is obtained from data in the fail region, from which the W+jets, DY, diboson, and tt+jets contributions are subtracted based on their MC estimate. The observable m fix T exploits the fact that the p miss T reconstructed in multijet events is mainly caused by resolution effects and is typically small, resulting in a falling distribution of m fix T , whereas W+jets and tt+jets events exhibit a broad maximum around m W ≈ 80 GeV. Compared to the usual transverse mass, the observable m fix T has the advantage of not depending on the p T of the lepton, and is therefore better suited for the purpose of measuring the probabilities f i in bins of lepton p T . For illustration, the distributions of m fix T in the pass and fail samples are shown in Fig. 5 for events containing an electron of 25 < p T < 35 GeV in the ECAL barrel. The contributions from W+jets, DY, and diboson production are assumed to scale by a common factor with respect to their MC expectation in the fit; we refer to their sum as "electroweak" (EWK) background. Finally, denoting the number of multijet events in the pass and fail samples by the symbols N pass and N fail , the probabilities f i are given by f i = N pass /(N pass + N fail ).
The f i for τ h are determined as a function of p T and η of the τ h candidate in a region enriched in tt+jets events containing a reconstructed opposite-sign electron-muon pair and at least two loose b-tagged jets in addition to the τ h candidate. Contributions of genuine τ h are modeled using the MC simulation and subtracted.
The event samples used to measure the f i are referred to as measurement regions (MRs) of the MP method. Potential biases in the estimate of the background from misidentified leptons and τ h , arising from differences between AR and MR in the p T spectrum of the lepton and τ h candidates and in the mixture of nonprompt leptons and hadrons that are misidentified as prompt leptons, are mitigated as detailed in Ref. [80]. A closure test performed using simulated tt+jets and multijet events reveals a residual difference between the probabilities f i for electrons in tt+jets and those in multijet events. The test is illustrated in Fig. 6, which compares the distributions of p T of nonprompt electrons in simulated tt+jets events for three cases: nonprompt electrons passing the nominal selection criteria ("nominal"); nonprompt electrons passing the relaxed, but failing the nominal selection criteria, weighted by probabilities f i determined in simulated tt+jets events ("relaxed, f i from tt+jets"); and nonprompt electrons passing the relaxed, but failing the nominal selection criteria, weighted by probabilities f i determined in simulated multijet events ("relaxed, f i from multijet"). The electron and muon p T distributions obtained in the first and second cases are in agreement, demonstrating the performance of the MP method. The ratio of the distributions obtained in the second and third cases is fitted by a linear function in p T of the lepton and is applied as a multiplicative correction to the f i measured in data, that accounts for the different flavor composition of jets between AR and MR. For the lepton and τ h selections used in this analysis, the probabilities f i range from 0.04 to 0.13, 0.02 to 0.20, and 0.10 to 0.50 for electrons, muons, and τ h , respectively.
The probabilities f i for electrons and muons obtained as described above are validated in a CR dominated by semileptonic tt+jets events. The events are selected by requiring the presence of two SS leptons and exactly three jets, one of which exactly passes the tight b tagging criteria. The three jets are interpreted as originating from the hadronic decay of one of the top quarks, while the other top quark decays semileptonically. One of the two reconstructed leptons is assumed to arise from the misidentification of a b hadron originating from the semi-leptonically decaying top quark. A kinematic fit using the constraints from kinematic relations between the top quark decay products is employed to increase the purity of semileptonic tt+jets events that are correctly reconstructed in this CR. The level of compatibility of selected events with the aforementioned experimental signature is quantified using a T for events containing an electron candidate of 25 < p T < 35 GeV in the ECAL barrel, which (left) passes the nominal selection and (right) passes the relaxed, but fails the nominal selection. The "electroweak" (EWK) background refers to the sum of W+jets, DY, and diboson production. The "rare" backgrounds are defined in the text. The data in the fail sample agrees with the sum of multijet, EWK, tt+jets, and rare backgrounds by construction, as the number of multijet events in the fail sample is computed by subtracting the sum of EWK, tt+jets, and rare background contributions from the data. The misidentification probabilities are derived separately for each era: this figure shows, as an example, the results obtained with the 2017 data set. The uncertainty band represents the total uncertainty after the fit has been performed Fig. 6 Transverse momentum distributions of nonprompt (left) electrons and (right) muons in simulated tt+jets events, for the three cases "nominal", "relaxed, f i from tt+jets", and "relaxed, f i from multijet" discussed in text. The figure illustrates that a nonclosure correction needs to be applied to the probabilities f i measured for electrons in data, while no such correction is needed for muons χ 2 criterion; events with a high value of χ 2 , corresponding to a poor-quality fit, are discarded. Good agreement is observed between semileptonic tt+jets events where both leptons pass the nominal selection and semileptonic tt+jets events where both leptons pass the relaxed selection, but one or both leptons fail the nominal selection, provided that the weights given by Eq. (1) are applied to the latter events by using the probabilities f i measured in multijet events and corrected (for electrons) as described in the previous paragraph.
The MP method is applied in all channels except for 2 SS + 1τ h and 3 + 1τ h , where a modified version of the method is used, in which only the selections for the leptons are relaxed in the AR, while the τ h is required to satisfy the nominal selection. Correspondingly, only the leptons are considered when computing the weights w, given by Eq. (1), that are applied to events in the AR of the 2 SS+1τ h and 3 +1τ h channels. Background contributions where the reconstructed leptons are genuine prompt leptons and the reconstructed τ h is due to the misidentification of a quark or gluon jet are modeled using the MC simulation. Weights are applied to these simulated events to correct for differences in the τ h misidentification rates between data and simulation. Using a modified version of the MP method in the 2 SS + 1τ h and 3 + 1τ h channels permits the retention as signal of those ttH and tH signal events in which the reconstructed τ h is not a genuine hadronic τ decay, but arises instead from the misidentification of a quark or gluon jet. The fraction of ttH and tH signal events retained as signal amounts to approximately 30% of the total ttH and tH signal yield in the 2 SS+1τ h and 3 +1τ h channels.

Estimation of the "flips" background
The flips background, relevant for events containing either one or two reconstructed electrons in the 2 SS + 0τ h and 2 SS + 1τ h channels, is estimated using a procedure similar to the MP method. A sample of events passing all selection criteria of the SR, except that both leptons are required to be of OS instead of SS, are selected and assigned appropriately chosen weights. In the 2 SS + 0τ h channel, the weight is given by the sum of the probabilities for the charge of either lepton to be mismeasured, whereas in the 2 SS + 1τ h channel, only the lepton that has the same charge as the τ h is considered, since only those events in which the charge of this lepton is mismeasured satisfy the condition ,τ h q = ±1 that is applied in the SR of this channel.
The probability for the charge of electrons to be mismeasured, referred to as the electron charge misidentification rate, is determined using Z/γ * → ee events. The events are selected by requiring the presence of an electron pair of invariant mass m ee within the range 60 < m ee < 120 GeV. No requirement is imposed on the charge of the electron pair. Contributions to the selected event sample arising from processes other than DY production of electron pairs are determined by performing an ML fit to the m ee distribution. Referring to the number of Z/γ * → ee events containing reconstructed SS and OS electron pairs, respectively, by the symbols N SS and N OS , the electron charge misidentification rate is given by the ratio N SS /(N OS + N SS ). The ratio is measured as a function of electron p T and η and varies between 5.1 × 10 −5 for electrons of low p T in the ECAL barrel and 1.6 × 10 −3 for electrons of high p T in the ECAL endcap. For illustration, the m ee distributions for SS and OS electron pairs are shown in Fig. 7 for events in which both electrons are reconstructed in the ECAL barrel and have p T within the range 25 < p T < 50 GeV.

Control regions for irreducible backgrounds
The accuracy of the simulation-based modeling of the main irreducible backgrounds, arising from ttW(W), ttZ, WZ, and ZZ production, is validated in three CRs. The first CR is based on the SR for the 3 + 0τ h channel and targets the ttZ and WZ backgrounds. We refer to this CR as the 3 -CR. The selection criteria applied in the 3 -CR differ from those applied in the SR of the 3 + 0τ h channel in that: no Z boson veto is applied in the 3 -CR; the presence of at least one SFOS lepton pair of invariant mass m with |m − m Z | < 10 GeV is demanded instead; the requirement on the multiplicity of jets is relaxed to demanding the presence of at least one jet; and no requirement on the presence of btagged jets is applied. The contributions arising from ttZ and from WZ production are separated by binning the events selected in the 3 -CR in the flavor of the three leptons (eee, eeμ, eμμ, μμμ) and in the multiplicity of jets and of btagged jets. The second CR targets the ZZ background. We refer to it as the 4 -CR, since it is based on the SR for the 4 + 0τ h channel. Compared to the latter, the event selection criteria applied in the 4 -CR are modified by applying no Z veto, instead requiring the presence of at least one SFOS lepton pair of invariant mass m with |m −m Z | < 10 GeV, and applying no requirements on the multiplicity of jets and of b-tagged jets. To separate the ZZ background from other backgrounds, predominantly arising from ttZ production, the events selected in the 4 -CR are binned in the multiplicity of SFOS lepton pairs of invariant mass |m − m Z | < 10 GeV and in the number of jets passing tight b tagging criteria. The third CR targets the ttW(W) background and is identical to the SR of the 2 SS+0τ h channel, except that the output node of the ANN that has the highest activation value is required to be the output node corresponding to the ttW background.
The numbers of events observed in the 3 -and 4 -CRs and in the CR for the ttW(W) background are given in Table 6. The contributions arising from the misidentified leptons and flips backgrounds are estimated using the methods described in Sects. 7.1 and 7.2, respectively. The uncertainties include The systematic uncertainties that are relevant for the CRs are similar to the ones applied to the SR. The latter are detailed in Sect. 8. Figure 12, discussed in Sect. 9, shows the distributions of events selected in the 3 -and 4 -CRs in the binning scheme employed to separate the WZ and ZZ backgrounds from the ttZ backgrounds. The events selected in the 3 -CR are first subdivided by lepton flavor and then by the multiplicity of jets and b-tagged jets. For each lepton flavor, 12 bins are used, defined as follows (in order of increasing bin number): 0 jets passing the tight b tagging criteria with 1, 2, 3, or ≥4 jets in total; 1 jet passing the tight b tagging criteria with 2, 3, 4, or ≥5 jets in total; ≥2 jets passing the tight b tagging criteria with 2, 3, 4, or ≥5 jets in total. In the 4 -CR, 4 bins are used in total, defined as (again in order of increasing bin number): 2 SFOS lepton pairs of invariant mass |m −m Z | < 10 GeV; 1 such SFOS lepton pair with 0, 1, or ≥2 jets passing the tight b tagging criteria.
The data in the 3 -and 4 -CRs and in the CR for the ttW(W) background are in agreement with the background estimates within the quoted uncertainties.

Systematic uncertainties
The event rates and the distributions of the discriminating observables used for signal extraction may be altered by several experiment-or theory-related effects, referred to as systematic uncertainties. Experimental sources comprise the uncertainties in auxiliary measurements, performed to validate and, if necessary, correct the modeling of the data by the MC simulation, and the uncertainties in the data-driven estimates of the misidentified leptons and flips backgrounds. The latter are largely unaffected by potential inaccuracies of the MC simulation. Theoretical uncertainties mainly arise from missing higher-order corrections to the perturbative expansions employed for the computation of cross sections and from uncertainties in the PDFs.
The efficiencies of triggers based on the presence of one, two, or three electrons or muons are measured as a function of the lepton multiplicity with an uncertainty ranging from 1 to 2%, using samples of tt+jets and diboson events that have been recorded using triggers based on p miss T . The efficiencies for electrons and muons to pass the offline reconstruction and identification criteria are measured as a function of the lepton p T and η by applying the "tag-andprobe" method detailed in Ref. [71] to Z/γ * → ee and Z/γ * → μμ events. Additionally, we cross-check these efficiencies in a CR enriched in tt+jets events to account for differences in event topology between DY events and the events in the SR of this analysis, which may cause a change in the efficiencies for electrons and muons to pass isolation requirements. Events in the tt+jets CR are selected by requiring the presence of an OS e+μ pair and at least two jets. Nonprompt- Table 6 Number of events selected in the 3 -and 4 -CRs and in the CR for the ttW(W) background, compared to the event yields expected from different types of background and from the ttH and tH signals, after the fit to data is performed as described in Sect. 9. Uncertainties shown include all systematic components. The symbol "-" indicates that the corresponding background does not apply lepton backgrounds in the CR are subtracted using a sideband region SS e+μ events. The difference between the efficiency measured in the tt+jets CR and the one measured in DY events is included as a systematic uncertainty, amounting to 1-2%. The τ h identification efficiency and energy scale are measured with respective uncertainties of 5 and 1.2% using Z/γ * → ττ events [74]. The energy scale of jets is measured with an uncertainty amounting to a few percent, depending on the jet p T and η, using the p T -balance method, which is applied to Z/γ * → ee, Z/γ * → μμ, γ+jets, dijet, and multijet events [72]. The resulting effect on signal and background expectations is evaluated by varying the energies of jets in simulated events within their uncertainties, recalculating all kinematic observables, and reapplying the event selection criteria. The effect of uncertainties in the jet energy resolution is evaluated in a similar way, but is smaller than the effect of the uncertainties in the jet energy scale.
The b tagging efficiency is measured with an uncertainty of a few per cent in tt+jets and multijet events as a function of jet p T and η. The heavy-flavor content of the multijet events is enriched by requiring the presence of a muon in the event. The mistag rates for light-quark and gluon jets are measured in multijet events yielding an uncertainty of 5-10% for the loose and 20-30% for the tight b tagging criteria, depending on p T and η [73].
The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the 2.3-2.5% range [39][40][41], while the total Run 2 (2016-2018) integrated luminosity has an uncertainty of 1.8%, the improvement in precision reflecting the (uncorrelated) time evolution of some systematic effects.
The uncertainties related to the number of PU interactions are evaluated by varying the number of inelastic pp interactions that are superimposed on simulated events by 4.6% [98]. The resulting effect on the ttH and tH signal yields and on the yields of background contributions modeled using the MC simulation amounts to less than 1%.
The effect of theory-related uncertainties on the event yields and on the distributions of the BDTs and ANNs classifier outputs that are used for the signal extraction is assessed for the ttH and tH signals, as well as for the main irreducible backgrounds that arise from ttW, ttWW, and ttZ production. The uncertainties in the production cross sections amount to +6.8 −9.9 and +5.1 −7.3 % for the ttH and tH signals, and to +13.5 −12.2 , +8.6 −11.3 , and +11.7 −10.2 % for the ttW, ttWW, and ttZ backgrounds, respectively. These uncertainties are taken from Ref. [62] and consist of the sum in quadrature of three sources: missing higher-order corrections in the perturbative expansion, different choices of PDFs, and uncertainties in the value of the strong coupling constant α S . The uncertainties in the cross sections are relevant for the purpose of quoting the measured production rates with respect to their SM expectations for these rates. In addition, the uncertainty in the ttH and tH production cross sections is relevant for setting limits on the coupling of the Higgs boson to the top quark. The effect of missing higher-order corrections on the distributions of the discriminating observables is estimated by varying the renormalization and factorization scales up and down by a factor of two with respect to their nominal value, following the recommendations of Refs. [99][100][101], avoiding cases in which the two variations are done in opposite directions. The effect of uncertainties in the PDFs on these distributions is evaluated following the recommendations given in Ref. [102]. The uncertainties in the branching fractions of the Higgs boson decay modes H → WW, H → ττ, and H → ZZ are taken from Ref. [62] and amount to 1.5, 1.7, and 1.5%, respectively.
In the 1 + 1τ h and 0 + 2τ h channels, the tt+jets and DY production may contribute as irreducible backgrounds and are modeled using the MC simulation. The tt+jets and DY production cross sections are known to an uncertainty of 5 [65] and 4% [103], respectively. An additional uncertainty on the modeling of top quark p T distribution of tt+jets events is considered, defined as the difference between the nominal powheg sample and that sample reweighed to improve the quality of the top quark p T modeling, as described in Sect. 3. The modeling of the multiplicity of jets and of b-tagged jets in simulated DY events is improved by comparing these multiplicities between MC simulation and data using Z/γ * → ee and Z/γ * → μμ events. The average ratio of data and MC simulation in the Z/γ * → ee and Z/γ * → μμ event samples is taken as a correction, while the difference between the ratios measured in Z/γ * → ee and Z/γ * → μμ events is taken as the systematic uncertainty and added in quadrature to the statistical uncertainties in these ratios. The Z/γ * → ee and Z/γ * → μμ event samples used to determine this correction have little overlap with the SRs of the 1 + 1τ h and 0 + 2τ h channels, since most of the DY background in these channels arises from Z/γ * → ττ events.
Other background processes, notably the conversions and rare backgrounds, are modeled using the MC simulation; the uncertainty in their event yields is conservatively taken to be 50%. This choice accounts for the extrapolation from the inclusive phase space to the phase space relevant for this analysis, in particular to events with a high multiplicity of jets and b-tagged jets, as required to pass the event selection criteria detailed in Sect. 5. The inclusive cross sections for most of these background processes have been measured with uncertainties amounting to significantly less than 50% by previous analyses of the LHC data.
The extrapolation of the WZ and ZZ background rates from the 3 -and 4 -CRs to the SR depends on the heavyflavor content of WZ and ZZ background events. According to the MC simulation, most of the b jets reconstructed in WZ and ZZ background events arise from the misidentification of light-quark or gluon jets rather than from charm or bottom quarks. We assign an uncertainty of 40% to the modeling of the heavy-flavor content in WZ and ZZ background events, accounting for the differences in the jet multiplicity distribution between data and simulation in the 3 CR. The misidentification of light quark or gluon jets as b jets is covered by a separate systematic uncertainty.
The uncertainties in the rate and in the distribution of the discriminating observables for the background from misidentified leptons and τ h stem from statistical uncertainties in the events selected in the MR and AR as well as from systematic uncertainties related to the subtraction of the prompt-lepton contributions from the data selected in the MR and AR of the MP method. The effect of these uncertainties on the analysis is evaluated by applying independent variations of the probabilities f i for electrons and muons in different bins of leptoncandidate p T and η and determining the resulting change in the yield and distribution of the misidentified leptons background estimate. We introduce an additional uncertainty in the nonclosure correction to the f i for electrons and muons, accounting for differences between the probabilities f i in tt+jets and multijet events shown in Fig. 6. The size of this uncertainty is equal to the magnitude of the correction. In case of τ h , the misidentification rates f i measured in each bin in η and reconstructed τ h decay mode are fitted by a linear function in p T of the τ h candidate and the uncertainty in the slope and offset of this fit is propagated to the final result. The uncertainty in the rate of the misidentified leptons background is, in general, higher for channels with τ h . The uncertainty varies between 10% in the 2 SS + 0τ h channel and 60% in the 2 + 2τ h channel. The resulting uncertainty in the distribution of the discriminating observables is of moderate size. Additional nonclosure uncertainties account for small differences between the misidentified leptons background estimate obtained by computing the probabilities f i for simulated events and applying the weights w given by Eq. (1) to simulated events selected in the AR, and the background estimates obtained by modeling the background from misidentified leptons and τ h in the SR using the MC simulation directly.
The uncertainty in the flips background in the 2 SS + 0τ h and 2 SS + 1τ h channels is evaluated in a similar way: it amounts to 30% in each channel.
The effects of systematic uncertainties representing the same source are treated as fully correlated between all ten analysis channels. Theoretical uncertainties are furthermore treated as fully correlated among all data-taking periods, whereas the uncertainties arising from experimental sources are treated as uncorrelated between the data recorded in each of the years 2016, 2017, and 2018. The latter treatment is justified by the fact that the uncertainties related to the auxiliary measurements that are performed to validate, and if necessary correct, the modeling of the data by the MC simulation, are mainly of statistical origin and hence independent for measurements that are performed independently for each of the three data-taking periods because of the changes in the detector conditions from one period to another.
The impact of the systematic and statistical uncertainties on the measurement of the ttH and tH signal rates is summarized in Table 7. The largest impacts are due to: the statistical uncertainty of observed data; the uncertainty in the efficiency to reconstruct and identify τ h ; the uncertainties related to the estimation of the misidentified leptons and flips backgrounds; and the theoretical uncertainties, which affect the yield and the distribution of the discriminating observables Table 7 Summary of the sources of systematic and statistical uncertainties and their impact on the measurement of the ttH and tH signal rates, and the measured value of the unconstrained nuisance parameters. The quantity Δμ x /μ x corresponds to the change in uncertainty when fixing the nuisance parameters associated with that uncertainty in the fit. Under the label "MC and sideband statistical uncertainty" are the uncertainties associated with the limited number of simulated MC events and the amount of data events in the application region of the MP method

Additional checks
As a cross-check, and to highlight the enhancement in sensitivity provided by machine-learning techniques, a complementary measurement of the ttH signal rate is performed using a set of alternative observables in the ML fit. We refer to this cross-check as the control analysis, as distinguished from the analysis previously discussed, which we refer to as the main analysis. The control analysis (CA) is restricted to the 2 SS + 0τ h , 3 + 0τ h , 2 SS + 1τ h , and 4 + 0τ h channels. The production rate of the tH signal is fixed to its SM expectation in the CA. In the 2 SS + 0τ h channel, the invariant mass of the lepton pair is used as the discriminating observable. The event selection criteria applied in the CA in this channel are modified to the condition N j ≥ 4 and the events are analyzed in subcategories based on lepton flavor, the charge-sum of the leptons (+2 or −2), and the multiplicity of jets. In the 3 + 0τ h channel, the invariant mass of the three-lepton system is used as discriminating observable and the events are analyzed in subcategories based on the multiplicity of jets and on the charge-sum of the leptons (+1 or −1). A discriminant based on the matrix-element method [35,36] is used as discriminating observable in the 2 SS+1τ h channel and the events are analyzed in two subcategories based on the multiplicity of jets, defined by the conditions N j = 3 and N j ≥ 4, and referred to as the "missing-jet" and "no-missing-jet" subcategories. The computation of the discriminant exploits the fact that the differential cross sections for the ttH signal, as well as for the dominant background processes in the 2 SS + 1τ h channel, are well known; this permits the computation of the probabilities for a given event to be either signal or background, given the measured values of kinematic observables in the event and taking into account the experimental resolution of the detector. The probabilities are computed for the ttH signal hypothesis and for three types of background hypotheses: ttZ events in which the Z boson decays into a pair of τ leptons; ttZ events in which the Z boson decays into a pair of electrons or muons and one lepton is misidentified as τ h ; and tt → b ν bτν events with one additional nonprompt lepton originating from a b hadron decay. Details on the computation of these probabilities are given in Ref.
[23]. The ratio of the probability for a given event to be ttH signal to the sum of the probabilities for the event to be one of the three backgrounds constitutes, according to the Neyman-Pearson lemma [104], an optimal observable for the purpose of separating the ttH signal from backgrounds and is taken as the discriminant used for the signal extraction. In the 4 + 0τ h channel, the invariant mass of the four-lepton system, m 4 , is used as the discriminating observable.

Statistical analysis and results
The production rates of the ttH and tH signals are determined through a binned simultaneous ML fit to the total of 105 distributions: the outputs of the BDTs in each of the seven channels 1 + 1τ h , 0 + 2τ h , 2 OS + 1τ h , 1 + 2τ h , 4 + 0τ h , 3 + 1τ h , and 2 + 2τ h ; the distributions of the 10 output nodes of the ANNs in the 2 SS + 0τ h , 3 + 0τ h , and 2 SS+1τ h channels in the categories described in Fig. 3; and the distributions of the observables that discriminate the ttZ background from each of the WZ and ZZ backgrounds in the   Events       Fig. 9 Distributions of the activation value of the ANN output node with the highest activation value for events selected in the 3 + 0τ h channel and classified as ttH signal (upper left), tH signal (upper right), and background (lower left), and for events selected in the 2 SS + 1τ h channel (lower right). In case of the 2 SS + 1τ h channel, the activation value of the ANN output nodes for ttH signal, tH signal, and background are shown together in a single histogram, concatenating histogram bins as appropriate and enumerating the bins by a monotonously increasing number. The distributions expected for the ttH and tH signals and for background processes are shown for the values of the parameters of interest and of the nuisance parameters obtained from the ML fit. The best fit value of the ttH and tH production rates amounts toμ ttH = 0.92 andμ tH = 5.7 times the rates expected in the SM Events μ H, tH)= t (t μ ,

CMS
Misid. leptons + jets t t Total unc.     obtained from the ML fit. The best fit value of the ttH and tH production rates amounts toμ ttH = 0.92 andμ tH = 5.7 times the rates expected in the SM 3 -and 4 -CRs, respectively; separately for the three datataking periods considered in the analysis. The 2 SS + 0τ h (3 + 0τ h ) channel contributes a total of 12 (11) distributions per data-taking period to the ML fit, reflecting the subdivision of these channels into event categories based on lepton flavor and on the multiplicity of b-tagged jets.

CMS
The production rates of the ttH and tH signals constitute the parameters of interest (POI) in the fit. We denote by the symbols μ ttH and μ tH the ratio of these production rates to their SM expectation and use the notation μ to refer to the set of both POIs.
The likelihood function is denoted by the symbol L and is given by the expression: where the index i refers to individual bins of the 105 distributions of the discriminating observables that are included in the fit, and the factor P (n i | μ, θ ) represents the probability to observe n i events in a given bin i, where ν i (μ, θ ) events are expected from the sum of signal and background contributions in that bin. The number of expected events is a linear function of the two POIs indicated by μ ttH and μ tH where the symbols ν ttH i , ν tH i , and ν B i denote, respectively, the SM expectation for the ttH and tH signal contributions and the aggregate of contributions expected from background processes in bin i. We use the notation ν i (μ, θ ) to indicate that the number of events expected from signal and background processes in each bin i depends on a set of parameters, denoted by the symbol θ , that represent the systematic uncertainties detailed in Sect. 8 and are referred to as nuisance parameters. Via the dependency of the ν i (μ, θ ) on θ , the nuisance parameters accommodate for variations of the event yields as well as of the distributions of the discriminating observables during the fit. The probability P (n i | μ, θ ) is given by the Poisson distribution: Individual elements of the set of nuisance parameters θ are denoted by the symbol θ k , where each θ k represents a specific source of systematic uncertainty. The function p(θ k |θ k ) represents the probability to observe a valueθ k in an auxiliary measurement of the nuisance parameter, given that its true value is θ k . Systematic uncertainties that affect only the normalization, but not the shape of the distribution of the discriminating observables, are represented by a Gamma probability density function if they are statistical in origin, e.g. if they correspond to the number of events observed in a CR, and otherwise by a log-normal probability density function; systematic uncertainties that also affect the shape of distributions of the discriminating observables are incorporated into the ML fit via the technique detailed in Ref. [105] and represented by a Gaussian probability density function. The rates of the ttW and ttZ backgrounds are separately left unconstrained in the fit. The rate of the small ttWW background is constrained to scale by the same factor with respect to its SM expectation as the rate of the ttW background.
Statistical fluctuations in the background predictions arise because of a limited number of events in the MC simulation as well as in the ARs that are used to estimate the misidentified leptons and flips backgrounds from data. These fluctuations are incorporated into the likelihood function via the approach described in Ref. [106].
Further details concerning the treatment of systematic uncertainties and concerning the choice of the functions p(θ k |θ k ) are given in Refs. [105,107,108].
A complication in the signal extraction arises from the fact that a deviation in the top quark Yukawa coupling y t with respect to the SM expectation m t /v would change the distribution of kinematic observables for the tH signal and alter the proportion between the tH and ttH signal rates. We address this complication by first determining the production rates for the tH and ttH signals, assuming that the distributions of kinematic observables for the tH signal conform to the distributions expected in the SM; we then determine the Yukawa coupling y t of the Higgs boson to the top quark, accounting for modifications in the interference effects for the tH signal. These studies assume a Higgs boson mass of 125 GeV.
Assuming the distributions of the discriminating observables for the tH and ttH signals agree with their SM expectation, the production rate for the ttH signal is measured to be μ ttH = 0.92 ± 0.19 (stat) +0.17 −0.13 (syst) times the SM expectation, equivalent to a ttH production cross section for ttH production of 466±96 (stat) +70 −56 (syst) fb, and that of the tH signal is measured to be μ tH = 5.7±2.7 (stat)±3.0 (syst) times the SM expectation for this production rate, equivalent to a cross section for tH production of 510 ± 200 (stat) ± 220 (syst) fb. The corresponding observed (expected) significance of the ttH signal amounts to 4.7 (5.2) standard deviations, assuming the tH process to have the SM production rate, and that of the tH signal to 1.4 (0.3) standard deviations, also assuming the ttH process to have the SM production rate. We have estimated the agreement between the data and our statistical model by using a goodness-of-fit test to the saturated model, obtaining a p-value of 0.097, showing no indication of a significant difference between data and the assumed model. Bkg. Data-Bkg. The distributions that are included in the ML fit are shown in Figs. 8, 9, 10, 11 and 12. In the 2 SS + 0τ h and 3 + 0τ h channels, we show the distributions of the activation values of ANN output nodes in the different subcategories based on lepton flavor and on the multiplicity of b-tagged jets in a single histogram, concatenating histogram bins as appropriate, and enumerate the bins by a monotonically increasing number. The distributions expected for the ttH and tH signals, as well as the expected background contributions, are shown for the value of the POI and of nuisance parameters obtained from the ML fit. The uncertainty bands shown in the figures represent the total uncertainty in the sum of signal and background contributions that remains after having determined the value of the nuisance parameters through the ML fit. These bands are computed by randomly sampling from the covariance matrix of the nuisance parameters as determined by the ML fit and adding the statistical uncertainties in the background predictions in quadrature. The data are in agreement with the sum of contributions estimated by the ML fit for the ttH and tH signals and for the background processes. The corresponding event yields are given in Table 8. In the 2 SS + 0τ h , 3 + 0τ h , and 2 SS + 1τ h channels, the sums  The event yields of background processes obtained from the ML fit agree reasonably well with their expected production rate, given the uncertainties. In particular, the production rates of the ttZ and ttW backgrounds are determined to be μ ttZ = 1.03 ± 0.14 (stat+syst) and μ ttW = 1.43 ± 0.21 (stat+syst) times their SM expectation, as obtained from the MC simulation.
The evidence for the presence of the ttH and tH signals in the data is illustrated in Fig. 13, in which each bin of the distributions that are included in the ML fit is classified according to the expected ratio of the number of ttH+tH signal (S) over background (B) events in that bin. A significant excess of events with respect to the background expectation is visible in the bins with the highest expected S/B ratio.
The ttH signal rates measured in the ten individual channels are shown in Fig. 14, obtained by performing a likelihood fit in which signal rates are parametrized with independent parameters, one for each channel. The measurement of the tH production rate is only shown in the 2 SS+0τ h , 3 +0τ h , and 2 SS + 1τ h channels, which employ a multiclass ANN to separate the tH from the ttH signal. The sensitivity of the other channels to the tH signal is small. The ttH and tH production rates obtained from the simultaneous fit of all channels are also shown in the figure. The signal rates measured in individual channels are compatible with each other and with the ttH and tH production rates obtained from the simultaneous fit of all channels. The largest deviation from the SM expectation is observed in the ttH production rate in the 2 +2τ h channel, where the best fit value of the ttH signal rate is negative, reflecting the deficit of observed events compared to the background expectation in this channel, as shown in Fig. 11. The value and uncertainty shown in Fig. 14 are obtained after requiring the ttH production rates in this channel to be positive. The value measured in the 2 +2τ h channel is compatible with the SM expectation at the level of 1.94 standard deviations when constraining the signal strength in that channel to be larger than zero. The sensitivity of individual channels can be inferred from the size of the uncertainty band in the measured signal strengths. The channel providing the highest sensitivity is the 2 SS + 0τ h channel, which is the channel providing the largest signal yield, followed by the 3 + 0τ h and 2 SS + 1τ h channels. Figure 15 shows the correlations between the measured ttH and tH signal rates and those between the signal rates and the production rates of the ttZ and ttW backgrounds. All correlations are of moderate size, demonstrating the performance achieved by the multiclass ANN in distinguishing between the tH and ttH signals as well as in separating the ttH and tH signals from the ttZ and ttW backgrounds.
The two production rates that are not shown on either the x or the y axis are profiled such that the function L attains its minimum at each point in the x-y plane observed (expected) significance of the ttH signal in the CA amounts to 3.8 (4.0) standard deviations. We now drop the assumption that the distributions of kinematic observables for the tH signal conform to the distributions expected in the SM and determine the Yukawa coupling y t of the Higgs boson to the top quark. We parametrize the production ratesμ ttH andμ tH of the ttH and tH signals as a function of the ratio of the top quark Yukawa coupling y t to its SM expectation m t /v. We refer to this ratio as the coupling modifier and denote it by the symbol κ t . The effect of the interference, described in Sect. 1, between the diagrams in Fig. 2 on the distributions of kinematic observables is parametrized as a function of κ t and fully taken into account, adjusting the event yield for the tH signal as well as the distributions of the outputs of the BDTs and ANNs for each value of κ t . The changes in the kinematical properties of the event affect the probability for tH signal events to pass the event selection criteria. The effect is illustrated in Fig. 16, which shows the variation of the product of acceptance and efficiency for the tHq and tHW signal contributions in each decay mode of the Higgs boson as a function of the ratio κ t /κ V , where κ V denotes the coupling of the Higgs boson to the W boson with respect to the SM expectation for this coupling. The coupling of the Higgs boson to the Z boson with respect to its SM expectation is assumed to scale by the same value κ V . Variations of the coupling modifier κ V from the SM expectation κ V = 1 affect the interference between the diagrams in Fig. 2 as well as the branching fractions of   the Higgs boson decay modes H → WW and H → ZZ. We compute the compatibility of the data with different values of κ t and κ V , as is shown in Fig. 17. We obtain a 95% confidence level (CL) region on κ t consisting of the union of the two intervals −0.9 < κ t < −0.7 and 0.7 < κ t < 1.1 at 95% confidence level (CL). At 95% CL, both the inverted top coupling scenario and the SM expectation κ t = 1 are in agreement with the data.

Summary
The rate for Higgs boson production in association with either one or two top quarks has been measured in events containing multiple electrons, muons, and hadronically decaying tau leptons, using data recorded by the CMS experiment in pp collisions at √ s = 13 TeV in 2016, 2017, and 2018. The analyzed data corresponds to an integrated luminosity of 137 fb −1 . Ten different experimental signatures are considered in the analysis, differing by the multiplicity of electrons, muons, and hadronically decaying tau leptons, and targeting events in which the Higgs boson decays via H → WW, H → ττ, or H → ZZ, whereas the top quark(s) decay either semi-leptonically or hadronically. The measured production rates for the ttH and tH signals amount to 0.92±0.19 (stat) +0.17 −0.13 (syst) and 5.7±2.7 (stat)±3.0 (syst) times their respective standard model (SM) expectations. The corresponding observed (expected) significance amounts to 4.7 (5.2) standard deviations for ttH, and to 1.4 (0.3) for tH production. Assuming that the Higgs boson coupling to the tau lepton is equal in strength to the values expected in the SM, the coupling y t of the Higgs boson to the top quark divided by its SM expectation, κ t = y t /y SM t , is constrained to be within −0.9 < κ t < −0.7 or 0.7 < κ t < 1.1, at 95% confidence level. This result is the most sensitive measurement of the ttH production rate to date. thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF

Data Availability Statement
This manuscript has no associated data or the data will not be deposited. [Authors' comment: Release and preservation of data used by the CMS Collaboration as the basis for publications is guided by the CMS policy as written in its document "CMS data preservation, re-use and open access policy" (https://cms-docdb.cern. ch/cgi-bin/PublicDocDB/RetrieveFile?docid=6032&filename=CMS DataPolicyV1.2.pdf&version=2).]

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Funded by SCOAP 3