1 Introduction

The discovery of a Higgs (\({\mathrm{H}} \)) boson by the ATLAS and CMS experiments at the CERN LHC [1,2,3] opened a new field for exploration in the realm of particle physics. Detailed measurements of the properties of this new particle are important to ascertain if the discovered resonance is indeed the Higgs boson predicted by the standard model (SM) [4,5,6,7]. In the SM, the Yukawa coupling \(y_{\mathrm {f}}\) of the Higgs boson to fermions is proportional to the mass \(m_{\mathrm {f}}\) of the fermion, namely \(y_{\mathrm {f}} = m_{\mathrm {f}}/v\), where \(v = 246\,\text {GeV} \) denotes the vacuum expectation value of the Higgs field. With a mass of \(m_{{\mathrm{t}}}= 172.76 \pm 0.30\,\text {GeV} \) [8], the top quark is by far the heaviest fermion known to date, and its Yukawa coupling is of order unity. The large mass of the top quark may indicate that it plays a special role in the mechanism of electroweak symmetry breaking [9,10,11]. Deviations of \(y_{{\mathrm{t}}}\) from the SM prediction of \(m_{{\mathrm{t}}}/v\) would indicate the presence of physics beyond the SM.

Fig. 1
figure 1

Feynman diagrams at LO for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production

The measurement of the Higgs boson production rate in association with a top quark pair (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \)) provides a model-independent determination of the magnitude of \(y_{{\mathrm{t}}}\), but not of its sign. The sign of \(y_{{\mathrm{t}}}\) is determined from the associated production of a Higgs boson with a single top quark (\({\mathrm{t}} {\mathrm{H}} \)). Leading-order (LO) Feynman diagrams for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production are shown in Figs. 1 and 2, respectively. The diagrams for \({\mathrm{t}} {\mathrm{H}} \) production are separated into three contributions: the t-channel (\({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \)) and the s-channel, that proceed via the exchange of a virtual \({\mathrm{W}} \) boson, and the associated production of a Higgs boson with a single top quark and a \({\mathrm{W}} \) boson (\({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \)). The interference between the diagrams where the Higgs boson couples to the top quark (Fig. 2 upper and lower left), and those where the Higgs boson couples to the \({\mathrm{W}} \) boson (Fig. 2 upper and lower right) is destructive when \(y_{{\mathrm{t}}}\) and \(g_{{\mathrm{W}}}\) have the same sign, where the latter denotes the coupling of the Higgs boson to the \({\mathrm{W}} \) boson. This reduces the \({\mathrm{t}} {\mathrm{H}} \) cross section and influences the kinematical properties of the event as a function of \(y_{{\mathrm{t}}}\) and \(g_{{\mathrm{W}}}\). The interference becomes constructive when the coupling of the \(g_{{\mathrm{W}}}\) and \(y_{{\mathrm{t}}}\) have opposite signs, causing an increase in the cross section of up to one order of magnitude. This is referred to as inverted top quark coupling.

Indirect constraints on the magnitude of \(y_{{\mathrm{t}}}\) are obtained from the rate of Higgs boson production via gluon fusion and from the decay rate of Higgs bosons to photon pairs [12], where in both cases, \(y_{{\mathrm{t}}}\) enters through top quark loops. The \({\mathrm{H}} \rightarrow {\upgamma }{}{} {\upgamma }{}{} \) decay rate also provides sensitivity to the sign of \(y_{{\mathrm{t}}}\) [13], as does the rate for associated production of a Higgs boson with a \({\mathrm{Z}} \) boson [14]. The measured rates of these processes suggest that the Higgs boson coupling to top quarks is SM-like. However, contributions from non-SM particles to these loops can compensate, and therefore mask, deviations of \(y_{{\mathrm{t}}}\) from its SM value. A model-independent direct measurement of the top quark Yukawa coupling in \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production is therefore very important. The comparison of the magnitude and sign of \(y_{{\mathrm{t}}}\) obtained from the measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates, where \(y_{{\mathrm{t}}}\) enters at lowest “tree” level, with the value of \(y_{{\mathrm{t}}}\) obtained from processes where \(y_{{\mathrm{t}}}\) enters via loop contributions can provide evidence about such contributions.

This manuscript presents the measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates in final states containing multiple electrons, muons, or \({\uptau } \) leptons that decay to hadrons and a neutrino (\({\uptau } _\mathrm {h}\)). In the following, we refer to \({\uptau } _\mathrm {h}\) as “hadronically decaying \({\uptau }\) ”. We also refer to electrons and muons collectively as “leptons” (\(\ell \)). The measurement is based on data recorded by the CMS experiment in \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collisions at \(\sqrt{s} = 13\,\text {TeV} \) during Run 2 of the LHC, that corresponds to an integrated luminosity of 137\(\,\text {fb}^{-1}\).

The associated production of Higgs bosons with top quark pairs was previously studied by the ATLAS and CMS experiments, with up to 24.8\(\,\text {fb}^{-1}\) of data recorded at \(\sqrt{s} = 7\) and \(8\,\text {TeV} \) during LHC Run 1 [15,16,17,18,19], and up to 79.8\(\,\text {fb}^{-1}\) of data recorded at \(\sqrt{s} = 13\,\text {TeV} \) during LHC Run 2 [20,21,22,23,24,25,26]. The combined analysis of data recorded at \(\sqrt{s} = 7\), 8, and \(13\,\text {TeV} \) resulted in the observation of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production by CMS and ATLAS [27, 28]. The production of Higgs bosons in association with a single top quark was also studied using the data recorded during LHC Run 1 [29] and Run 2 [30, 31]. These analyses covered Higgs boson decays to \({{\mathrm{b}}} {{\overline{{{{\mathrm{b}}}}}}} \), \({\upgamma }{}{} {\upgamma }{}{} \), \({\mathrm{W}} {\mathrm{W}} \), \({\mathrm{Z}} {\mathrm{Z}} \), and \({\uptau } {\uptau } \).

The measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates presented in this manuscript constitutes their first simultaneous analysis in this channel. This approach is motivated by the high degree of overlap between the experimental signatures of both production processes and takes into account the dependence of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates as a function of \(y_{{\mathrm{t}}}\). Compared to previous work [23], the sensitivity of the present analysis is enhanced by improvements in the identification of \({\uptau } _\mathrm {h}\) decays and of jets originating from the hadronization of bottom quarks, as well as by performing the analysis in four additional experimental signatures, also referred to as analysis channels, that add up to a total of ten. The signatures involve Higgs boson decays to \({\mathrm{W}} {\mathrm{W}} \), \({\uptau } {\uptau } \), and \({\mathrm{Z}} {\mathrm{Z}} \), and are defined according to the lepton and \({\uptau } _\mathrm {h}\) multiplicities in the events. Some of them require leptons to have the same (opposite) sign of electrical charge and are therefore referred to as \({\mathrm {SS}}\) (\({\mathrm {OS}}\)). The signatures \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \), \(1\ell + 2{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \) target events where at least one top quark decays via \({\mathrm{t}} \rightarrow {{\mathrm{b}}} {\mathrm{W}} ^{+} \rightarrow {{\mathrm{b}}} \ell ^{+}{\upnu {}{}} _{\ell }\), whereas the signatures \(1\ell + 1{\uptau } _\mathrm {h} \) and \(0\ell + 2{\uptau } _\mathrm {h} \) target events where all top quarks decay via \({\mathrm{t}} \rightarrow {{\mathrm{b}}} {\mathrm{W}} ^{+} \rightarrow {{\mathrm{b}}} {{\mathrm{q}}} {{\overline{{{{\mathrm{q}}}}}}} '\). We refer to the first and latter top quark decay signatures as semi-leptonically and hadronically decaying top quarks, respectively. Here and in the following, the term top quark includes the corresponding charge-conjugate decays of top antiquarks. As in previous analyses, the separation of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals from backgrounds is improved through machine-learning techniques, specifically boosted decision trees (BDTs) and artificial neural networks (ANNs) [32,33,34], and through the matrix-element method [35, 36]. Machine-learning techniques are also employed to improve the separation between the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals. We use the measured \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates to set limits on the magnitude and sign of \(y_{{\mathrm{t}}}\).

Fig. 2
figure 2

Feynman diagrams at LO for \({\mathrm{t}} {\mathrm{H}} \) production via the t-channel (\({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) in upper left and upper right) and s-channel (middle) processes, and for associated production of a Higgs boson with a single top quark and a \({\mathrm{W}} \) boson (\({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) in lower left and lower right). The \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) and \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) production processes are shown for the five-flavor scheme

This paper is organized as follows. After briefly describing the CMS detector in Sect. 2, we proceed to discuss the data and simulated events used in the measurement in Sect. 3. Section 4 covers the object reconstruction and selection from signals recorded in the detector, while Sect. 5 describes the selection criteria applied to events in the analysis. These events are grouped in categories, defined in Sect. 6, while the estimation of background contributions in these categories is described in Sect. 7. The systematic uncertainties affecting the measurements are given in Sect. 8, and the statistical analysis and the results of the measurements in Sect. 9. We end the paper with a brief summary in Sect. 10.

2 The CMS detector

The central feature of the CMS apparatus is a superconducting solenoid of 6\(\,\text {m}\) internal diameter, providing a magnetic field of 3.8\(\,\text {T}\). A silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections, are positioned within the solenoid volume. The silicon tracker measures charged particles within the pseudorapidity range \(|\eta | < 2.5\). The ECAL is a fine-grained hermetic calorimeter with quasi-projective geometry, and is segmented into the barrel region of \(|\eta | < 1.48\) and in two endcaps that extend up to \(|\eta | < 3.0\). The HCAL barrel and endcaps similarly cover the region \(|\eta | < 3.0\). Forward calorimeters extend the coverage up to \(|\eta | < 5.0\). Muons are measured and identified in the range \(|\eta | < 2.4\) by gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. A two-level trigger system [37] is used to reduce the rate of recorded events to a level suitable for data acquisition and storage. The first level of the CMS trigger system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events with a latency of 4\(\,\mu \text {s}\). The high-level trigger processor farm further decreases the event rate from around 100\(\,\text {kHz}\) to about 1\(\,\text {kHz}\). Details of the CMS detector and its performance, together with a definition of the coordinate system and the kinematic variables used in the analysis, are reported in Ref. [38].

3 Data samples and Monte Carlo simulation

The analysis uses \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collision data recorded at \(\sqrt{s} = 13\,\text {TeV} \) at the LHC during 2016-2018. Only the data-taking periods during which the CMS detector was fully operational are included in the analysis. The total integrated luminosity of the analyzed data set amounts to 137\(\,\text {fb}^{-1}\), of which 35.9 [39], 41.5 [40], and 59.7 [41]\(\,\text {fb}^{-1}\) have been recorded in 2016, 2017, and 2018, respectively.

The event samples produced via Monte Carlo (MC) simulation are used for the purpose of calculating selection efficiencies for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals, estimating background contributions, and training machine-learning algorithms. The contribution from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal and the backgrounds arising from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \) production in association with \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \)), from triboson (\({\mathrm{W}} {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{W}} {\mathrm{W}} {\mathrm{Z}} \), \({\mathrm{W}} {\mathrm{Z}} {\mathrm{Z}} \), \({\mathrm{Z}} {\mathrm{Z}} {\mathrm{Z}} \), \({\mathrm{W}} {\mathrm{Z}} {\upgamma }{}{} \)) production, as well as from the production of four top quarks (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} \)) are generated at next-to-LO (NLO) accuracy in perturbative quantum chromodynamics (pQCD) making use of the program \(\textsc {MadGraph} {}5\_\mathrm{a}\textsc {mc@nlo} \) 2.2.2 or 2.3.3 [42,43,44,45], whereas the \({\mathrm{t}} {\mathrm{H}} \) signal and the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} {\upgamma }{}{} \), \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} {\upgamma }{}{} ^{*} \), \({\mathrm{t}} {\mathrm{Z}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{W}} \)+jets, Drell–Yan (DY), \({\mathrm{W}} {\upgamma }{}{} \), and \({\mathrm{Z}} {\upgamma }{}{} \) backgrounds are generated at LO accuracy using the same program. The symbols \({\upgamma }{}{} ^{*} \) and \({\upgamma }{}{} \) are employed to distinguish virtual photons from the real ones. The event samples with virtual photons also include contributions from virtual \({\mathrm{Z}} \) bosons. The DY production of electron, muon, and \({\uptau } \) lepton pairs are referred to as \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \), \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \), and \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\uptau } {\uptau } \), respectively. The modeling of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background includes additional \(\alpha _\mathrm {S} \alpha ^3\) electroweak corrections [46, 47], simulated using MadGraph 5_amc@nlo. The NLO program powheg v2.0 [48,49,50] is used to simulate the backgrounds arising from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, \({\mathrm{t}} {\mathrm{W}} \), and diboson (\({\mathrm{W}} ^{\pm }{\mathrm{W}} ^{\mp }\), \({\mathrm{W}} {\mathrm{Z}} \), \({\mathrm{Z}} {\mathrm{Z}} \)) production, and from the production of single top quarks, and from SM Higgs boson production via gluon fusion (\({\mathrm{g}} {\mathrm{g}} {\mathrm{H}} \)) and vector boson fusion (\({{\mathrm{q}}} {{\mathrm{q}}} {\mathrm{H}} \)) processes, and from the production of SM Higgs bosons in association with \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons (\({\mathrm{W}} {\mathrm{H}} \), \({\mathrm{Z}} {\mathrm{H}} \)) and with \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons along with a pair of top quarks (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{H}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} {\mathrm{H}} \)). The modeling of the top quark transverse momentum (\(p_{\mathrm {T}}\)) distribution of \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events simulated with the program powheg is improved by reweighting the events to the differential cross section computed at next-to-NLO (NNLO) accuracy in pQCD, including electroweak corrections computed at NLO accuracy [51]. We refer to the sum of \({\mathrm{W}} {\mathrm{H}} \) plus \({\mathrm{Z}} {\mathrm{H}} \) contributions by using the symbol \({\mathrm{V}} {\mathrm{H}} \) and to the sum of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{H}} \) plus \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} {\mathrm{H}} \) contributions by using the symbol \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{V}} {\mathrm{H}} \). The SM production of Higgs boson pairs or a Higgs boson in association with a pair of b quarks is not considered as a background to this analysis, because its impact on the event yields in all categories is found to be negligible. The production of same-sign W pairs (\({\mathrm {SS}}\)W) is simulated using the program MadGraph 5_amc@nlo in LO accuracy, except for the contribution from double-parton interactions, which is simulated with pythia v8.2 [52] (referred to as pythia hereafter). The NNPDF3.0LO (NNPDF3.0NLO) [53,54,55] set of parton distribution functions (PDF) is used for the simulation of LO (NLO) 2016 samples, while NNPDF3.1 NNLO [56] is used for 2017 and 2018 LO and NLO samples.

Different flavor schemes are chosen to simulate the \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) and \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) processes. In the five-flavor scheme (\(5{\mathrm {\,FS}}\)), bottom quarks are considered as sea quarks of the proton and may appear in the initial state of proton–proton (\({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \)) scattering processes, as opposed to the four-flavor scheme (\(4{\mathrm {\,FS}}\)), where only up, down, strange, and charm quarks are considered as valence or sea quarks of the proton, whereas bottom quarks are produced by gluon splitting at the matrix-element level, and therefore appear only in the final state [57]. In the \(5{\mathrm {\,FS}}\) the distinction of \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \), s-channel, and \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) contributions to \({\mathrm{t}} {\mathrm{H}} \) production is well-defined up to NLO, whereas at higher orders in perturbation theory the \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) and s-channel production processes start to interfere and can no longer be uniquely separated [58]. Similarly, in the same regime the \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) process starts to interfere with \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production at NLO. In the \(4{\mathrm {\,FS}}\), the separation among the \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \), s-channel, and \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) (if the \({\mathrm{W}} \) boson decays hadronically) processes holds only up to LO, and the \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) process starts to interfere with \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production already at tree level [58].

The \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) process is simulated at LO in the \(4{\mathrm {\,FS}}\) and the \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) process in the \(5{\mathrm {\,FS}}\), so that interference contributions of latter with \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production are not present in the simulation. The contribution from s-channel \({\mathrm{t}} {\mathrm{H}} \) production is negligible and is not considered in this analysis.

Parton showering, hadronization, and the underlying event are modeled using pythia with the tune CP5, CUETP8M1, CUETP8M2, or CUETP8M2T4 [59,60,61], depending on the dataset, as are the decays of \({\uptau } \) leptons, including polarization effects. The matching of matrix elements to parton showers is done using the MLM scheme [42] for the LO samples and the FxFx scheme [44] for the samples simulated at NLO accuracy.

The modeling of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals, as well as of the backgrounds, is improved by normalizing the simulated event samples to cross sections computed at higher order in pQCD. The cross section for \({\mathrm{t}} {\mathrm{H}} \) production is computed in the \(5{\mathrm {\,FS}}\). The SM cross section for \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) production has been computed at NLO accuracy in pQCD as \(74.3\,\text {fb} \) [62], and the SM cross section for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production has been computed at NLO accuracy in pQCD as \(506.5\,\text {fb} \) with electroweak corrections calculated at the same order in perturbation theory [62]. Both cross sections are computed for \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collisions at \(\sqrt{s} = 13\,\text {TeV} \). The \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) cross section is computed to be \(15.2\,\text {fb} \) at NLO in the \(5{\mathrm {\,FS}}\), using the DR2 scheme [63] to remove overlapping contributions between the \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) process and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production. The cross sections for \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, \({\mathrm{W}} \)+jets, DY, and diboson production are computed at NNLO accuracy [64,65,66].

Event samples containing Higgs bosons are normalized using the SM cross sections published in Ref. [62]. Event samples of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) production are normalized to the cross sections published in Ref. [62], while \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) simulated samples are normalized to the cross section published in the same reference increased by the contribution from the \(\alpha _\mathrm {S} \alpha ^3\) electroweak corrections [46, 47]. The SM cross sections for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for the most relevant background processes are given in Table 1.

The \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) samples are produced assuming all couplings of the Higgs boson have the values expected in the SM. The variation in kinematical properties of \({\mathrm{t}} {\mathrm{H}} \) signal events, which stem from the interference of the diagrams in Fig. 2 described in Sect. 1, for values of \(y_{{\mathrm{t}}}\) and \(g_{{\mathrm{W}}}\) that differ from the SM expectation, is accounted for by applying weights calculated for each \({\mathrm{t}} {\mathrm{H}} \) signal event with MadGraph 5_amc@nlo, following the approach suggested in [67, 68]. No such reweighting is necessary for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, because any variation of \(y_{{\mathrm{t}}}\) would only affect the inclusive cross section for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production, which increases proportional to \(y_{{\mathrm{t}}}^{2}\), leaving the kinematical properties of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal events unaltered.

The presence of simultaneous \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collisions in the same or nearby bunch crossings, referred to as pileup (PU), is modeled by superimposing inelastic \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) interactions, simulated using pythia, to all MC events. Simulated events are weighed so the PU distribution of simulated samples matches the one observed in the data.

All MC events are passed through a detailed simulation of the CMS apparatus, based on Geant4 [69, 70], and are processed using the same version of the CMS event reconstruction software used for the data.

Simulated events are corrected by means of weights or by varying the relevant quantities to account for residual differences between data and simulation. These differences arise in: trigger efficiencies; reconstruction and identification efficiencies for electrons, muons, and \({\uptau } _\mathrm {h} \); the energy scale of \({\uptau } _\mathrm {h} \) and jets; the efficiency to identify jets originating from the hadronization of bottom quarks and the corresponding misidentification rates for light-quark and gluon jets; and the resolution in missing transverse momentum. The corrections are typically at the level of a few percent [71,72,73,74,75]. They are measured using a variety of SM processes, such as \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \), \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \), \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\uptau } {\uptau } \), \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, and \({\upgamma }{}{} \)+jets production.

Table 1 Standard model cross sections for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals as well as for the most relevant background processes. The cross sections are quoted for \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collisions at \(\sqrt{s} = 13\,\text {TeV} \). The quoted value for DY production includes a generator-level requirement of \(m_{{\mathrm{Z}}/{\upgamma }{}{} ^*}>50\,\text {GeV} \)

4 Event reconstruction

The CMS particle-flow (PF) algorithm [76] provides a global event description that optimally combines the information from all subdetectors, to reconstruct and identify all individual particles in the event. The particles are subsequently classified into five mutually exclusive categories: electrons, muons, photons, and charged and neutral hadrons.

Electrons are reconstructed combining the information from tracker and ECAL [77] and are required to satisfy \(p_{\mathrm {T}} > 7\,\text {GeV} \) and \(|\eta | < 2.5\). Their identification is based on a multivariate (MVA) algorithm that combines observables sensitive to: the matching of measurements of the electron energy and direction obtained from the tracker and the calorimeter; the compactness of the electron cluster; and the bremsstrahlung emitted along the electron trajectory. Electron candidates resulting from photon conversions are removed by requiring that the track has no missing hits in the innermost layers of the silicon tracker and by vetoing candidates that are matched to a reconstructed conversion vertex. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels (see Sect. 5 for channel definitions), we apply further electron selection criteria that demand the consistency among three independent measurements of the electron charge, described as “selective algorithm” in Ref. [77].

The reconstruction of muons is based on linking track segments reconstructed in the silicon tracker to hits in the muon detectors that are embedded in the steel flux-return yoke [78]. The quality of the spatial matching between the individual measurements in the tracker and in the muon detectors is used to discriminate genuine muons from hadrons punching through the calorimeters and from muons produced by in-flight decays of kaons and pions. Muons selected in the analysis are required to have \(p_{\mathrm {T}} > 5\,\text {GeV} \) and \(|\eta | < 2.4\). For events selected in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, the relative uncertainty in the curvature of the muon track is required to be less than 20% to ensure a high-quality charge measurement.

The electrons and muons satisfying the aforementioned selection criteria are referred to as “loose leptons” in the following. Additional selection criteria are applied to discriminate electrons and muons produced in decays of \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons and leptonic \({\uptau } \) decays (“prompt”) from electrons and muons produced in decays of \({{\mathrm{b}}} \) hadrons (“nonprompt”). The removal of nonprompt leptons reduces, in particular, the background arising from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets production. To maximally exploit the information available in each event, we use MVA discriminants that take as input the charged and neutral particles reconstructed in a cone around the lepton direction besides the observables related to the lepton itself. The jet reconstruction and \({{\mathrm{b}}} \) tagging algorithms are applied, and the resulting reconstructed jets are used as additional inputs to the MVA. In particular, the ratio of the lepton \(p_{\mathrm {T}}\) to the reconstructed jet \(p_{\mathrm {T}}\) and the component of the lepton momentum in a direction perpendicular to the jet direction are found to enhance the separation of prompt leptons from leptons originating from \({{\mathrm{b}}} \) hadron decays, complementing more conventional observables such as the relative isolation of the lepton, calculated in a variable cone size depending on the lepton \(p_{\mathrm {T}}\) [79, 80], and the longitudinal and transverse impact parameters of the lepton trajectory with respect to the primary \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) interaction vertex. Electrons and muons passing a selection on the MVA discriminants are referred to as “tight leptons”.

Because of the presence of PU, the primary \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) interaction vertex typically needs to be chosen among the several vertex candidates that are reconstructed in each \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collision event. The candidate vertex with the largest value of summed physics-object \(p_{\mathrm {T}} ^2\) is taken to be the primary \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [81, 82] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the \(p_{\mathrm {T}}\) of those jets.

While leptonic decay products of \({\uptau } \) leptons are selected by the algorithms described above, hadronic decays are reconstructed and identified by the “hadrons-plus-strips” (HPS) algorithm [74]. The algorithm is based on reconstructing individual hadronic decay modes of the \({\uptau } \) lepton: \({\uptau } ^{-} \rightarrow {{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{\upnu {}{}} _{{\uptau }}\), \({\uptau } ^{-} \rightarrow {{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{{\uppi {}{}} {}^{0}} {\upnu {}{}} _{{\uptau }}\), \({\uptau } ^{-} \rightarrow {{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{{\uppi {}{}} {}^{0}} {{\uppi {}{}} {}^{0}} {\upnu {}{}} _{{\uptau }}\), \({\uptau } ^{-} \rightarrow {{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{+}{{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{\upnu {}{}} _{{\uptau }}\), \({\uptau } ^{-} \rightarrow {{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{+}{{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}{{\uppi {}{}} {}^{0}} {\upnu {}{}} _{{\uptau }}\), and all the charge-conjugate decays, where the symbols \({{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{-}\) and \({{\mathrm{h}}_{\mathrm{}}^{\mathrm{}}} ^{+}\) denotes either a charged pion or a charged kaon. The photons resulting from the decay of neutral pions that are produced in the \({\uptau } \) decay have a sizeable probability to convert into an electron-positron pair when traversing the silicon tracker. The conversions cause a broadening of energy deposits in the ECAL, since the electrons and positrons produced in these conversions are bent in opposite azimuthal directions by the magnetic field and may also emit bremsstrahlung photons. The HPS algorithm accounts for this broadening when it reconstructs the neutral pions, by means of clustering photons and electrons in rectangular strips that are narrow in \(\eta \) but wide in \(\phi \). The subsequent identification of \({\uptau } _\mathrm {h} \) candidates is performed by the “DeepTau” algorithm [83]. The algorithm is based on a convolutional ANN [84], using as input a set of 42 high-level observables in combination with low-level information obtained from the silicon tracker, the electromagnetic and hadronic calorimeters, and the muon detectors. The high-level observables comprise the \(p_{\mathrm {T}} \), \(\eta \), \(\phi \), and mass of the \({\uptau } _\mathrm {h} \) candidate; the reconstructed \({\uptau } _\mathrm {h} \) decay mode; observables that quantify the isolation of the \({\uptau } _\mathrm {h} \) with respect to charged and neutral particles; as well as observables that provide sensitivity to the small distance that a \({\uptau } \) lepton typically traverses between its production and decay. The low-level information quantifies the particle activity within two \(\eta \times \phi \) grids, an “inner” grid of size \(0.2 \times 0.2\), filled with cells of size \(0.02 \times 0.02\), and an “outer” grid of size \(0.5 \times 0.5\) (partially overlapping with the inner grid) and cells of size \(0.05 \times 0.05\). Both grids are centered on the direction of the \({\uptau } _\mathrm {h} \) candidate. The \({\uptau } _\mathrm {h} \) considered in the analysis are required to have \(p_{\mathrm {T}} > 20\,\text {GeV} \) and \(|\eta | < 2.3\) and to pass a selection on the output of the convolutional ANN. The selection differs by analysis channel, targeting different efficiency and purity levels. We refer to these as the very loose, loose, medium, and tight \({\uptau } _\mathrm {h} \) selections, depending on the requirement imposed on the ANN output.

Jets are reconstructed using the anti-\(k_{\mathrm {T}}\) algorithm [81, 82] with a distance parameter of 0.4 and with the particles reconstructed by the PF algorithm as inputs. Charged hadrons associated with PU vertices are excluded from the clustering. The energy of the reconstructed jets is corrected for residual PU effects using the method described in Refs. [85, 86] and calibrated as a function of jet \(p_{\mathrm {T}}\) and \(\eta \) [72]. The jets considered in the analysis are required to: satisfy \(p_{\mathrm {T}} > 25\,\text {GeV} \) and \(|\eta | < 5.0\); pass identification criteria that reject spurious jets arising from calorimeter noise [87]; and not overlap with any identified electron, muon or hadronic \({\uptau } \) within \(\varDelta R = \sqrt{\smash [b]{(\varDelta \eta )^2+(\varDelta \phi )^2}} < 0.4\). We tighten the requirement on the transverse momentum to the condition \(p_{\mathrm {T}} > 60\,\text {GeV} \) for jets reconstructed within the range \(2.7< |\eta | < 3.0\), to further reduce the effect of calorimeter noise, which is sizeable in this detector region. Jets passing these selection criteria are then categorized into central and forward jets, the former satisfying the condition \(|\eta | < 2.4\) and the latter \(2.4< |\eta | < 5.0\). The presence of a high-\(p_{\mathrm {T}}\) forward jet in the event is a characteristic signature of \({\mathrm{t}} {\mathrm{H}} \) production in the t-channel and is used to separate the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) from the \({\mathrm{t}} {\mathrm{H}} \) process in the signal extraction stage of the analysis.

Jets reconstructed within the region \(|\eta | < 2.4\) and originating from the hadronization of bottom quarks are denoted as \({{\mathrm{b}}} \) jets and identified by the DeepJet algorithm [88]. The algorithm exploits observables related to the long lifetime of \({{\mathrm{b}}} \) hadrons as well as to the higher particle multiplicity and mass of \({{\mathrm{b}}} \) jets compared to light-quark and gluon jets. The properties of charged and neutral particle constituents of the jet, as well as of secondary vertices reconstructed within the jet, are used as inputs to a convolutional ANN. Two different selections on the output of the algorithm are employed in the analysis, corresponding to \({{\mathrm{b}}} \) jet selection efficiencies of 84 (“loose”) and 70% (“tight”). The respective mistag rates for light-quark and gluon jets (c jet) are 11 and 1.1% (50% and 15%).

The missing transverse momentum vector, denoted by the symbol \({\vec p}_{\mathrm {T}}^{\text {miss}} \), is computed as the negative of the vector \(p_{\mathrm {T}}\) sum of all particles reconstructed by the PF algorithm. The magnitude of this vector is denoted by the symbol \(p_{\mathrm {T}} ^\text {miss} \). The analysis employs a linear discriminant, denoted by the symbol \(L_{\mathrm {D}} \), to remove backgrounds in which the reconstructed \(p_{\mathrm {T}} ^\text {miss} \) arises from resolution effects. The discriminant also reduces PU effects and is defined by the relation \(L_{\mathrm {D}} = 0.6 p_{\mathrm {T}} ^\text {miss} + 0.4 H_{\mathrm {T}}^{\text {miss}} \), where the observable \(H_{\mathrm {T}}^{\text {miss}} \) corresponds to the magnitude of the vector \(p_{\mathrm {T}}\) sum of electrons, muons, \({\uptau } _\mathrm {h} \), and jets [23]. The discriminant is constructed to combine the higher resolution of \(p_{\mathrm {T}} ^\text {miss} \) with the robustness to PU of \(H_{\mathrm {T}}^{\text {miss}} \).

5 Event selection

The analysis targets \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production in events where the Higgs boson decays via \({\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{H}} \rightarrow {\uptau } {\uptau } \), or \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \), with subsequent decays \({\mathrm{W}} {\mathrm{W}} \rightarrow \ell ^{+}{\upnu {}{}} _{\ell }{{\mathrm{q}}} {{\mathrm{q}}} '\) or \(\ell ^{+}{\upnu {}{}} _{\ell }\ell ^{-}{\overline{{{\upnu {}{}}}}{}{}} _{\ell }\); \({\uptau } {\uptau } \rightarrow \ell ^{+}{\upnu {}{}} _{\ell }{\overline{{{\upnu {}{}}}}{}{}} _{{\uptau }}\ell ^{-}{\overline{{{\upnu {}{}}}}{}{}} _{\ell }{\upnu {}{}} _{{\uptau }}\), \(\ell ^{+}{\upnu {}{}} _{\ell }{\overline{{{\upnu {}{}}}}{}{}} _{{\uptau }}{\uptau } _\mathrm {h} {\upnu {}{}} _{{\uptau }}\), or \({\uptau } _\mathrm {h} {\overline{{{\upnu {}{}}}}{}{}} _{{\uptau }}{\uptau } _\mathrm {h} {\upnu {}{}} _{{\uptau }}\); \({\mathrm{Z}} {\mathrm{Z}} \rightarrow \ell ^{+}\ell ^{-}{{\mathrm{q}}} {{\mathrm{q}}} '\) or \(\ell ^{+}\ell ^{-}{\upnu {}{}} {\overline{{{\upnu {}{}}}}{}{}} \); and the corresponding charge-conjugate decays. The decays \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \rightarrow \ell ^{+}\ell ^{-}\ell ^{+}\ell ^{-}\) are covered by the analysis published in Ref. [20]. The top quark may decay either semi-leptonically via \({\mathrm{t}} \rightarrow {{\mathrm{b}}} {\mathrm{W}} ^{+} \rightarrow {{\mathrm{b}}} \ell ^{+}{\upnu {}{}} _{\ell }\) or hadronically via \({\mathrm{t}} \rightarrow {{\mathrm{b}}} {\mathrm{W}} ^{+} \rightarrow {{\mathrm{b}}} {{\mathrm{q}}} {{\mathrm{q}}} '\), and analogously for the top antiquarks. The experimental signature of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal events consists of: multiple electrons, muons, and \({\uptau } _\mathrm {h} \); \(p_{\mathrm {T}} ^\text {miss} \) caused by the neutrinos produced in the \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons, and tau lepton decays; one (\({\mathrm{t}} {\mathrm{H}} \)) or two (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \)) \({{\mathrm{b}}} \) jets from top quark decays; and further light-quark jets, produced in the decays of either the Higgs boson or of the top quark(s).

The events considered in the analysis are selected in ten nonoverlapping channels, targeting the signatures \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), \(1\ell + 1{\uptau } _\mathrm {h} \), \(0\ell + 2{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \), \(1\ell + 2{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \), as stated earlier. The channels \(1\ell + 1{\uptau } _\mathrm {h} \) and \(0\ell + 2{\uptau } _\mathrm {h} \) specifically target events in which the Higgs boson decays via \({\mathrm{H}} \rightarrow {\uptau } {\uptau } \) and the top quarks decay hadronically, the other channels target a mixture of \({\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{H}} \rightarrow {\uptau } {\uptau } \), and \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \) decays in events with either one or two semi-leptonically decaying top quarks.

Events are selected at the trigger level using a combination of single-, double-, and triple-lepton triggers, lepton\(+{\uptau } _\mathrm {h} \) triggers, and double-\({\uptau } _\mathrm {h} \) triggers. Spurious triggers are discarded by demanding that electrons, muons, and \({\uptau } _\mathrm {h} \) reconstructed at the trigger level match electrons, muons, and \({\uptau } _\mathrm {h} \) reconstructed offline. The \(p_{\mathrm {T}}\) thresholds of the triggers typically vary by a few GeV during different data-taking periods, depending on the instantaneous luminosity. For example, the threshold of the single-electron trigger ranges between 25 and 35\(\,\text {GeV}\) in the analyzed data set, and that of the single-muon trigger varies between 22 and 27\(\,\text {GeV}\). The double-lepton (triple-lepton) triggers reduce the \(p_{\mathrm {T}}\) threshold that is applied to the lepton of highest \(p_{\mathrm {T}}\) to 23 (16)\(\,\text {GeV}\) in case this lepton is an electron and to 17 (8)\(\,\text {GeV}\) in case it is an muon. The electron\(+{\uptau } _\mathrm {h} \) (muon\(+{\uptau } _\mathrm {h} \)) trigger requires the presence of an electron of \(p_{\mathrm {T}} > 24\,\text {GeV} \) (muon of \(p_{\mathrm {T}} > 19\) or 20\(\,\text {GeV}\)) in combination with a \({\uptau } _\mathrm {h} \) of \(p_{\mathrm {T}} > 20\) or 30\(\,\text {GeV}\) (\(p_{\mathrm {T}} > 20\) or 27\(\,\text {GeV}\)), where the lower \(p_{\mathrm {T}}\) thresholds were used in 2016 and the higher ones in 2017 and 2018. The threshold of the double-\({\uptau } _\mathrm {h} \) trigger ranges between 35 and 40\(\,\text {GeV}\) and is applied to both \({\uptau } _\mathrm {h} \). In order to attain these \(p_{\mathrm {T}}\) thresholds, the geometric acceptance of the lepton\(+{\uptau } _\mathrm {h} \) and double-\({\uptau } _\mathrm {h} \) triggers is restricted to the range \(|\eta | < 2.1\) for electrons, muons, and \({\uptau } _\mathrm {h} \). The \(p_{\mathrm {T}}\) thresholds applied to electrons, muons, and \({\uptau } _\mathrm {h} \) in the offline event selection are chosen above the trigger thresholds.

The charge of leptons and \({\uptau } _\mathrm {h} \) is required to match the signature expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals. The \(0\ell + 2{\uptau } _\mathrm {h} \) and \(1\ell + 2{\uptau } _\mathrm {h} \) channels target events where the Higgs boson decays to a \({\uptau } \) lepton pair and both \({\uptau } \) leptons decay hadronically. Consequently, the two \({\uptau } _\mathrm {h} \) are required to have \({\mathrm {OS}}\) charges in these channels. In events selected in the channels \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \), the leptons and \({\uptau } _\mathrm {h} \) are expected to originate from either the Higgs boson decay or from the decay of the top quark–antiquark pair and the sum of their charges is required to be zero. In the \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \), and \(1\ell + 2{\uptau } _\mathrm {h} \) channels the charge-sum of leptons plus \({\uptau } _\mathrm {h} \) is required to be either \(+1\) or \(-1\). No requirement on the charge of the lepton and of the \({\uptau } _\mathrm {h} \) is applied in the \(1\ell + 1{\uptau } _\mathrm {h} \) channel, because studies performed with simulated samples of signal and background events indicate that the sensitivity of this channel is higher when no charge requirement is applied. The \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel targets events in which one lepton originates from the decay of the Higgs boson and the other lepton from a top quark decay. Requiring \({\mathrm {SS}}\) leptons reduces the signal yield by about half, but increases the signal-to-background ratio by a large factor by removing in particular the large background arising from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets production with dileptonic decays of the top quarks. The more favorable signal-to-background ratio for events with SS, rather than OS, lepton pairs motivates the choice of analyzing the events containing two leptons and one \({\uptau } _\mathrm {h} \) separately, in the two channels \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \).

The selection criteria on \({{\mathrm{b}}} \) jets are designed to maintain a high efficiency for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal: one \({{\mathrm{b}}} \) jet can be outside of the \(p_{\mathrm {T}}\) and \(\eta \) acceptance of the jet selection or can fail the \({{\mathrm{b}}} \) tagging criteria, provided that the other \({{\mathrm{b}}} \) jet passes the tight \({{\mathrm{b}}} \) tagging criteria. This choice is motivated by the observation that the main background contributions, arising from the associated production of single top quarks or top quark pairs with \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons, photons, and jets, feature genuine \({{\mathrm{b}}} \) jets with a multiplicity resembling that of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals.

The requirements on the overall multiplicity of jets, including \({{\mathrm{b}}} \) jets, take advantage of the fact that the multiplicity of jets is typically higher in signal events compared to the background. The total number of jets expected in \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) (\({\mathrm{t}} {\mathrm{H}} \)) signal events with the \({\mathrm{H}} \) boson decaying into \({\mathrm{W}} {\mathrm{W}} \), \({\mathrm{Z}} {\mathrm{Z}} \), and \({\uptau } {\uptau } \) amounts to \(N_{\mathrm {j}} = 10 - 2 N_{\ell } - 2 N_{{\uptau }}\) (\(N_{\mathrm {j}} = 7 - 2 N_{\ell } - 2 N_{{\uptau }}\)), where \(N_{\mathrm {j}} \), \(N_{\ell }\) and \(N_{{\uptau }}\) denote the total number of jets, electrons or muons, and hadronic \({\uptau } \) decays, respectively. The requirements on \(N_{\mathrm {j}} \) applied in each channel permit up to two jets to be outside of the \(p_{\mathrm {T}}\) and \(\eta \) acceptance of the jet selection. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, the requirement on \(N_{\mathrm {j}} \) is relaxed further, to increase the signal efficiency in particular for the \({\mathrm{t}} {\mathrm{H}} \) process.

Background contributions arising from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \), \({\mathrm{t}} {\mathrm{Z}} \), \({\mathrm{W}} {\mathrm{Z}} \), and DY production are suppressed by vetoing events containing \({\mathrm {OS}}\) pairs of leptons of the same flavor, referred to as SFOS lepton pairs, passing the loose lepton selection criteria and having an invariant mass \(m_{\ell \ell }\) within 10\(\,\text {GeV}\) of the \({\mathrm{Z}} \) boson mass, \(m_{{\mathrm{Z}}} = 91.19\,\text {GeV} \) [8]. We refer to this selection criterion as “\({\mathrm{Z}} \) boson veto”. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, the \({\mathrm{Z}} \) boson veto is also applied to \({\mathrm {SS}}\) electron pairs, because the probability to mismeasure the charge of electrons is significantly higher than the corresponding probability for muons.

Background contributions arising from DY production in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \) channels are further reduced by imposing a requirement on the linear discriminant, \(L_{\mathrm {D}} > 30\,\text {GeV} \). The requirement on \(L_{\mathrm {D}} \) is relaxed or tightened, depending on whether or not the event meets certain conditions, in order to either increase the efficiency to select \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal events or to reject more background. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, the requirement on \(L_{\mathrm {D}} \) is only applied to events where both reconstructed leptons are electrons, to suppress the contribution of DY production entering the selection through a mismeasurement of the electron charge. In the \(3\ell + 0{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \) channels, the distribution of \(N_{\mathrm {j}} \) is steeply falling for the DY background, thus rendering the expected contribution of this background small if the event contains a high number of jets; we take advantage of this fact by applying the requirement on \(L_{\mathrm {D}} \) only to events with three or fewer jets. If events with \(N_{\mathrm {j}} \le 3\) contain an SFOS lepton pair, the requirement on \(L_{\mathrm {D}} \) is tightened to the condition \(L_{\mathrm {D}} > 45\,\text {GeV} \). Events considered in the \(3\ell + 0{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \) channels containing three or fewer jets and no SFOS lepton pair are required to satisfy the nominal condition \(L_{\mathrm {D}} > 30\,\text {GeV} \).

Events containing a pair of leptons passing the loose selection criteria and having an invariant mass \(m_{\ell \ell }\) of less than 12\(\,\text {GeV}\) are vetoed, to remove events in which the leptons originate from quarkonium decays, cascade decays of heavy-flavor hadrons, and low-mass DY production, because such events are not well modeled by the MC simulation.

In the \(3\ell + 0{\uptau } _\mathrm {h} \) and \(4\ell + 0{\uptau } _\mathrm {h} \) channels, events containing four leptons passing the loose selection criteria and having an invariant mass of \(m_{4\ell }\) of the four-lepton system of less than 140\(\,\text {GeV}\) are vetoed, to remove \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal events in which the Higgs boson decays via \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \rightarrow \ell ^{+}\ell ^{-}\ell ^{+}\ell ^{-}\), thereby avoiding overlap with the analysis published in Ref. [20].

A summary of the event selection criteria applied in the different channels is given in Tables 2, 3 and 4.

Table 2 Event selections applied in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(3\ell + 1{\uptau } _\mathrm {h} \) channels. The \(p_{\mathrm {T}}\) thresholds applied to the lepton of highest, second-highest, and third-highest \(p_{\mathrm {T}}\) are separated by slashes. The symbol “–” indicates that no requirement is applied
Table 3 Event selections applied in the \(0\ell + 2{\uptau } _\mathrm {h} \), \(1\ell + 1{\uptau } _\mathrm {h} \), \(1\ell + 2{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \) channels. The \(p_{\mathrm {T}}\) thresholds applied to the lepton and to the \({\uptau } _\mathrm {h} \) of highest and second-highest \(p_{\mathrm {T}}\) are separated by slashes. The symbol “–” indicates that no requirement is applied
Table 4 Event selections applied in the \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \) and \(4\ell + 0{\uptau } _\mathrm {h} \) channels. The symbol “–” indicates that no requirement is applied

6 Event classification, signal extraction, and analysis strategy

Contributions from background processes that pass the event selection criteria detailed in Sect. 5, significantly exceed the expected \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal rates. The ratio of expected signal to background yields is particularly unfavorable in channels with a low multiplicity of leptons and \({\uptau } _\mathrm {h} \), notwithstanding that these channels also provide the highest acceptance for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals. In order to separate the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals from the background contributions, we employ a maximum-likelihood (ML) fit to the distributions of a number of discriminating observables. The choice of these observables is based on studies, performed with simulated samples of signal and background events, that aim at maximizing the expected sensitivity of the analysis. Compared to the alternative of reducing the background by applying more stringent event selection criteria, the chosen strategy has the advantage of retaining events reconstructed in kinematic regions of low signal-to-background ratio for analysis. Even though these events enter the ML fit with a lower “weight” compared to the signal events reconstructed in kinematic regions where the signal-to-background ratio is high, the retained events increase the overall sensitivity of the statistical analysis, firstly by increasing the overall \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal yield and secondly by simultaneously constraining the background contributions. The likelihood function used in the ML fit is described in Sect. 9. The diagram displayed in Fig. 3 describes the classification employed in each of the categories, which defines the regions that are fitted in the signal extraction fit.

Fig. 3
figure 3

Diagram showing the categorization strategy used for the signal extraction, making use of MVA-based algorithms and topological variables. In addition to the ten channels, the ML fit receives input from two control regions (CRs) defined in Sect. 7.3

The chosen discriminating observables are the outputs of machine-learning algorithms that are trained using simulated samples of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal events as well as \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \), \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, and diboson background samples. For the purpose of separating the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals from backgrounds, the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels employ ANNs, which allows to discriminate among the two signals and background simultaneously, while the other channels use BDTs.

The observables used as input to the ANNs and BDTs are outlined in Table 5. These are chosen to maximize the discrimination power of the discriminators, with the objective of maximizing the expected sensitivity of the analysis. The optimization is performed separately for each of the ten analysis channels. Typical observables used are: the number of leptons, \({\uptau } _\mathrm {h} \), and jets that are reconstructed in the event, where electrons and muons, as well as forward jets, central jets, and jets passing the loose and the tight \({{\mathrm{b}}} \) tagging criteria are counted separately; the 3-momentum of leptons, \({\uptau } _\mathrm {h} \), and jets; the magnitude of the missing transverse momentum, quantified by the linear discriminant \(L_{\mathrm {D}} \); the angular separation between leptons, \({\uptau } _\mathrm {h} \), and jets; the average \(\varDelta R\) separation between pairs of jets; the sum of charges for different combinations of leptons and \({\uptau } _\mathrm {h} \); observables related to the reconstruction of specific top quark and Higgs boson decay modes; as well as a few other observables that provide discrimination between the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals. A boolean variable that indicates whether the event has an SFOS lepton pair passing looser isolation criteria is included in regions with at least three leptons in the final state.

Table 5 Input variables to the multivariate discriminants in each of the ten analysis channels. The symbol “–” indicates that the variable is not used. For all objects, the three-momentum is constituted by the \(p_{\mathrm {T}}\), \(\eta \), and \(\phi \) components of the object momentum

Input variables are included related to the reconstruction of specific top quark and Higgs boson decay modes comprise the transverse mass of a given lepton, \(m_{\mathrm {T}} = \sqrt{\smash [b]{2 p_{\mathrm {T}} ^{\ell } p_{\mathrm {T}} ^\text {miss} \left( 1 - \cos \varDelta \phi \right) }}\), where \(\varDelta \phi \) refers to the angle in the transverse plane between the lepton momentum and the \({\vec p}_{\mathrm {T}}^{\text {miss}} \) vector; the invariant masses of different combinations of leptons and \({\uptau } _\mathrm {h} \); and the invariant mass of the pair of jets with the highest and second-highest values of the \({{\mathrm{b}}} \) tagging discriminant. These observables are complemented by the outputs of MVA-based algorithms, documented in Ref. [23], that reconstruct hadronic top quark decays and identify the jets originating from \({\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} \rightarrow \ell ^{+}{\upnu {}{}} _{\ell }{{\mathrm{q}}} {{\overline{{{{\mathrm{q}}}}}}} '\) decays.

In the \(0\ell + 2{\uptau } _\mathrm {h} \) channel, we use as additional inputs the invariant mass of the \({\uptau } \) lepton pair, which is expected to be close to the Higgs boson mass in signal events and is reconstructed using the algorithm documented in Ref. [89] (SVFit), in conjunction with the decay angle, denoted by \(\cos \theta ^{*}\), of the two tau leptons in the Higgs boson rest frame.

In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, the \(p_{\mathrm {T}}\) and \(\eta \) of the forward jet of highest \(p_{\mathrm {T}}\), as well as the distance \(\varDelta \eta \) of this jet to the jet nearest in pseudorapidity, are used as additional inputs to the ANN, in order to improve the separation of the \({\mathrm{t}} {\mathrm{H}} \) from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal. The presence of such a jet is a characteristic signature of \({\mathrm{t}} {\mathrm{H}} \) production in the t-channel. The forward jet in such \({\mathrm{t}} {\mathrm{H}} \) signal events is expected to be separated from other jets in the event by a pseudorapidity gap, since there is no color flow at tree level between this jet and the jets originating from the top quark and Higgs boson decays.

The number of simulated signal and background events that pass the event selection criteria described in Sect. 5 and are available for training the BDTs and ANNs typically amount to a few thousand. In order to increase the number of events in the training samples, in particular for the channels with a high multiplicity of leptons and \({\uptau } _\mathrm {h} \) where the amount of available events is most limited, we relax the identification criteria for electrons, muons, and hadronically decaying tau leptons. The resulting increase in the ratio of misidentified to genuine leptons and \({\uptau } _\mathrm {h} \) is corrected. We have checked that the distributions of the observables used for the BDT and ANN training are compatible, within statistical uncertainties, between events selected with relaxed and with nominal lepton and \({\uptau } _\mathrm {h} \) selection criteria, provided that these corrections are applied.

The ANNs used in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels are of the multiclass type. Such ANNs have multiple output nodes that, besides discriminating the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals from backgrounds, accomplish both the separation of the \({\mathrm{t}} {\mathrm{H}} \) from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal and the distinction between individual types of backgrounds. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, we use four output nodes, to distinguish between \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, \({\mathrm{t}} {\mathrm{H}} \) signal, \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background, and other backgrounds. No attempt is made to distinguish between individual types of backgrounds in the \(3\ell + 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, which therefore use three output nodes. The ANNs in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels implement 16, 5 and 3 hidden layers, respectively, each one of them containing 8 to 32 neurons. The softmax [90] function is chosen as an activation function for all output nodes, permitting the interpretation of their activation values as probability for a given event to be either \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, \({\mathrm{t}} {\mathrm{H}} \) signal, \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background, or other background (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, \({\mathrm{t}} {\mathrm{H}} \) signal, or background) in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel (in the \(3\ell + 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels). The events selected in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel (\(3\ell + 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels) are classified into four (three) categories, corresponding to the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, \({\mathrm{t}} {\mathrm{H}} \) signal, \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background, or other background (\({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, \({\mathrm{t}} {\mathrm{H}} \) signal, or background), according to the output node that has the highest such probability value. We refer to these categories as ANN output node categories. The four (three) distributions of the probability values of the output nodes in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel (in the \(3\ell + 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels) are used as input to the ML fit. Events are prevented from entering more than one of these distributions by assigning each event only to the distribution corresponding to the output node that has the highest activation value. The rectified linear activation function [91] is used for the hidden layers. The training is performed using the TensorFlow [92] package with the Keras [93] interface. The objective of the training is to minimize the cross-entropy loss function [94]. Batch gradient descent is used to update the weights of the ANN during the training. Overtraining is minimized by using Tikhonov regularization [95] and dropout [96].

The sensitivity of the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(3\ell + 0{\uptau } _\mathrm {h} \) channels, which are the channels with the largest event yields out of the three using multiclass ANN, is further improved by analyzing selected events in subcategories based on the flavor (electron or muon) of the leptons and on the number of jets passing the tight \({{\mathrm{b}}} \) tagging criteria. The motivation for distinguishing events by lepton flavor is that the rate for misidentifying nonprompt leptons as prompt ones and, in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, also the probability for mismeasuring the lepton charge is significantly higher for electrons compared to muons. Distinguishing events by the multiplicity of \({{\mathrm{b}}} \) jets improves in particular the separation of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal from the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background. This occurs because if a nonprompt lepton produced in the decay of a \({{\mathrm{b}}} \) hadron gets misidentified as a prompt lepton, the remaining particles resulting from the hadronization of the bottom quark are less likely to pass the \({{\mathrm{b}}} \) jet identification criteria, thereby reducing the number of \({{\mathrm{b}}} \) jets in such \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background events. The distribution of the multiplicity of \({{\mathrm{b}}} \) jets in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background events in which a nonprompt lepton is misidentified as prompt lepton (“nonprompt”) and in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background events in which this is not the case (“prompt”) is shown in Fig. 4. The figure also shows the distributions of \(p_{\mathrm {T}}\) and \(\eta \) of bottom quarks produced in top quark decays in \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal events compared to in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background events. The \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal features more bottom quarks of high \(p_{\mathrm {T}}\), whereas the distribution of \(\eta \) is similar for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal and for the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background.

Fig. 4
figure 4

Transverse momentum (left) and pseudorapidity (middle) distributions of bottom quarks produced in top quark decays in \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal events compared to \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background events, and multiplicity of jets passing tight \({{\mathrm{b}}} \) jet identification criteria (right). The latter distribution is shown separately for \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background events in which a nonprompt lepton is misidentified as a prompt lepton and for those background events in which all reconstructed leptons are prompt leptons. The events are selected in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel

The number of subcategories is optimized for each of the four (three) ANN output categories of the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) (\(3\ell + 0{\uptau } _\mathrm {h} \)) channel individually. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, each of the 4 ANN output node categories is subdivided into three subcategories, based on the flavor of the two leptons (\({\mathrm{e}} {\mathrm{e}} \), \({\mathrm{e}} {\upmu {}{}} \), \({\upmu {}{}} {\upmu {}{}} \)). In the \(3\ell + 0{\uptau } _\mathrm {h} \) channel, the ANN output node categories corresponding to the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal and to the \({\mathrm{t}} {\mathrm{H}} \) signal are subdivided into two subcategories, based on the multiplicity of jets passing tight \({{\mathrm{b}}} \) tagging criteria (bl: <2 tight \({{\mathrm{b}}} \)-tagged jets, bt: \(\ge \)2 tight \({{\mathrm{b}}} \)-tagged jets), while the output node category corresponding to the backgrounds is subdivided into seven subcategories, based on the flavor of the three leptons and on the multiplicity of jets passing tight \({{\mathrm{b}}} \) tagging criteria (\({\mathrm{e}} {\mathrm{e}} {\mathrm{e}} \); \({\mathrm{e}} {\mathrm{e}} {\upmu {}{}} \) bl, \({\mathrm{e}} {\mathrm{e}} {\upmu {}{}} \) bt; \({\mathrm{e}} {\upmu {}{}} {\upmu {}{}} \) bl, \({\mathrm{e}} {\upmu {}{}} {\upmu {}{}} \) bt; \({\upmu {}{}} {\upmu {}{}} {\upmu {}{}} \) bl, \({\upmu {}{}} {\upmu {}{}} {\upmu {}{}} \) bt), where bl (bt) again corresponds to the condition of <2 (\(\ge \)2) tight \({{\mathrm{b}}} \)-tagged jets. The \({\mathrm{e}} {\mathrm{e}} {\mathrm{e}} \) subcategory is not further subdivided by the number of \({{\mathrm{b}}} \)-tagged jets, because of the lower number of events containing three electrons compared to events in other categories. The aforementioned event categories are constructed based on the output of the BDTs and ANNs with the goal of enhancing the analysis sensitivity, while keeping a sufficiently high rate of background events for a precise estimation.

The BDTs used in the \(1\ell + 1{\uptau } _\mathrm {h} \), \(0\ell + 2{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \), \(1\ell + 2{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \) channels address the binary classification problem of separating the sum of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals from the aggregate of all backgrounds. The training is performed using the scikit-learn [34] package with the XGBoost [33] algorithm. The training parameters are chosen to maximize the integral, or area-under-the-curve, of the receiver-operating-characteristic curve of the BDT output.

7 Background estimation

The dominant background in most channels comes from the production of top quarks in association with \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons. We collectively refer to the sum of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{W}} \) backgrounds using the notation \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\). In \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) background events selected in the signal regions (SRs), reconstructed leptons typically originate from genuine prompt leptons or reconstructed \({{\mathrm{b}}} \) jets arising from the hadronization of bottom quarks, whereas reconstructed \({\uptau } _\mathrm {h} \) are a mixture of genuine hadronic \({\uptau } \) decays and misidentified quark or gluon jets. Background events from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) production may pass the \({\mathrm{Z}} \) boson veto applied in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), and \(3\ell + 1{\uptau } _\mathrm {h} \) channels in the case that the \({\mathrm{Z}} \) boson either decays to leptons and one of the leptons fails to get selected, or the \({\mathrm{Z}} \) boson decays to \({\uptau } \) leptons and the \({\uptau } \) leptons subsequently decay to electrons or muons. In the latter case, the invariant mass \(m_{\ell \ell }\) of the lepton pair is shifted to lower values because of the neutrinos produced in the \({\uptau } \) decays. Additional background contributions arise from off-shell \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\upgamma }{}{} ^{*} \) and \({\mathrm{t}} {\upgamma }{}{} ^{*} \) production: we include them in the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) background. The \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets production cross section is about three orders of magnitude larger than the cross section for associated production of top quarks with \({\mathrm{W}} \) and \({\mathrm{Z}} \) bosons, but in most channels the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background is strongly reduced by the lepton and \({\uptau } _\mathrm {h} \) identification criteria. Except for the channels \(1\ell + 1{\uptau } _\mathrm {h} \) and \(0\ell + 2{\uptau } _\mathrm {h} \), the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets background contributes solely in the cases that a nonprompt lepton (or a jet) is misidentified as a prompt lepton, a quark or gluon jet is misidentified as \({\uptau } _\mathrm {h} \), or the charge of a genuine prompt lepton is mismeasured. Photon conversions are a relevant background in the event categories with one or more reconstructed electrons in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(3\ell + 0{\uptau } _\mathrm {h} \) channels. The production of \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) pairs in events with two or more jets constitutes another relevant background in most channels. In the \(1\ell + 1{\uptau } _\mathrm {h} \) and \(0\ell + 2{\uptau } _\mathrm {h} \) channels, an additional background arises from DY production of \({\uptau } \) lepton pairs.

We categorize the contributions of background processes into reducible and irreducible ones. A background is considered irreducible if all reconstructed electrons and muons are genuine prompt leptons and all reconstructed \({\uptau } _\mathrm {h} \) are genuine hadronic \({\uptau } \) decays; in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, we further require that the measured charge of reconstructed electrons and muons matches their true charge. The irreducible background contributions are modeled using simulated events fulfilling the above criteria to avoid double-counting of all the other background contributions, which are considered to be reducible and are mostly determined from data.

Throughout the analysis, we distinguish three sources of reducible background contributions: misidentified leptons and \({\uptau } _\mathrm {h} \) (“misidentified leptons”), asymmetric conversions of a photon into electrons (“conversions”), and mismeasurement of the lepton charge (“flips”).

The background from misidentified leptons and \({\uptau } _\mathrm {h} \) refers to events in which at least one reconstructed electron or muon is caused by the misidentification of a nonprompt lepton or hadron, or at least one reconstructed \({\uptau } _\mathrm {h} \) arises from the misidentification of a quark or gluon jet. The main contribution to this background stems from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets production, reflecting the large cross section for this background process.

The conversions background consists of events in which one or more reconstructed electrons are due to the conversion of a photon. The conversions background is typically caused by \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} {\upgamma }{}{} \) events in which one electron or positron produced in the photon conversion carries most of the energy of the converted photon, whereas the other electron or positron is of low energy and fails to get reconstructed. We refer to such photon conversions as asymmetric conversions.

The flips background is specific to the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels and consists in events where the charge of a reconstructed lepton is mismeasured. The main contribution to the flips background stems from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events in which both top quarks decay semi-leptonically. In case of the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channel, a quark or gluon jet is additionally misidentified as \({\uptau } _\mathrm {h} \). The mismeasurement of the electron charge typically results from the emission of a hard bremsstrahlung photon, followed by an asymmetric conversion of this photon. The reconstructed electron is typically the electron or positron that carries most of the energy of the converted photon, resulting in an equal probability for the reconstructed electron to have either the same or opposite charge compared to the charge of the electron or positron that emitted the bremsstrahlung photon [77]. The probability of mismeasuring the charge of muons is negligible in this analysis.

The three types of reducible background are made mutually exclusive by giving preference to the misidentified leptons type over the flips and conversions types and by giving preference to the flips type over the conversions type when an event qualifies for more than one type of reducible background. The misidentified leptons and flips backgrounds are determined from data, whereas the conversions background is modeled using the MC simulation. The procedures for estimating the misidentified leptons and flips backgrounds are described in Sects. 7.1 and 7.2, respectively. We performed dedicated studies in the data to ascertain that photon conversions are adequately modeled by the MC simulation similar to the ones performed in Ref. [97]. To avoid potential double-counting of the background estimates obtained from data with background contributions modeled using the MC simulation, we match reconstructed electrons, muons, and \({\uptau } _\mathrm {h} \) to their generator-level equivalents and veto simulated signal and background events selected in the SR that qualify as misidentified leptons or flips backgrounds.

Concerning the irreducible backgrounds, we refer to the aggregate of background contributions other than those arising from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \), \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, DY, and diboson backgrounds, or from SM Higgs boson production via the processes \({\mathrm{g}} {\mathrm{g}} {\mathrm{H}} \), \({{\mathrm{q}}} {{\mathrm{q}}} {\mathrm{H}} \), \({\mathrm{W}} {\mathrm{H}} \), \({\mathrm{Z}} {\mathrm{H}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{H}} \), and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} {\mathrm{H}} \) as “rare” backgrounds. The rare backgrounds typically yield a minor background contribution to each of the ten analysis channels and include such processes as \({\mathrm{t}} {\mathrm{W}} \) and \({\mathrm{t}} {\mathrm{Z}} \) production, the production of \({\mathrm {SS}}\) \({\mathrm{W}} \) boson pairs, triboson, and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} \) production.

We validate the modeling of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \), \({\mathrm{W}} {\mathrm{Z}} \), and \({\mathrm{Z}} {\mathrm{Z}} \) backgrounds in dedicated control regions (CRs) whose definitions are detailed in Sect. 7.3.

7.1 Estimation of the “misidentified leptons” background

The background from misidentified leptons and \({\uptau } _\mathrm {h} \) is estimated using the misidentification probability (MP) method [23]. The method is based on selecting a sample of events satisfying all selection criteria of the SR, detailed in Sect. 5, except that the electrons, muons, and \({\uptau } _\mathrm {h} \) used to construct the signal regions are required to pass relaxed selections instead of the nominal ones. We refer to this sample of events as the application region (AR) of the MP method. Events in which all leptons and \({\uptau } _\mathrm {h} \) satisfy the nominal selections are vetoed, to avoid overlap with the SR.

An estimate of the background from misidentified leptons and \({\uptau } _\mathrm {h} \) in the SR is obtained by applying suitably chosen weights to the events selected in the AR. The weights, denoted by the symbol w, are given by the expression:

$$\begin{aligned} w = (-1)^{n+1} \, \prod _{i=1}^{n} \, \frac{f_{i}}{1 - f_{i}}\, \end{aligned}$$
(1)

where the product extends over all electrons, muons, and \({\uptau } _\mathrm {h} \) that pass the relaxed, but fail the nominal selection criteria, and n refers to the total number of such leptons and \({\uptau } _\mathrm {h} \). The symbol \(f_{i}\) denotes the probability for an electron, muon, or \({\uptau } _\mathrm {h} \) passing the relaxed selection to also satisfy the nominal one. The contributions of irreducible backgrounds to the AR are subtracted based on the MC expectation of such contributions. The \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal yields in the AR are found to be negligible.

The probabilities \(f_{i}\) for leptons are measured in multijet events, separately for electrons and muons, and are binned in \(p_{\mathrm {T}}\) and \(\eta \) of the lepton candidate. The measurement is based on selecting events containing exactly one electron or muon that passes the relaxed selection and at least one jet separated from the lepton by \(\varDelta R > 0.7\). Selected events are then subdivided into “pass” and “fail” samples, depending on whether the lepton candidate passes the nominal selection or not. The fail sample is dominated by the contribution of multijet events. The contributions of other processes, predominantly arising from \({\mathrm{W}} \)+jets, DY, diboson, and \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets production, are subtracted based on MC estimates of these contributions. The number of multijet events in the pass sample is obtained by an ML fit to the distribution of the observable:

$$\begin{aligned} m_{\mathrm {T}} ^{\text {fix}}= \sqrt{2 \, p_{\mathrm {T}} ^{\text {fix}} \, p_{\mathrm {T}} ^\text {miss} \, \left( 1 - \cos \varDelta \phi \right) }, \end{aligned}$$
(2)

where \(p_{\mathrm {T}} ^{\text {fix}}\) is a constant value set to \(35\,\text {GeV} \), and the symbol \(\varDelta \phi \) refers to the angle in the transverse plane between the lepton momentum and the \({\vec p}_{\mathrm {T}}^{\text {miss}} \) vector. \(p_{\mathrm {T}} ^{\text {fix}}\) is used instead of the lepton \(p_{\mathrm {T}}\) to reduce the correlation between \(m_{\mathrm {T}} ^{\text {fix}}\)and the lepton \(p_{\mathrm {T}}\). The ML fit is similar to the one used in the measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal rates, described in Sect. 9. The distribution of \({\mathrm{W}} \)+jets, DY, diboson, \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, and rare backgrounds in the observable \(m_{\mathrm {T}} ^{\text {fix}}\) is modeled using the MC simulation, whereas the distribution of multijet events in the pass sample is obtained from data in the fail region, from which the \({\mathrm{W}} \)+jets, DY, diboson, and \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets contributions are subtracted based on their MC estimate. The observable \(m_{\mathrm {T}} ^{\text {fix}}\) exploits the fact that the \(p_{\mathrm {T}} ^\text {miss} \) reconstructed in multijet events is mainly caused by resolution effects and is typically small, resulting in a falling distribution of \(m_{\mathrm {T}} ^{\text {fix}}\), whereas \({\mathrm{W}} \)+jets and \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events exhibit a broad maximum around \(m_{{\mathrm{W}}} \approx 80\,\text {GeV} \). Compared to the usual transverse mass, the observable \(m_{\mathrm {T}} ^{\text {fix}}\) has the advantage of not depending on the \(p_{\mathrm {T}}\) of the lepton, and is therefore better suited for the purpose of measuring the probabilities \(f_{i}\) in bins of lepton \(p_{\mathrm {T}}\). For illustration, the distributions of \(m_{\mathrm {T}} ^{\text {fix}}\) in the pass and fail samples are shown in Fig. 5 for events containing an electron of \(25< p_{\mathrm {T}} < 35\,\text {GeV} \) in the ECAL barrel. The contributions from \({\mathrm{W}} \)+jets, DY, and diboson production are assumed to scale by a common factor with respect to their MC expectation in the fit; we refer to their sum as “electroweak” (EWK) background. Finally, denoting the number of multijet events in the pass and fail samples by the symbols \(N_{\text {pass}}\) and \(N_{\text {fail}}\), the probabilities \(f_{i}\) are given by \(f_{i} = N_{\text {pass}} / (N_{\text {pass}} + N_{\text {fail}})\).

Fig. 5
figure 5

Distributions of \(m_{\mathrm {T}} ^{\text {fix}}\) for events containing an electron candidate of \(25< p_{\mathrm {T}} < 35\,\text {GeV} \) in the ECAL barrel, which (left) passes the nominal selection and (right) passes the relaxed, but fails the nominal selection. The “electroweak” (EWK) background refers to the sum of \({\mathrm{W}} \)+jets, DY, and diboson production. The “rare” backgrounds are defined in the text. The data in the fail sample agrees with the sum of multijet, EWK, \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, and rare backgrounds by construction, as the number of multijet events in the fail sample is computed by subtracting the sum of EWK, \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets, and rare background contributions from the data. The misidentification probabilities are derived separately for each era: this figure shows, as an example, the results obtained with the 2017 data set. The uncertainty band represents the total uncertainty after the fit has been performed

The \(f_{i}\) for \({\uptau } _\mathrm {h} \) are determined as a function of \(p_{\mathrm {T}}\) and \(\eta \) of the \({\uptau } _\mathrm {h}\) candidate in a region enriched in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events containing a reconstructed opposite-sign electron-muon pair and at least two loose b-tagged jets in addition to the \({\uptau } _\mathrm {h}\) candidate. Contributions of genuine \({\uptau } _\mathrm {h} \) are modeled using the MC simulation and subtracted.

The event samples used to measure the \(f_{i}\) are referred to as measurement regions (MRs) of the MP method. Potential biases in the estimate of the background from misidentified leptons and \({\uptau } _\mathrm {h} \), arising from differences between AR and MR in the \(p_{\mathrm {T}}\) spectrum of the lepton and \({\uptau } _\mathrm {h} \) candidates and in the mixture of nonprompt leptons and hadrons that are misidentified as prompt leptons, are mitigated as detailed in Ref. [80]. A closure test performed using simulated \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and multijet events reveals a residual difference between the probabilities \(f_{i}\) for electrons in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and those in multijet events. The test is illustrated in Fig. 6, which compares the distributions of \(p_{\mathrm {T}}\) of nonprompt electrons in simulated \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events for three cases: nonprompt electrons passing the nominal selection criteria (“nominal”); nonprompt electrons passing the relaxed, but failing the nominal selection criteria, weighted by probabilities \(f_{i}\) determined in simulated \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events (“relaxed, \(f_{i}\) from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets”); and nonprompt electrons passing the relaxed, but failing the nominal selection criteria, weighted by probabilities \(f_{i}\) determined in simulated multijet events (“relaxed, \(f_{i}\) from multijet”). The electron and muon \(p_{\mathrm {T}}\) distributions obtained in the first and second cases are in agreement, demonstrating the performance of the MP method. The ratio of the distributions obtained in the second and third cases is fitted by a linear function in \(p_{\mathrm {T}}\) of the lepton and is applied as a multiplicative correction to the \(f_{i}\) measured in data, that accounts for the different flavor composition of jets between AR and MR. For the lepton and \({\uptau } _\mathrm {h} \) selections used in this analysis, the probabilities \(f_{i}\) range from 0.04 to 0.13, 0.02 to 0.20, and 0.10 to 0.50 for electrons, muons, and \({\uptau } _\mathrm {h} \), respectively.

Fig. 6
figure 6

Transverse momentum distributions of nonprompt (left) electrons and (right) muons in simulated \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events, for the three cases “nominal”, “relaxed, \(f_{i}\) from \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets”, and “relaxed, \(f_{i}\) from multijet” discussed in text. The figure illustrates that a nonclosure correction needs to be applied to the probabilities \(f_{i}\) measured for electrons in data, while no such correction is needed for muons

The probabilities \(f_{i}\) for electrons and muons obtained as described above are validated in a CR dominated by semileptonic \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events. The events are selected by requiring the presence of two \({\mathrm {SS}}\) leptons and exactly three jets, one of which exactly passes the tight \({{\mathrm{b}}} \) tagging criteria. The three jets are interpreted as originating from the hadronic decay of one of the top quarks, while the other top quark decays semi-leptonically. One of the two reconstructed leptons is assumed to arise from the misidentification of a \({{\mathrm{b}}} \) hadron originating from the semi-leptonically decaying top quark. A kinematic fit using the constraints from kinematic relations between the top quark decay products is employed to increase the purity of semileptonic \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events that are correctly reconstructed in this CR. The level of compatibility of selected events with the aforementioned experimental signature is quantified using a \(\chi ^{2}\) criterion; events with a high value of \(\chi ^{2}\), corresponding to a poor-quality fit, are discarded. Good agreement is observed between semileptonic \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events where both leptons pass the nominal selection and semileptonic \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events where both leptons pass the relaxed selection, but one or both leptons fail the nominal selection, provided that the weights given by Eq. (1) are applied to the latter events by using the probabilities \(f_{i}\) measured in multijet events and corrected (for electrons) as described in the previous paragraph.

The MP method is applied in all channels except for \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) and \(3\ell + 1{\uptau } _\mathrm {h} \), where a modified version of the method is used, in which only the selections for the leptons are relaxed in the AR, while the \({\uptau } _\mathrm {h} \) is required to satisfy the nominal selection. Correspondingly, only the leptons are considered when computing the weights w, given by Eq. (1), that are applied to events in the AR of the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) and \(3\ell + 1{\uptau } _\mathrm {h} \) channels. Background contributions where the reconstructed leptons are genuine prompt leptons and the reconstructed \({\uptau } _\mathrm {h} \) is due to the misidentification of a quark or gluon jet are modeled using the MC simulation. Weights are applied to these simulated events to correct for differences in the \({\uptau } _\mathrm {h} \) misidentification rates between data and simulation. Using a modified version of the MP method in the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) and \(3\ell + 1{\uptau } _\mathrm {h} \) channels permits the retention as signal of those \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal events in which the reconstructed \({\uptau } _\mathrm {h} \) is not a genuine hadronic \({\uptau } \) decay, but arises instead from the misidentification of a quark or gluon jet. The fraction of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal events retained as signal amounts to approximately 30% of the total \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal yield in the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) and \(3\ell + 1{\uptau } _\mathrm {h} \) channels.

7.2 Estimation of the “flips” background

The flips background, relevant for events containing either one or two reconstructed electrons in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, is estimated using a procedure similar to the MP method. A sample of events passing all selection criteria of the SR, except that both leptons are required to be of \({\mathrm {OS}}\) instead of \({\mathrm {SS}}\), are selected and assigned appropriately chosen weights. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, the weight is given by the sum of the probabilities for the charge of either lepton to be mismeasured, whereas in the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channel, only the lepton that has the same charge as the \({\uptau } _\mathrm {h} \) is considered, since only those events in which the charge of this lepton is mismeasured satisfy the condition \(\sum \nolimits _{\ell ,{\uptau } _\mathrm {h}} q = \pm 1\) that is applied in the SR of this channel.

The probability for the charge of electrons to be mismeasured, referred to as the electron charge misidentification rate, is determined using \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) events. The events are selected by requiring the presence of an electron pair of invariant mass \(m_{{\mathrm{e}} {\mathrm{e}}}\) within the range \(60< m_{{\mathrm{e}} {\mathrm{e}}} < 120\,\text {GeV} \). No requirement is imposed on the charge of the electron pair. Contributions to the selected event sample arising from processes other than DY production of electron pairs are determined by performing an ML fit to the \(m_{{\mathrm{e}} {\mathrm{e}}}\) distribution. Referring to the number of \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) events containing reconstructed \({\mathrm {SS}}\) and \({\mathrm {OS}}\) electron pairs, respectively, by the symbols \(N_{{\mathrm {SS}}}\) and \(N_{{\mathrm {OS}}}\), the electron charge misidentification rate is given by the ratio \(N_{{\mathrm {SS}}}/(N_{{\mathrm {OS}}} + N_{{\mathrm {SS}}})\). The ratio is measured as a function of electron \(p_{\mathrm {T}}\) and \(\eta \) and varies between \(5.1 \times 10^{-5}\) for electrons of low \(p_{\mathrm {T}}\) in the ECAL barrel and \(1.6 \times 10^{-3}\) for electrons of high \(p_{\mathrm {T}}\) in the ECAL endcap. For illustration, the \(m_{{\mathrm{e}} {\mathrm{e}}}\) distributions for SS and OS electron pairs are shown in Fig. 7 for events in which both electrons are reconstructed in the ECAL barrel and have \(p_{\mathrm {T}}\) within the range \(25< p_{\mathrm {T}} < 50\,\text {GeV} \).

Fig. 7
figure 7

Distributions of \(m_{{\mathrm{e}} {\mathrm{e}}}\) for (left) SS and (right) OS electron pairs in \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) candidate events in which both electrons are in the ECAL barrel and have transverse momenta within the range \(25< p_{\mathrm {T}} < 50\,\text {GeV} \), for data recorded in 2018, compared to the expectation. Uncertainties shown are statistical only. A similar level of agreement is present in all the other momentum ranges and data-taking periods

7.3 Control regions for irreducible backgrounds

The accuracy of the simulation-based modeling of the main irreducible backgrounds, arising from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \), \({\mathrm{W}} {\mathrm{Z}} \), and \({\mathrm{Z}} {\mathrm{Z}} \) production, is validated in three CRs. The first CR is based on the SR for the \(3\ell + 0{\uptau } _\mathrm {h} \) channel and targets the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) and \({\mathrm{W}} {\mathrm{Z}} \) backgrounds. We refer to this CR as the \(3\ell \)-CR. The selection criteria applied in the \(3\ell \)-CR differ from those applied in the SR of the \(3\ell + 0{\uptau } _\mathrm {h} \) channel in that: no \({\mathrm{Z}} \) boson veto is applied in the \(3\ell \)-CR; the presence of at least one SFOS lepton pair of invariant mass \(m_{\ell \ell }\) with \(|m_{\ell \ell } - m_{{\mathrm{Z}}} | < 10\,\text {GeV} \) is demanded instead; the requirement on the multiplicity of jets is relaxed to demanding the presence of at least one jet; and no requirement on the presence of \({{\mathrm{b}}} \)-tagged jets is applied. The contributions arising from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) and from \({\mathrm{W}} {\mathrm{Z}} \) production are separated by binning the events selected in the \(3\ell \)-CR in the flavor of the three leptons (\({\mathrm{e}} {\mathrm{e}} {\mathrm{e}} \), \({\mathrm{e}} {\mathrm{e}} {\upmu {}{}} \), \({\mathrm{e}} {\upmu {}{}} {\upmu {}{}} \), \({\upmu {}{}} {\upmu {}{}} {\upmu {}{}} \)) and in the multiplicity of jets and of \({{\mathrm{b}}} \)-tagged jets. The second CR targets the \({\mathrm{Z}} {\mathrm{Z}} \) background. We refer to it as the \(4\ell \)-CR, since it is based on the SR for the \(4\ell + 0{\uptau } _\mathrm {h} \) channel. Compared to the latter, the event selection criteria applied in the \(4\ell \)-CR are modified by applying no \({\mathrm{Z}} \) veto, instead requiring the presence of at least one SFOS lepton pair of invariant mass \(m_{\ell \ell }\) with \(|m_{\ell \ell } - m_{{\mathrm{Z}}} | < 10\,\text {GeV} \), and applying no requirements on the multiplicity of jets and of \({{\mathrm{b}}} \)-tagged jets. To separate the \({\mathrm{Z}} {\mathrm{Z}} \) background from other backgrounds, predominantly arising from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) production, the events selected in the \(4\ell \)-CR are binned in the multiplicity of SFOS lepton pairs of invariant mass \(|m_{\ell \ell } - m_{{\mathrm{Z}}} | < 10\,\text {GeV} \) and in the number of jets passing tight \({{\mathrm{b}}} \) tagging criteria. The third CR targets the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\) background and is identical to the SR of the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, except that the output node of the ANN that has the highest activation value is required to be the output node corresponding to the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background.

The numbers of events observed in the \(3\ell \)- and \(4\ell \)-CRs and in the CR for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\) background are given in Table 6. The contributions arising from the misidentified leptons and flips backgrounds are estimated using the methods described in Sects. 7.1 and 7.2, respectively. The uncertainties include both statistical and systematic sources, added in quadrature. The systematic uncertainties that are relevant for the CRs are similar to the ones applied to the SR. The latter are detailed in Sect. 8.

Table 6 Number of events selected in the \(3\ell \)- and \(4\ell \)-CRs and in the CR for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\) background, compared to the event yields expected from different types of background and from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals, after the fit to data is performed as described in Sect. 9. Uncertainties shown include all systematic components. The symbol “–” indicates that the corresponding background does not apply

Figure 12, discussed in Sect. 9, shows the distributions of events selected in the \(3\ell \)- and \(4\ell \)-CRs in the binning scheme employed to separate the \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) backgrounds from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) backgrounds. The events selected in the \(3\ell \)-CR are first subdivided by lepton flavor and then by the multiplicity of jets and \({{\mathrm{b}}} \)-tagged jets. For each lepton flavor, 12 bins are used, defined as follows (in order of increasing bin number): 0 jets passing the tight \({{\mathrm{b}}} \) tagging criteria with 1, 2, 3, or \(\ge \)4 jets in total; 1 jet passing the tight \({{\mathrm{b}}} \) tagging criteria with 2, 3, 4, or \(\ge \)5 jets in total; \(\ge \)2 jets passing the tight \({{\mathrm{b}}} \) tagging criteria with 2, 3, 4, or \(\ge \)5 jets in total. In the \(4\ell \)-CR, 4 bins are used in total, defined as (again in order of increasing bin number): 2 SFOS lepton pairs of invariant mass \(|m_{\ell \ell } - m_{{\mathrm{Z}}} | < 10\,\text {GeV} \); 1 such SFOS lepton pair with 0, 1, or \(\ge \)2 jets passing the tight \({{\mathrm{b}}} \) tagging criteria.

The data in the \(3\ell \)- and \(4\ell \)-CRs and in the CR for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} ({\mathrm{W}})\) background are in agreement with the background estimates within the quoted uncertainties.

8 Systematic uncertainties

The event rates and the distributions of the discriminating observables used for signal extraction may be altered by several experiment- or theory-related effects, referred to as systematic uncertainties. Experimental sources comprise the uncertainties in auxiliary measurements, performed to validate and, if necessary, correct the modeling of the data by the MC simulation, and the uncertainties in the data-driven estimates of the misidentified leptons and flips backgrounds. The latter are largely unaffected by potential inaccuracies of the MC simulation. Theoretical uncertainties mainly arise from missing higher-order corrections to the perturbative expansions employed for the computation of cross sections and from uncertainties in the PDFs.

The efficiencies of triggers based on the presence of one, two, or three electrons or muons are measured as a function of the lepton multiplicity with an uncertainty ranging from 1 to 2%, using samples of \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and diboson events that have been recorded using triggers based on \(p_{\mathrm {T}} ^\text {miss} \).

The efficiencies for electrons and muons to pass the offline reconstruction and identification criteria are measured as a function of the lepton \(p_{\mathrm {T}}\) and \(\eta \) by applying the “tag-and-probe” method detailed in Ref. [71] to \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) and \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \) events. Additionally, we cross-check these efficiencies in a CR enriched in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events to account for differences in event topology between DY events and the events in the SR of this analysis, which may cause a change in the efficiencies for electrons and muons to pass isolation requirements. Events in the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets CR are selected by requiring the presence of an OS \({\mathrm{e}} \)+\({\upmu {}{}} \) pair and at least two jets. Nonprompt-lepton backgrounds in the CR are subtracted using a sideband region SS \({\mathrm{e}} \)+\({\upmu {}{}} \) events. The difference between the efficiency measured in the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets CR and the one measured in DY events is included as a systematic uncertainty, amounting to 1–2%. The \({\uptau } _\mathrm {h} \) identification efficiency and energy scale are measured with respective uncertainties of 5 and 1.2% using \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\uptau } {\uptau } \) events [74].

The energy scale of jets is measured with an uncertainty amounting to a few percent, depending on the jet \(p_{\mathrm {T}}\) and \(\eta \), using the \(p_{\mathrm {T}}\)-balance method, which is applied to \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \), \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \), \({\upgamma }{}{} \)+jets, dijet, and multijet events [72]. The resulting effect on signal and background expectations is evaluated by varying the energies of jets in simulated events within their uncertainties, recalculating all kinematic observables, and reapplying the event selection criteria. The effect of uncertainties in the jet energy resolution is evaluated in a similar way, but is smaller than the effect of the uncertainties in the jet energy scale.

The \({{\mathrm{b}}} \) tagging efficiency is measured with an uncertainty of a few per cent in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and multijet events as a function of jet \(p_{\mathrm {T}}\) and \(\eta \). The heavy-flavor content of the multijet events is enriched by requiring the presence of a muon in the event. The mistag rates for light-quark and gluon jets are measured in multijet events yielding an uncertainty of 5–10% for the loose and 20–30% for the tight \({{\mathrm{b}}} \) tagging criteria, depending on \(p_{\mathrm {T}}\) and \(\eta \) [73].

The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the 2.3–2.5% range [39,40,41], while the total Run 2 (2016–2018) integrated luminosity has an uncertainty of 1.8%, the improvement in precision reflecting the (uncorrelated) time evolution of some systematic effects.

The uncertainties related to the number of PU interactions are evaluated by varying the number of inelastic \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) interactions that are superimposed on simulated events by 4.6% [98]. The resulting effect on the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal yields and on the yields of background contributions modeled using the MC simulation amounts to less than 1%.

The effect of theory-related uncertainties on the event yields and on the distributions of the BDTs and ANNs classifier outputs that are used for the signal extraction is assessed for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals, as well as for the main irreducible backgrounds that arise from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{W}} \), and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) production. The uncertainties in the production cross sections amount to \(^{+6.8}_{-9.9}\) and \(^{+5.1}_{-7.3}\%\) for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals, and to \(^{+13.5}_{-12.2}\), \(^{+8.6}_{-11.3}\), and \(^{+11.7}_{-10.2}\%\) for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{W}} \), and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) backgrounds, respectively. These uncertainties are taken from Ref. [62] and consist of the sum in quadrature of three sources: missing higher-order corrections in the perturbative expansion, different choices of PDFs, and uncertainties in the value of the strong coupling constant \(\alpha _\mathrm {S} \). The uncertainties in the cross sections are relevant for the purpose of quoting the measured production rates with respect to their SM expectations for these rates. In addition, the uncertainty in the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production cross sections is relevant for setting limits on the coupling of the Higgs boson to the top quark. The effect of missing higher-order corrections on the distributions of the discriminating observables is estimated by varying the renormalization and factorization scales up and down by a factor of two with respect to their nominal value, following the recommendations of Refs. [99,100,101], avoiding cases in which the two variations are done in opposite directions. The effect of uncertainties in the PDFs on these distributions is evaluated following the recommendations given in Ref. [102]. The uncertainties in the branching fractions of the Higgs boson decay modes \({\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{H}} \rightarrow {\uptau } {\uptau } \), and \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \) are taken from Ref. [62] and amount to 1.5, 1.7, and 1.5%, respectively.

Table 7 Summary of the sources of systematic and statistical uncertainties and their impact on the measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal rates, and the measured value of the unconstrained nuisance parameters. The quantity \(\varDelta \mu _{x}/\mu _{x}\) corresponds to the change in uncertainty when fixing the nuisance parameters associated with that uncertainty in the fit. Under the label “MC and sideband statistical uncertainty” are the uncertainties associated with the limited number of simulated MC events and the amount of data events in the application region of the MP method

In the \(1\ell + 1{\uptau } _\mathrm {h} \) and \(0\ell + 2{\uptau } _\mathrm {h} \) channels, the \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and DY production may contribute as irreducible backgrounds and are modeled using the MC simulation. The \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and DY production cross sections are known to an uncertainty of 5 [65] and 4% [103], respectively. An additional uncertainty on the modeling of top quark \(p_{\mathrm {T}}\) distribution of \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets events is considered, defined as the difference between the nominal powheg sample and that sample reweighed to improve the quality of the top quark \(p_{\mathrm {T}}\) modeling, as described in Sect. 3. The modeling of the multiplicity of jets and of \({{\mathrm{b}}} \)-tagged jets in simulated DY events is improved by comparing these multiplicities between MC simulation and data using \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) and \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \) events. The average ratio of data and MC simulation in the \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) and \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \) event samples is taken as a correction, while the difference between the ratios measured in \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) and \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \) events is taken as the systematic uncertainty and added in quadrature to the statistical uncertainties in these ratios. The \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\mathrm{e}} {\mathrm{e}} \) and \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\upmu {}{}} {\upmu {}{}} \) event samples used to determine this correction have little overlap with the SRs of the \(1\ell + 1{\uptau } _\mathrm {h} \) and \(0\ell + 2{\uptau } _\mathrm {h} \) channels, since most of the DY background in these channels arises from \({\mathrm{Z}}/{\upgamma }{}{} ^{*} \rightarrow {\uptau } {\uptau } \) events.

Other background processes, notably the conversions and rare backgrounds, are modeled using the MC simulation; the uncertainty in their event yields is conservatively taken to be 50%. This choice accounts for the extrapolation from the inclusive phase space to the phase space relevant for this analysis, in particular to events with a high multiplicity of jets and \({{\mathrm{b}}} \)-tagged jets, as required to pass the event selection criteria detailed in Sect. 5. The inclusive cross sections for most of these background processes have been measured with uncertainties amounting to significantly less than 50% by previous analyses of the LHC data.

The extrapolation of the \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) background rates from the \(3\ell \)- and \(4\ell \)-CRs to the SR depends on the heavy-flavor content of \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) background events. According to the MC simulation, most of the \({{\mathrm{b}}} \) jets reconstructed in \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) background events arise from the misidentification of light-quark or gluon jets rather than from charm or bottom quarks. We assign an uncertainty of 40% to the modeling of the heavy-flavor content in \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) background events, accounting for the differences in the jet multiplicity distribution between data and simulation in the \(3\ell \) CR. The misidentification of light quark or gluon jets as \({{\mathrm{b}}} \) jets is covered by a separate systematic uncertainty.

The uncertainties in the rate and in the distribution of the discriminating observables for the background from misidentified leptons and \({\uptau } _\mathrm {h} \) stem from statistical uncertainties in the events selected in the MR and AR as well as from systematic uncertainties related to the subtraction of the prompt-lepton contributions from the data selected in the MR and AR of the MP method. The effect of these uncertainties on the analysis is evaluated by applying independent variations of the probabilities \(f_{i}\) for electrons and muons in different bins of lepton-candidate \(p_{\mathrm {T}}\) and \(\eta \) and determining the resulting change in the yield and distribution of the misidentified leptons background estimate. We introduce an additional uncertainty in the nonclosure correction to the \(f_{i}\) for electrons and muons, accounting for differences between the probabilities \(f_{i}\) in \({{\mathrm{t}} {}{{\overline{{{\mathrm{t}}}}}}} \)+jets and multijet events shown in Fig. 6. The size of this uncertainty is equal to the magnitude of the correction. In case of \({\uptau } _\mathrm {h} \), the misidentification rates \(f_{i}\) measured in each bin in \(\eta \) and reconstructed \({\uptau } _\mathrm {h} \) decay mode are fitted by a linear function in \(p_{\mathrm {T}}\) of the \({\uptau } _\mathrm {h} \) candidate and the uncertainty in the slope and offset of this fit is propagated to the final result. The uncertainty in the rate of the misidentified leptons background is, in general, higher for channels with \({\uptau } _\mathrm {h} \). The uncertainty varies between 10% in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel and 60% in the \(2\ell + 2{\uptau } _\mathrm {h} \) channel. The resulting uncertainty in the distribution of the discriminating observables is of moderate size. Additional nonclosure uncertainties account for small differences between the misidentified leptons background estimate obtained by computing the probabilities \(f_{i}\) for simulated events and applying the weights w given by Eq. (1) to simulated events selected in the AR, and the background estimates obtained by modeling the background from misidentified leptons and \({\uptau } _\mathrm {h} \) in the SR using the MC simulation directly.

The uncertainty in the flips background in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels is evaluated in a similar way: it amounts to 30% in each channel.

The effects of systematic uncertainties representing the same source are treated as fully correlated between all ten analysis channels. Theoretical uncertainties are furthermore treated as fully correlated among all data-taking periods, whereas the uncertainties arising from experimental sources are treated as uncorrelated between the data recorded in each of the years 2016, 2017, and 2018. The latter treatment is justified by the fact that the uncertainties related to the auxiliary measurements that are performed to validate, and if necessary correct, the modeling of the data by the MC simulation, are mainly of statistical origin and hence independent for measurements that are performed independently for each of the three data-taking periods because of the changes in the detector conditions from one period to another.

The impact of the systematic and statistical uncertainties on the measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal rates is summarized in Table 7. The largest impacts are due to: the statistical uncertainty of observed data; the uncertainty in the efficiency to reconstruct and identify \({\uptau } _\mathrm {h}\); the uncertainties related to the estimation of the misidentified leptons and flips backgrounds; and the theoretical uncertainties, which affect the yield and the distribution of the discriminating observables for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals as well as for the main irreducible backgrounds, arising from \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{t}} {\mathrm{W}} \), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \), and \({\mathrm{t}} {\mathrm{Z}} \) production.

Fig. 8
figure 8

Distributions of the activation value of the ANN output node with the highest activation value for events selected in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel and classified as \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal (upper left), \({\mathrm{t}} {\mathrm{H}} \) signal (upper right), \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background (lower left), and other backgrounds (lower right). The distributions expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for background processes are shown for the values of the parameters of interest and of the nuisance parameters obtained from the ML fit. The best fit value of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates amounts to \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\) times the rates expected in the SM

Fig. 9
figure 9

Distributions of the activation value of the ANN output node with the highest activation value for events selected in the \(3\ell + 0{\uptau } _\mathrm {h} \) channel and classified as \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal (upper left), \({\mathrm{t}} {\mathrm{H}} \) signal (upper right), and background (lower left), and for events selected in the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channel (lower right). In case of the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channel, the activation value of the ANN output nodes for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, \({\mathrm{t}} {\mathrm{H}} \) signal, and background are shown together in a single histogram, concatenating histogram bins as appropriate and enumerating the bins by a monotonously increasing number. The distributions expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for background processes are shown for the values of the parameters of interest and of the nuisance parameters obtained from the ML fit. The best fit value of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates amounts to \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\) times the rates expected in the SM

Fig. 10
figure 10

Distributions of the BDT output for events selected in the \(1\ell + 1{\uptau } _\mathrm {h} \) (upper left), \(0\ell + 2{\uptau } _\mathrm {h} \) (upper right), and \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \) (lower) channels. The distributions expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for background processes are shown for the values of the parameters of interest and of the nuisance parameters obtained from the ML fit. The best fit value of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates amounts to \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\) times the rates expected in the SM

Fig. 11
figure 11

Distributions of the BDT output used for the signal extraction in the \(1\ell + 2{\uptau } _\mathrm {h} \) (upper left), \(4\ell + 0{\uptau } _\mathrm {h} \) (upper right), \(3\ell + 1{\uptau } _\mathrm {h} \) (lower left), and \(2\ell + 2{\uptau } _\mathrm {h} \) (lower right) channels. The distributions expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for background processes are shown for the values of the parameters of interest and of the nuisance parameters obtained from the ML fit. The best fit value of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates amounts to \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\) times the rates expected in the SM

Fig. 12
figure 12

Distributions of discriminating observables in the \(3\ell + 0{\uptau } _\mathrm {h} \) (left) and \(4\ell + 0{\uptau } _\mathrm {h} \) (right) control region. The distributions expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for background processes are shown for the values of the parameters of interest and of the nuisance parameters obtained from the ML fit. The best fit value of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates amounts to \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\) times the rates expected in the SM

8.1 Additional checks

As a cross-check, and to highlight the enhancement in sensitivity provided by machine-learning techniques, a complementary measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal rate is performed using a set of alternative observables in the ML fit. We refer to this cross-check as the control analysis, as distinguished from the analysis previously discussed, which we refer to as the main analysis. The control analysis (CA) is restricted to the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), and \(4\ell + 0{\uptau } _\mathrm {h} \) channels. The production rate of the \({\mathrm{t}} {\mathrm{H}} \) signal is fixed to its SM expectation in the CA. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, the invariant mass of the lepton pair is used as the discriminating observable. The event selection criteria applied in the CA in this channel are modified to the condition \(N_{\mathrm {j}} \ge 4\) and the events are analyzed in subcategories based on lepton flavor, the charge-sum of the leptons (\(+2\) or \(-2\)), and the multiplicity of jets. In the \(3\ell + 0{\uptau } _\mathrm {h} \) channel, the invariant mass of the three-lepton system is used as discriminating observable and the events are analyzed in subcategories based on the multiplicity of jets and on the charge-sum of the leptons (\(+1\) or \(-1\)). A discriminant based on the matrix-element method [35, 36] is used as discriminating observable in the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channel and the events are analyzed in two subcategories based on the multiplicity of jets, defined by the conditions \(N_{\mathrm {j}} = 3\) and \(N_{\mathrm {j}} \ge 4\), and referred to as the “missing-jet” and “no-missing-jet” subcategories. The computation of the discriminant exploits the fact that the differential cross sections for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal, as well as for the dominant background processes in the \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channel, are well known; this permits the computation of the probabilities for a given event to be either signal or background, given the measured values of kinematic observables in the event and taking into account the experimental resolution of the detector. The probabilities are computed for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal hypothesis and for three types of background hypotheses: \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) events in which the \({\mathrm{Z}} \) boson decays into a pair of \({\uptau } \) leptons; \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) events in which the \({\mathrm{Z}} \) boson decays into a pair of electrons or muons and one lepton is misidentified as \({\uptau } _\mathrm {h} \); and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} \rightarrow {{\mathrm{b}}} \ell {\upnu {}{}} \, {{\overline{{{{\mathrm{b}}}}}}} {\uptau } {\upnu {}{}} \) events with one additional nonprompt lepton originating from a \({{\mathrm{b}}} \) hadron decay. Details on the computation of these probabilities are given in Ref. [23]. The ratio of the probability for a given event to be \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal to the sum of the probabilities for the event to be one of the three backgrounds constitutes, according to the Neyman-Pearson lemma [104], an optimal observable for the purpose of separating the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal from backgrounds and is taken as the discriminant used for the signal extraction. In the \(4\ell + 0{\uptau } _\mathrm {h} \) channel, the invariant mass of the four-lepton system, \(m_{4\ell }\), is used as the discriminating observable.

9 Statistical analysis and results

The production rates of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals are determined through a binned simultaneous ML fit to the total of 105 distributions: the outputs of the BDTs in each of the seven channels \(1\ell + 1{\uptau } _\mathrm {h} \), \(0\ell + 2{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {OS}}+ 1{\uptau } _\mathrm {h} \), \(1\ell + 2{\uptau } _\mathrm {h} \), \(4\ell + 0{\uptau } _\mathrm {h} \), \(3\ell + 1{\uptau } _\mathrm {h} \), and \(2\ell + 2{\uptau } _\mathrm {h} \); the distributions of the 10 output nodes of the ANNs in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels in the categories described in Fig. 3; and the distributions of the observables that discriminate the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) background from each of the \({\mathrm{W}} {\mathrm{Z}} \) and \({\mathrm{Z}} {\mathrm{Z}} \) backgrounds in the \(3\ell \)- and \(4\ell \)-CRs, respectively; separately for the three data-taking periods considered in the analysis. The \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) (\(3\ell + 0{\uptau } _\mathrm {h} \)) channel contributes a total of 12 (11) distributions per data-taking period to the ML fit, reflecting the subdivision of these channels into event categories based on lepton flavor and on the multiplicity of \({{\mathrm{b}}} \)-tagged jets.

The production rates of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals constitute the parameters of interest (POI) in the fit. We denote by the symbols \(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}\) and \(\mu _{{\mathrm{t}} {\mathrm{H}}}\) the ratio of these production rates to their SM expectation and use the notation \(\varvec{\mu }\) to refer to the set of both POIs.

Table 8 Number of events selected in each of the ten analysis channels compared to the event yields expected from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and from background processes. The expected event yields are computed for the values of nuisance parameters and of the POI obtained from the ML fit. The best fit values of the POI amount to \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\). Quoted uncertainties represent the sum of statistical and systematic components. The symbol “–” indicates that the corresponding expected contribution is smaller than 0.1 events

The likelihood function is denoted by the symbol \({\mathcal {L}}\) and is given by the expression:

$$\begin{aligned} {\mathcal {L}}\left( \text {data} \, \vert \, \varvec{\mu }, \varvec{\theta }\right) = \prod _{i} \, {\mathcal {P}}\left( n_{i} \vert \, \varvec{\mu }, \varvec{\theta }\right) \, \prod _{k} \, \text {p}\left( {\tilde{\theta }}_{k} \vert \theta _{k}\right) , \end{aligned}$$
(3)

where the index i refers to individual bins of the 105 distributions of the discriminating observables that are included in the fit, and the factor \({\mathcal {P}}\left( n_{i} \vert \, \varvec{\mu }, \varvec{\theta }\right) \) represents the probability to observe \(n_{i}\) events in a given bin i, where \(\nu _{i}(\varvec{\mu }, \varvec{\theta })\) events are expected from the sum of signal and background contributions in that bin. The number of expected events is a linear function of the two POIs indicated by \(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}\) and \(\mu _{{\mathrm{t}} {\mathrm{H}}}\)

$$\begin{aligned} \nu _{i}(\varvec{\mu }, \varvec{\theta }) = \mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} \nu _{i}^{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}(\varvec{\theta }) + \mu _{{\mathrm{t}} {\mathrm{H}}} \nu _{i}^{{\mathrm{t}} {\mathrm{H}}}(\varvec{\theta }) + \nu _{i}^{\mathrm {B}}(\varvec{\theta }), \end{aligned}$$
(4)

where the symbols \(\nu _{i}^{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}\), \(\nu _{i}^{{\mathrm{t}} {\mathrm{H}}}\), and \(\nu _{i}^{\mathrm {B}}\) denote, respectively, the SM expectation for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal contributions and the aggregate of contributions expected from background processes in bin i. We use the notation \(\nu _{i}(\varvec{\mu }, \varvec{\theta })\) to indicate that the number of events expected from signal and background processes in each bin i depends on a set of parameters, denoted by the symbol \(\varvec{\theta }\), that represent the systematic uncertainties detailed in Sect. 8 and are referred to as nuisance parameters. Via the dependency of the \(\nu _{i}(\varvec{\mu }, \varvec{\theta })\) on \(\varvec{\theta }\), the nuisance parameters accommodate for variations of the event yields as well as of the distributions of the discriminating observables during the fit. The probability \({\mathcal {P}}\left( n_{i} \vert \, \varvec{\mu }, \varvec{\theta }\right) \) is given by the Poisson distribution:

$$\begin{aligned} {\mathcal {P}}\left( n_{i} \vert \, \varvec{\mu }, \varvec{\theta }\right) = \frac{\left( \nu _{i}(\varvec{\mu }, \varvec{\theta })\right) ^{n_{i}}}{n_{i}!} \, \exp \left( -\nu _{i}(\varvec{\mu }, \varvec{\theta })\right) . \end{aligned}$$
(5)

Individual elements of the set of nuisance parameters \(\varvec{\theta }\) are denoted by the symbol \(\theta _{k}\), where each \(\theta _{k}\) represents a specific source of systematic uncertainty. The function \(\mathrm {p}({\tilde{\theta }}_{k} \vert \theta _{k})\) represents the probability to observe a value \({\tilde{\theta }}_{k}\) in an auxiliary measurement of the nuisance parameter, given that its true value is \(\theta _{k}\). Systematic uncertainties that affect only the normalization, but not the shape of the distribution of the discriminating observables, are represented by a Gamma probability density function if they are statistical in origin, e.g. if they correspond to the number of events observed in a CR, and otherwise by a log-normal probability density function; systematic uncertainties that also affect the shape of distributions of the discriminating observables are incorporated into the ML fit via the technique detailed in Ref. [105] and represented by a Gaussian probability density function.

The rates of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) backgrounds are separately left unconstrained in the fit. The rate of the small \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} {\mathrm{W}} \) background is constrained to scale by the same factor with respect to its SM expectation as the rate of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) background.

Statistical fluctuations in the background predictions arise because of a limited number of events in the MC simulation as well as in the ARs that are used to estimate the misidentified leptons and flips backgrounds from data. These fluctuations are incorporated into the likelihood function via the approach described in Ref. [106].

Further details concerning the treatment of systematic uncertainties and concerning the choice of the functions \(\mathrm {p}({\tilde{\theta }}_{k} \vert \theta _{k})\) are given in Refs. [105, 107, 108].

A complication in the signal extraction arises from the fact that a deviation in the top quark Yukawa coupling \(y_{{\mathrm{t}}}\) with respect to the SM expectation \(m_{{\mathrm{t}}}/v\) would change the distribution of kinematic observables for the \({\mathrm{t}} {\mathrm{H}} \) signal and alter the proportion between the \({\mathrm{t}} {\mathrm{H}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal rates. We address this complication by first determining the production rates for the \({\mathrm{t}} {\mathrm{H}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signals, assuming that the distributions of kinematic observables for the \({\mathrm{t}} {\mathrm{H}} \) signal conform to the distributions expected in the SM; we then determine the Yukawa coupling \(y_{{\mathrm{t}}}\) of the Higgs boson to the top quark, accounting for modifications in the interference effects for the \({\mathrm{t}} {\mathrm{H}} \) signal. These studies assume a Higgs boson mass of 125 \(\,\text {GeV}\).

Assuming the distributions of the discriminating observables for the \({\mathrm{t}} {\mathrm{H}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signals agree with their SM expectation, the production rate for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal is measured to be \(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92 \pm 0.19 \,\text {(stat)} ^{+0.17}_{-0.13}\,\text {(syst)} \) times the SM expectation, equivalent to a \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production cross section for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production of \(466 \pm 96\,\text {(stat)} ^{+70}_{-56}\,\text {(syst)} \,\text {fb} \), and that of the \({\mathrm{t}} {\mathrm{H}} \) signal is measured to be \(\mu _{{\mathrm{t}} {\mathrm{H}}} = 5.7 \pm 2.7\,\text {(stat)} \pm 3.0\,\text {(syst)} \) times the SM expectation for this production rate, equivalent to a cross section for \({\mathrm{t}} {\mathrm{H}} \) production of \(510 \pm 200\,\text {(stat)} \pm 220\,\text {(syst)} \,\text {fb} \). The corresponding observed (expected) significance of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal amounts to 4.7 (5.2) standard deviations, assuming the \({\mathrm{t}} {\mathrm{H}} \) process to have the SM production rate, and that of the \({\mathrm{t}} {\mathrm{H}} \) signal to 1.4 (0.3) standard deviations, also assuming the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) process to have the SM production rate. We have estimated the agreement between the data and our statistical model by using a goodness-of-fit test to the saturated model, obtaining a p-value of 0.097, showing no indication of a significant difference between data and the assumed model.

Fig. 13
figure 13

Distribution of the decimal logarithm of the ratio between the expected \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} +{\mathrm{t}} {\mathrm{H}} \) signal and the expected sum of background contributions in each bin of the 105 distributions that are included in the ML fit used for the signal extraction. The distributions expected for signal and background processes are computed for \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.92\), \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}} = 5.7\), and the values of nuisance parameters obtained from the ML fit

Fig. 14
figure 14

Production rate \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}\) of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal (left) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}}\) of \({\mathrm{t}} {\mathrm{H}} \) signal (right), in units of their rate of production expected in the SM, measured in each of the ten channels individually and for the combination of all channels. The central value of the signal strength in the \(2\ell + 2{\uptau } _\mathrm {h} \)is constrained to be greater than zero

Fig. 15
figure 15

Two-dimensional contours of the likelihood function \({\mathcal {L}}\), given by Eq. (3), as a function of the production rates of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals (\(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}\) and \(\mu _{{\mathrm{t}} {\mathrm{H}}}\)) and of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) backgrounds (\(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}}}\) and \(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}}}\)). The two production rates that are not shown on either the x or the y axis are profiled such that the function \({\mathcal {L}}\) attains its minimum at each point in the x-y plane

The distributions that are included in the ML fit are shown in Figs. 8, 9, 10, 11 and 12. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) and \(3\ell + 0{\uptau } _\mathrm {h} \) channels, we show the distributions of the activation values of ANN output nodes in the different subcategories based on lepton flavor and on the multiplicity of \({{\mathrm{b}}} \)-tagged jets in a single histogram, concatenating histogram bins as appropriate, and enumerate the bins by a monotonically increasing number. The distributions expected for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals, as well as the expected background contributions, are shown for the value of the POI and of nuisance parameters obtained from the ML fit. The uncertainty bands shown in the figures represent the total uncertainty in the sum of signal and background contributions that remains after having determined the value of the nuisance parameters through the ML fit. These bands are computed by randomly sampling from the covariance matrix of the nuisance parameters as determined by the ML fit and adding the statistical uncertainties in the background predictions in quadrature. The data are in agreement with the sum of contributions estimated by the ML fit for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals and for the background processes. The corresponding event yields are given in Table 8. In the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, the sums of events yields in all ANN output node categories are given in the table.

The event yields of background processes obtained from the ML fit agree reasonably well with their expected production rate, given the uncertainties. In particular, the production rates of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) backgrounds are determined to be \(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}}} = 1.03 \pm 0.14\,\text {(stat+syst)}\) and \(\mu _{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}}} = 1.43 \pm 0.21\,\text {(stat+syst)}\) times their SM expectation, as obtained from the MC simulation.

The evidence for the presence of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals in the data is illustrated in Fig. 13, in which each bin of the distributions that are included in the ML fit is classified according to the expected ratio of the number of \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} +{\mathrm{t}} {\mathrm{H}} \) signal (S) over background (B) events in that bin. A significant excess of events with respect to the background expectation is visible in the bins with the highest expected S/B ratio.

Fig. 16
figure 16

Probability for \({\mathrm{t}} {\mathrm{H}} \) signal events produced by the \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) (left) and \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) (right) production process to pass the event selection criteria for the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels in each of the Higgs boson decay modes as a function of the ratio \(\kappa _{{\mathrm{t}}}/\kappa _{{\mathrm{V}}}\) of the Higgs boson couplings to the top quark and to the \({\mathrm{W}} \) boson

Fig. 17
figure 17

Dependence of the likelihood function \({\mathcal {L}}\) in Eq. (3), as a function of \(\kappa _{{\mathrm{t}}}\), profiling over \(\kappa _{{\mathrm{V}}}\) (left), and as a function of \(\kappa _{{\mathrm{t}}}\) and \(\kappa _{{\mathrm{V}}}\) (right)

The \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal rates measured in the ten individual channels are shown in Fig. 14, obtained by performing a likelihood fit in which signal rates are parametrized with independent parameters, one for each channel. The measurement of the \({\mathrm{t}} {\mathrm{H}} \) production rate is only shown in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels, which employ a multiclass ANN to separate the \({\mathrm{t}} {\mathrm{H}} \) from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal. The sensitivity of the other channels to the \({\mathrm{t}} {\mathrm{H}} \) signal is small. The \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates obtained from the simultaneous fit of all channels are also shown in the figure. The signal rates measured in individual channels are compatible with each other and with the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) production rates obtained from the simultaneous fit of all channels. The largest deviation from the SM expectation is observed in the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production rate in the \(2\ell + 2{\uptau } _\mathrm {h} \) channel, where the best fit value of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal rate is negative, reflecting the deficit of observed events compared to the background expectation in this channel, as shown in Fig. 11. The value and uncertainty shown in Fig. 14 are obtained after requiring the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production rates in this channel to be positive. The value measured in the \(2\ell + 2{\uptau } _\mathrm {h} \) channel is compatible with the SM expectation at the level of 1.94 standard deviations when constraining the signal strength in that channel to be larger than zero. The sensitivity of individual channels can be inferred from the size of the uncertainty band in the measured signal strengths. The channel providing the highest sensitivity is the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \) channel, which is the channel providing the largest signal yield, followed by the \(3\ell + 0{\uptau } _\mathrm {h} \) and \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \) channels.

Figure 15 shows the correlations between the measured \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signal rates and those between the signal rates and the production rates of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) backgrounds. All correlations are of moderate size, demonstrating the performance achieved by the multiclass ANN in distinguishing between the \({\mathrm{t}} {\mathrm{H}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signals as well as in separating the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals from the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{Z}} \) and \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{W}} \) backgrounds.

In the CA described in Sect. 8.1, the measured production rate for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal is \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.5 \pm 0.3\,\text {(stat+syst)}\), \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 1.3 \pm 0.5\,\text {(stat+syst)}\), \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.9 \pm 0.4\,\text {(stat+syst)}\), and \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 1.5 \pm 1.5\,\text {(stat+syst)}\) times the SM expectation, in the \(2\ell {\mathrm {SS}}+ 0{\uptau } _\mathrm {h} \), \(3\ell + 0{\uptau } _\mathrm {h} \), \(2\ell {\mathrm {SS}}+ 1{\uptau } _\mathrm {h} \), and \(4\ell + 0{\uptau } _\mathrm {h} \) channels, respectively, while \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}} = 0.91 \pm 0.21\,\text {(stat)} \pm 0.18\,\text {(syst)} \) is obtained for the simultaneous ML fit of all four channels. The \(3\ell \)- and \(4\ell \)-CRs are included in each of these ML fits. The corresponding observed (expected) significance of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) signal in the CA amounts to 3.8 (4.0) standard deviations.

We now drop the assumption that the distributions of kinematic observables for the \({\mathrm{t}} {\mathrm{H}} \) signal conform to the distributions expected in the SM and determine the Yukawa coupling \(y_{{\mathrm{t}}}\) of the Higgs boson to the top quark. We parametrize the production rates \({\hat{\mu }}_{{\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}}}\) and \({\hat{\mu }}_{{\mathrm{t}} {\mathrm{H}}}\) of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals as a function of the ratio of the top quark Yukawa coupling \(y_{{\mathrm{t}}}\) to its SM expectation \(m_{{\mathrm{t}}}/v\). We refer to this ratio as the coupling modifier and denote it by the symbol \(\kappa _{{\mathrm{t}}}\). The effect of the interference, described in Sect. 1, between the diagrams in Fig. 2 on the distributions of kinematic observables is parametrized as a function of \(\kappa _{{\mathrm{t}}}\) and fully taken into account, adjusting the event yield for the \({\mathrm{t}} {\mathrm{H}} \) signal as well as the distributions of the outputs of the BDTs and ANNs for each value of \(\kappa _{{\mathrm{t}}}\). The changes in the kinematical properties of the event affect the probability for \({\mathrm{t}} {\mathrm{H}} \) signal events to pass the event selection criteria. The effect is illustrated in Fig. 16, which shows the variation of the product of acceptance and efficiency for the \({\mathrm{t}} {\mathrm{H}} {{\mathrm{q}}} \) and \({\mathrm{t}} {\mathrm{H}} {\mathrm{W}} \) signal contributions in each decay mode of the Higgs boson as a function of the ratio \(\kappa _{{\mathrm{t}}}/\kappa _{{\mathrm{V}}}\), where \(\kappa _{{\mathrm{V}}}\) denotes the coupling of the Higgs boson to the \({\mathrm{W}} \) boson with respect to the SM expectation for this coupling. The coupling of the Higgs boson to the \({\mathrm{Z}} \) boson with respect to its SM expectation is assumed to scale by the same value \(\kappa _{{\mathrm{V}}}\). Variations of the coupling modifier \(\kappa _{{\mathrm{V}}}\) from the SM expectation \(\kappa _{{\mathrm{V}}}= 1\) affect the interference between the diagrams in Fig. 2 as well as the branching fractions of the Higgs boson decay modes \({\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} \) and \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \). We compute the compatibility of the data with different values of \(\kappa _{{\mathrm{t}}}\) and \(\kappa _{{\mathrm{V}}}\), as is shown in Fig. 17. We obtain a 95% confidence level (\(\text {CL}\)) region on \(\kappa _{{\mathrm{t}}}\) consisting of the union of the two intervals \(-0.9< \kappa _{{\mathrm{t}}}< -0.7\) and \(0.7< \kappa _{{\mathrm{t}}}< 1.1\) at 95% confidence level (\(\text {CL}\)). At 95% \(\text {CL}\), both the inverted top coupling scenario and the SM expectation \(\kappa _{{\mathrm{t}}}= 1\) are in agreement with the data.

10 Summary

The rate for Higgs boson production in association with either one or two top quarks has been measured in events containing multiple electrons, muons, and hadronically decaying tau leptons, using data recorded by the CMS experiment in \({{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} {{\mathrm{p}}_{\mathrm{}}^{\mathrm{}}} \) collisions at \(\sqrt{s} = 13\,\text {TeV} \) in 2016, 2017, and 2018. The analyzed data corresponds to an integrated luminosity of 137\(\,\text {fb}^{-1}\). Ten different experimental signatures are considered in the analysis, differing by the multiplicity of electrons, muons, and hadronically decaying tau leptons, and targeting events in which the Higgs boson decays via \({\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} \), \({\mathrm{H}} \rightarrow {\uptau } {\uptau } \), or \({\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} \), whereas the top quark(s) decay either semi-leptonically or hadronically. The measured production rates for the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) and \({\mathrm{t}} {\mathrm{H}} \) signals amount to \(0.92 \pm 0.19 \,\text {(stat)} ^{+0.17}_{-0.13}\,\text {(syst)} \) and \(5.7 \pm 2.7\,\text {(stat)} \pm 3.0\,\text {(syst)} \) times their respective standard model (SM) expectations. The corresponding observed (expected) significance amounts to 4.7 (5.2) standard deviations for \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \), and to 1.4 (0.3) for \({\mathrm{t}} {\mathrm{H}} \) production. Assuming that the Higgs boson coupling to the tau lepton is equal in strength to the values expected in the SM, the coupling \(y_{{\mathrm{t}}}\) of the Higgs boson to the top quark divided by its SM expectation, \(\kappa _{{\mathrm{t}}}=y_{{\mathrm{t}}}/y_{{\mathrm{t}}}^{{\mathrm {SM}}}\), is constrained to be within \(-0.9< \kappa _{{\mathrm{t}}}< -0.7\) or \(0.7< \kappa _{{\mathrm{t}}}< 1.1\), at 95% confidence level. This result is the most sensitive measurement of the \({\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} \) production rate to date.