Search for a heavy pseudoscalar Higgs boson decaying into a 125 GeV Higgs boson and a Z boson in final states with two tau and two light leptons at $\sqrt{s} =$ 13 TeV

A search is performed for a pseudoscalar Higgs boson, A, decaying into a 125 GeV Higgs boson h and a Z boson. The h boson is specifically targeted in its decay into a pair of tau leptons, while the Z boson decays into a pair of electrons or muons. A data sample of proton-proton collisions collected by the CMS experiment at the LHC at $\sqrt{s} =$ 13 TeV is used, corresponding to an integrated luminosity of 35.9 fb$^{-1}$. No excess above the standard model background expectations is observed in data. A model-independent upper limit is set on the product of the gluon fusion production cross section for the A boson and the branching fraction to Zh $\to$ $\ell\ell\tau\tau$. The observed upper limit at 95% confidence level ranges from 27 to 5 fb for A boson masses from 220 to 400 GeV, respectively. The results are used to constrain the extended Higgs sector parameters for two benchmark scenarios of the minimal supersymmetric standard model.


Introduction
In the standard model (SM) [1][2][3], the Brout-Englert-Higgs mechanism [4][5][6][7][8][9] is responsible for the electroweak symmetry breaking, and it predicts the existence of the Higgs boson. A Higgs boson with a mass around 125 GeV was discovered by the ATLAS and CMS Collaborations in 2012 [10][11][12]. The best measurement of the Higgs boson mass to date, 125.26 ± 0.21 GeV, comes from a partial Run 2 data set analysis by the CMS Collaboration [13]; the result is consistent with the earlier Run 1 combined measurement by the ATLAS and CMS Collaborations [14] and the recent results by the ATLAS Collaboration [15]. The couplings of the observed boson have been studied extensively, and are found to be compatible with the SM expectation [16,17].
The observation of the Higgs boson has not only given closure to the search for particles described by the SM, but also constrains the beyond-the-SM theories proposed to explain some of the open questions in particle physics. A class of simple extensions of the SM, two-Higgs-doublet models (2HDMs), predicts the existence of five Higgs bosons [18,19]. Two of these five particles are CP-even Higgs bosons (h and H), and thus either of them could correspond to the observed particle. The properties of the observed state can be used to exclude regions of the parameter space of 2HDMs. Further constraints can be placed by performing searches for the four additional Higgs bosons, namely the scalar H, the CP-odd Higgs boson A, and two charged Higgs bosons H ± . Moreover, 2HDMs are a prerequisite for the minimal supersymmetric standard model (MSSM) where the extended Higgs sector at tree-level is fully defined by two parameters, conventionally chosen to be the ratio of the vacuum expectation values of the two Higgs doublets (tan β) and the mass of the pseudoscalar A (m A ).
In the MSSM, given that the mass of the h boson is as large as 125 GeV, the scale of the soft supersymmetry breaking masses can be larger than 1 TeV. This is a reasonable assumption based on the nonobservation of supersymmetric particles at the CERN LHC thus far. In many of the MSSM benchmark scenarios typically studied, the predicted mass of the Higgs boson is lower than 125 GeV in the low tan β region [20]. We study two MSSM benchmark scenarios that can accommodate these constraints in most of the m A -tan β plane: M 125 h,EFT [20] and hMSSM [21][22][23][24]. The Higgs sector predictions of the M 125 h,EFT scenario are derived from a 2HDM effective field theory framework, with a supersymmetric mass scale that can reach up to 10 16 GeV, in order for the Higgs boson mass to be compatible with 125 GeV in the low tan β region. In the hMSSM scenario, by requiring m h = 125 GeV, the dominant radiative corrections to the Higgs boson mass become fixed, which are then used to determine the masses and couplings of the other Higgs bosons.
The parameter spaces of these benchmark scenarios can be explored by studying processes producing an experimentally accessible signature with a 125 GeV Higgs boson. One such process is the decay of the A boson into a 125 GeV Higgs boson and a Z boson. In the parameter space region with low tan β values, this decay has a substantial branching fraction. For tan β 5 the A boson is produced mainly in gluon fusion (gg → A), but for higher tan β values the associated production with b quarks (bbA) becomes dominant. The Feynman diagrams for both production processes are shown in Fig. 1.
This paper reports on a search for the pseudoscalar A boson decaying into a 125 GeV Higgs boson h and a Z boson in proton-proton (pp) collisions at √ s = 13 TeV. The search is based on a data set collected in 2016 by the CMS experiment, corresponding to an integrated luminosity of 35.9 fb −1 . The analysis is primarily sensitive to the assumed gluon fusion production of the A boson, but the associated production with b quarks is included in the interpretation of the results. The studied signal mass range begins at 220 GeV because the A boson must be massive In this search, the Higgs boson is sought in its decay into a pair of tau leptons. Four possible ττ decay channels of the Higgs boson are considered: eτ h , µτ h , τ h τ h , and eµ, where τ h denotes a tau lepton decaying hadronically. Throughout the paper, neutrinos are omitted from the notation of the final states. These four decay channels are combined with the Z boson decays into two light leptons, i.e., Z → + − ( = e, µ), resulting in eight distinct final states of the A boson decay. To account for the missing transverse momentum that results from the neutrinos in the final states, we use the SVFIT algorithm [29] to reconstruct the four-vector of the Higgs boson while constraining its mass to 125 GeV. Compared to the previous result presented by the CMS Collaboration [26], this novel approach significantly increases the sensitivity of the search.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel fluxreturn yoke outside the solenoid. Events of interest are selected using a two-tiered trigger system [30]. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [31].

Simulated samples and models
Simulated signal events with a pseudoscalar Higgs boson A produced in gluon fusion (gg → A), decaying into a 125 GeV Higgs boson and a Z boson and finally into two tau and two leptons (electrons or muons) are generated at leading order (LO) using MADGRAPH5 aMC@NLO v2.4.2 [32]. The considered A boson mass points are within 220-400 GeV, as in this mass range the A → Zh decay becomes predominant. The samples are based on the m mod+ h model [33], assuming a low value of tan β (∼2). The generated width of the A boson is small compared to the instrumental resolution for all masses. Additional signal events are simulated for a 300 GeV A boson produced in association with b quarks (bbA) and are used only to study the selection efficiency, necessary for setting model-dependent limits, as explained in Section 8.
The background processes consist of all SM processes with nonnegligible yield in the studied final states, including the Higgs boson production through processes predicted in the SM (e.g. Zh, Wh, tth). The background processes with a Higgs boson decaying into two tau leptons, produced in association with a W or Z boson (Wh or Zh), are generated at next-to-LO (NLO) in perturbative quantum chromodynamics (QCD) with the POWHEG 2.0 [34][35][36][37][38] generator extended with the MiNLO procedure [39]. The contribution from Higgs boson events produced via gluon fusion or vector boson fusion and decaying into two tau leptons is negligible. The transverse momentum (p T ) distribution of the Higgs boson in the POWHEG simulations is tuned to match closely the next-to-NLO (NNLO) plus next-to-next-to-leading-logarithmic prediction in the HRES 2.3 generator [40,41]. The production cross sections and branching fractions for the SM Higgs boson production and their corresponding uncertainties are taken from Refs. [42][43][44].
The background samples for tth, tt, WZ, and qq → ZZ, as well as Wh → WWW, Wh → WZZ, Zh → ZWW, Zh → ZZZ, and gg → h → ZZ processes, are generated at NLO with POWHEG 2.0. The gg → ZZ process is generated at LO with MCFM [45]. Samples for the qq → ZZ and gg → ZZ processes include all SM events with two Z bosons in the final states except the ones from the gg → h → ZZ process. The MADGRAPH5 aMC@NLO 2.2.2 or 2.3.3 generator is used for triboson, Z + jets, ttW, and ttZ production, with the jet matching and merging scheme applied either at NLO with the FxFx algorithm [46] or at LO with the MLM algorithm [47]. The generators are interfaced with PYTHIA 8.212 [48] to model the parton showering and fragmentation, as well as the decay of the tau leptons. The PYTHIA parameters affecting the description of the underlying event are set to the CUETP8M1 tune [49]. The set of parton distribution functions (PDFs) used in the simulation is NNPDF3.0 [50].
The generated events are processed through a simulation of the CMS detector based on GEANT4 [51], and are reconstructed with the same algorithms that are used for data. The simulated samples include additional pp interactions per bunch crossing, referred to as in-time pileup. The effect of inelastic collisions happening in the preceding and subsequent bunch crossings (out-of-time pileup) is also considered. The effect of pileup is taken into account by generating concurrent minimum bias collision events. The simulated events are weighted such that the distribution of the number of pileup interactions matches with that observed in data. The pileup distribution is estimated from the measured instantaneous luminosity for each bunch crossing, resulting in an average of approximately 23 interactions per bunch crossing.
To produce model-dependent interpretations of the results described in Section 8, we utilize production cross section and branching fraction calculations for the pseudoscalar A in the M 125 h,EFT and hMSSM scenarios. In the M 125 h,EFT scenario, Higgs boson masses and mixing parameters (and effective Yukawa couplings) were calculated with a yet unpublished version of FEYNHIGGS based on version 2.14.3 [20, [52][53][54][55][56].
Inclusive bbA production cross sections at NNLO QCD accuracy in the five-flavor scheme are calculated with SUSHI, based on BBH@NLO [72]. The results are combined with the bbA cross section calculation at NLO in QCD in the four-flavor scheme [73,74] using the Santander matching scheme [75] for the hMSSM scenario, and matched predictions [76][77][78][79]

Event reconstruction
Both observed and simulated events are reconstructed using the particle-flow (PF) algorithm [83]. The particle-flow algorithm aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. In this process the reconstructed PF objects include photons, electrons, muons, neutral hadrons, and charged hadrons.
Higher-level objects are reconstructed from combinations of the PF objects. For example, jets are reconstructed with an anti-k T clustering algorithm implemented in the FASTJET library [84,85]. The reconstruction is based on the clustering of PF objects with a distance parameter of 0.4. Charged PF objects are required to be associated with the primary vertex of the interaction. The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [84,86] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets. Jet energy corrections are derived from simulation studies so that the average measured response of jets becomes identical to that of particle level jets. In situ measurements of the momentum balance in dijet, photon+jet, Z + jet, and multijet events are used to determine any residual differences between the jet energy scale in data and in simulation, and appropriate corrections are applied [87].
While neutrinos cannot be detected directly, they contribute to the missing transverse momentum. The missing transverse momentum vector p miss T is computed as the negative vector sum of the transverse momenta of all the PF objects in an event [88]. The p miss T is modified to account for corrections to the energy scale of the reconstructed jets in the event.
Electrons are identified by a multivariate analysis (MVA) discriminant that requires as input several quantities describing the track quality, the shapes of the energy deposits in the ECAL, and the compatibility of the measurements from the tracker and the ECAL [89]. Muon identification relies on the number of hits in the inner tracker and the muon systems, and on the quality of the reconstructed tracks [90]. Electrons and muons selected in this analysis are required to be consistent with originating from the primary vertex.
A lepton isolation discriminant I is defined to reject nonprompt or misidentified leptons ( = e, µ): where p T stands for the p T of the lepton. The variable ∑ charged p T is the scalar sum of the transverse momenta of the charged particles originating from the primary vertex and located in a cone of size ∆R = √ (∆η) 2 + (∆φ) 2 = 0.3 (0.4) centered on the electron (muon) direction, where φ is the azimuthal angle in radians. The sum ∑ neutral p T represents a similar quantity for neutral particles. The scalar sum of the transverse momenta of charged hadrons originating from pileup vertices in the cone, ∑ charged, PU p T , is used to estimate the contribution of photons and neutral hadrons originating from pileup vertices. The factor of 1/2 corresponds approximately to the ratio of neutral-to charged-hadron production in the hadronization process of inelastic pp collisions. The isolation requirements based on I are described in the following section.
The combined secondary vertex algorithm [91] is used to identify jets that are likely to have originated from a bottom quark ("b-tagged jets"). In this algorithm, the secondary vertices associated with the jet and the track-based lifetime information are given as inputs to an MVA discriminant designed for b jet identification. Differences in the b tagging efficiency between data and simulation are taken into account by applying a set of p T -dependent correction factors to the simulated events [91]. The identification efficiency for genuine b jets in this analysis is approximately 63%, whereas the misidentification probability for c (light-flavor or gluon) jets is approximately 12 (0.9)% for jet p T > 20 GeV.
Anti-k T jets seed the hadron-plus-strips algorithm [92, 93] which is used to reconstruct τ h candidates. A hadronic decay of a tau lepton can result in one or more charged hadrons, and additional π 0 particles. These π 0 s are reconstructed by clustering electromagnetic deposits in the ECAL into "strips" in the η-φ plane. The strips are elongated in the φ direction to contain the calorimeter signatures of converted photons from neutral pion decays. The algorithm reconstructs τ h candidates based on the number of tracks and the number of strips representing the number of charged hadrons ("prongs") and the number of π 0 s present in the decay. The τ h candidates used in this analysis are reconstructed in three decay modes: 1-prong, 1-prong+π 0 s, and 3-prong.
To suppress objects (jets and leptons) misidentified as τ h candidates, an MVA discriminant [93] including calorimetric information, isolation sums, and lifetime information is used. A misidentification rate for quark-and gluon-initiated jets of less than 1% is achieved within a p T range typical of a τ h candidate originating from an h boson. At the same time, an efficiency for selecting τ h candidates of ≈70% can be achieved for τ h candidates passing the decay mode reconstruction discussed above. To further suppress electrons and muons misidentified as τ h candidates, dedicated criteria based on the consistency between the measurements in the tracker, the calorimeters, and the muon detectors are applied [92,93]. The τ h energy scale is measured from Z → ττ events and the correction is propagated to the simulation for each decay mode. A "tag-and-probe" measurement [93] in Z → events, where one of the is misidentified as a τ h candidate, is used to correct the energy scale of electrons and muons misidentified as τ h candidates in simulation.
The reconstructed mass of the A boson candidate can be used to discriminate between signaland background-like events. Multiple reconstruction methods are considered and described below. The resulting mass distributions for the signal process (m A = 300 GeV) are shown in Fig. 2. The shapes of the background distributions do not depend on the mass reconstruction method as strongly as the shape of the signal distribution. The simplest reconstructed mass, m vis τ τ , uses only the visible decay products and combines the reconstructed Z → four-vector with the h → ττ four-vector based only on visible τ decay products. The resulting mass resolution for m vis τ τ is approximately 15% for an A boson with a mass of 300 GeV in all final states.
The mass resolution of the reconstructed A boson candidate can be significantly improved by accounting for the neutrinos associated with the leptonic and hadronic tau decays. We use the SVFIT algorithm [29] to estimate the mass of the Higgs boson, denoted as m fit τ τ . The SVFIT algorithm combines the p miss T with the four-vectors of both τ candidates (electrons, muons, or τ h ), resulting in an improved estimate of the four-vector of h boson that is then used to obtain a more accurate estimate of the A boson candidate mass m fit τ τ . The mass resolution of m fit τ τ is 10% for an A boson with a mass of 300 GeV.
To further improve the mass resolution, the measured mass of the Higgs boson (125 GeV) can be given as an input to the SVFIT algorithm. This yields a constrained estimate of the four-vector of the h boson, which results in an even more precise estimate of the A boson candidate mass, denoted as m c τ τ . The resulting mass resolution of m c τ τ is as good as 3% at 300 GeV, which improves the expected 95% confidence level (CL) model-independent limits by approximately 40% compared to using the visible mass of the A boson m vis τ τ as the discriminating variable. Thus, we use m c τ τ as the discriminating variable between the signal and the background processes for the final results.

Event selection
Events are selected online using dilepton or single-lepton triggers targeting leptonic decays of the Z bosons. The trigger and offline selection requirements for the Z boson decay modes are presented in Table 1. Each lepton selected by the trigger is required to be geometrically matched to a corresponding lepton selected in the analysis. The light leptons in an event are required to be separated from each other by ∆R > 0.3, while the τ h candidates must be separated from each other and from any other lepton by ∆R > 0.5. The resulting event samples are made mutually exclusive by discarding events that have additional identified and isolated electrons or muons. Small differences in trigger selection efficiencies are observed between data and simulation, and are accounted for by applying corrections to the simulated events.
The nontriggering electrons and muons are required to have p T > 10 GeV, whereas τ h candidates are required to have p T > 20 GeV. The |η| constraints from detector geometry are |η e | < 2.5, |η µ | < 2.4, and |η τ h | < 2.3 for electrons, muons, and τ h candidates, respectively. The |η| boundaries are the same for both triggering and nontriggering electrons and muons. Table 1: Trigger and offline selection requirements for the different Z boson decay modes. The events are selected using either dilepton triggers with lower-p T thresholds or single-lepton triggers with higher-p T thresholds. The subscripts 1 and 2 indicate the higher-and lower-p T leptons associated with the Z boson, respectively.
The Z boson is reconstructed from a pair of opposite-charge, same-flavor light leptons that fulfills 60 < m < 120 GeV. In case of multiple Z boson candidates, we choose the one with the mass closest to the Z boson mass. Loose identification and isolation selection criteria are applied to the leptons associated to the Z boson to maintain a high signal acceptance. The leptons forming the Z boson candidate are required to pass the lepton identification, which has an efficiency of 90 (>99)% for electrons (muons). The muons must pass an isolation requirement of I µ < 0.25, while a loose isolation requirement is already included in the electron identification selection.
The leptons associated with the h boson decay are required to have opposite charge. In case of the eτ h , µτ h , and eµ decay channels, tighter selection criteria are applied to the light leptons to decrease the background contributions from Z + jets and other reducible backgrounds. The specific signal selections detailed in Table 2, including those chosen for the τ h candidates, were optimized to obtain the best signal sensitivity. The isolation requirements are I e(µ) < 0.15 for electrons (muons) associated to a tau lepton decay. Electrons from tau lepton decays need to pass the electron identification which has an efficiency of 80%. The τ h candidates associated with the Higgs boson must satisfy the τ h identification and isolation requirements which have an efficiency of 70%.
The large h boson mass leads to relatively high-p T decay products compared to the lower p T of jets misidentified as leptons from the Z + jets background process. This background process is suppressed by selecting events based on the scalar p T sum of the visible decay products of the Higgs boson, L h T . In the + τ h τ h final states, which have a larger relative ratio of reducible to irreducible backgrounds, events with L h T > 60 GeV are selected. The signal events contain no b jets (gg → A), or only b jets with a relatively soft p T distribution (bbA). We suppress the contributions from background processes, especially tt and ttZ, by discarding all events with one or more b-tagged jets with p T > 20 GeV ("b jet veto") without significantly reducing the signal selection efficiency. The total acceptance for the gg → A (bbA) signal events with m A = 300 GeV is 3.9 (3.0)%. The fraction of gg → A signal events lost due to the b jet veto is negligible, while for the bbA process approximately 17% of events are removed with this selection.
The sensitivity of the analysis is improved by reducing the number of background events using additional information regarding the Higgs boson candidate. The constrained Higgs boson candidate four-vector, as estimated with the SVFIT algorithm, is used to reconstruct the A boson mass, as described in Section 4. By removing the mass constraint from the SVFIT algorithm, the most likely mass of the Higgs boson candidate m fit τ τ provides significant discrimination between reducible backgrounds, which have a broad distribution due to their nonresonant nature, and the signal processes, which have a resonance present at 125 GeV. Moreover, the dominant irreducible background from ZZ → 4 (qq → ZZ and gg → ZZ) is suppressed, because for this background the m fit τ τ distribution is concentrated near the Z boson mass in contrast to the signal. The signal sensitivity is increased by an additional 20% by requiring m fit τ τ to be within 90-180 GeV.

Background estimation
The irreducible backgrounds (ZZ → 4 , ttZ, WWZ, WZZ, ZZZ) and the production of the 125 GeV Higgs boson via the processes predicted by the SM are estimated from simulation. They are scaled by their theoretical cross sections calculated at the highest order available, and the processes producing the 125 GeV Higgs boson are also scaled by their most accurate branching fractions [42].
The reducible backgrounds, which have at least one jet misidentified as an electron, muon, or τ h candidate, are estimated from data. In this analysis the dominant reducible contributions come from the tt, Z + jets, and WZ + jets processes which produce jets misidentified as τ candidates. The estimation of the reducible background contribution is performed with a so-called "fake rate method" which is based on measuring the misidentification rates, i.e., probabilities to misidentify a jet as a lepton. Events with τ candidates failing the signal region identification and isolation criteria are used along with the misidentification rates to estimate the contribution from the reducible background in the signal region.
In total three different event samples are used to estimate the contribution from the reducible background processes. First, the misidentification rates are estimated in event samples independent from the signal region. This region is called a "measurement region". To understand to which extent the measured misidentification rates describe the jets misidentified as leptons in the signal region, closure tests comparing the observed and the estimated reducible background yields are performed in yet another region ("validation region"). The validation region is required to be independent from the signal and the measurement regions. The closure tests are used to derive systematic uncertainties to account for possible differences between the true and the estimated reducible background yields in the signal region. Finally, the misidentification rates are applied in an "application region", formed by events that fail the identification and isolation criteria required in the signal region.
In this analysis we use a sample of Z + jet events to estimate the misidentification rates. The estimation of misidentification rates relies on reconstructing an opposite-charge, same-flavor lepton pair compatible with a Z boson, and requiring one additional loosely defined lepton (electron, muon, or τ h candidate). The requirements on the leptons associated with the Z boson are the same as defined in Section 5, but they must fulfill a more stringent dilepton mass requirement, 81.2 < m < 120 GeV. After reconstructing the Z → decay, the jetto-lepton misidentification rate is estimated by applying the lepton identification algorithm to the additional loosely defined lepton in the event. The misidentification rates are measured in different bins of lepton p T , and are further split between reconstructed decay modes for the τ h candidate, and for muons and electrons in bins of lepton η, based on the barrel and endcap regions. The events where the τ candidates arise from genuine tau leptons, electrons, or muons and not jets, primarily from the WZ process, are estimated from simulation and subtracted from data so that the misidentification rates are measured for genuine hadronic jets only. The obtained misidentification rates for electrons (muons) are <5 (10)% in barrel and endcap regions for lepton p T > 10 GeV, whereas for τ h candidates the misidentification rates vary between 15-30% for τ h candidate p T > 20 GeV depending on the decay mode.
The measured misidentification rates are validated in another region that consists of events with a Z boson candidate and two additional loosely defined leptons. To ensure that the validation region is not contaminated with signal events or irreducible background contributions, the two additional leptons are required to have the same charge. Modest differences in observed versus predicted reducible background yields are observed. These differences are accounted for by assigning a systematic uncertainty in the yield, taken to be 40% which is conservative enough to cover the observed nonclosure. This uncertainty is uncorrelated between the Higgs boson decay channels resulting in four uncertainties tied to + eτ h , + µτ h , + τ h τ h , and + eµ channels. Further studies confirmed that the final results of this analysis are not sensitive to the exact magnitude of this systematic uncertainty.
To estimate the reducible background contribution in the signal region, we apply a weight on data events where either one or both of the τ candidates associated to the Higgs boson fail the identification and isolation criteria. These data events form the application region.
Events with exactly one object failing the identification and isolation criteria receive a weight f /(1 − f ), where f is the misidentification rate for the particular type of lepton. As such, this weight includes the contribution from the WZ + jets process, where we expect one genuine lepton and one jet misidentified as a lepton in addition to the Z boson candidate. Also tt and Z + jets processes are accounted for by the weight as either of the two jets can pass the identification and isolation criteria even if neither of them is a genuine lepton. As a result, the weight introduces double counting of events from tt and Z + jets processes.
To remove the double-counted events from tt and Z + jets processes, we define a weight with a negative sign that is given for events with both objects failing the identification and isolation criteria, ]. This subtraction, however, introduces increased statistical uncertainties on the estimated yield of the reducible background.
The statistical uncertainties can be controlled by taking the shape of the m c τ τ distribution of the reducible background contribution from data in another region with negligible signal and irreducible background contributions. This region is defined similarly to the signal region but with same-sign τ candidates passing relaxed identification and isolation criteria, yielding a higher number of events available for the shape estimation. This results in a smoother shape of the m c τ τ distribution, which is normalized to the estimated yield of the reducible background contribution in the signal region.
An alternative approach to estimate the reducible background contribution was studied to reduce the statistical uncertainties and to cross check the results obtained using the nominal method. Instead of using the same-sign data events for the shape of the m c τ τ distribution, the statistical uncertainties can be reduced considerably by giving a suitable nonzero weight only for events with both candidates failing the selection criteria, i.e., by estimating only the contribution from the tt and Z + jets processes by using the misidentification rate method. The contribution from events with a single object failing the identification and isolation criteria is predicted from simulation, removing the double counting present in the nominal method. As a result, this alternative approach requires using a weight with a positive sign . Since the statistical uncertainties are smaller, the shape of the m c τ τ distribution is taken from the same events that provide the estimated yield of the reducible background. The results of the cross-check show that the two methods yield consistent expected 95% CL model-independent limits.
To cross check the measured misidentification rates, we performed an additional measurement using a sample of Z + 2 jets events. In this cross-check, the measurement region partially overlaps with the aforementioned validation region, as in both cases the two lepton candidates are required to have the same charge. The amount of overlap between the measurement and validation regions depends on the lepton type and the decay channel of the Higgs boson. The rates are measured in bins of lepton p T , and are separated by the reconstructed decay mode of the τ h candidates. Unlike above, the misidentification rates are not split in bins of lepton η for electrons and muons. The measured misidentification rates result in a reducible background yield and shape that are compatible with the reducible background estimation obtained with the nominal misidentification rate measurement used in this analysis.

Systematic uncertainties
All systematic uncertainties considered in the analysis are summarized in Table 3. Different uncertainties are treated as uncorrelated, and each uncertainty is assumed correlated between different processes and final states, unless otherwise mentioned below.
The overall uncertainty in the τ h identification and isolation efficiency for genuine τ h leptons is 5% [93], which has been measured with a tag-and-probe method in Z → ττ events. An uncertainty of 1.2% in the visible energy scale of genuine τ h candidates affects both the distributions and yields of the signals and backgrounds. It is uncorrelated across the 1-prong, 1-prong+π 0 s, and 3-prong decay modes.
The uncertainties in the electron and muon identification and isolation efficiencies lead to a normalization uncertainty of 2% for either electrons or muons. The uncertainty in the trigger efficiency results in a normalization uncertainty of 2% for both electron and muon triggers. In all channels, the effect of the uncertainty in the electron and muon energy scales is negligible.
The normalization uncertainty related to vetoing events with a b-tagged jet is 4.5% for the background processes with heavy-flavor jets (from charm or bottom quarks), i.e., tt, ttZ, and ttW. All other processes, including the signal process, are dominated by light-flavor or gluon jets and their normalization uncertainty is 0.15%.
The normalization uncertainties related to the choice of PDFs, and the renormalization and factorization (RF) scales, affecting the acceptance of the dominant background processes, are estimated from simulation separately for each process. The uncertainty from the RF scales is determined by varying one scale at a time by factors of 0.5 and 2.0, and calculating the change in process acceptance. Combining the RF scale uncertainties with the PDF set uncertainty [94] for the qq → ZZ process leads to an uncertainty of 4.8%. The inclusive uncertainty for Zh production related to the PDFs amounts to 1.6%, whereas the uncertainty for the variation of the RF scales is 3.8% [42]. For the subleading h boson processes Wh, gg → h → ZZ, and tth the inclusive uncertainties related to the PDFs amount to 1.9, 3.2, and 3.6% and the uncertainties for the variation of the RF scales are 0.7, 3.9, and 7.5%, respectively [42].
For the gg → ZZ process, there is a 10% uncertainty in the NNLO cross section estimate used in the analysis, which covers the PDF, RF scale uncertainties, and the uncertainty on the strong coupling constant. An additional 10% uncertainty is included to account for the assumptions used to estimate the NNLO cross section [95]. The uncertainties in the cross section of the rare ttZ, ttW, and triboson processes amount to 25% [96].
The last theoretical uncertainty applied in this analysis is the uncertainty in the theoretical calculations of the SM h → ττ branching fraction. This uncertainty of 2% [42] is applied to both the gg → A and bbA signal samples as well as all backgrounds that include the h → ττ process.
Normalization uncertainties in the misidentification rates arising from the subtraction of prompt lepton contribution estimated from simulation are taken into account and propagated to the yield of the reducible background mass distributions. The shape of the m c τ τ distribution of the reducible background is estimated from data in a region where the τ candidates have the same charge and pass relaxed isolation conditions. Therefore, the statistical uncertainties in the misidentification rates do not have an impact on the shape of the m c τ τ distribution. As discussed in Section 6, an additional uncertainty is applied based on the results of the closure tests comparing the differences between the observed and the estimated reducible background yields. The uncertainty in the yield is taken to be 40%, and is considered uncorrelated across the + eτ h , + µτ h , + τ h τ h , and + eµ channels.

Results
We use the reconstructed pseudoscalar Higgs boson mass, m c τ τ , as the discriminating variable between the signal and the background processes. The results are based on a simultaneous binned likelihood fit of the reconstructed mass distributions in the eight final states. The eight final states are each fit as separate distributions in the simultaneous fit. They are combined together for visualization purposes only. Nuisance parameters, representing the systematic uncertainties, are profiled in the fit. Even though the studied signal mass range is 220-400 GeV, the distribution of the reconstructed mass m c τ τ covers the mass range 200-600 GeV, as the additional information on the background distributions is used to constrain the corresponding parameters in the simultaneous fit. When displaying the results, background processes are grouped as follows: "h(125 GeV)" includes all processes with the SM Higgs boson (including gg → h → ZZ → 4 ); "ZZ → 4 " includes events from qq → ZZ and gg → ZZ processes; "Other" includes events from triboson, ttZ, and ttW production; and "Reducible" includes the reducible background contribution.
The m c τ τ distributions are shown in Fig. 3 for each of the four h boson decay channels, adding the Z → channels together, and in Fig. 4 for all eight final states together. The distributions are shown after a background-only fit to data and include both statistical and systematic uncertainties. No excess above the standard model background expectations is observed in data. The predicted signal and background yields, as well as the number of observed events, are given in Table 4 for each of the four Zh channels.
Upper limits at 95% CL [99,100] are set in multiple scenarios. An asymptotic approximation of the modified frequentist CL s method [99][100][101][102] is used when calculating the 95% CL upper   follows: The scaling takes the estimated gg → A yield at each m A -tan β point and adds a contribution associated to bbA according to the estimated selection efficiency ratio in the signal region, bb A/gg→A = 0.76, and the ratio σ bb A /σ gg→A , which depends on m A and tan β. The signal region selection efficiency ratio was estimated for a single mass point (m A = 300 GeV), and additional studies were performed to confirm that for the studied mass range (220-400 GeV) the efficiency ratio is nearly flat. The signal yield scaling allows the estimated bbA contribution to be included which is necessary when setting model-dependent limits in the parameter space region where the bbA cross section becomes nonnegligible compared to the gg → A cross section. For reference, at m A = 300 GeV and tan β = 4, in the hMSSM scenario, σ bb A /σ gg→A = 0.22, which is a nonnegligible contribution.
The results in the M 125 h,EFT scenario and the hMSSM scenario are shown in Fig. 6. The observed limits exclude slightly higher tan β values in the M 125 h,EFT scenario compared to the hMSSM scenario: for example at m A = 300 GeV, tan β values below 4.0 and 3.7 are excluded at 95% confidence level in the M 125 h,EFT and hMSSM scenarios, respectively. In the hMSSM scenario, this search constrains the parameter space region with low tan β values when 220 < m A < 350 GeV, and supports the results of previous indirect and direct searches. Table 4: Background and signal expectations together with the numbers of observed events, for the signal region distributions after a background-only fit. The expected contribution from the A → Zh signal process is given for a pseudoscalar Higgs boson with m A = 300 GeV with the product of the cross section and branching fraction of 20 fb. The background uncertainty accounts for all sources of background uncertainty, systematic as well as statistical, after the simultaneous fit.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF      [90] CMS Collaboration, "Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at √ s = 13 TeV", JINST 13 (2018) P06015, doi:10.1088/1748-0221/13/06/P06015, arXiv:1804.04528.