Search for a light pseudoscalar Higgs boson in the boosted $\mu\mu\tau\tau$ final state in proton-proton collisions at $\sqrt{s}=$ 13 TeV

A search for a light pseudoscalar Higgs boson (a) decaying from the 125 GeV (or a heavier) scalar Higgs boson (H) is performed using the 2016 LHC proton-proton collision data at $\sqrt{s}=$ 13 TeV, corresponding to an integrated luminosity of 35.9 fb$^{-1}$, collected by the CMS experiment. The analysis considers gluon fusion and vector boson fusion production of the H, followed by the decay H $\to$ aa $\to$ $\mu\mu\tau\tau$, and considers pseudoscalar masses in the range 3.6 $\lt m_\mathrm{a}\lt$ 21 GeV. Because of the large mass difference between the H and the a bosons and the small masses of the a boson decay products, both the $\mu\mu$ and the $\tau\tau$ pairs have high Lorentz boost and are collimated. The $\tau\tau$ reconstruction efficiency is increased by modifying the standard technique for hadronic $\tau$ lepton decay reconstruction to account for a nearby muon. No significant signal is observed. Model-independent limits are set at 95% confidence level, as a function of $m_\mathrm{a}$, on the branching fraction ($\mathcal{B}$) for H $\to$ aa $\to$ $\mu\mu\tau\tau$, down to 1.5 (2.0)$\times10^{-4}$ for $m_\mathrm{H}=$ 125 (300) GeV. Model-dependent limits on $\mathcal{B}$(H $\to$ aa) are set within the context of two Higgs doublets plus singlet models, with the most stringent results obtained for Type-III models. These results extend current LHC searches for heavier a bosons that decay to resolved lepton pairs and provide the first such bounds for an H boson with a mass above 125 GeV.

The analysis presented in this paper considers µµττ final states arising from H → aa → µµττ, where SM-like production of the H boson via the dominant gluon fusion (ggF) and vector boson fusion (VBF) modes are both included [46]. This analysis focuses on the pseudoscalar boson mass range 3.6-21 GeV, complementary to searches, such as Ref. [40], that focus on heavier pseudoscalar masses. For light masses, the large Lorentz boost of the a boson causes its decay products to overlap. In the µµ channel, the standard CMS muon identification has sensitivity to the topology of boosted muon pairs similar to that for an isolated, nonboosted muon pair. To reconstruct the collimated τ lepton pair, we have developed a boosted τ lepton pair reconstruction technique to target the specific decay where one τ lepton decays to a muon and neutrinos, τ µ , while the other decays to one or more hadrons and a neutrino, τ h , thus: a → τ µ τ h . This technique improves upon the standard CMS τ lepton reconstruction that is optimized for isolated, nonboosted τ leptons. The µµτ µ τ h channel has greater detection efficiency than final states with b quarks, which are difficult to reconstruct at low momentum and significant boost, and has a larger branching fraction than most models with four-muon final states. The effectiveness of this improved technique also makes possible for the first time the search for the decays of a heavier Higgs boson to aa in the µµττ final state at low m a , with m H = 300 GeV used as a demonstration. Such an H boson generically has a large branching fraction to any kinematically accessible pair of lighter bosons [28,47]; the light bosons are highly boosted and the resulting final-state leptons are similarly collimated. The search is performed using an unbinned parameterized maximum likelihood fit of signal and background contributions to the two-dimensional (2D) distribution of the µµ invariant mass m(µµ) and the 4-body visible mass m(µµτ µ τ h ). This paper is organized as follows. A brief description of the CMS detector is given in Section 2. Section 3 summarizes the data and simulated samples used. Section 4 describes the object identification algorithms, including the modified τ µ τ h reconstruction technique, while Section 5 focusses on the event selection. The background and signal models of the 2D unbinned fit are described in Section 6 and the treatment of systematic uncertainties are subsequently discussed in Section 7. The model-independent results, as well as interpretation in the context of several 2HDM+S types, are presented in Section 8. The paper is summarized in Section 9.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. Events of interest are selected using a two-tiered trigger system [48]. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [49].

Data and simulated samples
This search uses a sample of pp collisions at the LHC, collected in 2016 at √ s = 13 TeV, corresponding to an integrated luminosity of 35.9 fb −1 .
The acceptance and reconstruction efficiency of the H → aa → µµττ processes are evaluated using simulated events. These signal processes are generated with MADGRAPH5 aMC@NLO version 2.2.2 [50] at next-to-leading order (NLO). The PYTHIA 8.205 program [51] is used for parton showering, hadronization, and the underlying event is simulated with the CUETP8M1 tune [52]. The NNPDF3.0 [53] set of parton distribution functions is used. Samples are generated for 3.6 < m a < 21 GeV for the SM-like H boson with m H = 125 GeV, and for 5 < m a < 21 GeV for a heavier H boson with m H = 300 GeV. The ggF Higgs production process is simulated for each sample with the obtained signal yields scaled to the sum of the ggF and VBF cross sections. The inclusion of the VBF process increases the expected signal yield by 8 (19)% for m H = 125 (300) GeV. An acceptance correction arising from a small difference in the analysis acceptance for ggF and VBF events of 0.5-3.0% is applied to these scaled acceptances as a function of Higgs and pseudoscalar boson masses, with an uncertainty of 0.5%. This correction primarily arises from the differences in transverse momentum p T spectrum of the generated H and a bosons. These differences have a negligible effect on the shapes of the reconstructed pseudoscalar mass distributions that are used to discriminate signal from background.
For all processes, the detector response is simulated using a detailed description of the CMS detector, based on the GEANT4 package [54], and the event reconstruction is performed with the same algorithms used for data. The simulated samples include additional interactions per bunch crossing (pileup) and are weighted so that the multiplicity distribution matches the measured one, with an average of about 23 interactions per bunch crossing.

Event reconstruction
Using the information from all CMS subdetectors, a particle-flow (PF) technique is employed to identify and reconstruct the individual particles emerging from each collision [55]. The particles are classified into mutually exclusive categories: charged and neutral hadrons, photons, muons, and electrons. Jets and τ h candidates are identified algorithmically using the PFreconstructed particles as inputs. The missing transverse momentum vector p miss T is defined as the projection onto the plane perpendicular to the beam axis of the negative vector sum of the momenta of all reconstructed PF objects in an event. Its magnitude is referred to as p miss T . The primary pp interaction vertex is defined as the reconstructed vertex with the largest value of summed physics-object p 2 T . The physics objects considered in the vertex determination are the objects returned by a jet finding algorithm [56,57] applied to all charged tracks associated with the vertex, plus the corresponding associated p miss T . Finally, additional identification criteria are applied to the reconstructed muons, electrons, photons, τ h candidates, jets, and p miss T to reduce the frequency of misidentified objects. This section details the reconstruction and identification of muons, jets, and τ h candidates.

Muons
Muons are reconstructed within |η(µ)| < 2.4 [58]. The reconstruction combines the information from both the tracker and the muon spectrometer. The muons are selected from among the reconstructed muon track candidates by applying minimal requirements on the track components in the muon system and taking into account matching with small energy deposits in the calorimeters. For each muon track, the distance of closest approach to the primary vertex in the transverse plane is required to be less than 0.2 cm. The distance of closest approach to the primary vertex along the beamline, d z , must be less than 0.5 cm.
The isolation of individual muons is defined relative to their transverse momentum p T (µ) by summing over the p T of charged hadrons and neutral particles within a cone around the muon direction at the interaction vertex with radius ∆R = √ (∆η) 2 + (∆φ) 2 < 0.4 (where φ is the azimuthal angle in radians) : (1) Here, ∑ p charged T is the scalar p T sum of charged hadrons originating from the primary vertex. The ∑ p neutral T and ∑ p γ T are the scalar p T sums for neutral hadrons and photons, respectively. The neutral contribution to the isolation from pileup interactions, p PU T , is estimated as 0.5 ∑ i p PU,i T , where i runs over the charged hadrons originating from pileup vertices and the factor 0.5 corrects for the ratio of charged to neutral particle contributions in the isolation cone. Muons are considered isolated if I µ < 0.25.

Jets
Jets are reconstructed using PF objects. The anti-k T jet clustering algorithm [56,57] with a distance parameter of 0.4 is used. The standard method for jet energy corrections [59] is applied. In order to reject jets coming from pileup collisions, a multivariate (MVA) jet identification algorithm [60] is applied. This algorithm takes advantage of differences in the shapes of energy deposits in a jet cone between pileup jets and jets originating from a quark or gluon. The combined secondary vertices (CSV) b tagging algorithm [61] is used to identify jets originating from b hadrons [62]. The efficiency for tagging b jets is ≈63%, while the misidentification probability for charm (light-quark or gluon) jets is ≈12 (1)%.

Hadronic τ lepton decays
Hadronically decaying τ leptons are reconstructed and identified within |η(τ h )| < 2.3 using the hadron-plus-strips (HPS) algorithm [63], which targets the main decay modes by selecting PF objects with one charged hadron and up to two neutral pions, or with three charged hadrons. The HPS algorithm is seeded by the jets described in Section 4.2. The τ h candidates are reconstructed based on the number of tracks and on the number of ECAL strips with an energy deposit in the η-φ plane.
This analysis uses a specialized τ µ τ h reconstruction algorithm, which uses the same HPS method as the above, with a modified jet seed. This method is designed to reconstruct boosted τ µ τ h objects, for which the τ lepton decaying leptonically to a muon overlaps with the hadronic decay products of the other τ lepton. One τ lepton is required to decay to a muon because this mode has a high reconstruction efficiency and a low misidentification probability. As in Ref.
[39], a joint reconstruction of the τ h candidate and a nearby muon is performed. Jets that seed the τ h reconstruction are first modified to remove muons with p T > 3 GeV passing minimal identification requirements from their jet constituents. The τ h candidates reconstructed using these modified jets are required to have p T > 10 GeV, where the reconstructed p T (τ h ) corresponds to the visible portion of the τ lepton decay. To reject τ h candidates that arise from constituents not originating from the primary vertex, the τ h candidates must have d z < 0.5 cm. To reduce background contribution from jets arising from b quarks, the jet seeds to the τ h reconstruction must additionally fail the CSV jet tagging algorithm. The τ h reconstruction algorithm has some efficiency to select τ leptons that decay to electrons, τ e . The fraction of reconstructed τ h candidates that are τ e decays is estimated from simulation to be 18-22%, predominantly reconstructed in the one-prong decay mode with no additional neutral hadrons. No distinction is made between τ e and τ h candidates and this paper refers to the contribution of both decay categories as τ h candidates.
The full τ µ τ h identification procedure includes the modified HPS algorithm described above, along with a requirement on the τ h candidate isolation. The isolation of a τ h candidate is computed using an MVA discriminant [63]. The discriminant is computed using PF candidates, with the overlapping muon excluded, in the region around the τ h candidate defined by ∆R < 0.8. The τ h candidates are required to pass a selection on the MVA discriminant output as a function of p T (τ h ) to yield an approximately constant efficiency of ≈80%.

Charged lepton efficiency
The combined efficiencies of the reconstruction, identification, and isolation requirements for muons are measured in several bins of p T (µ) and |η(µ)| using a "tag-and-probe" technique [64] applied to an inclusive sample of muon pairs from Z boson and J/ψ meson events [58]. These efficiencies are measured in data and simulation. The data to simulation efficiency ratios are used as scale factors to correct the simulated event yields. For τ h candidates, two scale factors are similarly measured using a Z → τ µ τ h sample [63] to be 0.60 ± 0.11 (0.97 ± 0.05) for 10 < p T (τ h ) < 20 GeV (p T (τ h ) > 20 GeV), which are found to be independent of |η(τ h )|. For 10 < p T (τ h ) < 20 GeV, the Z → τ µ τ h data sample contains significant W+jets background, making the scale factor difficult to estimate with as high a precision as for p T (τ h ) > 20 GeV.

Event selection
Collision events are selected by a trigger that requires the presence of an isolated muon with p T > 24 GeV [48]. Trigger efficiencies are measured in data and simulation using the tagand-probe technique. The event is required to have two isolated opposite-sign muons with ∆R < 1. The leading muon which is matched to the muon that triggered the event must have p T > 26 GeV. The second muon must have p T > 3 GeV. These muons constitute a µµ pair from one of the pseudoscalar candidates.
The second pseudoscalar is selected via its decay to an isolated τ µ τ h pair. The τ µ τ h selection requires one identified muon with p T > 3 GeV, with no isolation selection imposed, and one τ h candidate with p T > 10 GeV, reconstructed as described in Section 4.3. The reconstructed muon corresponds to the visible portion of the τ µ decay. The two τ lepton candidates are required to lie within ∆R(τ µ , τ h ) < 0.8.
The modified τ µ τ h reconstruction and identification algorithm increases the signal efficiency throughout the full range of Higgs boson and pseudoscalar hypotheses considered, as shown in Fig. 1. The efficiency of the τ µ τ h reconstruction and identification is measured by requiring the presence of a muon passing the identification requirements and a τ h candidate passing either the standard τ h HPS reconstruction or the τ µ τ h HPS reconstruction, as well as the MVA isolation discriminant. The increase in efficiency arises incrementally both from the modification The events are required to have two reconstructed muons passing identification and isolation criteria. The efficiency is measured by additionally requiring a third muon passing identification requirements and a τ h candidate reconstructed using either the standard HPS algorithm or the τ µ τ h HPS algorithm and passing isolation requirements.
of the jets which seed the τ µ τ h reconstruction and the exclusion of the muon energy from the MVA isolation discriminant. Because of the increase in Lorentz boost, the jet seed modification is the primary cause of increased efficiency at low m a where the pseudoscalar decay products are most overlapping, with ∆R(τ µ , τ h ) < 0.4. At larger separation, 0.4 < ∆R(τ µ , τ h ) < 0.8, the change in the MVA discriminant becomes the only source of efficiency increase. The reduced efficiency at low pseudoscalar mass is due to the high Lorentz boost in which the muon is nearly collinear with a charged hadron from the τ h candidate. At low Lorentz boost, the muon and τ h candidate have a large separation. In this case, the efficiency is reduced from the requirement of the boosted topology, especially at m H = 125 GeV. The efficiency for the higher H boson mass is less affected by an increase in pseudoscalar mass because the reduction in Lorentz boost is generally not significant enough to separate the τ leptons from a pseudoscalar decay beyond the selection requirement of ∆R(τ µ , τ h ) < 0.8.

Signal and background modeling
The main source of background in this search is Drell-Yan µµ production in association with at least one jet that is misidentified as the τ µ τ h candidate. This background, reduced by the τ µ τ h reconstruction, features the prominent µµ resonances with masses between 3.6 and 21 GeV: ψ(2S) (3.69 GeV), Υ(1S) (9.46 GeV), Υ(2S) (10.0 GeV), and Υ(3S) (10.4 GeV) [65]. In the m(µµ) distribution, the known resonance peaks appear on top of the Drell-Yan continuum. In the m(µµτ µ τ h ) distribution, the µµ + jet background appears as an exponentially falling distribution with a threshold around 40-60 GeV because of the p T thresholds of the three reconstructed muons and one τ h candidate. The signal is characterized by a narrow m(µµ) resonance from a pseudoscalar decay and a broader m(µµτ µ τ h ) distribution because of the invisible decay products of one of the pseudoscalar Higgs bosons. As described below, the search strategy consists of an unbinned fit of m(µµ) vs. m(µµτ µ τ h ), using analytical models for the signal and background shapes in each dimension. The background shape model for the Drell-Yan continuum, the meson resonances mentioned above, and additionally the J/ψ resonance (3.10 GeV [65]) are constrained via a data control region enriched in µµ+jet events. Although the J/ψ resonance falls outside the kinematically allowed search window for a ττ resonance, it is modeled in the fit to provide a better background description near the ψ(2S) meson.
The analysis uses a simultaneous unbinned fit of three mutually exclusive regions to model the background and search for a signal. The "control region" requires the presence of two muons and no identified τ µ τ h candidate. The next two regions additionally require a reconstructed τ µ τ h candidate and are defined by passing or failing the τ h MVA isolation requirement, labeled as "signal region" and "sideband", respectively. A schematic depiction of the three regions is shown in Fig. 2. Two additional regions are also shown and are used to validate the background estimation method described below.  Figure 2: Schematic of the fit regions in the analysis. Events with two isolated muons and no τ µ τ h candidates constitute the control region (blue). Events that have a τ µ τ h candidate are further divided based on the isolation of the τ h candidate with isolated τ µ τ h candidates forming the signal region (green) and the remaining τ µ τ h candidates forming the sideband (red). Additionally, the µµ candidates that fail the muon isolation selection form two analogous regions for the validation of the background fit model (gray).
The choice of m(µµ) and m(µµτ µ τ h ) as observables for distinguishing the H → aa signal from the SM background processes is found to be most performant over the largest range of Higgs boson and pseudoscalar mass hypotheses. The signal is modeled as a 2D function given by the product of a Voigt function for m(µµ) and a split normal distribution for m(µµτ µ τ h ). For the signal processes, there is minimal correlation between the m(µµ) and m(µµτ µ τ h ) distributions. The parameters of the model are determined from fits to the signal simulation. Each generated distribution, with a specified Higgs boson and pseudoscalar mass, is fit with the described 2D function. For each parameter, a polynomial function is used to interpolate between the generated masses: a first-order polynomial for the mean value of the m(µµ) and m(µµτ µ τ h ), a second-order polynomial for each width parameter, and the product of a firstorder polynomial and two error functions for the signal normalization. The search is performed for pseudoscalar masses between 3.6 and 21 GeV.
The 2D fit of m(µµ) vs. m(µµτ µ τ h ) is performed in data to model the SM background processes and extract any significant signal process contribution in three ranges of the m(µµ) spectrum: 2.5 < m(µµ) < 8.5 GeV, 6 < m(µµ) < 14 GeV, and 11 < m(µµ) < 25 GeV. For a given m a , a single m(µµ) range is used, with the transition between the m(µµ) ranges occurring at m a = 8 and 11.5 GeV. There is some overlap in the fit ranges to allow the lower or upper portion of the signal model to be fully contained in the given fit range. The background proba-bility density function (PDF) used for the m(µµ) spectrum is the sum of an exponential together with two, three, or zero Voigt distributions to model the SM resonances for the three respective ranges. An additional exponential function is necessary to model the rising continuum background near the J/ψ resonance in the lowest m(µµ) range. The m(µµτ µ τ h ) background distribution is modeled with the product of an error function and the sum of two exponential distributions. The second exponential provides the fit with additional flexibility to allow the fit to favor an extended tail if necessary. The fit range is 0 < m(µµτ µ τ h ) < 1200 GeV in all three m(µµ) ranges. The m(µµ) and m(µµτ µ τ h ) functions are multiplied together to produce a 2D PDF. Because m(µµτ µ τ h ) is loosely correlated with m(µµ) in the background distribution, the parameters of the m(µµτ µ τ h ) background model in a given m(µµ) range are allowed to vary independently of the other ranges, allowing a correlation between m(µµ) and m(µµτ µ τ h ).
The normalization of the background model in the signal region is estimated from the sideband using a "tight-to-loose" method. This method uses a Z(µµ) + jet sample to estimate the efficiency for a jet that has passed all the τ h reconstruction requirements (including the muon removal step) of Section 4.3, except the MVA isolation requirement, to additionally pass the MVA isolation requirement. The region contains events collected with a single muon trigger with the requirement of two isolated opposite-sign muons and a jet that has been misidentified as a τ µ τ h object with a muon within ∆R(τ µ , τ h ) < 0.8, without the requirement on the MVA isolation. The µµ pair must have invariant mass 81 < m(µµ) < 101 GeV. The tight-to-loose ratio, f , is defined as the ratio of the number of τ h candidates that pass the MVA isolation requirement in addition to the other identification requirements (the "tight" condition) to the number of τ h candidates that pass the other identification requirements, but with a relaxed requirement on the isolation (the "loose" condition). The calculation of f is performed separately for each hadronic decay mode of the τ lepton and is binned in p T (τ h ). This region is dominated by Drell-Yan events containing jets. Residual contributions from diboson processes, as estimated from simulation, are subtracted from the data. The associated jets are the objects most likely to pass the τ h reconstruction criteria. This tight-to-loose ratio is measured to be 10-40%, increasing at lower p T (τ h ). In general, the decay mode with three charged tracks has a lower tight-to-loose ratio than those with a single charged track.
The sideband is then reweighted using the tight-to-loose method to estimate the contribution in the signal region. The weights are applied on an event-by-event basis as a function of p T (τ h ). The tight-to-loose method is verified in a validation region independent of the analysis region by inverting the isolation requirement on the muon in the µµ pair that did not trigger the event. These regions correspond to the gray boxes in Fig. 2. The expected and observed yields in this validation region are compatible within 15%, and an uncertainty is derived from this value.
The parameters of the µµ resonances-mean (µ), width (Γ), and resolution (σ)-and the relative normalizations-N i /N j where i and j are a pair of background resonances-between the J/ψ and ψ(2S) resonances and between the Υ(1S) and each of the Υ(2S) and Υ(3S) resonances are constrained via a simultaneous fit among all three regions. The parameters of the resonances are compatible, and thus the same, among the three regions, while their relative normalizations are only the same in the sideband and control region with the signal region relative normalizations related to the sideband via a linear transformation. The slope and constant values of this linear transformation are determined from a fit to the sideband and the tight-to-loose estimation of the background in the signal region. An uncertainty is assigned for this linear constraint in the signal region. This uncertainty is derived in a validation region and a corresponding validation sideband in which the muon of the µµ pair which did not trigger the event has an inverted isolation requirement and is measured to be 5-20% depending on the resonance. The parameters of the µµ continuum (λ i µµ ), the m(µµτ µ τ h ) continuum (λ i µµτ τ ), the m(µµτ µ τ h ) error function shift (erf a ) and scale (erf b ), and the relative normalizations of the µµ resonances to the µµ continuum (N Υ(1S) /N J/ψ and N J/ψ /N continuum ) are constrained in the signal region to the sideband via the tight-to-loose method. All remaining parameters are free to vary independently of each other and share no constraint between regions. Table 1 summarizes these constraints.
Tight-to-loose Constrained (two regions) Tight-to-loose Constrained (two regions) N Υ(1S) /N J/ψ Tight-to-loose Free Free N J/ψ /N continuum Tight-to-loose Free Free The background model and observed data in the control region are shown in Fig. 3. Projections on the m(µµ) and m(µµτ µ τ h ) axes of the 2D background model and observed data with sample signal distributions for each fit range are shown in Figs. 4 and 5 for the sideband and signal region, respectively. The signal distribution is scaled assuming an SM Higgs boson production cross section [46] and B(H → aa → µµττ) = 5 × 10 −4 . A small level of signal contamination is expected in the sideband and is included in the fit. For the signal processes, there is minimal correlation between the m(µµ) and the m(µµτ µ τ h ) distributions.

Systematic uncertainties
Uncertainties in the signal process modeling contribute both to the total expected signal yield and the individual signal fit parameters. Despite the small spatial separation between the τ µ and τ h candidates, the τ µ τ h reconstruction procedure, which relies on the excellent muon discrimination of the CMS detector, allows the uncertainties in the τ h efficiency and energy scale modeling to be treated independently from those for the τ µ candidates. Systematic uncertainties in the efficiency measurements from the tag-and-probe technique contribute an uncertainty in the total signal yield of 0.5% for the muon trigger efficiency and 1.0-1.4% for each reconstructed muon. The uncertainty in the muon momentum scale is 0.2-5.0%; most muons have p T < 100 GeV and thus an uncertainty of 0.2% [58]. For the τ h reconstruction, there is an uncertainty in the τ h identification efficiency of 5-18%, varying with p T (τ h ), and an uncertainty in the τ h energy scale of 1.2-3.0% [63], varying with the number of charged and neutral hadrons in the τ h decay.
The uncertainty in the luminosity normalization of simulated signal samples is 2.5% [66]. Uncertainty from pileup effects arises from the uncertainty of 4.6% [67] in the total inelastic cross section of pp interactions resulting in a 1% uncertainty in the signal yields. The efficiency correction for the rejection of jets tagged as originating from b quarks contributes an uncertainty of up to 3% in the signal yield.
As described in Section 3, a correction to the simulated ggF signal samples to account for small differences in acceptance for the ggF and VBF H boson production modes contributes a 0.5% uncertainty in the signal yield. Theoretical uncertainties in the H boson production cross section are calculated by varying renormalization (µ R ) and factorization (µ F ) scales independently up and down by a factor of two with respect to the default values with the condition that 0.5 ≤ µ R /µ F ≤ 2. The resulting uncertainties, combined with those from Ref.
[46], contribute less than 1% to the overall signal yield uncertainty.
For the background model, the tight-to-loose method contributes a 15% uncertainty in the total expected yield in the signal region. This uncertainty arises from the application of the tight-toloose ratio to the validation sideband to obtain a prediction for the model shapes in the validation region. The additional uncertainty in the relative normalizations of the low-mass meson resonances arises from differences in the tight-to-loose method predictions of the signal region distributions when derived from the sideband, as discussed in Section 6. This uncertainty is measured to be 5-20% for ψ(2S) and each Υ resonance, which yields up to a 3% uncertainty near these resonances in the final result.

Results
The observed distribution of data in the signal region is shown in Figs. 5 and 6. No significant excess of events is observed above the expected SM background. A modified frequentist approach based on the CL criterion [68,69] is used for upper limit calculations [65] using the LHC test statistic [70]. Systematic uncertainties are represented as nuisance parameters assuming a log-normal PDF in the likelihood fit for uncertainties in the expected yields and a Gaussian PDF of uncertainties in the signal and background model parameters.
Model-independent upper limits at 95% CL are set on σ H B(H → aa → µµττ)/σ SM and are presented in Fig. 7. Here, σ SM is the SM Higgs boson (or, for m H = 300 GeV, σ SM is the SMlike Higgs boson) production cross section including ggF and VBF production modes [46]. Broadly, the sensitivity of this exclusion decreases at low values of m a because of reconstruction inefficiencies as the decay products of the ττ pair overlap. In addition, for m H = 125 GeV, as m a increases, the Lorentz boost decreases causing the products to be well separated, failing the requirement of ∆R(τ µ , τ h ) < 0.8. The two peaking structures around m a = 10 GeV are from the Υ resonances where the Υ(1S) resonance is resolvable but the Υ(2S) and Υ(3S) merge because the rejection power of the boosted τ µ τ h selection sufficiently reduces the number of events in and around these peaks. A third peaking structure is not as apparent but is also present at the ψ(2S) resonance. Comparison with an earlier √ s = 13 TeV result from the CMS Collaboration [40] targeting resolved ττ decay products is possible for SM Higgs boson decays with 15 < m a < 21 GeV. In this case, the two approaches have similar sensitivity. Upper limits on σ H B(H → aa)/σ SM for the 2HDM+S for each Type-I to -IV as a function of tan β and m a are shown in Figs. 8 and 9. The assumed model branching fractions for pseudoscalar decays to µµ and ττ are taken from Ref. [71], and the branching fraction B(aa → µµττ) depends strongly on the 2HDM+S type [7]. The branching fractions are calculated in tan β increments of 0.5 above tan β = 1 and increments of 0.1 below, and a linear interpolation is applied between the calculated points in Fig. 9. For the Type-I and -II models, we primarily probe the 2m τ < m a < 2m b range, with the Type-I upper limits approximately independent of tan β. In the Type-I model, the most stringent limit of 5% is set for m a ≈ 4.5 GeV. In the Type-III model, this analysis has exclusion power over the full pseudoscalar mass range probed, especially at large tan β. For the Type-II and -III models with m a below the bb threshold, upper limits on B(H → aa) are stronger than the 0.47 inferred from combined measurements of SM Higgs couplings [9] for tan β 0.8-0.9, becoming as strong as 10% for tan β 1.5. In the Type-III models, strong upper limits are set for all pseudoscalar boson masses tested when tan β 1.5. The Type-IV model, however, can only be effectively probed in the low-tan β region. For a given m a , the ratio of decay rates to µµ and ττ, respectively B(a → µµ) and  [71] and are approximately independent of tan β.

Summary
A search for Higgs boson (H) decays to a pair of light pseudoscalar bosons (a) is presented, including the first such LHC results for an H with mass above 125 GeV. The light pseudoscalars decay to µµ and ττ with substantial overlap between the leptons because of the Lorentz boost. This difficult topology motivates the development of a dedicated τ µ τ h reconstruction method to increase the acceptance. Data collected by the CMS Collaboration at √ s = 13 TeV, corresponding to an integrated luminosity of 35.9 fb −1 , are examined and no significant excess over standard model (SM) processes is observed. This analysis obtains model-independent upper limits at 95% confidence level on the branching fraction (B) of a SM-like Higgs boson (H), decaying to a pair of pseudoscalar bosons (a) in the µµττ final state, σ H B(H → aa → µµττ)/σ SM , as well as model-specific upper limits on σ H B(H → aa)/σ SM for Type-I, -II, -III, and -IV two Higgs doublets plus singlet models. In the Type-I model, the upper limit on the allowed branching fraction is approximately independent of tan β, with the most stringent limit of 5% set for m a ≈ 4.5 GeV. For the Type-II and -III models with m a below the bb threshold, upper limits on B(H → aa) are stronger than the 0.47 inferred from combined measurements of SM Higgs couplings for tan β 0.8-0.9, becoming as strong as 10% for tan β 1.5. In the Type-III models, the predicted branching fraction to leptons increases with tan β, leading to strong upper limits for all pseudoscalar boson masses tested when tan β 1.5. In contrast, the strongest upper limits for Type-IV models are set when tan β < 1. These results significantly extend upper limits obtained by earlier searches by the CMS and ATLAS Collaborations, such as those obtained by CMS with 8 TeV data [39], and are complementary to present searches (e.g. Ref. [40]) at higher m a that lead to resolved µµ and ττ final states.  [2] ATLAS and CMS Collaborations, "Measurements of the Higgs boson production and decay rates and constraints on its couplings from a combined ATLAS and CMS analysis of the LHC pp collision data at √ s = 7 and 8 TeV", JHEP 08 (2016) 045, doi:10.1007/JHEP08(2016)045, arXiv:1606.02266.      [48] CMS Collaboration, "The CMS trigger system", JINST 12 (2017) P01020, doi:10.1088/1748-0221/12/01/P01020, arXiv:1609.02366.