Evidence for Higgs boson decay to a pair of muons

Evidence for Higgs boson decay to a pair of muons is presented. This result combines searches in four exclusive categories targeting the production of the Higgs boson via gluon fusion, via vector boson fusion, in association with a vector boson, and in association with a top quark-antiquark pair. The analysis is performed using proton-proton collision data at s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{s} $$\end{document} = 13 TeV, corresponding to an integrated luminosity of 137 fb−1, recorded by the CMS experiment at the CERN LHC. An excess of events over the back- ground expectation is observed in data with a significance of 3.0 standard deviations, where the expectation for the standard model (SM) Higgs boson with mass of 125.38 GeV is 2.5. The combination of this result with that from data recorded at s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{s} $$\end{document} = 7 and 8 TeV, corresponding to integrated luminosities of 5.1 and 19.7 fb−1, respectively, increases both the expected and observed significances by 1%. The measured signal strength, relative to the SM prediction, is 1.19−0.39+0.40stat−0.14+0.15syst\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {1.19}_{-0.39}^{+0.40}{\left(\mathrm{stat}\right)}_{-0.14}^{+0.15}\left(\mathrm{syst}\right) $$\end{document}. This result constitutes the first evidence for the decay of the Higgs boson to second generation fermions and is the most precise measurement of the Higgs boson coupling to muons reported to date.


Introduction
Since the discovery of the Higgs boson at the CERN LHC in 2012 [1][2][3], various measurements of its interactions with standard model (SM) particles have been performed. The interactions of the Higgs boson with the electroweak gauge bosons and charged fermions belonging to the third generation of the SM have been observed, with coupling strengths consistent with the SM predictions [4][5][6][7][8][9][10][11][12][13][14][15][16][17]. The Yukawa couplings of the Higgs boson to fermions of the first and second generation, however, have yet to be established experimentally. The SM predicts that the strengths of the couplings of the Higgs boson to fermions are proportional to the fermion masses [18][19][20][21]. Consequently, the branching fractions of the Higgs boson to fermions of the first and second generation are expected to be small, and their measurement at hadron colliders is challenging. The expected branching fraction for the decay of the Higgs boson with mass of 125 GeV to a pair of muons is B(H → µ + µ − ) = 2.18 × 10 −4 [22]. The study of H → µ + µ − decays is of particular importance since it is the most experimentally sensitive probe of the Higgs boson couplings to second-generation fermions at the LHC.

Event reconstruction
The particle-flow (PF) algorithm [28] aims to reconstruct and identify each individual particle (PF candidate) in an event, with an optimized combination of information from the various elements of the CMS detector. The energy of photons is obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the silicon tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of charged hadrons is determined from a combination of their momentum measured in the silicon tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers. The energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies. Finally, the momentum of muons is obtained from the curvature of the corresponding track reconstructed in the silicon tracker as well as in the muon system.
For each event, hadronic jets are clustered from these reconstructed particles using the infrared and collinear-safe anti-k T algorithm [29,30] with a distance parameter of R = 0.4. The jet momentum is determined from the vectorial sum of the momenta of all particles in the jet, and is found from simulation to be, on average, within 5 to 10% of the true transverse momentum over the whole p T spectrum and detector acceptance. Additional pp interactions within the same or nearby bunch crossings (pileup) can contribute additional tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, charged particles identified as originating from pileup vertices are discarded and an offset correction is applied to subtract the remaining contributions from neutral particles [31]. Jet energy corrections are derived from simulation to bring, on average, the measured response of jets to that of particle-level jets. In situ measurements of the momentum balance in dijet, -3 -JHEP01(2021)148 using either the powheg or MadGraph5_amc@nlo generators. Their production cross sections are corrected with the NNLO/NLO K factors taken from refs. [58,59], and [60]. The gluon-initiated loop-induced ZZ process (gg → ZZ) is simulated with the mcfm v7.0 generator [61] at LO and the corresponding production cross section is corrected to match higher-order QCD predictions, following the strategy detailed in ref. [9]. Minor contributions from triboson processes (WWW, WWZ, WZZ, and ZZZ) are also taken into account and are simulated at NLO in QCD using the MadGraph5_amc@nlo generator. The main backgrounds in the ttH category involve the production of top quarks. The tt background is simulated with NLO precision in QCD using the powheg generator, and its cross section is obtained from the top++ v2.0 [62] prediction that includes NNLO corrections in QCD and resummation of NNLL soft gluon terms. The single top quark processes are simulated at NLO in QCD via either powheg or MadGraph5_amc@nlo and their cross sections are computed, at the same order of precision, using hathor [63]. Finally, contributions from the ttZ, ttW, ttWW, tttt, and tZq processes are also considered and are simulated using the MadGraph5_amc@nlo generator at NLO precision in QCD. For the simulated samples corresponding to the 2016 (2017-2018) data-taking periods, the NNPDF v3.0 (v3.1) NLO (NNLO) parton distribution functions (PDFs) are used [64,65]. For processes simulated at NLO (LO) in QCD with the MadGraph5_amc@nlo generator, events from the ME characterized by different parton multiplicities are merged via the FxFx (MLM) prescription [66,67].
The simulated events at the ME level for both signal and background processes, except for Zjj-EW production, are interfaced with pythia v8.2.2 or higher [68] to simulate the shower and hadronization of partons in the initial and final states, along with the underlying event description. The CUETP8M1 tune [69] is used for simulated samples corresponding to the 2016 data-taking period, while the CP5 tune [70] is used for the 2017 and 2018 simulated data. Simulated VBF signal events are interfaced with pythia but, rather than the standard p T -ordered parton shower, the dipole shower is chosen to model the ISR and FSR [71]. The dipole shower correctly takes into account the structure of the colour flow between incoming and outgoing quark lines, and its predictions are found to be in good agreement with NNLO QCD calculations, as reported in ref. [72]. In contrast, the parton shower (PS), hadronization, and simulation of the underlying event for the Zjj-EW process are performed with the herwig++ (2016 simulation) and herwig 7 (2017 and 2018) programs [73], as they have shown to better match the observed data compared to the p T -ordered pythia predictions in the description of the additional hadronic activity in the rapidity range between the two leading jets [74]. The EE5C [69] and CH3 tunes [75] are used in the herwig++ and herwig 7 simulated samples, respectively.

Event selection
The analysis is performed using √ s = 13 TeV pp collision data collected by the CMS experiment from 2016 to 2018, corresponding to an integrated luminosity of 137 fb −1 . Signal events considered in this analysis are expected to contain two prompt isolated muons, regardless of the targeted Higgs boson production mode. Events are initially selected by the -6 -JHEP01(2021)148 L1 trigger, requiring at least one muon candidate reconstructed in the muon chambers with p T > 22 GeV. Events of interest are selected by the HLT using single muon triggers that have a p T threshold of 27 (24) GeV for data recorded in 2017 (2016,2018).
After passing the trigger selections, each event is required to contain at least two oppositely charged muons with p T > 20 GeV, |η| < 2.4, and passing certain selection requirements on the number of hits in the silicon tracker and in the muon systems, as well as on the quality of the fitted muon track [35]. Each muon is also required to be isolated in order to reject events with nonprompt or misidentified muon candidates. The muon isolation variable, as defined in section 3, is required to be less than 25% of the muon p T . Muons from the Higgs boson decay satisfy these identification and isolation requirements with an average selection efficiency of about 95%. In addition, at least one of the two muons is required to have p T > 29 (26) GeV for data collected in 2017 (2016, 2018), ensuring nearly 100% trigger efficiency.
The sensitivity of this analysis depends primarily on the resolution of the m µµ peak in the signal events. This resolution depends on the precision with which the muon p T is measured, which worsens with increasing muon |η|. The relative p T resolution of muons with p T > 20 GeV passing through the barrel region of the detector (|η| < 0.9) ranges from 1.5 to 2%, whereas the p T resolution of muons passing through the endcaps of the muon system (|η| > 1.2) ranges from 2 to 4%. The muon momentum scale and resolution are calibrated in bins of p T and η using the decay products of known dilepton resonances, following the method described in ref. [76]. In signal events, the Higgs boson decays into a muon pair at the interaction point. Therefore, the precision of the muon p T measurement can be improved by including the interaction point as an additional constraint in the muon track fit. This is implemented via an analytical correction to the muon p T proportional to the product of the muon p 2 T , its charge, and the minimum distance in the transverse plane between the muon track and the beam position. The correction is derived in simulated Z → µµ events and checked in both data and simulation to provide an equivalent result to refitting the muon track with the interaction point constraint. The resulting improvement in the expected m µµ resolution in signal events ranges from 3 to 10%, depending on muon p T , η, and the data-taking period.
In a nonnegligible fraction of signal events, a muon from the Higgs boson decay radiates a photon that carries away a significant fraction of the muon momentum. If not taken into account, this worsens the resolution of the dimuon invariant mass (m µµ ) peak in signal events. Furthermore, if the FSR photon falls in the isolation cone of the corresponding muon candidate, it can significantly increase the value of the isolation sum, thereby creating an inefficiency in selecting signal events. Therefore, a procedure is implemented to identify and recover the contribution of FSR photons similar to that described in ref. [9]. In order to preserve the overall signal acceptance of the dimuon selection described above, the FSR recovery is applied only to muons with p T > 20 GeV and |η| < 2.4. Photons with p T > 2 GeV and |η| < 2.5 that are not associated with reconstructed electrons are considered as FSR photon candidates if they lie inside a cone of R = 0.5 around a muon track. These candidates are then required to be loosely isolated and collinear with the muon such that (Σ i p i T (∆R(γ, i) < 0.3))/p T (γ) < 1.8 and ∆R(µ, γ)/p 2 T (γ) < 0.012, where -7 -

JHEP01(2021)148
p T (γ) is the p T of the FSR photon candidate and the index i refers to the PF candidates other than the muon within a cone of R = 0.3 around the photon. In order to suppress possible contaminations from H → Z(µµ)γ decays, the ratio between the p T of the FSR photon and that of the associated muon is required to be smaller than 0.4. In the case of multiple FSR candidates associated with a muon, the candidate with the smallest value of ∆R(µ, γ)/p 2 T (γ) is chosen. The momentum of the photon is added to that of the muon and its contribution to the muon isolation sum is ignored. The FSR recovery increases the signal efficiency by about 2% and improves the m µµ resolution by about 3%.
In order to maximize the analysis sensitivity, event candidates selected with the requirements described above are separated into independent and nonoverlapping classes based on the features of the final state expected from each production mode. Events with b-tagged jets are assigned to the ttH production category, which is further split into the hadronic and leptonic subclasses by the presence of additional charged leptons (µ or e) in the final state. Dimuon events with one (two) additional charged lepton(s) and no btagged jets are assigned to the WH (ZH) category. Events with neither additional charged leptons nor b-tagged jets belong to the VBF category if a pair of jets is present with large m jj and ∆η jj . The remaining untagged events, which constitute about 96% of the total sample of dimuon candidate events, belong to the ggH-enriched category. In each production category, multivariate techniques are used to enhance the discrimination between the expected signal and background contributions by further dividing events into several subcategories with different signal-to-background ratios. The measured H → µ + µ − signal is then extracted via a simultaneous maximum-likelihood fit across all event categories to observables chosen for each category to maximize the overall measurement precision. In the following sections, each production category is presented in order of decreasing sensitivity.

The VBF production category
A dimuon event passing the baseline selection detailed in section 5 is considered in the VBF production category if it contains two or more jets, with the p T of the leading jet (p T (j 1 )) larger than 35 GeV, the p T of the second-highest p T jet (p T (j 2 )) greater than 25 GeV, and the |η| of both jets less than 4.7. Jets overlapping with either of the two selected muons are discarded. In addition, the two highest p T jets in the event are required to have m jj > 400 GeV and |∆η jj | > 2.5. An event is rejected from the VBF category if it contains one (two) jet(s) inside the silicon tracker fiducial volume (|η| < 2.5) with p T > 25 GeV and identified as a b quark jet by the medium (loose) WP of the DeepCSV btagging algorithm. These requirements suppress the tt and single top quark backgrounds and ensure mutual exclusivity between the VBF and ttH categories. Moreover, events containing an additional muon (electron) with p T > 20 GeV and |η| < 2.4 (2.5) passing the selection criteria described in section 9 are discarded. This requirement ensures no overlap between the analyses targeting VBF and VH production. Selected events are further grouped into two independent classes. Events in which the two muons form an invariant mass between 115 and 135 GeV belong to the signal region (VBF-SR), which is enriched in signal-like events. Events with 110 < m µµ < 115 GeV or 135 < m µµ < 150 GeV belong -8 - to the mass sideband region (VBF-SB), which is used as a control region to estimate the background. The VBF-SR is defined to be 20 GeV wide in order to be sensitive to Higgs boson mass hypotheses in the range of 120-130 GeV. A summary of the selection criteria used to define the VBF-SB and VBF-SR regions is reported in table 1. A deep neural network (DNN) multivariate discriminant is trained to distinguish the expected signal from background events using kinematic input variables that characterize the signal and the main background processes in the VBF-SR. The DNN is implemented using keras [77] with tensorflow [78] as backend. The DNN inputs include six variables associated with the production and decay of the dimuon system, namely the m µµ , the perevent uncertainty in the measured dimuon mass σ(m µµ ), the dimuon transverse momentum (p µµ T ), the dimuon rapidity (y µµ ), and the azimuthal angle (φ CS ) and the cosine of the polar angle (cos θ CS ) computed in the dimuon Collins-Soper rest frame [79]. The DNN also takes as input a set of variables describing the properties of the dijet system, namely the full momentum vector of the two highest p T jets in the event (p T (j 1 ), p T (j 2 ), η(j 1 ), η(j 2 ), φ(j 1 ), and φ(j 2 )), m jj , and ∆η jj . In addition, observables sensitive to angular and p T correlations between muons and jets are also included, namely the minimum ∆η between the dimuon system and each of the two leading jets, the Zeppenfeld variable (z * ) [80] constructed from y µµ and the rapidities of the two jets as
The VBF signal events are expected to have suppressed hadronic activity in the rapidity region between the two leading jets. This feature is exploited by considering "soft trackjets" in the event that are defined by clustering, via the anti-k T algorithm with a distance parameter of 0.4, charged particles from the primary interaction vertex, excluding the two identified muons and those associated with the two VBF jets. The use of soft track-jet observables is a robust and validated method to reconstruct the hadronization products of partons with energy as low as a few GeV [81]. The number of soft track-jets in an event with p T > 5 GeV, as well as the scalar p T sum of all track-jets with p T > 2 GeV, are used as additional input variables. Finally, since jets in signal events are expected to originate from quarks, whereas in the DY process they can also be initiated by gluons, the quark-gluon likelihood [82] of the two leading jets is also used as input to the DNN. The DNN is trained using simulated events from signal (VBF) and background (DY, Zjj-EW, tt, and diboson) processes selected in the VBF-SR. Signal events generated with m H = 125 GeV are used in the DNN training. The last hidden layers of four intermediate networks are combined to form a single binary classifier: two networks exploit the full set of variables described above in order to optimize the separation between the VBF signal and the Zjj-EW or DY background, while the other two optimize the separation between the VBF signal and the total expected background. The first of the two networks discriminating against the total background uses all the inputs except for m µµ , while the second uses only the dimuon mass and its resolution. Every network contains three or four hidden layers, each with a few tens of nodes. All trainings are performed using a four-fold strategy [83], where 50% of the events are used for training, 25% for validation, and 25% for testing. The validation sample is used to optimize the DNN hyper-parameters, while the test sample is used to evaluate the DNN performance and for the expected distributions in the signal extraction fit. The selected training epoch maximizes the expected significance, determined using the Asimov data set [84], defined as the minimum between the significances computed from the training and validation samples.
Events belonging to the VBF-SR are divided into nonoverlapping bins based on the DNN value, independently for each data-taking period. These bins are defined to achieve optimal sensitivity, while minimizing the total number of bins. From this optimization procedure, thirteen bins are obtained in each data-taking period characterized by different bin boundaries. Given the negligible correlation between the m µµ and other input variables, the m µµ variable can be marginalized from the DNN by replacing the m µµ with a fixed value of 125 GeV during the DNN evaluation. The resulting DNN score is not significantly correlated with the m µµ . This mass-decorrelated DNN is used for events in the VBF-SB region and captures the main features of the DNN distribution in the VBF-SR. The signal is extracted from a binned maximum-likelihood fit to the output of the DNN discriminator performed simultaneously over the VBF-SR and VBF-SB regions. Because of significant variations in the detector response to forward jets during different data-taking periods, the fit is performed separately for data collected in 2016, 2017, and 2018. The contributions of the various background processes are estimated from simulation, following the same strategy employed in the measurement of the Zjj-EW cross section with 13 TeV data [74]. This simulation-based strategy yields, in the VBF category, an improvement in sensitivity of about 20% compared to an alternative strategy in which the background determination is entirely based on data. In this alternative analysis, a multivariate classifier is used to divide events into subcategories with different signal purity, and the signal is extracted by fitting the m µµ distribution in each subcategory to parametric functions as in ref. [23]. In such data-driven analyses, the precision of the background estimate strictly depends on -10 -JHEP01(2021)148 the number of observed events in the mass sidebands, thereby limiting the performance in the high purity subcategories that contain a small number of events. In contrast, the approach presented here relies on the precision with which the simulation is able to predict the different background components. The uncertainty in this prediction is validated and constrained using the signal-depleted sideband regions.
Theoretical uncertainties affect both the expected rate and the shape of signal and background histograms (templates) used in the fit. The Higgs boson production cross section for the various modes, and their corresponding uncertainties, are taken from ref. [22]. These include uncertainties in the choice of the PDF, as well as the QCD renormalization (µ R ) and factorization (µ F ) scales. The uncertainty in the prediction of B(H → µ + µ − ) is also considered. For the VBF process, uncertainties in the modelling of the p T (H), p T (Hjj), jet multiplicity, and m jj distributions are considered. Their total uncertainty on the VBF signal prediction is about 2-4%. Similarly, for the ggH process, seven independent additional sources are included to account for the uncertainty in the modelling of the p T (H) distribution, the number of jets in the event, and its contamination in the VBF selected region, as described in ref. [22]. The magnitude of these uncertainties for ggH events in the VBF category varies from about 15 to 25%. The theoretical uncertainties described so far affect also the signal prediction in the ggH, ttH, and VH production categories reported in the next sections. For each background process, template variations are built by changing the values of µ R and µ F by factors of 2 and 0.5 from the default values used in the ME calculation, excluding the combinations for which µ R /µ F = 0.25 or 4, as well as by comparing the nominal distributions with those obtained using the alternative PDFs of the NNPDF set. These theoretical uncertainties are correlated across years and regions (VBF-SR and VBF-SB) but are uncorrelated between processes. The shape uncertainty arising from the PS model is assessed by varying several parameters that control the properties of the ISR and FSR jets produced by pythia. The Zjj-EW and VBF signal simulations are very sensitive to the PS model, as shown in refs. [72,74]. A conservative PS uncertainty is assigned to the Zjj-EW background and VBF signal, defined as the full symmetrized difference between pythia (dipole shower) and herwig (angular-ordered shower) predictions in each DNN bin, which is larger than that obtained by varying the PS ISR and FSR parameters.
Several sources of experimental uncertainty are taken into account for both signal and background processes. These include the uncertainty in the measurement of the integrated luminosity, in the modelling of the pileup conditions during data taking, in the measurement of the muon selection and trigger efficiencies, in the muon momentum scale and resolution, in the efficiency of vetoing b quark jets, and in the jet energy scale and resolution. If not explicitly mentioned, experimental uncertainties are considered correlated across event categories and data-taking periods. Most of the sources of uncertainty affecting the jet energy scale are correlated across processes and years, while those affecting the jet energy resolution are only correlated across processes but not across years. The uncertainty in the measurement of the integrated luminosity is partially correlated across years. The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the 2.3-2.5% range [85][86][87], while the total integrated luminosity has an uncertainty of 1.8%. The improvement in precision reflects the (uncorrelated) -11 -JHEP01(2021)148 time evolution of some systematic effects. During the 2016 and 2017 data-taking periods, a gradual shift in the timing of the inputs of the ECAL L1 trigger in the forward endcap region (|η| > 2.4) led to a specific inefficiency. A correction for this effect was determined using an unbiased data sample and is found to be relevant in events with high-p T jets with 2.4 < |η| < 3.0. This correction is about 2 (3)% at m jj = 400 GeV in the 2016 (2017) data-taking period and it increases to about 6 (9)% for m jj > 2 TeV. A systematic uncertainty corresponding to 20% of this correction is considered. Lastly, a significant fraction (about 30-35%) of the DY background populating bins with low DNN score is comprised of events in which either the leading or subleading jet are in the forward region of the detector (|η| > 3.0) and are not matched with a jet at the generator level. These jets originate either from the soft emissions produced by the PS or from pileup interactions. The normalization of this term is left floating in the fit and is directly constrained by the observed data events with low DNN score belonging to the VBF-SR and VBF-SB regions. Because of significant variations in the detector response in the forward region over time, these normalization parameters are considered uncorrelated across years. The normalization of the remaining DY component with at least two matched jets is taken from the simulation and constrained, as for the other background processes, within the systematic uncertainties described above.
The uncertainty arising from the limited size of simulated samples is also taken into account by allowing each bin of the total background template to vary within the corresponding statistical uncertainty using the Barlow-Beeston lite technique [88,89]. These uncertainties are uncorrelated across the bins of the DNN templates used in the fit. Systematic uncertainties are modelled in the fit as nuisance parameters with log-normal or Gaussian external constraints. Figure 1 shows the observed and predicted distributions of the DNN discriminant in the VBF-SR. The background prediction is obtained from a simultaneous signal-plusbackground (S+B) fit performed across the VBF-SR and VBF-SB regions, as well as data-taking periods. The post-fit distributions for the Higgs boson signal produced via ggH (solid red) and VBF (solid black) production with m H = 125.38 GeV are overlaid. The blue histogram indicates, instead, the total signal extracted from the fit. Similarly, figure 2 shows the distributions of the DNN discriminant in the VBF-SB, obtained after performing the same S+B fit. Figure 3 shows the observed and predicted DNN output distributions in the VBF-SB (upper) and VBF-SR (lower) regions for the combination of 2016, 2017, and 2018 data. Since the bin boundaries are optimized separately per data-taking period, the distributions are combined by summing the corresponding observed and predicted number of events in each individual bin. The lower panel shows the ratio between the data and the post-fit background prediction, with the best fit signal contribution indicated by the blue line in the VBF-SR. Finally, table 2 reports, for each bin or group of bins of the DNN output in the VBF-SR, the expected number of VBF and ggH signal events (S), the observed number of events in data, the total background prediction (B) and its uncertainty (∆B), and the S/(S+B) and S/ √ B ratios obtained by summing the post-fit estimates from each of the three data-taking periods.    Table 2. Event yields in each bin or in group of bins defined along the DNN output in the VBF-SR for various processes. The expected signal contribution for m H = 125.38 GeV (S), produced via VBF and ggH modes and assuming SM cross sections and B(H → µ + µ − ), is shown. The background yields (B) and the corresponding uncertainties (∆B) are obtained after performing a combined S+B fit across the VBF-SR and VBF-SB regions and each data-taking period. The observed event yields, S/(S+B) ratios and S/ √ B ratios are also reported.

JHEP01(2021)148
Observable Selection Table 3. Summary of the kinematic selections used to define the ggH production category.

The ggH production category
An event is considered in the ggH category if it contains exactly two muons passing the baseline selection requirements detailed in section 5. Events with additional muons or electrons are rejected to avoid overlap with the VH category. Any jets considered in the event must be spatially separated (∆R > 0.4) from either of the two muons. In order to ensure mutual exclusivity with the VBF category, events containing two or more jets with p T > 25 GeV are only considered if the leading jet has p T < 35 GeV, the invariant mass of the two highest p T jets is smaller than 400 GeV, or the |∆η jj | < 2.5. Lastly, events containing at least two jets with p T > 25 GeV and |η| < 2.5 passing the loose WP of the DeepCSV b-tagging algorithm, or at least one jet passing the medium WP, are rejected, ensuring no overlap between the ggH and ttH categories. A summary of the selection criteria used to define the ggH category is reported in table 3. A multivariate discriminant based on boosted decision trees (BDTs) is employed to discriminate between signal and background events. To account for the evolution in the detector response during data-taking periods, the BDT discriminant is trained separately for the 2016, 2017, and 2018 simulated samples using the tmva package [90], resulting in three independent BDT outputs. The input variables are chosen such that the BDT discriminants are effectively uncorrelated with m µµ . This is required by the chosen analysis strategy, in which events are first divided into independent subcategories based on the BDT output, then a potential signal is extracted from each subcategory by searching for a narrow peak over a smoothly falling background in the m µµ distribution. In this category, given the prior knowledge of the expected DY background shape and the large number of data events in the mass sideband around the peak that can be used to constrain the background, this strategy provides a robust background estimate from data while maximizing the analysis sensitivity.
The BDT discriminants include input variables that describe the production and decay of the dimuon system, namely p µµ T , y µµ , φ CS , and cos θ CS . In addition, the η of each of the two muons and the ratio of each muon's p T to m µµ are also included. In order to increase the signal-to-background separation for events in which the ggH signal is produced in association with jets, the BDT discriminants also take into account the p T and η of the leading jet in the event with p T > 25 GeV and the absolute distance in η and φ between the jet and the muon pair. For events with two or more jets with p T > 25 GeV in the final state, additional inputs are included: the m jj , ∆η jj , and ∆φ jj of the two highest p T jets. The m jj , as well as the other dijet variables, is sensitive to the residual contribution from JHEP01(2021)148 VBF and VH modes, in which the vector boson decays hadronically. Furthermore, the Zeppenfeld variable defined in eq. (6.1) and the angular separation (∆η, ∆φ) between the dimuon system and each of the two leading jets are also included, which target residual VBF signal events in the ggH selected region. Lastly, the total number of jets in the event with p T > 25 GeV and |η| < 4.7 is also used as input to the BDT.
The signal simulation considered in the training of the multivariate discriminators includes the ggH, VBF, VH, and ttH processes. The ggH sample used in the training is generated via powheg since it provides positively weighted events at NLO in QCD. In later stages of the analysis, the prediction from MadGraph5_amc@nlo is used instead since it provides a more accurate description of gluon fusion events accompanied by more than one jet, as detailed in section 4. The background simulation consists of DY, tt, single top quark, diboson, and Zjj-EW processes. Only events with m µµ in the range 115-135 GeV are included in the training. Signal and background events both contain two prompt muons in the final state, and the corresponding dimuon mass resolution (σ µµ /m µµ ) does not discriminate between them. For this reason, σ µµ /m µµ is not added as an input to the BDT. Instead, signal events in the BDT training are assigned a weight inversely proportional to the expected mass resolution, derived from the uncertainties in the p T measurements of the individual muon tracks. This weighting improves the average signal σ µµ /m µµ in the high-score BDT region by assigning increased importance to the high-resolution signal events. Apart from m µµ , the p µµ T is one of the most discriminating observables in the ggH category. Discrepancies between data and simulation in the p µµ T spectrum for the DY background, similar to those reported in ref.
[91], are also observed in this analysis. In order to correctly model the p µµ T spectrum of the DY background during the training of the BDT discriminants, corrections are derived for each data-taking period by reweighting the p µµ T distribution of the DY simulation to reproduce the observation in data for dimuon events with 70 < m µµ < 110 GeV. These corrections are obtained separately for events containing zero, one, and two or more jets with p T > 25 GeV and |η| < 4.7. Figure 4 (upper) shows the BDT score distribution, comparing data to the prediction from simulation in events with 110 < m µµ < 150 GeV, where the outputs of the individual BDTs obtained in each year are combined into a single distribution. The distributions for various signal processes (ggH, VBF, and VH+ttH) are also shown. Five event subcategories are defined based on the output of these BDT discriminants. The subcategory boundaries are determined via an iterative process that aims to maximize the expected sensitivity of this analysis to H → µ + µ − decays of the SM Higgs boson. The expected sensitivity is estimated from S+B fits to the m µµ distribution in simulated events with 110 < m µµ < 150 GeV. In these fits, the Higgs boson signal is modelled using a parametric shape, the double-sided Crystal Ball function (DCB) [92]  The core of the DCB function consists of a Gaussian distribution of meanm and standard deviation σ, while the tails on either side are modelled by a power-law function with parameters α L and n L (low-mass tail), and α R and n R (high-mass tail). The total expected background is modelled with a modified form of the Breit-Wigner function (mBW) [23], where the parameters m Z and Γ Z are fixed to the measured Z boson mass of 91.19 GeV and width 2.49 GeV [93], and the parameters a 1 , a 2 , and a 3 are free to float. A first boundary is selected by optimizing the total expected significance against all possible boundaries defined in quantiles of signal efficiency. This strategy accounts for the slight differences in the BDT shapes among data-taking periods for both signal and background processes. This process is repeated recursively to define additional subcategory boundaries until the further gain in the expected significance is less than 1%. The optimized event categories are labelled as "ggH-cat1", "ggH-cat2", "ggH-cat3", "ggH-cat4", and "ggH-cat5" corresponding to signal efficiency quantiles of 0-30, 30-60, 60-80, 80-95, and >95%, respectively. The grey

JHEP01(2021)148
vertical bands in figure 4 (upper) indicate the small range of variation, among the datataking years, of the BDT boundaries for the optimized event categories described above. A simultaneous binned maximum-likelihood fit to the observed m µµ distributions is performed over the mass range 110-150 GeV to extract the H → µ + µ − signal. A bin size of 50 MeV is chosen for the m µµ distributions, which is about one order of magnitude smaller than the expected resolution of the signal peak. In each event category, simulated signal distributions from the different production modes (ggH, VBF, WH, ZH, and ttH) are modelled independently with DCB functions, and the best fit values of the DCB tail parameters are treated as constants in the final fit to the data. Them and σ parameters of the DCB function represent the peak position and resolution of the Higgs boson resonance, respectively. These are the only signal shape parameters allowed to vary in the fit. Their predicted values from simulation are constrained by Gaussian priors with widths corresponding to the muon momentum scale (up to 0.2%) and resolution uncertainties (up to 10%) in each event category. Figure 4 (lower) shows the total signal model for m H = 125 GeV obtained by summing the contributions from the different production modes in the best and the worst resolution subcategories of the ggH category, ggH-cat4 and ggH-cat1, where HWHM represents the half-width at half maximum of the signal peak. The category with the highest signal purity (ggH-cat5) uses particular kinematic features (p µµ T , ∆η and ∆φ between the dimuon system and jets) to isolate the signal, while ggH-cat4 relies more heavily on the m µµ resolution itself. Therefore, the mass resolution for signal events in ggH-cat4 is expected to be about 2% better than in ggH-cat5.
The theoretical and experimental sources of systematic uncertainties affecting the expected signal rate in each event category are similar to those described in the VBF analysis. Experimental uncertainties in the measurement of the muon selection efficiencies (0.5-1% per event category), jet energy scale (1-4% per event category) and resolution (1-6% per event category), the modelling of the pileup conditions (0.3-0.8% per event category), the integrated luminosity, and the efficiency for vetoing b quark jets (0.1-0.5% per event category) are considered. Theoretical uncertainties in the prediction of the Higgs boson production cross section, decay rate, and acceptance are also included, corresponding to a total uncertainty in the ggH yield ranging from 6-12% depending on the event category. Rate uncertainties are modelled in the signal extraction as nuisance parameters acting on the relative signal yield with log-normal constraints.
The background contribution in each subcategory is modelled with parametric functions. No prior knowledge of the shape parameters of these functions or the yield of the total background is assumed. These parameters are therefore constrained directly by the observed data in the S+B fit. Since the background composition expected from simulation is very similar across subcategories and largely dominated by the DY process, the background shape in m µµ is similar in all event categories. There are, however, variations in the overall slope of the m µµ spectrum across the BDT score categories. The function describing the background in each event category is therefore defined as the product of a "core" shape that is common among all event categories, with parameters correlated across categories, and a Chebyshev polynomial term (shape modifier) specific to each event category that modulates the core shape. This background modelling approach is referred to as the "core-JHEP01(2021)148 pdf method". The core background shape is obtained from an envelope of three distinct functions: the mBW defined in eq. (7.2), a sum of two exponentials, and the product of a nonanalytical shape derived from the fewz v3.1 generator [57] and a third-order Bernstein polynomial. Each of these functions contains three freely floating shape parameters. The nonanalytical shape derived from the fewz generator is obtained by simulating DY events at NNLO precision in QCD and NLO accuracy in EW theory and interpolating the resulting m µµ distribution using a spline function [94,95]. In a given subcategory, each of the three core functions is modulated by either a third-(ggH-cat1 and ggH-cat2) or a second-order polynomial, with parameters uncorrelated across event categories. A discrete profiling method [96] is employed, which treats the choice of the core function used to model the background as a discrete nuisance parameter in the signal extraction.
The following strategy is adopted to estimate the uncertainty in the measured signal due to the choice of parametric function for the background model. In each event category, background-only fits to the data are performed using different types of functions: the mBW, a sum of two exponentials, a sum of two power-law functions, a Bernstein polynomial, the product between the nonanalytical shape described above and a Bernstein polynomial, the product between the "BWZ" function, defined as and a Bernstein polynomial, and the "BWZγ" function [97] BWZγ(m µµ ; a, f, m Z , The BWZγ function is the sum of a Breit-Wigner function and a 1/m 2 µµ term, which are used to model the Z boson and the photon contributions to the m µµ spectrum in DY events, respectively. Both terms are multiplied by an exponential function to approximate the effect of the PDFs. The BWZ function is a Breit-Wigner distribution with an exponential tail. For the functions including Bernstein polynomials, a Fisher test [98] is used to determine the maximum degree of the polynomials to be considered in the fit. The chosen functional forms fit the data with a χ 2 probability larger than 5% in all event categories. Pseudodata sets are generated across all event categories from the post-fit background shapes obtained for each type of function in each subcategory, taking into account the uncertainties in the fit parameters as well as their correlations, and injecting a given number of signal events. Signal-plus-background fits are performed on the pseudodata sets using the core-pdf method. The median difference between the measured and injected signal yields, relative to the post-fit uncertainty in the signal yields, gives an estimate of the bias due to the choice of the background model. The bias measured in each BDT category, as well as from pseudodata sets in which the signal is injected simultaneously in all event categories, is smaller than 20% of the post-fit uncertainty on the signal yield. Including these observed deviations as spurious signals leads to a change in the overall uncertainty in the measured signal rate of less than 1% and is therefore neglected. The core-pdf method employed in this -20 -  Table 4. The total expected number of signal events with m H = 125.38 GeV (S), the ratio of the expected contributions from different production modes to the total signal yield ("Other" represents the sum of VH, ttH, and bbH contributions), the HWHM of the signal peak, the estimated number of background events (B) and the observation in data within ± HWHM, and the S/(S+B) and the S/ √ B ratios within ± HWHM, for each of the optimized ggH event categories.

JHEP01(2021)148
analysis yields an improvement in sensitivity of about 10% with respect to the background functions used in the previous result [23]. It also ensures a negligible bias in the measured signal with significantly fewer total degrees of freedom in the signal extraction fit. Figure 5 shows the m µµ distributions in each of the ggH subcategories, in which the signal is extracted by performing a binned maximum-likelihood fit using a DCB function to model the signal contribution, while the background is estimated with the core-pdf method. Table 4 reports the total number of expected signal events (S), the signal composition in each ggH category, and the HWHM of the expected signal shape. In addition, the estimated number of background events (B), the observation in data, the S/(S+B), and the S/ √ B ratios computed within the HWHM range around the signal peak are listed.

The ttH production category
The ttH process has the smallest cross section among the targeted Higgs boson production modes at the LHC. However, the presence of a top quark-antiquark pair in addition to the Higgs boson helps to reduce the background to a level that is comparable to the expected signal rate. The top quark decays predominantly into a b quark and a W boson [93], therefore a sample of events enriched in ttH production is selected by requiring the presence of at least two jets passing the loose WP of the DeepCSV b-tagging algorithm, or at least one b-tagged jet passing the medium WP. This requirement suppresses background processes in which jets originate mainly from the hadronization of light-flavour quarks, such as DY and diboson production. This selection also ensures mutual exclusivity between the ttH category and the other production categories considered in this analysis.
In order to increase the signal selection efficiency in events with large hadronic activity, as expected for the ttH signal process, the isolation requirement on all muons described in section 5 is relaxed to be less than 40% of the muon p T . In addition, the isolation cone size decreases dynamically with the muon p T (R = 0.2 for p T < 50 GeV, R = 10/p T for 50 < p T < 200 GeV, and R = 0.05 for p T > 200 GeV), following the approach used in ref. [99]. Electron candidates are required to have p T > 20 GeV, |η| < 2.5, and to pass identification requirements imposed on the properties of the ECAL cluster associated with the  Jet multiplicity (p T > 25 GeV, |η| < 4.7) ≥3 ≥2 Low-mass resonance veto m > 12 GeV Jet triplet mass 100 < m jjj < 300 GeV - Table 5. Summary of the kinematic requirements used to define the ttH hadronic and leptonic production categories.
electron track, as well as the consistency between the electron momentum measured by the inner silicon tracker and its ECAL energy deposit. Each electron is also required to be isolated following the same strategy as for muons, and the magnitude of the transverse and longitudinal impact parameters must be smaller than 0.05 and 0.1 cm, respectively. In order to suppress backgrounds containing nonprompt leptons produced in the decay of heavy quarks, muons and electrons are rejected when the jet with p T > 15 GeV that is nearest to the lepton in ∆R separation is b-tagged according to the DeepCSV medium WP. Furthermore, all muons and electrons in the ttH category are required to pass the medium WP of a multivariate lepton identification discriminant specifically designed to reject nonprompt leptons [100], resulting in a selection efficiency of about 95 (92)% per prompt muon (electron).
The ttH signal events may contain additional charged leptons, depending on the decay of the top quarks. Events with one or two additional charged leptons in the final state are grouped in the ttH leptonic category. An event in the ttH leptonic category containing three (four) charged leptons is further required to have the net sum of the lepton electric charges equal to one (zero). In the case of events with more than one pair of oppositely charged muons with 110 < m µµ < 150 GeV, the pair with the largest dimuon p T is chosen as the Higgs boson candidate. The invariant mass of each pair of same-flavour, oppositesign leptons is required to be greater than 12 GeV to suppress backgrounds arising from quarkonium decays. An event is vetoed if it contains a pair of oppositely charged electrons or muons with an invariant mass in the range 81-101 GeV, consistent with the decay of an on-shell Z boson. In contrast, events with exactly two oppositely charged muons with 110 < m µµ < 150 GeV, no identified electrons, and at least one combination of three jets in the final state with invariant mass (m jjj ) between 100 and 300 GeV belong to the ttH hadronic category. Each jet must have p T > 25 GeV and |η| < 4.7. A summary of the selection criteria used to define the ttH hadronic and leptonic categories is reported in table 5.
The dominant background in the ttH hadronic category comes from fully leptonic tt decays, while the main backgrounds in the ttH leptonic category are the ttZ and tt processes. In order to obtain an optimal discrimination between the ttH signal and the expected backgrounds, BDT-based multivariate discriminants are trained in both the -23 -

JHEP01(2021)148
hadronic and leptonic categories. The input variables are chosen to account for both the kinematic properties of the dimuon system and the properties of the top quark decay products, while ensuring that the BDT outputs remain uncorrelated with m µµ . A common set of observables is used as input to the two BDT discriminants. These include variables that characterize the production and decay of the Higgs boson candidate, namely the p µµ T , y µµ , φ CS , and cos θ CS . In addition, the η of each of the two muons and the ratio of each muon's p T to m µµ are also considered. To account for the large hadronic activity in ttH signal events, the p T and η of the three leading jets, the maximum DeepCSV value of jets not overlapping with charged leptons (∆R( , j) > 0.4), the number of jets, and the scalar (vectorial) p T sum H T (| H miss T |) of all identified leptons and jets (p T > 25 GeV, |η| < 2.5) are included. The p miss T is also considered along with the ∆ζ variable [101], which is defined as the projection of the p miss T on the bisector of the dimuon system in the transverse plane. Signal events are weighted during the BDT training with the inverse of the per-event mass resolution, following the same approach used in the ggH categories.
In the ttH leptonic category, several additional variables are used in the BDT discriminant that target the kinematic properties of a leptonic top quark decay. These include the azimuthal separation between the Higgs boson candidate and the highest p T additional charged lepton ( t ), the invariant mass formed by t and the jet with the highest DeepCSV score, the transverse mass formed by t and p miss T in the event, and the flavour of t . In the ttH hadronic category, the resolved hadronic top tagger (RHTT), which combines a kinematic fit and a BDT-based multivariate discriminant, is used to identify top quark decays to three resolved jets following a similar approach to the one reported in ref. [102]. The jet triplet with 100 < m jjj < 300 GeV and the highest RHTT score is selected as a hadronic top quark candidate. The corresponding RHTT score is used as input to the BDT discriminant. Furthermore, the p T of the top quark candidate and the p T balance of the top quark and the muon pair are also considered. Figure 6 shows the output of the BDT discriminant in the ttH hadronic (upper) and leptonic (lower) categories. The high BDT score region of the ttH hadronic category is enriched in events with large jet multiplicity, where the tt and DY background predictions rely on a significant number of jets from the PS and are known to not entirely reproduce the data [103]. The signal prediction, however, relies largely on jets derived from the ME calculation. Since the background prediction is extracted from the data, the observed differences between data and background simulation do not affect the fit result. Based on the BDT output, events in the ttH leptonic category are further divided into two subcategories, labelled as "ttHlep-cat1" and "ttHlep-cat2", corresponding to signal efficiency quantiles of 0-52 and >52%, respectively. Similarly, events in the ttH hadronic category are divided into three subcategories labelled "ttHhad-cat1", "ttHhad-cat2", and "ttHhad-cat3", corresponding to signal efficiency quantiles of 0-70, 70-86, and >86%, respectively. The BDT score boundaries of these event categories, indicated in figure 6 by black dashed vertical lines, are optimized following the same strategy adopted for events in the ggH category. In the optimization, exponential functions are used to model the background in both the ttH hadronic and leptonic subcategories.   Figure 7 shows the m µµ distributions in the ttH hadronic and leptonic event categories. The signal is extracted by performing a binned maximum-likelihood fit to these m µµ distributions (bin size of 50 MeV), where signal is modelled using the DCB function and the background is modelled using a second-order Bernstein polynomial (Bern(2)) in ttHhad-cat1 and ttHhad-cat2, a sum of two exponentials (S-Exp) in ttHhad-cat3, and a single exponential (Exp) in the ttH leptonic event categories. Table 6 reports the expected signal composition of each ttH subcategory, along with the HWHM of the expected signal shape. In addition, the estimated number of background events, the observation in data, and the S/(S+B) and S/ √ B ratios within the HWHM of the signal shape are shown.
The systematic uncertainties considered account for possible mismodelling of the signal shape and rate. Uncertainties in the calibration of the muon momentum scale and resolution are propagated to the shape of the signal m µµ distribution, yielding variations of up to 0.1% in the peak position and up to 10% in width. Experimental uncertainties from the measurement of the electron and muon selection efficiencies (0.5-1.5% per event category), muon momentum scale and resolution (0.1-0.8% per event category), jet energy scale and resolution (2-6% per event category), efficiency of identifying b quark jets (1-3% per event category), integrated luminosity, and modelling of the pileup conditions (0.2-1% per event category) affect the predicted signal rate. Furthermore, theoretical uncertainties in the prediction of the Higgs boson production cross sections, decay rate, and acceptance are also included, as already described for the ggH, VBF, and VH analyses. Rate uncertainties are included in the signal extraction as nuisance parameters acting on the relative signal yield with log-normal constraints.   Table 6. The total expected number of signal events with m H = 125.38 GeV (S), the ratio of the expected contributions from different production modes to the total signal yield ("Other" represents the sum of tH, VBF, and bbH contributions), the HWHM of the signal peak, the functional form used for the background modelling, the estimated number of background events (B) and the observed number of events within ± HWHM, and the S/(S+B) and S/ √ B ratios computed within the HWHM of the signal peak, for each of the optimized event categories defined along the ttH hadronic and leptonic BDT outputs.
In order to estimate the potential bias arising from the choice of the parametric function used to model the background, alternative functions able to fit the data with a χ 2 p-value larger than 5% are considered. These include Bernstein polynomials, sum of exponentials, and sum of power laws. In each event category, background-only fits to the data are performed with each function listed above. From each of these fits, pseudodata sets are generated taking into account the uncertainties in the fit parameters and their correlations, and injecting a certain number of signal events. A S+B fit is then performed on these pseudodata sets using, in each category, the parametric functions listed above. The corresponding bias is observed to be smaller than 20% of the post-fit uncertainty on the signal yield and is therefore neglected in the signal extraction. The chosen functions maximize the expected sensitivity to the 125 GeV Higgs boson.

The VH production category
Events considered in the VH category contain at least two muons passing the selection requirements listed in section 5. In order to ensure no overlap with the ttH category, events containing at least two b-tagged jets with p T > 25 GeV and |η| < 2.5 passing the loose WP of the DeepCSV b-tagging algorithm, or at least one jet passing the medium WP, are discarded. Events are also required to have at least one additional charged lepton (electron or muon), which is expected from the leptonic decay of the W or Z boson. The additional muons (electrons) must have p T > 20 GeV, |η| < 2.4 (2.5), and pass certain isolation and identification requirements with an average efficiency of 95 (90)%. Furthermore, all muons and electrons in this category are required to pass the medium WP of a multivariate discriminant developed in ref. [100] to identify and suppress nonprompt leptons, with a selection efficiency of about 95 (92)% per prompt muon (electron).
Events containing exactly one additional charged lepton belong to the WH category, which targets signal events where the Higgs boson is produced in association with a leptonically decaying W boson. If the additional lepton is a muon, the two pairs of oppositely

JHEP01(2021)148
Observable WH leptonic ZH leptonic µµµ µµe 4µ 2µ2e Number of loose (medium) b-tagged jets Low-mass resonance veto m > 12 GeV charged muons are required to have m µµ > 12 GeV to suppress background events from quarkonium decays. Moreover, neither of the two oppositely charged muon pairs can have an invariant mass consistent with m Z within 10 GeV. Finally, at least one of these two muon pairs must have m µµ in the range 110-150 GeV. If both m µµ pairs satisfy this criterion, the pair with the highest p µµ T is considered as the Higgs boson candidate. If the additional lepton is an electron, the only requirement imposed is that 110 < m µµ < 150 GeV.
The ZH category targets signal events where the Higgs boson is produced in association with a Z boson that decays to a pair of electrons or muons. Events in the ZH category are therefore required to contain four charged leptons, with a combined lepton number and electric charge of zero. As in the WH category, the invariant mass of each pair of same-flavour, opposite-sign leptons is required to be greater than 12 GeV. An event is rejected if it does not contain exactly one pair of same-flavour, opposite-sign leptons with invariant mass compatible with the Z boson within 10 (20) GeV for muon (electron) pairs. In addition, each event must contain one oppositely charged muon pair satisfying 110 < m µµ < 150 GeV. For events with four muons, the muon pair with m µµ closer to m Z is chosen as the Z boson candidate, while the other muon pair is selected as the Higgs boson candidate. A summary of the selection criteria applied in the WH and ZH production categories is reported in table 7.
Two BDT discriminants are trained to discriminate between signal and background events in the WH and ZH categories. The input variables are selected such that the BDT outputs are not significantly correlated with the m µµ of the Higgs boson candidate. This is required by the chosen analysis strategy, which is analogous to that adopted for the signal extraction in the ggH category. The impact of the m µµ resolution, which evolves as a function of muon p T and η, is taken into account during the BDT training by applying weights to the simulated signal events that are inversely proportional to the per-event mass resolution, estimated from the uncertainty in the measured m µµ following the same strategy described in section 7 and 8.

JHEP01(2021)148
The BDT discriminant used in the WH category takes as inputs several variables that exploit the kinematic features of the three charged leptons in the event, as well as the p miss T . These variables include the full kinematic information, apart from the invariant mass, of the dimuon system corresponding to the Higgs boson candidate. In addition, the ∆φ and ∆η separations between the additional lepton ( W ) and the Higgs boson candidate, between W and both muons from the Higgs boson candidate, and between W and H miss T are considered. The H miss T is defined as the negative vector p T sum of all jets in the event with p T > 30 GeV and |η| < 4.7. Finally, the transverse mass of the combined W and H miss T system, the flavour of W , and the p T of W are added as inputs to the BDT. The particular kinematic properties in signal events of the W and H miss T enable a large suppression of the residual DY background. The BDT discriminant trained in the ZH category considers several input observables constructed from the lepton pair associated with the Z boson decay ( Z ) and the muon pair considered as the Higgs boson candidate (µµ H ). These include the p T and η of both Z and Higgs boson candidates, the ∆φ (∆R) between the muons (charged leptons) of the µµ H ( Z ) system, m Z , ∆η(µµ H , Z ), and the cosine of the polar angle between the µµ H and Z candidates. The flavour of the lepton pair associated with the Z boson decay is also included as an input variable. Figure 8 shows the output of the BDT classifiers in the WH (upper) and ZH (lower) categories. Based on these outputs, events in the WH category are further divided into three subcategories termed "WH-cat1", "WH-cat2", and "WH-cat3" corresponding to signal efficiency quantiles of 0-22, 22-70, >70%, respectively. Similarly, events in the ZH category are divided into two subcategories, labelled "ZH-cat1" and "ZH-cat2" corresponding to signal efficiency quantiles of 0-52 and >52%, respectively. The boundaries of these subcategories, defined in terms of the BDT discriminant and indicated in figure 8 by black dashed vertical lines, are chosen via the same optimization strategy adopted in the ggH and ttH categories. In the VH category, the BWZ function is used to estimate the total background instead of mBW. Figure 9 shows the m µµ distributions in the WH and ZH event categories. The signal is extracted via a binned maximum-likelihood fit in each event category, where the signal is modelled with a DCB function and the background is modelled with the BWZγ function in WH-cat1, as defined in eq. (7.4) and the BWZ function in the remaining subcategories, as defined in eq. (7.3). Table 8 reports the signal composition in the WH and ZH subcategories, along with the HWHM of the expected signal shape. In addition, the estimated number of background events, the S/(S+B) and S/ √ B ratios, and the observation in data within the HWHM of the signal peak are also listed.
The systematic uncertainties considered in this category account for possible mismodelling in the signal shape and rate. The shape of the reconstructed Higgs boson resonance, modelled using the DCB function defined in eq. (7.1), is affected by the uncertainty in the muon momentum scale and resolution. Uncertainties in the calibration of these values are propagated to the shape of the m µµ distribution, yielding variations of up to 0.2% in the peak position and up to 10% in the width. Experimental systematic uncertainties from the measurement of the electron and muon selection efficiencies (1-3% per event category), jet energy scale and resolution (0.5-2% per event category), the efficiency of vetoing b quark    Table 8. The total expected number of signal events with m H = 125.38 GeV (S), the ratio of the expected contributions from different production modes to the total signal yield, the HWHM of the signal peak, the functional form used for the background modelling, the estimated number of background events (S) and the observed number of events within ± HWHM, and the S/(S+B) and the S/ √ B ratios computed within the HWHM of the signal peak for each of the optimized event categories defined along the WH and ZH BDT outputs.
jets (1-3% per event category), the integrated luminosity, and the pileup model (0.5-2% per event category) affect the predicted signal rate. Furthermore, theoretical uncertainties in the prediction of the Higgs boson production cross section, decay rate, and acceptance are also considered. Rate uncertainties are taken into account in the signal extraction as nuisance parameters acting on the relative signal yield with log-normal constraints.
The potential bias due to the choice of the parametric function used to model the background is estimated using the same procedure employed in the ttH analysis, detailed in section 8. The set of parametric functional forms considered in the bias studies includes BWZ, BWZγ, sum of exponentials, Bernstein polynomials, and sum of power laws. The

JHEP01(2021)148
chosen parametrization maximizes the expected sensitivity without introducing a significant bias in the measured signal yield. The corresponding bias is found to be smaller than 20% and is therefore neglected in the signal extraction. The chosen functions maximize the expected sensitivity to the 125 GeV Higgs boson.

Results
A simultaneous fit is performed across all event categories, with a single overall signal strength modifier (µ) free to float in the fit. The signal strength modifier is defined as the ratio between the observed Higgs boson rate in the H → µ + µ − decay channel and the SM expectation, µ = (σB(H → µ + µ − )) obs /(σB(H → µ + µ − )) SM . The relative contributions from the different Higgs boson production modes are fixed to the SM prediction within uncertainties. Confidence intervals on the signal strength are estimated using a profile likelihood ratio test statistic [84], in which systematic uncertainties are modelled as nuisance parameters following a modified frequentist approach [104]. The profile likelihood ratio is defined as whereμ represents the value of the signal strength that maximizes the likelihood L for the data, whileθ andθ µ denote the best fit estimate for the nuisance parameters and the estimate for a given fixed value of µ, respectively. Theoretical uncertainties affecting the signal prediction are correlated among all the event categories included in the fit. Similarly, experimental uncertainties in the measurement of the integrated luminosity in each year, jet energy scale and resolution, b quark jet identification, modelling of the pileup conditions, and selection efficiencies of muons and electrons are also correlated across categories. Because of the different analysis strategy employed in the VBF category, the acceptance uncertainties from the muon energy scale and resolution are correlated only among the ggH, WH, ZH, and ttH categories. Furthermore, their effect on the position and width of the signal peak are assumed to be uncorrelated across event categories. The local p-value quantifies the probability for the background to produce a fluctuation larger than the apparent signal observed in the search region. Figure 10 (upper) shows the observed local p-value for the combined fit, and for each individual production category, as a function of m H in a 5 GeV window around the expected Higgs boson mass. The solid markers indicate the mass points for which the observed p-values are computed. Figure 10 (lower) shows the expected p-values computed for the combined fit, and for each production category, on an Asimov data set [84] generated from the background expectation obtained from the S+B fit with a m H = 125.38 GeV signal injected. The observed p-values as a function m H are compatible, within the statistical variation, with the expectation for the Higgs boson with m H = 125.38 GeV. In the ggH, VH, and ttH categories, in order to evaluate p-values for masses different from 125 GeV, signal models are derived using alternative H → µ + µ − signal samples generated with m H fixed to 120 and 130 GeV. Signal shape parameters and the expected rate for each production mode in each event category are then interpolated using a spline function within 120 < m H < 130 GeV, providing a signal    model for any mass value in the m H = 125 ± 5 GeV range. A different strategy is employed in the VBF category since m µµ is a DNN input variable. As described in section 6, the DNN output can be decorrelated from the m µµ information by fixing its value to 125 GeV. Therefore, a potential signal with mass m different from 125 GeV can be extracted by fitting the data with an alternative set of signal and background templates, obtained by shifting the mass value used as input to the DNN evaluation by ∆m = 125 GeV − m and adjusting the expected signal yields by the corresponding differences in the production cross section and decay rate. Variations in the acceptance per DNN bin as a function of ∆m are found to be negligible in the mass range of interest. This procedure is also applied to the data, yielding for each tested mass hypothesis a different observed DNN distribution to fit. Throughout the explored mass range, 120 < m H < 130 GeV, the VBF category has the highest expected sensitivity to H → µ + µ − decays, followed by the ggH, ttH, and VH categories, respectively. The observed (expected for µ = 1) significance at m H = 125.38 GeV of the incompatibility with the background-only hypothesis is 3.0 (2.5) standard deviations. The 95% CL upper limit (UL) on the signal strength, computed with the asymptotic CL s criterion [84,105,106], is also derived from the combined fit performed across all event categories. The observed (expected for µ = 0) UL on µ at 95% CL for m H = 125.38 GeV is 1.9 (0.8). Discrete fluctuations in the observed p-value for the VBF category and the combined fit arise from event migrations in data between neighbouring bins when reevaluating the VBF category DNN for different mass hypotheses, following the procedure described above. Size of simulated samples +0.07 −0.06 Table 9.  Figure 11 (upper) reports a summary of the best fit values for the signal strength and the corresponding 68% CL intervals obtained from a profile likelihood scan in each production category. The best fit signal strengths in each production category are consistent with the combined fit result as well as the SM expectation. A likelihood scan is performed in which the four main Higgs boson production mechanisms are associated to either fermion (ggH and ttH) or vector boson (VBF and VH) couplings. Two signal strength modifiers, denoted as µ ggH,tt H and µ VBF,VH , are varied independently as unconstrained parameters in the fit. Figure 11    for the nuisance parameters and signal strength are propagated to the m µµ distribution. This distribution is not used for any of the measurements presented in this paper, but only to visualize the fit result. Figure 12 (upper) shows the observed and predicted weighted m µµ distributions for events in the VBF-SB and VBF-SR regions, combining 2016, 2017, and 2018 data. The lower panel shows the residuals between the data and the post-fit background prediction, along with the post-fit uncertainty obtained from the background-only fit. The best fit signal contribution with m H = 125.38 GeV is indicated by the blue line. An excess is observed in the weighted data distribution that is consistent with the expected resonant mass distribution for the signal with m H near 125 GeV and compatible with the excess observed at high DNN score in figure 3. The signal and background distributions are then interpolated with a spline function in order to obtain a continuous spectrum that can be summed with the parametric fit results in the ggH, WH, ZH, and ttH categories. The result is combined with that obtained from data recorded at centre-of-mass energies of 7 and 8 TeV. The 7+8 TeV search described in ref. [97] has been updated using for the Higgs boson production cross sections and branching fractions the values reported in ref. [22]. Systematic uncertainties in the inclusive signal production cross sections and  B(H → µ + µ − ) are correlated across the 7, 8, and 13 TeV analyses. Experimental uncertainties affecting the measured properties of the various physics objects (muons, electrons, jets, and b quark jets), the measurement of the integrated luminosity, and the modelling of the pileup conditions are assumed to be uncorrelated between the 7+8 and 13 TeV analyses. Table 10 reports the observed and expected significances over the background-only expectation at m H = 125.38 GeV and the 95% CL ULs on µ in each production category, as well as for the 13 TeV and the 7+8+13 TeV combined fits. The combination improves, relative to the 13 TeV-only result, both the expected and the observed significance at m H = 125.38 GeV by about 1%. Figure 13 shows the observed (solid black) and the expected (dashed black) local p-values derived from the 7+8+13 TeV combined fit as a function of m H in a 5 GeV window around the expected Higgs boson mass. The expected p-value is computed on an Asimov data set generated from the background expectation obtained from the S+B fit with a m H = 125.38 GeV signal injected. As in figure 10, the solid markers indicate the mass points for which the observed p-values are computed. The best fit signal strength, and the corresponding 68% CL interval, obtained from the 7+8+13 TeV combination for the Higgs boson with mass of 125.38 GeV is 1.19 +0. 40 −0.39 (stat) +0.15 −0.14 (syst). The results presented in this paper are the most precise measurement of the H → µ + µ − decay rate reported to date, and provide the best constraint of the coupling between the Higgs boson and the muon. The signal strength measured in the H → µ + µ − analysis cannot be translated directly into a measurement of the Higgs boson coupling to muons -36 -  because it is also sensitive to the interactions between the Higgs boson and several SM particles involved in the production processes considered, primarily the top quark and vector boson couplings. These Higgs boson couplings to other particles are constrained by combining the result of this analysis with those presented in ref.

JHEP01(2021)148
[10], based on pp collision data recorded by the CMS experiment at √ s = 13 TeV in 2016 corresponding to an integrated luminosity of 35.9 fb −1 . Under the assumption that there are no new particles contributing to the Higgs boson total width, Higgs boson production and decay rates in each category are expressed in terms of coupling modifiers within the κ-framework [107]. Six free coupling parameters are introduced in the likelihood function (κ W , κ Z , κ t , κ τ , κ b , and κ µ ) and are extracted from a simultaneous fit across all event categories. In the combined fit, the event categories of the √ s = 13 TeV H → µ + µ − analysis described in [10] in the κ-framework. The best fit value for κ µ is 1.07 and the corresponding observed 68% CL interval is 0.85 < κ µ < 1.29. Right: the best fit estimates for the reduced coupling modifiers extracted for fermions and weak bosons from the resolved κframework compared to their corresponding prediction from the SM. The error bars represent 68% CL intervals for the measured parameters. In the lower panel, the ratios of the measured coupling modifiers values to their SM predictions are shown.
this paper supersede those considered in ref.
[10]. Figure 14 (upper) shows the observed profile likelihood ratio as a function of κ µ for m H = 125.38 GeV. The best fit value for κ µ (κ µ = 1.07), as well as those for the other couplings, are compatible with the SM prediction. The corresponding 68 and 95% CL intervals for the κ µ parameter are 0.85 < κ µ < 1.29 and 0.59 < κ µ < 1.50, respectively. Note that the observed (expected) significances reported in table 10 and figure 10 are computed assuming SM production cross sections and decay rates, constrained within the corresponding theoretical uncertainties. In the result presented in figure 14 (left), the freely floating coupling modifiers are allowed to simultaneously modify both Higgs boson production cross sections and decay rates within the constraint of keeping the total Higgs boson width fixed to the SM value.
In the SM, the Yukawa coupling between the Higgs boson and the fermions (λ F ) is proportional to the fermion mass (m F ), while the coupling to weak bosons (g V ) is proportional to the square of the vector boson masses (m V ). The results from the κ-framework fit can therefore be translated in terms of reduced coupling strength modifiers, defined as y V = √ κ V m V /ν for weak bosons and y F = κ F m F /ν for fermions, where ν is the vacuum expectation value of the Higgs field of 246.22 GeV [93]. Figure 14 (lower) shows the best fit estimates for the six reduced coupling strength modifiers as a function of particle mass, where lepton, vector boson, and quark masses are taken from ref. [93]. The compatibility between the measured coupling strength modifiers and their SM expectation is derived from the −2 ∆ ln(L) separation between the best fit and an alternative one, performed by fixing the six coupling modifiers to the SM prediction (κ W = κ Z = κ t = κ τ = κ b = κ µ = 1), yielding a p-value of 44%.

Summary
Evidence for Higgs boson decay to a pair of muons is presented. This result combines searches in four exclusive categories targeting the production of the Higgs boson via gluon fusion, via vector boson fusion, in association with a vector boson, and in association with a top quark-antiquark pair. The analysis is performed using proton-proton collision data at √ s = 13 TeV, corresponding to an integrated luminosity of 137 fb −1 , recorded by the CMS experiment at the CERN LHC. An excess of events over the background expectation is observed in data with a significance of 3.0 standard deviations, where the expectation for the standard model (SM) Higgs boson with mass of 125.38 GeV is 2.5. The combination of this result with that from data recorded at √ s = 7 and 8 TeV, corresponding to integrated luminosities of 5.1 and 19.7 fb −1 , respectively, increases both the expected and observed significances by 1%. The measured signal strength, relative to the SM prediction, is 1.19 +0. 40 −0.39 (stat) +0.15 −0.14 (syst). This result constitutes the first evidence for the decay of the Higgs boson to second generation fermions and is the most precise measurement of the Higgs boson coupling to muons reported to date.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF (  -41 -