Search for the lepton flavor violating decay τ → 3μ in proton-proton collisions at s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{\mathrm{s}} $$\end{document} = 13 TeV

Results are reported from a search for the lepton flavor violating decay τ → 3μ in proton-proton collisions at s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{\mathrm{s}} $$\end{document} = 13 TeV. The data sample corresponds to an integrated luminosity of 33.2 fb−1 recorded by the CMS experiment at the LHC in 2016. The search exploits τ leptons produced in both W boson and heavy-flavor hadron decays. No significant excess above the expected background is observed. An upper limit on the branching fraction ℬ(τ → 3μ) of 8.0 × 10−8 at 90% confidence level is obtained, with an expected upper limit of 6.9 × 10−8.


Introduction
In the standard model (SM) with massless neutrinos, the three lepton flavor numbers are exactly conserved. The observation of neutrino oscillations not only proves that lepton flavor is not conserved in the neutral sector, but also provides a mechanism, through neutrino loops, for lepton flavor violating (LFV) decays of charged leptons such as τ → 3µ, albeit with extraordinarily small branching fractions [1][2][3]. However, a number of SM extensions predict a much larger τ → 3µ branching fraction, including values as high as 10 −10 -10 −8 [4][5][6], accessible to current and near-future experiments. The BaBar collaboration set a limit of B(τ → 3µ) < 5.3 × 10 −8 at 90% confidence level (CL) [7]. The present best limit of <2.1 × 10 −8 at 90% CL was obtained by the Belle experiment [8]. Searches at the CERN LHC are approaching this sensitivity with 90% CL upper limits of 4.6×10 −8 from LHCb [9] and 38 × 10 −8 from ATLAS [10].
The LHCb and ATLAS results targeted τ production from heavy-flavor hadron decays and W boson decays, respectively. While many more τ leptons are produced from -1 -

JHEP01(2021)163
heavy-flavor hadron decays, the τ leptons from W decays tend to have larger transverse momentum (p T ) and are typically isolated from hadronic activity, providing an experimental signature with much less background. In this paper, we present results from the CMS experiment of the first search for the LFV decay τ → 3µ from a combination of the two independent channels (production in W boson and heavy-flavor hadron decays). Using both channels, for which CMS has comparable sensitivity, provides the best opportunity for a discovery or the lowest upper limit on the branching fraction. The data were collected at the LHC in 2016 from proton-proton (pp) collisions at a center-of-mass energy of 13 TeV, and correspond to an integrated luminosity of 33.2 fb −1 . Inclusion of charge-conjugate states is implied throughout this paper.

The CMS experiment
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. Additional forward calorimetry complements the coverage provided by the barrel and endcap detectors. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in ref. [11].
Events of interest are selected using a two-tiered trigger system [12]. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed time interval of less than 4 µs. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage.
The particle-flow algorithm [13] aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. In particular, muons are identified by matching tracks in the silicon tracker with tracks in the muon detector and verifying the energy deposited in the calorimeters is consistent with that expected for muons. The muon momentum is obtained from the curvature observed in the silicon tracker and the relative p T resolution for muons with p T < 100 GeV is 1% in the barrel and 3% in the endcaps [14].
Simulated event samples are used to validate the analysis, measure acceptance and efficiency, and estimate systematic uncertainties. For the analysis of τ leptons from W boson decays, events were simulated using MadGraph5_amc@nlo 2.5.2 [15,16] at leading order, assuming a two-Higgs-doublet model that allows for flavor changing neutral currents and LFV processes, interfaced with pythia for parton shower and hadronization descriptions. The W production and decay, as well as the τ decay, are handled by MadGraph5_amc@nlo. The p T distribution of the W boson is reweighted to match that obtained from a SM next-to-leading-order W → ν sample produced with Mad-

JHEP01(2021)163
Graph5_amc@nlo and interfaced to pythia for parton showers and hadronization. For the analysis of τ leptons from heavy-flavor hadron decays, events were simulated using pythia 8.226 [17] with the CUETP8M1 tune [18] interfaced with evtgen 1.6.0 [19] for particle decays, with the τ decay kinematics determined by phase space, rather than a particular model. All events are passed through the CMS detector simulation based on Geant4 [20]. The multiple pp collisions that occur within the same or nearby bunch crossings (pileup) are modeled by including additional minimum bias events generated with pythia with a distribution that matches the one observed in data. Simulated events are reconstructed with the same algorithms as used for data, including emulation of the triggers.

Data selection
The triggers used by this analysis evolved during the data collection period, primarily to cope with increases in the instantaneous luminosity. Most of the data were collected with an L1 trigger requirement of either three muons, two muons with at least one muon having p T > 10 GeV, or two muons with both muons having an absolute pseudorapidity of |η| < 1.6. The dimuon L1 triggers also required the two muons to have an absolute pseudorapidity difference |∆η| < 1.8. The high-level trigger required three reconstructed charged particles (tracks), of which two must be identified as muons with p T > 3 GeV and the other must have p T > 1.2 GeV. The three tracks are fitted to a common vertex and kept if the normalized χ 2 of the fit is less than 8; the vertex location is at least 2 times its uncertainty from the beamline; the p T of the combination (p 3µ T ) is greater than 8 GeV; the invariant mass (assuming a muon mass for all tracks) is in the range 1.60-2.02 GeV; and the cosine of the angle in the transverse plane between the three-track momentum vector and the vector from the beamline to the vertex is greater than 0.9. During the first half of 2016, errors in the L1 triggers used by this analysis resulted in a significant loss of efficiency for muons with |η| > 1.24. While this trigger misconfiguration is not modeled by the simulation, it is accounted for by the analysis.
Offline, all combinations of three muons in the event with a combined charge of ±1 are considered and a fit to a common vertex is attempted to make a τ candidate. The muons are required to match the ones used in the trigger, and the trigger-level selection criteria are reapplied. If either of the oppositely charged dimuon combinations from the τ candidate has an invariant mass within 20 MeV of the mass of the ω(783) or φ(1020) resonances, the candidate is rejected. Events with at least one τ candidate with an invariant mass between 1.6 and 2.0 GeV are kept for analysis by two different algorithms, one optimized for production of τ leptons in W boson decays and the other optimized for production of τ leptons in heavy-flavor hadron decays.

Selecting τ candidates
For the W boson analysis, τ candidates must pass the selection criteria described in section 3, as well as an additional veto that suppresses background arising from dimuon decays -3 -

JHEP01(2021)163
of hadronic resonances. The veto considers all pairs of oppositely charged muons with one muon from the τ candidate and one muon not associated with the τ candidate. If any of the muon pairs form a good vertex (vertex fit χ 2 probability above 5%) and have an invariant mass within twice the larger of the detector resolution or natural width of a known resonance with a dimuon decay, the τ candidate is vetoed. The checked resonances are η, ω(783), ρ (770), φ(1020), J/ψ, ψ(2S), Υ(1S), Υ(2S), Υ(3S), and Z.
The reconstructed pp interaction vertex with the largest value of summed physicsobject p 2 T is taken to be the primary interaction vertex. The physics objects are the jets, clustered using the anti-k T algorithm [21,22] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets.
To better separate signal from background, a boosted decision tree (BDT) is trained [23] using simulated signal events and background from data events in the mass sideband region (trimuon invariant mass in the range of 1.60-1.74 or 1.82-2.00 GeV). The signal sample used for training the BDT is a combination of several samples, each with a different τ lepton mass (covering the mass range 1.6-2 GeV), to avoid training on the true τ mass. Data in the mass sidebands contain combinatorial background, as well as decays, primarily of heavyflavor hadrons, where one or more hadrons are misreconstructed as muons. Simulated data were used to verify that background from charm hadron decays does not produce a peak in the trimuon mass.
The BDT uses 18 variables. The variables include a measure of the muon quality for each muon, the difference in longitudinal impact parameters with respect to the primary vertex for each pair of muons, the p T and η of the τ candidate, the χ 2 of the trimuon vertex fit, the distance in the transverse plane between the trimuon vertex and the beamline divided by the uncertainty in that distance, and the angle in the transverse plane between the trimuon momentum vector and the vector between the beamline and the trimuon vertex. The remaining variables include additional information about the event. The absolute isolation of the τ candidate [24] is the sum of the transverse momenta of the charged particles (charged isolation) and photons (neutral isolation) reconstructed using the particle-flow algorithm, with ∆R = (∆η) 2 + (∆φ) 2 < 0.5, where ∆η and ∆φ are the differences in pseudorapidity and azimuthal angle, respectively, between the directions of the particle and the τ candidate. The charged isolation only includes tracks that pass within 0.2 cm of the primary vertex in the longitudinal direction and are not one of the τ candidate constituents. The neutral isolation is corrected for pileup following the prescription in ref. [24]. The variable used in the BDT is the relative isolation, defined as the absolute isolation divided by the p T of the τ candidate.
Assuming that the only missing particle in the event is the neutrino from the W → τν decay, the neutrino p T can be determined from the negative vector sum of the transverse momenta of all other particles in the event, a quantity referred to as p miss T . A multivariate regression that uses additional information from the event [25] is applied to p miss T to reduce effects from pileup, improving the p miss T resolution by 30%. The W boson p T is defined as the sum of p miss T and p 3µ T . Furthermore, using the known mass of the W boson, the longitudinal momentum of the neutrino can be determined, up to a two-fold ambiguity.

JHEP01(2021)163
The remaining BDT variables use this information and are: both neutrino longitudinal momentum solutions, W boson p T , p miss T , the angle ∆φ in the transverse plane between p miss T and p 3µ T , and the transverse W mass 2p 3µ T p miss T (1 − cos ∆φ). The BDT is trained and tested on independent samples with no evidence of overtraining or bias. The most important variables are found to be the τ candidate relative isolation, transverse W mass, and p 3µ T .

Analysis strategy
The relationship between the τ → 3µ branching fraction and the number of signal events can be written as: is the W boson production cross section, B(W → τν) is the branching fraction of W decay to τν, A 3µ(W) is the acceptance, and 3µ(W) is the combined reconstruction, selection, and trigger efficiency for the three muons. The product of σ(pp → W + X) and and the world-average value of the ratio B(W → τν)/B(W → µν) [27]. Other sources of τ leptons, such as from Z boson or D meson decays, are neglected as either the low production cross section or BDT selection efficiency reduces their contribution to no more than a few percent of that from W boson production. Simulated samples are used to estimate the relative production of τ leptons from different sources and to determine the acceptance and efficiency of the signal. To account for differences between data and simulation, several multiplicative corrections are applied on an event-by-event basis to the simulated events. Each of the three muons has a weight associated with it, which is the product of three corrections related to the efficiency of reconstructing the track in the tracker, the efficiency of identifying the reconstructed track as a muon, and the efficiency for the trigger system to find the muon given that it was reconstructed and identified by the offline algorithm. An additional correction is applied to account for the L1 trigger misconfiguration described in section 3. The average weight from the combination of these corrections is 0.88. The difference from unity comes primarily from the trigger efficiency. The weighted events are used to determine the signal efficiency, and the uncertainties from the corrections are included as systematic uncertainties.
Since the τ invariant mass resolution is a strong function of the τ pseudorapidity, the data sample is divided into two mutually exclusive categories, barrel and endcap, corresponding to trimuon |η| < 1.6 (with an average mass resolution of 16 MeV) and |η| ≥ 1.6 (with an average mass resolution of 27 MeV), respectively. Events with a BDT score larger than a given threshold are selected and used for the final analysis. Simulated signal and sideband data events are used to set the BDT score thresholds for the barrel and endcap regions that give the most stringent expected exclusion limits. Figure 1 shows the trimuon invariant mass distributions for events passing each category, along with a background-only fit (described in section 6) and the contribution expected for a signal with B(τ → 3µ) = 10 −7 .

Systematic uncertainties
The largest systematic uncertainty is from the corrections that are used in extracting the signal efficiency. This is dominated by the L1 trigger inefficiency correction, which predominantly affects the endcap region, and is correlated between the barrel and endcap categories. The other simulation correction uncertainties are uncorrelated between the two categories. The second largest systematic uncertainty arises from the limited size of the simulated samples and is uncorrelated between the two categories. The remaining uncertainties come from the integrated luminosity [28], the W boson production cross section, and the W boson branching fractions, all of which are correlated between the barrel and endcap categories. The systematic uncertainties are summarized in table 1.

Search for τ → 3µ in heavy-flavor hadron decays
The measurement of the τ → 3µ branching fraction for τ leptons produced in charm and bottom decays is complicated by uncertainties in the production of heavy-flavor hadrons. These uncertainties are reduced by utilizing the decay D + s → φπ + → µ + µ − π + to normalize the signal yield.
Simulated samples are used to estimate the relative production of τ leptons from different sources and to determine the acceptance and efficiency of the signal and normalization modes. Four samples are used to extract the acceptance and efficiency. The first is a sample of D + s → τ + ν decays. The second and third samples contain the inclusive B + → τ + X and B 0 → τ + X decays, respectively. The fourth sample contains events. For all samples, the heavy-flavor decays are simulated with evtgen 1.6.0, with the τ → 3µ decay occurring via phase space. In the first and fourth samples, the D + s mesons can be produced by hadronization or from b hadron decays. The acceptance A is the fraction of events in which all tracks of the τ or D + s decay have |η| < 2.4, the muons have p > 2.5 GeV, and the pion (if present) has p T > 1 GeV. The efficiency is the product of the reconstruction and selection efficiency reco and the trigger efficiency trig .

Selecting τ candidates
For the heavy-flavor analysis, τ candidates must pass the selection criteria described in section 3 and the lowest-p T track must have p T > 2 GeV. The trimuon sample is divided into a signal region (invariant mass of 1.75-1.80 GeV) and a sideband (background) region (invariant mass of 1.60-1.75 or 1.80-2.00 GeV). The normalization channel D + s → φπ + → µ + µ − π + uses the same selection criteria with a few exceptions. Only two muons are required and they must be oppositely charged with an invariant mass between 1 and 1.04 GeV. The track associated with the pion must have p T > 2 GeV and form a vertex with the two muons with a normalized χ 2 less than 5. The three-track invariant mass must be in the range 1.68-2.02 GeV, with the signal region defined as 1.93-2.01 GeV and the sideband region as 1.70-1.80 GeV. If there is more than one τ or D + s candidate in an event, the one with the smallest vertex fit χ 2 is selected. Once a candidate is found, its trajectory is extrapolated to the beamline and the primary vertex is selected as the reconstructed pp collision vertex that is closest to the extrapolated point.
To improve the signal-to-background ratio for the τ → 3µ sample, a BDT is trained using simulated signal events (including τ leptons produced from both charm and bottom decays) and background events from the data sideband region. The training utilizes 10 variables: the smallest muon momentum, three distinct muon quality criteria (each using the "worst" value of the three muon candidates), the χ 2 of the trimuon vertex fit, the angle between the trimuon momentum vector and the vector connecting the primary and trimuon vertices, the distance between the trimuon vertex and the primary vertex divided by the uncertainty in that distance, the smallest transverse impact parameter of the muons with respect to the primary vertex, and two isolation variables. The first isolation variable JHEP01 (2021)163 is the smallest distance of closest approach to the trimuon vertex of all other tracks in the event with p T > 1 GeV. The second isolation variable sums the p T of all tracks with p T > 1 GeV, ∆R < 0.3 with respect to the muon candidate, and with a distance of closest approach with respect to the muon candidate below 1 mm, and divides this sum by the muon candidate p T . The largest value of the isolation parameter among the three muons is used by the BDT.
The BDT is trained and tested on independent samples with no evidence of overtraining. The BDT output was also verified to be independent of the trimuon invariant mass. A BDT for the normalization mode is similarly trained using the same 10 variables (modified to account for one less muon). The efficiency as a function of the BDT requirement is measured with both actual and simulated data for the normalization mode. The largest discrepancy, 5%, is taken as a systematic uncertainty associated with modeling the BDT efficiency.
To improve the sensitivity of the analysis, the τ candidates are separated into six categories depending on the BDT score and the trimuon invariant mass resolution (the ratio of the mass uncertainty σ m , calculated from propagating the track parameter uncertainties, to the invariant mass m). There are three mass resolution bins: σ m /m ≤ 0.7%, 0.7% < σ m /m < 1.0%, and σ m /m ≥ 1.0%, with average mass resolutions of 12, 19, and 25 MeV, and labeled A, B, and C, respectively. The first and last bins roughly correspond to barrel and endcap events, respectively. Each mass resolution bin is then divided into three bins based on the BDT score. The highest two BDT bins in each mass resolution bin are used in the search, with the highest signal-to-background bin given the label "1" and the other "2". Thus, the six categories are labeled A1, A2, B1, B2, C1, and C2. The values of the two BDT bin boundaries in each mass resolution bin are determined independently by simultaneously scanning both values to find the result that gives the best expected upper limit on B(τ → 3µ). The trimuon invariant mass distribution for each category is shown in figure 2, along with a background-only fit (described in section 6) and the contribution expected for a signal with B(τ → 3µ) = 10 −7 .

Signal yield normalization
Results from simulation indicate that the τ leptons in the data sample overwhelmingly come from three disjoint sources: prompt D meson decays (the D meson is not from a b hadron decay), b hadron decays (directly from b hadron decays), and nonprompt D meson decays (the D is from a b hadron decay), with contributions of 65, 25, and 10%, respectively. More than 95% of the τ leptons produced from charm meson decays are from D + s meson decays, with the remainder from D + meson decays. Approximately 75% of the signal is expected to come from the L1 dimuon trigger, and can be directly calibrated using D + s → φπ + → µ + µ − π + events since they pass the same trigger. The remaining 25% of the expected signal is obtained exclusively from the L1 trimuon trigger. As detailed in section 6, the final results are obtained from a fit that uses both the expected number of background events and the relationship between B(τ → 3µ) and the expected number of signal events. While this relationship can be obtained from an equation similar to eq. (4.1), the heavy-flavor production cross sections have large uncertainties. To mitigate this, and correct for effects like the L1 trigger misconfiguration during the first half of 2016, we extract the expected signal yields using methods based on control samples in data to calibrate the production of τ leptons.

Yield of events from dimuon L1 triggers
The expected number of τ → 3µ signal events from D + s meson decays that pass the dimuon L1 triggers, denoted N sig(D) , is related to B(τ → 3µ) by: 1) where N norm is the measured D + s → φπ + → µ + µ − π + yield, A, reco , and trig are the detector acceptance, selection efficiency, and trigger efficiency for the two channels, respectively, and the branching fractions are B (D + s Figure 3 (left) shows the µµπ invariant mass distribution with fits to the D + and D + s peaks using Crystal Ball functions [29] for the signal and an exponential function for the background, from which N norm can be extracted from the peak on the right. Note that N sig (D) includes contributions from directly produced D + s mesons and D + s mesons from b hadron decays. To evaluate the degree to which the normalization mode mimics the signal mode, the ratio of the D + s → φπ + → µ + µ − π + yield to the number of signal sideband events is measured for seven different run periods. Assuming these seven values are measuring the same quantity, we use the scale-factor method [27] to derive a systematic uncertainty of 10%.

JHEP01(2021)163
The expected number of τ → 3µ signal events from decays of the form B → τ + X coming from the dimuon L1 triggers, denoted N sig(B) , is related to B(τ → 3µ) by: The fraction f can be calculated as f = σ(pp → B)B(B → D + s + X)/σ(pp → D + s ). Since the D + s mesons produced from b hadron decays will tend to decay farther from the pp collision vertex than directly produced D + s mesons, we use the proper decay length distribution to measure f . The proper decay length is LM/p where L is the distance between the primary vertex and the µµπ vertex, M is the µµπ invariant mass, and p is the µµπ momentum. Figure 3 (right) shows the proper decay length distribution for D + s mesons in which the background has been subtracted using the invariant mass sidebands. The proper decay length distribution shapes for D + s mesons directly produced (open histogram) and from B decays (shaded histogram) are obtained from simulation. The data distribution is fit to a linear sum of these two simulation shapes, yielding a measured value of f = 0.267 ± 0.015. The value from simulation of 0.240 ± 0.001 is used in the analysis and the difference between the two values is included as a systematic uncertainty.
The small contributions from D + → τ + X and B 0 s → τ + X decays are added by scaling the D + s → τ + ν and B → τ + X predictions by 0.04 and 0.12, respectively, as determined from simulation. A systematic uncertainty equal to the total contribution in each case is assessed. The much smaller contribution (∼0.1%) from direct b baryon decays is not included.
Uncertainties in the ratios of event selection acceptances (A 3µ(D) /A µµπ and A 3µ(B) /A µµπ ) are estimated by changing the parton distribution function sets in the corresponding simulated events. Although the acceptances change by up to 7%, the ratios remain constant within O(1%), consistent with the statistical uncertainty associated with the size of the simulated samples. In the ratio reco 3µ / reco µµπ , the muon reconstruction efficiency does not cancel exactly since the numerator refers to events with three muons and the denominator to events with only two. We derive data-to-simulation corrections for the muon reconstruction efficiency in bins of muon p T and η using the tag-and-probe method [30] applied to J/ψ → µµ data events. These additional corrections are then applied to signal events. The systematic uncertainty in the correction is estimated to be 1.5%.

Yield of events exclusively from trimuon L1 triggers
As described in section 3, the data are collected using both dimuon and trimuon triggers. The data collected using trimuon triggers cannot be directly normalized to D + s → φπ + → -11 -

JHEP01(2021)163
Source of uncertainty Uncertainty (%) Yield (%) D + s normalization 10 10 Number of events from L1 trimuon trigger 12 3 Acceptance ratio A 3µ /A µµπ 1 1 Muon reconstruction efficiency 1 1 BDT requirement efficiency 5 5 Total 16 Table 2. The sources of systematic uncertainties in the heavy-flavor analysis affecting signal modeling and their impact on the expected signal event yield. The columns labeled Uncertainty and Yield give the relative uncertainty associated with the source, and the resulting effect on the yield, respectively.
µ + µ − π + , as this decay only contains two muons. The simulation predicts that the fraction of signal events triggered exclusively through the L1 trimuon trigger is 33% of the events passing the L1 dimuon triggers. When measured from events in the sideband region, this ratio is found to be 35% using data collected after the initial trigger problems were fixed, a 6% difference. The data-to-simulation correction for the dimuon trigger, measured in D + s → φπ + → µ + µ − π + events, is 0.90 for the same data-taking period. We scale up the dimuon-triggered predicted yields for this data-taking period by the simulation value of 33% and assign a systematic uncertainty of 12% to account for the observed 6 and 10% differences. For the initial data-taking periods, the expected yield is scaled by the ratio of the trimuon trigger rates in the early and late periods, with the same 12% uncertainty.

Systematic uncertainties
The systematic uncertainties associated with the expected signal event yield, as described previously, are summarized in table 2. In addition, systematic uncertainties related to the signal and background shapes are evaluated. The signal invariant mass shape uncertainties are estimated by comparing data and simulation results for the fitted value of the mean and resolution in D + s → φπ + → µ + µ − π + decays. The mean value is found to be 0.07% higher in simulation and therefore the mass in the signal simulation is shifted by −0.07% with a systematic uncertainty of 0.07%. The resolution is found to be 2% smaller in simulation and thus the signal simulation resolution is increased by 2%, with a systematic uncertainty of 2.5%, consistent with the statistical precision of the measurement. The uncertainty in the background shape is obtained by varying the functional form from the default exponential to a third-order polynomial and a power-law function. This is found to contribute an uncertainty of less than 1%.

Results
The branching fraction B(τ → 3µ) is extracted from a simultaneous unbinned maximum likelihood fit to the trimuon invariant mass distribution  in the two categories of the W boson analysis and the six categories of the heavy-flavor analysis.
For the W boson analysis, the signal model is a Gaussian function with fixed mean and width, as determined from fitting the simulated events in the appropriate category. For the heavy-flavor selection, the signal model is a Gaussian plus Crystal Ball function [29] with fixed mean and width, as determined from fitting the simulated events in the appropriate category and modified as described in section 5.3. In all cases, the background model is an exponential function with parameters and normalization determined by the fit.
As can be seen in the trimuon invariant mass plots of figures 1 and 2, no evidence for a τ → 3µ signal is found. Upper limits on B(τ → 3µ) are determined from a fully frequentist method [31] based on modified profile likelihood test statistics and the CL s criterion [32,33]. Systematic uncertainties are incorporated in the analysis via nuisance parameters. Uncertainties are assumed to be uncorrelated between the two channels. A log-normal probability density function is assumed for the nuisance parameters affecting the corrected signal yields. Events from data and simulation that pass the selection criteria of both analyses are removed from the heavy-flavor analysis in the combined fit.

Summary
The results of a search for the lepton flavor violating decay τ → 3µ, using proton-proton collisions with a center-of-mass energy of 13 TeV at the LHC, are presented. The search uses data collected by CMS in 2016, corresponding to an integrated luminosity of 33.2 fb −1 , and, for the first time, combines the result of two analyses: one targeting τ leptons produced in W boson decays and the other using τ leptons from heavy-flavor hadron decays. No signal is observed, and the branching fraction B(τ → 3µ) is determined to be less than 8.0 × 10 −8 at 90% confidence level, with an expected upper limit of 6.9 × 10 −8 . While the limit obtained in this measurement is still a factor of four away from the current most restrictive one from the Belle experiment [8], we have achieved similar sensitivity to that by BaBar [7] and LHCb [9].

JHEP01(2021)163
Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF ( Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.