Search for lepton flavour violating decays of the Higgs boson to $\mu\tau$ and e$\tau$ in proton-proton collisions at $\sqrt{s}=$ 13 TeV

A search for lepton flavour violating decays of the Higgs boson in the $\mu\tau$ and e$\tau$ decay modes is presented. The search is based on a data set corresponding to an integrated luminosity of 35.9 fb$^{-1}$ of proton-proton collisions collected with the CMS detector in 2016, at a centre-of-mass energy of 13 TeV. No significant excess over the standard model expectation is observed. The observed (expected) upper limits on the lepton flavour violating branching fractions of the Higgs boson are $\mathcal{B}$(H$\to\mu\tau$)<0.25% (0.25%) and $\mathcal{B}$(H$\to$e$\tau$)<0.61% (0.37%), at 95% confidence level. These results are used to derive upper limits on the off-diagonal $\mu\tau$ and e$\tau$ Yukawa couplings $\sqrt{|{Y_{\mu\tau}}|^{2}+|{Y_{\tau\mu}}|^{2}}<1.43 \times 10^{-3}$ and $\sqrt{|{Y_{\mathrm{e}\tau}}|^{2}+|{Y_{\tau\mathrm{e}}}|^{2}}<2.26 \times 10^{-3}$ at 95% confidence level. The limits on the lepton flavour violating branching fractions of the Higgs boson and on the associated Yukawa couplings are the most stringent to date.


Introduction
The discovery of the Higgs boson (H) at the CERN LHC [1][2][3] has stimulated further precision measurements of the properties of the new particle.A combined study of the 7 and 8 TeV data sets collected by the CMS and ATLAS collaborations shows consistency between the measured couplings of the Higgs boson and the standard model (SM) predictions [4].However, the constraint on the branching fraction to non-SM decay modes derived from these measurements, B(non-SM) < 34% at 95% confidence level (CL), still allows for a significant contribution from exotic decays [4].
The CMS experiment published the first direct search for H → µτ [44], followed by searches for H → eτ and H → eµ decays [45], using proton-proton (pp) collision data corresponding to an integrated luminosity of 19.7 fb −1 at a centre-of-mass energy of 8 TeV.A small excess of data with respect to the SM background-only hypothesis at m H = 125 GeV was observed in the H → µτ channel, with a significance of 2.4 standard deviations (σ), and the best fit for the branching fraction was found to be B(H → µτ) = (0.84 +0.39  −0.37 )%.A constraint was set on the observed (expected) branching fraction B(H → µτ) < 1.51% (0.75%) at 95% CL.No excess of events over the estimated background was observed in the H → eτ or H → eµ channels, and observed (expected) upper limits on the branching fractions B(H → eτ) < 0.69% (0.75%) and B(H → eµ) < 0.035% (0.048%) at 95% CL were set.The ATLAS Collaboration reported searches for H → eτ and H → µτ using pp collision data at a centre-of-mass energy of 8 TeV, finding no significant excess of events over the background expectation, and set observed (expected) limits of B(H → µτ) < 1.43% (1.01%) and B(H → eτ) < 1.04% (1.21%) at 95% CL [46,47].
The search described in this paper is performed in four decay channels, H → µτ h , H → µτ e , H → eτ h , H → eτ µ , where τ h , τ e , and τ µ correspond to the hadronic, electronic, and muonic decay channels of τ leptons, respectively.The decay channels H → eτ e and H → µτ µ , are not considered because of the large background contribution from Z boson decays.The expected final state signatures are very similar to those for the SM H → ττ decays, studied by CMS [48][49][50] and ATLAS [51], but with some significant kinematic differences.The electron (muon) in the LFV H → e(µ)τ decay is produced promptly, and tends to have a larger momentum than in the SM H → τ e(µ) τ h decay.The search reported in this paper improves upon the sensitivity of the earlier CMS searches [44,45] by using a boosted decision trees (BDT) discriminator to distinguish signal from background events.A separate analysis, similar in strategy to the previous CMS publications, is performed as cross check.The results of both strategies are reported in this paper.This paper is organized as follows.After a description of the CMS detector (Section 2) and of the collision data and simulated samples used in the analyses (Section 3), the event reconstruc-tion is described in Section 4. The event selection is described separately for the two Higgs boson decay modes H → eτ and H → µτ in Section 5.The backgrounds, which are common to all channels but with different rates in each, are described in Section 6.The systematic uncertainties are described in Section 7 and the results are then presented in Section 8.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections.Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors.Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.The two-level CMS trigger system selects events of interest for permanent storage [52].The first trigger level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs.The software algorithms of the high-level trigger, executed on a farm of commercial processors, reduce the event rate to about 1 kHz using information from all detector subsystems.A detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [53].

Collision data and simulated events
The analyses presented here use samples of pp collisions collected in 2016 by the CMS experiment at the LHC at a centre-of-mass energy of √ s = 13 TeV, corresponding to an integrated luminosity of 35.9 fb −1 .Isolated single muon triggers are used to collect the data samples in the H → µτ search.Triggers requiring a single isolated electron, or a combination of an electron and a muon, are used in the H → eτ h and H → eτ µ channels, respectively.Simulated samples of signal and background events are produced with several event generators.The Higgs bosons are produced in pp collisions predominantly by gluon fusion (ggH) [54], but also by vector boson fusion (VBF) [55], and in association with a W or Z boson [56].The ggH and VBF Higgs boson samples are generated with POWHEG 2.0 [57][58][59][60][61][62] while the MINLO HVJ [63] extension of POWHEG 2.0 is used for the WH and ZH simulated samples.The MG5 aMC@NLO [64] generator is used for Z + jets and W + jets processes.They are simulated at leading order (LO) with the MLM jet matching and merging [65].Diboson production is simulated at next-to-LO (NLO) using MG5 aMC@NLO generator with the FxFx jet matching and merging [66], whereas POWHEG 2.0 and 1.0 are used for tt and single top quark production, respectively.The POWHEG and MADGRAPH generators are interfaced with PYTHIA 8.212 [67] for parton showering, fragmentation, and decays.The PYTHIA parameters for the underlying event description are set to the CUETP8M1 tune [68].Due to the high instantaneous luminosities attained during data taking, many events have multiple pp interactions per bunch crossing (pileup).The effect is taken into account in simulated samples, by generating concurrent minimum bias events.All simulated samples are weighted to match the pileup distribution observed in data, that has an average of approximately 27 interactions per bunch crossing.The CMS detector response is modelled using GEANT4 [69].

Event reconstruction
The global event reconstruction is performed using a particle-flow (PF) algorithm, which reconstructs and identifies each individual particle with an optimized combination of all subdetector information [70].In this process, the identification of the particle type (photon, electron, muon, charged or neutral hadron) plays an important role in the determination of the particle direction and energy.The primary pp vertex of the event is identified as the reconstructed vertex with the largest value of summed physics-object p 2 T , where p T is the transverse momentum.The physics objects are returned by a jet finding algorithm [71,72] applied to all charged tracks associated with the vertex, plus the corresponding associated missing transverse momentum.
A muon is identified as a track in the silicon detectors, consistent with the primary pp vertex and with either a track or several hits in the muon system, associated with an energy deposit in the calorimeters compatible with the expectations for a muon [70,73].Identification is based on the number of spacial points measured in the tracker and in the muon system, the track quality and its consistency with the event vertex location.The energy is obtained from the corresponding track momentum.
An electron is identified as a charged particle track from the primary pp vertex in combination with one or more ECAL energy clusters.These clusters correspond to the track extrapolation to the ECAL and to possible bremsstrahlung photons emitted when interacting with the material of the tracker [74].Electron candidates are accepted in the range |η| < 2.5, with the exception of the region 1.44 < |η| < 1.57where service infrastructure for the detector is located.They are identified using a multivariate (MVA) discriminator that combines observables sensitive to the amount of bremsstrahlung along the electron trajectory, the geometrical and momentum matching between the electron trajectory and associated clusters as well as various shower shape observables in the calorimeters.Electrons from photon conversions are removed.The energy of electrons is determined from a combination of the track momentum at the primary vertex, the corresponding ECAL cluster energy, and the energy sum of all bremsstrahlung photons attached to the track.
Hadronically decaying τ leptons are reconstructed and identified using the hadrons-plus-strips (HPS) algorithm [75,76].The reconstruction starts from a jet and searches for the products of the main τ lepton decay modes: one charged hadron and up to two neutral pions, or three charged hadrons.To improve the reconstruction efficiency in the case of conversion of the photons from neutral-pion decay, the algorithm considers the PF photons and electrons from a strip along the azimuthal direction φ.The charges of all the PF objects from tau lepton decay, except for the electrons from neutral pions, are summed to reconstruct the tau lepton charge.An MVA discriminator, based on the information of the reconstructed tau lepton and of the charged particles in a cone around it, is used to reduce the rate for quark-and gluon-initiated jets identified as τ candidates.The working point used in the analysis has an efficiency of about 60% for a genuine τ h , with approximately a 0.5% misidentification rate for quark and gluon jets [76].Additionally, muons and electrons misidentified as tau leptons are rejected using a dedicated set of selection criteria based on the consistency between the measurements in the tracker, calorimeters, and muon detectors.The specific identification criteria depend on the final state studied and on the background composition.The tau leptons that decay to muons and electrons are reconstructed as prompt muons and electrons as described above.
Charged hadrons are identified as charged particle tracks from the primary pp vertex neither reconstructed as electrons nor as muons nor as τ leptons.Neutral hadrons are identified as HCAL energy clusters not assigned to any charged hadron, or as ECAL and HCAL energy excesses with respect to the expected charged-hadron energy deposit.All the PF candidates are clustered into hadronic jets using the infrared and collinear safe anti-k T algorithm [71], implemented in the FASTJET package [77], with a distance parameter of 0.4.The jet momentum is determined as the vector sum of all particle momenta in this jet, and is found in the simulation to be on average within 10% of the true momentum over the whole p T spectrum and detector acceptance.An offset correction is applied to jet energies to take into account the contribution from pileup [78].Jet energy corrections are derived from the simulation, and are confirmed with in situ measurements of the energy balance of dijet, multijet, photon + jet, and Z + jet events [79].The variable ∆R = √ (∆η) 2 + (∆φ) 2 is used to measure the separation between reconstructed objects in the detector.Any jet within ∆R = 0.4 of the identified leptons is removed.
Jets misidentified as electrons, muons, or tau leptons are suppressed by imposing isolation requirements.The muon (electron) isolation is measured relative to its p T ( = e, µ), by summing over the p T of PF particles in a cone with ∆R = 0.4 (0.3) around the lepton: where p charged T , p neutral T , and p γ T indicate the p T of a charged particle, a neutral particle, and a photon within the cone, respectively.The neutral contribution to isolation from pileup, p PU T ( ), is estimated from the area of the jet and the average energy density of the event [80,81] for the electron or from the sum of transverse momenta of charged hadrons not originating from the primary vertex scaled by a factor of 0.5 for the muons.The charged contribution to isolation from pileup is rejected requiring the tracks to originate from the primary vertex.
All the reconstructed particles in the event are used to estimate the missing transverse momentum, p miss T , which is defined as the negative of the vector p T sum of all identified PF objects in the event [82].Its magnitude is referred to as p miss T .The transverse mass M T ( ) is a variable formed from the lepton momentum and the missing transverse momentum vectors: ), where ∆φ −p miss T is the angle in the transverse plane between the lepton and the missing transverse momentum.It is used to discriminate the Higgs boson signal candidates from the W + jets background.The collinear mass, M col , provides an estimate of m H using the observed decay products of the Higgs boson candidate.It is reconstructed using the collinear approximation based on the observation that, since m H m τ , the τ lepton decay products are highly Lorentz boosted in the direction of the τ candidate [83].The neutrino momenta can be approximated to have the same direction as the other visible decay products of the τ ( τ vis ) and the component of the p miss T in the direction of the visible τ lepton decay products is used to estimate the transverse component of the neutrino momentum (p ν, est T ).The collinear mass can then be derived from the visible mass of the τ-µ or τ-e system (M vis ) as M col = M vis /

√
x vis τ , where x vis τ is the fraction of energy carried by the visible decay products of the τ (x vis τ = p τ vis T /(p τ vis T + p ν, est T )), and M vis is the invariant mass of the visible decay products.

Event selection
The signal contains a prompt isolated lepton, µ or e, along with an oppositely charged isolated lepton of different flavour (τ µ , τ e or τ h ).In each decay mode a loose selection of this signature is defined first.The events are then divided into categories within each sample according to the number of jets in the event.This is designed to enhance the contribution of different Higgs boson production mechanisms.The jets are required to have p T > 30 GeV and |η| < 4.7.
The 0-jet category enhances the ggH contribution, while the 1-jet category enhances ggH production with initial-state radiation.The 2-jet ggH category has a further requirement that the invariant mass of the two jets M jj < 550 GeV while the 2-jet VBF category with the requirement M jj ≥ 550 GeV enhances the VBF contribution.The threshold on M jj has been optimized to give the best expected exclusion limits.The definition of the categories is the same in all the channels except in the H → eτ channels where the M jj threshold is 500 GeV, which optimizes the expected limits for this channel.
After the loose selection, a binned likelihood is used to fit the distribution of a BDT discriminator for the signal and the background contributions.This is referred to as the BDT fit analysis.As a cross-check an analysis using a tighter set of selection criteria is also presented.In this case, selection requirements are placed on the kinematic variables and a fit is performed to the M col distribution.This is referred to as the M col fit analysis.Requirements on additional kinematic variables such as M T ( ) are chosen to obtain the most stringent expected limits.The lepton p T has been excluded from this optimization to avoid biasing the selection toward energetic leptons that sculpt the background M col distribution to mimic the signal peak.This effect would reduce the shape discrimination power of the signal extraction procedure.

H → µτ h
The loose selection begins by requiring an isolated µ and an isolated τ h of opposite charge and separated by ∆R > 0.3.The muon candidate is required to have p µ T > 26 GeV, |η µ | < 2.4 and I µ rel < 0.15.The hadronic tau candidate is required to have p τ h T > 30 GeV and |η τ h | < 2.3.The isolation requirement for the τ h candidates is included in the MVA used for the HPS identification algorithm described in Section 4. Events with additional e, µ or τ h candidates are vetoed.Events with at least one jet identified by the combined secondary vertex b-tagging algorithm [84] as arising from a b quark, are also vetoed in order to suppress the tt background.The tighter selection used for the M col fit analysis further requires M T (τ h ) < 105 GeV in the 0-, 1-and 2-jet ggH categories, and M T (τ h ) < 85 GeV in the 2-jet VBF category.The selections are summarized in Table 1.
A BDT is trained after the loose selection combining all categories.The signal training sample used is a mixture of simulated ggH and VBF events, weighted according to their respective SM production cross sections.The background training sample is a set of collision events with misidentified leptons, as this is the dominant background in this channel.The leptons are required to satisfy the same kinematic selection of the signal sample, be like-sign and not isolated in order to select an orthogonal data set to the signal sample, and yet have the same kinematic properties.The input variables to the BDT are: ).The neutrino in the τ lepton decay leads to the presence of significant missing momentum motivating the inclusion of the p miss T variables.The neutrino is also approximately collinear with the visible τ decay products while the two leptons tend to be azimuthally opposite leading to the inclusion of the ∆φ variables.The BDT input variables are shown for signal and background in Fig. 1.

H → µτ e
The loose selection begins by requiring an isolated µ and an isolated e of opposite charge and separated by ∆R > 0.
A BDT is trained after the loose selection, combining all categories.The background is a mixed sample of tt and Z → ( = e, µ, τ) events weighted by their production cross-sections.The tt background is the dominant background in this channel for the 2-jet category and also very significant in the 1-jet category.It has many kinematic characteristics in common with the other backgrounds, such as diboson and single top.The Z → background is the dominant background in 0-and 1-jet category.The input variables to the BDT are: <0.15 [radians] ->2.5 >1.0 --

H → eτ h
The loose selection begins by requiring an isolated e and an isolated τ h candidate of opposite charge, separated by ∆R > 0. No veto is made on the number of b-tagged jets as the tt contribution is small.The additional selection used for the M col fit analysis further requires that M T (τ h ) < 60 GeV.The selections are summarized in Table 2.A BDT is trained after the loose selection.The same training samples as for the H → µτ h channel are used, except with an electron rather than a muon.The input variables to the BDT are also the same except for the addition of the visible mass, M vis , and the removal of p miss T .The relative composition of the backgrounds in the H → eτ h channel is different from the H → µτ h channel, in particular the Z → ee + jets background is larger in comparison to the Z → µµ + jets, which leads to this change of variables.

H → eτ µ
The loose selection begins by requiring an isolated e and an isolated µ candidate with opposite charge, separated by ∆R > 0.
on the axis ζ bisecting the directions of the electron, p T e , and of the muon, p T µ .This selection criterion is highly efficient in rejecting background as the p miss T is oriented in the direction of the visible τ decay products in signal events.The selection criteria are summarized in Table 2.
A BDT is trained after the loose selection.It uses the same input variables as for the H → µτ e channel with the addition of the visible mass, M vis , and the removal of M T (e).The background used for the training is a sample of simulated tt events.

Background estimation
The main background processes are Z → ττ, in which the µ or e arises from a τ decay, and W+jets and QCD multijet production where one or more of the jets are misidentified as leptons.Other backgrounds come from processes in which the lepton pair is produced from the weak decays of quarks and vector bosons.These include tt pairs, Higgs boson production (H → ττ, WW), WW, WZ, and ZZ.There are also smaller contributions from Wγ ( * ) + jets processes, single top quark production, and Z → ( = e, µ).All the backgrounds are estimated from simulated samples with the exception of the misidentified-lepton backgrounds that are estimated from data with either fully data-driven or semi data-driven methods.These techniques are described in detail below.The background estimate is validated with control regions designed to have enhanced contributions from the dominant backgrounds.
The Z → background is estimated from simulation.A reweighting is applied to correct the generator-level Z p T and m distributions in LO MG5 aMC@NLO samples to reduce the shape discrepancy between collision data and simulation.The reweighting factors are extracted from a Z → µµ control region and are applied to both Z → µµ and Z → ee simulated samples in bins of Z p T and m .Additional corrections for µ → τ h and e → τ h misidentification rates are applied when the reconstructed τ h candidate is matched to a muon or an electron, respectively, at the generator level.These corrections are measured in Z → events and depend on the lepton η.The tt + jets background is particularly important in the eµ final state.A correction based on the generated p T of the top quark and antiquark is applied to the simulation to match the p T distribution observed in a tt sample from collision data.The background estimation for this contribution is validated in a tt enriched control sample.It is defined by requiring the loose selection for these channels but with the additional requirement that at least one of the jets is b-tagged.Figure 3 (left) shows the data compared to the background estimation for this control sample in the H → µτ e channel.The same samples are used in the H → eτ µ channel and show similar agreement.
The Higgs boson production contributes a small but non-negligible background.It arises predominantly from H → ττ but also from H → WW decays and peaks at lower values of M col than the signal, because of additional neutrinos in the decays.The event selection described in Section 5 uses a BDT discriminator that combines M col with a set of other kinematic variables.The Higgs boson background also peaks below the signal in the distribution of the BDT discriminator output.
Jets misidentified as leptons are a source of background arising from two sources, W + jets and QCD multijet events.In W + jets background events, one lepton candidate is a real lepton from the W boson decay and the other is a jet misidentified as a lepton.In QCD multijet events, both lepton candidates are misidentified jets.In each of the four channels for this analysis (µτ h , eτ h , µτ e , eτ µ ), the misidentified-lepton background has been estimated using purely datadriven methods.In the µτ e and eτ µ channels it is also estimated using a technique, called semi data-driven, partially based on control samples in data and partially on simulation.It has been used previously in the SM H → ττ analysis [50].The misidentified W + jets background is estimated from simulation and the QCD background with data.The two techniques give consistent results; the semi data-driven technique is chosen for the leptonically decaying tau channels as the fully data-driven technique is limited by the reduced size of the sample.

Fully data-driven technique
The misidentified lepton background is estimated from collision data samples.The misidentification rates are evaluated with independent Z + jets data sets and then applied to a control region, orthogonal to the signal region, to estimate the misidentified background in the signal region.This control region is obtained by relaxing the signal selection requirements, typically isolation, and excluding events passing the final selection.The probabilities with which jets are misidentified as e ( f e ), µ ( f µ ), or τ h ( f τ ), are estimated using events with a Z boson candidate plus one jet that can be misidentified as a lepton.The Z boson candidate is formed from two muons with p T > 26 GeV, |η| < 2.4, and I rel < 0.15 (0.25) for the jet → τ h , µ (jet → e) misidentification rate.The muons are required to have opposite charge and their invariant mass (M µµ ) must satisfy 70 < M µµ < 110 GeV.The contribution from diboson events, where the third lepton candidate corresponds to a genuine lepton, is subtracted using simulation.Two Z + jets samples are defined: the signal-like one, in which the jet satisfies the same lepton selection criteria used in the H → eτ or H → µτ selections, and the background-enriched Z + jets sample with relaxed lepton identification on the jet but excluding events selected in the signal-like sample.The requirements for the third lepton candidate vary depending on the lepton flavour.The two samples are used to estimate f e , f µ and f τ which are obtained as where N i (Z + jets signal-like) is the number of events with a third lepton candidate that passes the signal-like sample selection, N i (Z + jets background-enriched) is the number of events in the background-enriched sample and i = e, µ or τ.The lepton selection criteria for the signal are given in Table 1 and 2. The background-enriched lepton selection used to estimate the misidentified µ and e contribution requires an isolation of 0.15 < I rel < 0.25 and 0.1 < I rel < 0.5, respectively.In both cases the misidentification rate is computed as a function of the lepton p T .The lepton selection for the τ h background-enriched sample requires that the tau candidates are identified using a loose HPS working point but are not identified by the tight working point used for the signal selection.The loose and the tight working points have an efficiency of 75% and 60% for genuine τ h candidates, respectively.The misidentification rates show a p T dependence that varies with the τ decay mode and |η|.The misidentification rates are thus obtained as a function of p T for the different decay modes and |η| regions (|η| < 1.5 or |η| > 1.5).
The final misidentified lepton background in the signal region for the two analyses (BDT and M col fit) is obtained from background-enriched signal-like samples (LFV background-enriched, type i), where the lepton i (i = e, µ or τ) passes the identification and isolation criteria used for the Z + jets background-enriched sample but not those defining the Z + jets signal-like sample, but otherwise uses the same selection as the signal.To estimate the misidentified lepton background in the signal sample, each event in this LFV background-enriched sample of type i is weighted by a factor f i /(1 − f i ), depending on the lepton p T for electrons and muons or on p T , η, and decay mode for the τ lepton candidates.Both background yield and shape distributions are thus estimated.Double-counted events with two misidentified leptons are subtracted.For example, events with a misidentified µ (e) and a misidentified τ h are subtracted in the H → µτ h (H → eτ h ) channel using a weight (where = µ or e) applied to the events of a LFV background-enriched sample defined requiring both leptons to pass the identification and isolation criteria used for the Z + jets background-enriched sample but not those defining the Z + jets signal-like sample.
The background estimation is validated in a like-sign sample applying the misidentification rate f i to events selected inverting the charge requirement of the lepton pair in both the background-enriched and the signal-like samples.It is performed after the loose selection described in Section 5. Figure 3 (central) shows the data compared to the background estimation in the like-sign control region for the H → µτ h channel.The like-sign selection enhances the misidentified lepton background and there is good agreement in the control sample.The background estimation can also be validated in a W boson enriched control sample.This data sample is obtained by applying the signal sample requirements and M T ( ) > 60 GeV ( = e or µ) and M T (τ h ) > 80 GeV. Figure 3 (right) shows the data compared to the background estimation in the W enriched sample for the H → µτ h channel.The same samples are used in the H → eτ h channel with similar agreement.

Semi data-driven technique
The W + jets background contribution to the misidentified-lepton background is estimated with simulated samples.The QCD multijet contribution is estimated with like-sign collision data events that pass the signal requirement.The expected yield from non-QCD processes is subtracted using simulation.The resulting sample is then rescaled to account for the differences between the composition in the like-and opposite-sign samples.The scaling factors are extracted from QCD multijet enriched control samples, composed of events with the lepton candidates satisfying inverted isolation requirements as illustrated in Ref. [50].This technique is chosen for the leptonically decaying tau channels as the size of the samples allows a more precise background description.

Systematic uncertainties
The systematic uncertainties affect the normalization and the shape of the distributions of the different processes, and arise from either experimental or theoretical sources.They are summarized in Table 3.The uncertainties in the lepton (e, µ, τ h ) selection including the trigger, identification, and isolation efficiencies are estimated using tag-and-probe measurements in collision data sets of Z bosons decaying to ee, µµ, τ µ τ h [73][74][75][76]86].The b tagging efficiency in the simulation is adjusted to match the efficiency measured in data.The uncertainty in this measurement is taken as the systematic uncertainty.The uncertainties on the Z → ee, Z → µµ, Z → ττ, WW, ZZ, Wγ, tt, and single top production background contributions arise predominantly from the uncertainties in the measured cross sections of these processes.The uncertainties in the estimate of the misidentified-lepton backgrounds (µ → τ h , e → τ h , jet → τ h , µ, e) are extracted from the validation tests in control samples, described in Section 6.
Shape and normalization uncertainties arising from the uncertainty in the jet energy scale are computed by propagating the effect of altering each source of jet energy scale uncertainty by one standard deviation to the fit templates of each process.This takes into account differences in yield and shape.The uncertainties on the e, µ, τ h energy scale are propagated to the M col and BDT distributions.For τ h , the energy scale uncertainty is treated independently for each reconstructed hadronic decay mode of the τ lepton.The systematic uncertainties in the energy resolutions of lepton candidates have negligible effect.The energy scale of muons (electrons) misidentified as hadronically decaying tau candidates (µ, e → τ h energy scale) is considered independently from true hadronic tau leptons.There is also an uncertainty in the unclustered energy scale.The unclustered energy comes from jets having p T < 10 GeV and PF candidates Renorm./fact.scales (ggH) [85] 3.9% Renorm./fact.scales (VBF and VH) [85] 0.4% PDF + α s (ggH) [85] 3.2% PDF + α s (VBF and VH) [85] 2.1% Renorm./fact.acceptance (ggH) −3.0% -+2.0%Renorm./fact.acceptance (VBF and VH) −0.3% -+1.0%PDF + α s acceptance (ggH) −1.5% -+0.5% PDF + α s acceptance (VBF and VH) −1.5% -+1.0% Integrated luminosity 2.5% not within jets.It is propagated to p miss T .The unclustered energy scale is considered independently for charged particles, photons, neutral hadrons, and very forward particles which are not contained in jets.The effect of varying the energy of each particle by its uncertainty leads to changes in both shape of the distribution and yield.The four different systematic uncertainties are uncorrelated.
The uncertainties in the Higgs boson production cross sections due to the factorization and the renormalization scales, as well as the parton distribution functions (PDF) and the strong coupling constant (α s ), result in changes in normalization and they are taken from Ref. [85].They also affect the acceptance and lead to the migration of events between the categories.They are listed as acceptance uncertainties in Table 3 and depend on the production process, Higgs boson decay channel, and category.For the ggH production this variation on the acceptance varies from −3% (anticorrelated between the categories) to 2% (correlated) for the factorization and the renormalization scales, and from −1.5% to 0.5% for PDF and α s .For the VBF and associated production (VH) the ranges go from −0.3% to 1.0% for the factorization and the renormalization scales, and from −1.5% to 1.0% for PDF and α s .
The bin-by-bin uncertainties account for the statistical uncertainties in every bin of the template distributions of every process.They are uncorrelated between bins, processes, and categories.The uncertainty of 2.5% on the integrated luminosity [87] affects all processes with the normalization taken directly from simulation.Shape uncertainties related to the pileup have been considered by varying the weights applied to simulation.The weight variation is obtained by a 5% change of the total inelastic cross section used to estimate the number of pileup events in data.The new values are then used to compute the weights for the simulation samples and these are applied, event by event, to produce alternate collinear mass and BDT distributions used as shape uncertainties in the fit.Other minimum bias event modelling and simulation uncertainties are estimated to be much smaller than those on the rate and are therefore neglected.

Results
After applying the selection criteria, a maximum likelihood fit is performed to derive the expected and observed limits.Each systematic uncertainty is used as a nuisance parameter in the fit.The fits are performed simultaneously in all channels and categories.A profile likelihood ratio is used as test statistic.The upper limits on the signal branching fraction are calculated with the asymptotic formula, using the CL s criterion [88][89][90].
The BDT discriminator distributions of signal and background for each category are shown in Fig. 4 and 7 in the H → µτ and H → eτ channels respectively.Figures 5 and 8 show the corresponding M col distributions used as cross-check.All the distributions are shown after they have been adjusted by the fit.No excess over the background expectation is observed.The observed and median expected 95% CL upper limits, and best fit branching fractions, for B(H → µτ) and B(H → eτ), assuming m H =125 GeV, are given for each category in Tables 4-7.The limits are also summarized graphically in Figs. 6 and 9.
No evidence is found for either the H → µτ or H → eτ processes in this search.The observed exclusion limits are a significant improvement over the 8 TeV results.The new results exclude the branching fraction that corresponded to the best fit for the 2.4 σ excess observed in the 8 TeV H → µτ channel results at 95% CL, in both the M col fit and BDT fit analysis.Table 8 shows a summary of the new 95% CL upper limits.The BDT fit analysis is more sensitive than the M col fit analysis, with expected limits reduced by about a factor of two.In both cases the results are Obs./exp.The background is normalized to the best fit values from the signal plus background fit while the simulated signal corresponds to B(H → µτ) = 5%.The bottom panel in each plot shows the fractional difference between the observed data and the fitted background.The left column of plots corresponds to the H → µτ h categories, from 0-jets (first row) to 2-jets VBF (fourth row).The right one to their H → µτ e counterparts.

Events/bin
Obs./exp.Obs./exp.Table 8: Summary of the observed and expected upper limits at the 95% CL and the best fit branching fractions in percent for the H → µτ and H → eτ processes, for the main analysis (BDT fit) and the cross check (M col fit) method.

Summary
The search for lepton flavour violating decays of the Higgs boson in the µτ and eτ channels, with the 2016 data collected by the CMS detector, is presented in this paper.The data set analysed corresponds to an integrated luminosity of 35.9 fb −1 of proton-proton collision data recorded at √ s = 13 TeV.The results are extracted by a fit to the output of a boosted decision trees discriminator trained to distinguish the signal from backgrounds.The results are crosschecked with an alternate analysis that fits the collinear mass distribution after applying selection criteria on kinematic variables.No evidence is found for lepton flavour violating Higgs boson decays.The observed (expected) limits on the branching fraction of the Higgs boson to µτ and to eτ are less than 0.25% (0.25%) and 0.61% (0.37%), respectively, at 95% confidence level.These limits constitute a significant improvement over the previously obtained limits by CMS and ATLAS using 8 TeV proton-proton collision data corresponding to an integrated luminosity of about 20 fb −1 .Upper limits on the off-diagonal µτ and eτ Yukawa couplings are derived from these constraints,  The expected (red dashed line) and observed (black solid line) limits are derived from the limit on B(H → µτ) and B(H → eτ) from the present analysis.The flavour-diagonal Yukawa couplings are approximated by their SM values.The green (yellow) band indicates the range that is expected to contain 68% (95%) of all observed limit excursions from the expected limit.The shaded regions are derived constraints from null searches for τ → 3µ or τ → 3e (dark green) [41,92,93] and τ → µγ or τ → eγ (lighter green) [41,93].The green hashed region is derived by the CMS direct search presented in this paper.The blue solid lines are the CMS limits from [44] (left) and [45](right).The purple diagonal line is the theoretical naturalness limit |Y ij Y ji | ≤ m i m j /v 2 [41].

Figure 1 :
Figure1: Distributions of the input variables to the BDT for the H → µτ h channel.The background from SM Higgs boson production is small and not visible in the plots.

Figure 2 :
Figure 2: Distributions of the input variables to the BDT for the H → µτ e channel.
4. The e candidate is required to have p e T > 24 GeV, |η e | < 2.1, and I e rel < 0.1.The µ candidate is required to have p µ T > 10 GeV, |η µ | < 2.4, and I µ rel < 0.15.Events with additional e, µ or τ h candidates, or with at least one b-tagged jet are vetoed.The tighter selection used in the M col fit analysis further requires ∆φ(e, p miss T ) < 1.0 and M T (e) > 60 GeV.The large tt background is further reduced by requiring p ζ − 0.85 p vis ζ > −60 GeV.This topological selection is based on the projections

Figure 3 :
Figure 3: M col distribution in tt enriched (left), like-sign lepton (central), and W + jets enriched (right) control samples defined in the text.The distributions include both statistical and systematic uncertainties.

Figure 4 :
Figure4: Distribution of the BDT discriminator for the H → µτ process in the BDT fit analysis, in the individual channels and categories compared to the signal and background estimation.The background is normalized to the best fit values from the signal plus background fit while the simulated signal corresponds to B(H → µτ) = 5%.The bottom panel in each plot shows the fractional difference between the observed data and the fitted background.The left column of plots corresponds to the H → µτ h categories, from 0-jets (first row) to 2-jets VBF (fourth row).The right one to their H → µτ e counterparts.

Figure 5 :
Figure5: Distribution of the collinear mass M col for the H → µτ process in M col fit analysis, in different channels and categories compared to the signal and background estimation.The background is normalized to the best fit values from the signal plus background fit while the overlaid simulated signal corresponds to B(H → µτ) = 5%.The bottom panel in each plot shows the ratio between the observed data and the fitted background.The left column of plots corresponds to the H → µτ h categories, from 0-jets (first row) to 2-jets VBF (fourth row).The right one to their H → µτ e counterparts.

Figure 6 :
Figure 6: Observed and expected 95% CL upper limits on the B(H → µτ) for each individual category and combined.Left: BDT fit analysis.Right: M col fit analysis.

Figure 7 :
Figure7: Distribution of the BDT discriminator for the H → eτ process for the BDT fit analysis, in different channels and categories compared to the signal and background estimation.The background is normalized to the best fit values from the signal plus background fit while the simulated signal corresponds to B(H → eτ) = 5%.The bottom panel in each plot shows the ratio between the observed data and the fitted background.The left column of plots corresponds to the H → eτ h categories, from 0-jets (first row) to 2-jets VBF (fourth row).The right one to their H → eτ µ counterparts.

Figure 8 :
Figure8: Distribution of the collinear mass M col for the H → eτ process in the M col fit analysis, in different channels and categories compared to the signal and background estimation.The background is normalized to the best fit values from the signal plus background fit while the simulated signal corresponds to B(H → eτ) = 5%.The lower panel in each plot shows the ratio between the observed data and the fitted background.The left column of plots correspond to the H → eτ h categories, from 0-jets (first row) to 2 jets VBF (fourth row).The right one to their H → eτ µ counterparts.

Figure 9 :
Figure 9: Observed and expected 95% CL upper limits on the B(H → eτ) for each individual category and combined.Left: BDT fit analysis.Right: M col fit analysis.

Figure 10 :
Figure 10: Constraints on the flavour violating Yukawa couplings, |Y µτ |, |Y τµ | (left) and |Y eτ |, |Y τe | (right), from the BDT result.The expected (red dashed line) and observed (black solid line) limits are derived from the limit on B(H → µτ) and B(H → eτ) from the present analysis.The flavour-diagonal Yukawa couplings are approximated by their SM values.The green (yellow) band indicates the range that is expected to contain 68% (95%) of all observed limit excursions from the expected limit.The shaded regions are derived constraints from null searches for τ → 3µ or τ → 3e (dark green)[41,92,93] and τ → µγ or τ → eγ (lighter green)[41,93].The green hashed region is derived by the CMS direct search presented in this paper.The blue solid lines are the CMS limits from[44] (left) and[45](right).The purple diagonal line is the theoretical naturalness limit |Y ij Y ji | ≤ m i m j /v 2[41].

Federal
Science Policy Office; the Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture (FRIA-Belgium); the Agentschap voor Innovatie door Wetenschap en Technologie (IWT-Belgium); the Ministry of Education, Youth and Sports (MEYS) of the Czech Republic; the Council of Science and Industrial Research, India; the HOMING PLUS programme of the Foundation for Polish Science, cofinanced from European Union, Regional Development Fund, the Mobility Plus programme of the Ministry of Science and Higher Education, the National Science Center (Poland), contracts Harmonia 2014/14/M/ST2/00428, Opus 2014/13/B/ST2/02543, 2014/15/B/ST2/03998, and 2015/19/B/ST2/02861, Sonata-bis 2012/07/E/ST2/01406; the National Priorities Research Program by Qatar National Research Fund; the Programa Severo Ochoa del Principado de Asturias; the Thalis and Aristeia programmes cofinanced by EU-ESF and the Greek NSRF; the Rachadapisek Sompot Fund for Postdoctoral Fellowship, Chulalongkorn University and the Chulalongkorn Academic into Its 2nd Century Project Advancement Project (Thailand); the Welch Foundation, contract C-1845; and the Weston Havens Foundation (USA).

Table 1 :
Event selection criteria for the kinematic variables for the H → µτ channels.
5. The e candidate is required to have p e T > 26 GeV, |η e | < 2.1, and I e rel < 0.1.The τ h candidate is required to have p τ h T > 30 GeV and |η τ h | < 2.3.Events with additional e, µ or τ h candidates are vetoed.

Table 2 :
Event selection criteria for the kinematic variables for the H → eτ channels.

Table 3 :
[85]ematic uncertainties in the expected event yields.All uncertainties are treated as correlated between the categories, except those that have two values separated by the ⊕ sign.In this case, the first value is the correlated uncertainty and the second value is the uncorrelated uncertainty for each individual category.Theoretical uncertainties on VBF Higgs boson production[85]are also applied to VH production.Uncertainties on acceptance lead to migration of events between the categories, and can be correlated or anticorrelated between categories.Ranges of uncertainties for the Higgs boson production indicate the variation in size, from negative (anticorrelated) to positive (correlated).

Table 6 :
Expected and observed upper limits at 95% CL and best fit branching fractions in percent for each individual jet category, and combined, in the H → eτ process obtained with the BDT fit analysis.