Flavor Violating Higgs Decays

We study a class of nonstandard interactions of the newly discovered 125 GeV Higgs-like resonance that are especially interesting probes of new physics: flavor violating Higgs couplings to leptons and quarks. These interaction can arise in many frameworks of new physics at the electroweak scale such as two Higgs doublet models, extra dimensions, or models of compositeness. We rederive constraints on flavor violating Higgs couplings using data on rare decays, electric and magnetic dipole moments, and meson oscillations. We confirm that flavor violating Higgs boson decays to leptons can be sizeable with, e.g., h ->tau mu and h ->tau e branching ratios of order 10% perfectly allowed by low energy constraints. We estimate the current LHC limits on h ->tau mu and h ->tau e decays by recasting existing searches for the SM Higgs in the tau-tau channel and find that these bounds are already stronger than those from rare tau decays. We also show that these limits can be improved significantly with dedicated searches and we outline a possible search strategy. Flavor violating Higgs decays therefore present an opportunity for discovery of new physics which in some cases may be easier to access experimentally than flavor conserving deviations from the Standard Model Higgs framework.


I. INTRODUCTION
Both ATLAS and CMS have recently announced the discovery of a Higgs-like resonance with a mass of m h 125 GeV [1][2][3][4], further supported by combined Tevatron data [5].
An interesting question is whether the properties of this resonance are consistent with the long history . In this paper, we refine the indirect bounds on the FV couplings. Most importantly, we discuss in detail possible search strategies for FV Higgs decays at the LHC and derive for the first time limits from LHC data.
As pointed out in the previous literature, and confirmed by the present analysis, the indirect constraints on many FV Higgs decays are rather weak. In particular, the branching ratios for h → τ µ and h → τ e can reach up to 10% [24]. In fact, for h → τ µ and h → τ e already now 1 , without targeted searches, the LHC is placing limits that are comparable to or even stronger than those from rare τ decays. As we shall see later, re-casting a h → τ τ analysis with 4.7 fb −1 of 7 TeV ATLAS data [27] gives a bound on the branching fraction of the Higgs into τ µ or τ e around 10%. We will also demonstrate that dedicated searches can be much more sensitive.
These decays could thus give a striking signature of new physics at the LHC, and we strongly encourage our experimental colleagues to include them in their searches. Another experimentally interesting set of decay channels are flavor conserving decays to the first two generations, e.g., h → µ + µ − , on which we will comment further below. We emphasize that large deviations from the SM do not require very exotic flavor structures. A branching ratio for h → τ µ comparable to the one for h → τ τ , or a h → µ + µ − branching ratio a few times larger than in the SM can arise in many models of flavor (for instance in models with continuous and/or discrete flavor symmetries [28], or in Randall-Sundrum models [29]) as long as there is new physics at the electroweak scale and not just the SM. The lepton flavor violating decay h → τ µ has been studied in [11], and it was found that the branching ratio for this decay can be up to 10% in certain Two Higgs Doublet Models (2HDMs).
In fact, there may already be experimental hints that the Higgs couplings to fermions may not be SM-like. For instance, the BaBar collaboration recently announced a 3.4σ indication of flavor universality violation in b → cτ ν transitions [30], which can be explained for instance by an extended Higgs sector with nontrivial flavor structure [31].
The paper is organized as follows. In Sec. II we introduce the theoretical framework we will use to parameterize the flavor violating decays of the Higgs. In Sec. III we derive bounds on flavor violating Higgs couplings to leptons and translate these bounds into limits on the Higgs decay branching fractions to the various flavor violating final states. In Sec. IV we do the same for flavor violating couplings to quarks. We shall see that decays of the Higgs to τ µ and to τ e with sizeable branching fractions are allowed, and that also flavor violating couplings of the Higgs to top quarks are only weakly constrained. Motivated by this we turn to the LHC in Section V and estimate the current bounds on Higgs decays to τ µ and τ e using data from an existing h → τ τ search. We also discuss a strategy for a dedicated h → τ µ search and comment on differences with the SM h → τ τ searches. We will see that the LHC can make significant further progress in probing the Higgs' flavor violating parameters space with existing data. We conclude in Section VI. In the appendices, we give more details on the calculation of constraints from low-energy observables.

II. THE FRAMEWORK
After electroweak symmetry breaking (EWSB) the fermionic mass terms and the couplings of the Higgs boson to fermion pairs in the mass basis are in general where ellipses denote nonrenormalizable couplings involving more than one Higgs field operator. In our notation, f L = q L , L are SU (2) L doublets, f R = u R , d R , ν R , R the weak singlets, and indices run over generations and fermion flavors (quarks and leptons) with summation implicitly understood. In the SM the Higgs couplings are diagonal, Y ij = (m i /v)δ ij , but in general NP models the structure of the Y ij can be very different. Note that we use the normalization v = 246 GeV here. The goal of the paper is to set bounds on Y ij and identify interesting channels for Higgs decays at the LHC. Throughout we will assume that the Higgs is the only additional degree of freedom with mass O(100 GeV) and that the Y ij 's are the only source of flavor violation. These assumptions are not necessarily valid in general, but will be a good approximation in many important classes of new physics frameworks. Let us now show how Y ij = (m i /v)δ ij can arise in two qualitatively different categories of NP models.
a. A single Higgs theory. Let us first explore the possibility that the Higgs is the only field that causes EWSB (see also [10,15,19,23,[32][33][34]). For simplicity let us also assume that at energies below ∼ 200 GeV the spectrum consists solely of the SM particles: three generations of quarks and leptons, the SM gauge bosons and the Higgs at 125 GeV. Additional heavy fields (e.g. scalar or fermionic partners which address the hierarchy problem) can be integrated out, so that we can work in effective field theory (EFT)-the effective Standard Model. In addition to the SM Lagrangian there are then also higher dimensional terms due to the heavy degrees of freedom that were integrated out: Here we have written out explicitly only the terms that modify the Yukawa interactions.
We can truncate the expansion after the terms of dimension 6, since these already suffice to completely decouple the values of the fermion masses from the values of fermion couplings to the Higgs boson. Additional dimension 6 operators involving derivatives include where After electroweak symmetry breaking (EWSB) and diagonalization of the mass matrices, one obtains the Yukawa Lagrangian in Eq. (1), with where the unitary matrices V L , V R are those which diagonalize the mass matrix, and v = 246 GeV. In the mass basis we can write with similar conditions for the other off diagonal elements. Even though we will keep this condition in the back of our minds, we will not restrict the parameter space to fulfill it.
where the index a runs over all the scalars (with Y a ij imaginary for pseudoscalars), and m i receives contributions from both vevs. In addition there is also a scalar potential which mixes the two Higgses. Diagonalizing the Higgs mass matrix then also changes Y a ij , but removes the Higgs mixing. For our purposes it is simplest to work in the Higgs mass basis.
All the results for a single Higgs are then trivially modified, replacing our final expressions below by a sum over several Higgses. For a large mass gap, where only one Higgs is light, the contributions from the heavier Higgs are power suppressed, unless its flavor violating Yukawa couplings are parametrically larger than those of the light Higgs. The contributions from the heavy Higgs correspond to the higher dimensional operators discussed in the previous paragraph. This example can be trivially generalized to models with many Higgs doublets.
We next derive constraints on flavor violating Higgs couplings and work out the allowed branching fractions for flavor violation Higgs decays. In placing the bounds we will neglect the FV contributions of the remaining states in the full theory. Our bounds thus apply barring cancellations with these other terms.

III. LEPTONIC FLAVOR VIOLATING HIGGS DECAYS
The FV decays h → eµ, eτ, µτ arise at tree level from the assumed flavor violating Yukawa interactions, Eq. (1), where the relevant terms are explicitly 2 A spurion which transforms as a triplet can also contribute to Majorana masses for neutrinos. The bounds on the FV Yukawa couplings are collected in Table I, where for simplicity of presentation the flavor diagonal muon and tau Yukawa couplings, were set equal to their respective SM values Y µµ SM = m µ /v, Y τ τ SM = m τ /v. Similar bounds on FV Higgs couplings to quarks are collected in Table II. Similar constraints on flavor violating Higgs decays have been present recently also in [24]. While our results agree qualitatively with previous ones, small numerical differences are expected because we avoid some of the approximations made by previous authors. We also consider some constraining processes not discussed before.
We first give more details on how the bounds in Tables I and II were obtained and then move on to predictions for the allowed sizes of the FV Higgs decays.
A. Constraints from τ → µγ, τ → eγ and µ → eγ The effective Lagrangian for the τ → µγ decay is given by where the dim-5 electromagnetic penguin operators are with α, β the Lorentz indices and F αβ the electromagnetic field strength tensor. The Wilson coefficients c L and c R receive contributions from the two 1-loop diagrams shown in Fig. 1 (with the first one dominant), and a comparable contribution from Barr-Zee type 2-loop Channel Coupling Bound For the muon magnetic dipole moment we show the value of the couplings required to explain the observed ∆a µ (if this is used only as an upper bound one has Re(Y µτ Y τ µ ) < 0.065 at 95%CL).
diagrams, see Fig. 12 in Appendix A. The complete one loop and two loop expressions are given in Appendix A.
In the approximation Y µµ Y τ τ , only the first of the one-loop diagrams in Fig. 1 is relevant (in addition to the 2-loop diagrams). Using also m µ m τ m h and assuming Y µµ , Y τ τ to be real, the expressions for the one-loop Wilson coefficients c L and c R simplify to (this agrees with [24]) The 2-loop contributions are numerically where in the last step we used for the top Yukawa coupling In terms of the Wilson coefficients c L and c R , the rate for τ → µγ is Using a Higgs mass m h = 125 GeV and assuming Y τ τ = m τ /v, Y tt =m t /v, we can then translate the experimental bound BR(τ → µγ) < 4.4 × 10 −8 [37] into a constraint |Y τ µ | 2 + |Y µτ | 2 < 1.6 × 10 −2 (see Table I). The bound is relaxed if Y τ τ and/or Y tt are smaller than their SM values.
The decay µ → eγ can also be used to place a bound on the combination Y µτ Y τ e using the 1-loop Wilson coefficient (in agreement with [24]) The decay τ → 3µ can be generated through tree level Higgs exchange, see the diagram in Fig. 2 (left). However, the diagram is suppressed not only by the flavor violating Yukawa couplings Y τ µ and Y µτ , but also by the flavor-conserving coupling Y µµ . It is thus subleading compared to the higher order contributions: the 1-loop diagrams of the form shown in Fig. 1 and 2-loop diagrams like the ones shown in Fig. 12 (for τ → µγ). These generate τ → 3µ if the outgoing gauge boson is off-shell and "decays" to a muon pair. This general topology is shown in the right part of Fig. 2.
Integrating out the Higgs, the heavy gauge bosons and the top quark, these contributions match onto an effective Lagrangian. The full effective Lagrangian is similar to the one in Eq. (A13) for µ → e conversion, but with quarks replaced by muons. Since a full evaluation of the 2-loop contributions is beyond the scope of this work, we will estimate the τ → 3µ rate by including only the dimension 5 elecromagnetic dipole contributions of the form given in Eq. (12). For c L and c R , we use the same expressions as for τ → µγ, see Sec. III A and Appendix A. We evaluate these expressions at q 2 = 0. We have checked that the neglected contributions are numerically smaller than the dipole terms at one loop. At two loops, to the best of our knowledge, a full evaluation of all potentially relevant diagrams is not available.
The corresponding expression for the flavor violating partial width of the τ is where we have neglected terms additionally suppressed by the muon mass. The Wilson coefficients c L and c R are given approximately by Eqs. (13) and (14), with the 2-loop contribution dominating over the 1-loop one.

C. Constraints from muonium-antimuonium oscillations
A µ + e − bound state (called muonium M ) can oscillate into an e + µ − bound state (antimuoniumM ) through the diagram in Fig. 3. The time-integrated M →M conversion probability is constrained by the MACS experiment at PSI [41] to be below where the correction factor S B ≤ 1 accounts for the splitting of muonium states in the magnetic field of the detector. It depends on the Lorentz structure of the conversion operator and varies between S B = 0.35 for (S ± P ) × (S ± P ) operators and S B = 0.9 for P × P operators [41]. Conservatively, we use the smallest value S B = 0.35 throughout. Since we will find that M -M oscillation constraints are much weaker than those from from µ → eγ and µ → e conversion, this approximation suffices for illustrative Figure 3: Diagram leading to muonium-antimuonium oscillations. purposes.
The theoretical prediction for the M →M conversion rate is governed by the mixing matrix element (see, e.g., [42]) where ↑ X and ↓ X are the spin orientations of particle X. We can work in the nonrelativistic limit here. For a contact interaction, the spatial wave function of muonium, M ] 1/2 , only needs to be evaluated at the origin. (Here r is the electron-antimuon distance and a M = (m e + m µ )/(m e m µ α) is the muonium Bohr radius.) The resulting mass splitting between the two mass eigenstates of the mixed M -M system is [42], and the time-integrated conversion probability is The bound from the MACS experiment [41] then translates into |Y µe + Y * eµ | < 0.079.

D. Constraints from magnetic dipole moments
The CP conserving and CP violating parts of the diagram in Fig. 4 generate magnetic and electric dipole moments of the muon, respectively. Since the experimental value of the magnetic dipole moment, g µ − 2, is above the SM prediction at more than 3σ, also the preferred value for the flavor violating Higgs couplings will be nonzero. The FV contribution to (g − 2) µ due to the τ -Higgs loop in Fig. 4 is (neglecting terms in agreement with [24]. The discrepancy between the measured value of a µ and the one predicted by the Standard Model [38,43], could thus be explained if there are FV Higgs interactions of the size (for the definition of the Yukawa couplings see Eq. (1)). This explanation of ∆a µ requires Y µτ ∼ Y τ µ to be a factor of a few bigger than the SM value of the diagonal Yukawa, m τ /v, and is in tension with limits from τ → µγ. 3 It is in further tension with the LHC limit extracted in Sec. V of this paper.
The measured ∆a µ could in principle also be explained by an enhanced flavor conserving coupling of the muon to the Higgs if Y µµ ∼ 0.15 ∼ 280 m µ /v. However, in this case h → µµ decays would be enhanced to a level that is already ruled out by the searches at the LHC: From the search for the MSSM neutral Higgs boson one obtains a bound σ(gg → h → µµ) with the electric dipole moment given by (neglecting the terms suppressed by m µ /m τ or in agreement with [24]. The experimental constraint −10 × 10 −20 e cm < d µ < 8 × 10 −20 e cm [37] translates into the rather weak limit −0.8 Im(Y µτ Y τ µ ) 1.0.  are larger than the one loop ones because they are not suppressed by the small Y µµ coupling but only by Y tt or the weak gauge coupling, but they are still slightly less than an order of magnitude smaller than the tree level contribution. Here, we always assume the diagonal Yukawa couplings to have their SM values. With this assumption, the tree level term is very sensitive to the strangeness content of the nucleon.
The bounds on the Yukawa couplings Y eµ and Y µe from µ → e conversion in nuclei are listed in Table I.
One could potentially also obtain interesting limits on |Y eτ | and |Y τ e | from µ → e conversion in nuclei, even though this requires diagrams proportional to two FV Yukawa couplings, because the other constraints on these couplings are weak.
are constrained by µ → e conversion through 1-loop diagrams similar to the ones shown in Fig. 5, but with a τ running in the loop (see Eq. (A17)). In the simplest case, Y eτ = Y τ e , Y µτ = Y τ µ , with all Yukawa couplings real, the constraint is Y eτ Y µτ 10 −6 . This is almost, but not quite, competitive with the bound following from τ → eγ and τ → µγ decays, see Table I. to the data points given in [45] is beyond the scope of this work, we have estimated that flavor-violating couplings |Y e | 2 + |Y e | 2 few × 10 −1 are excluded by LEP.
H. Allowed branching ratios for lepton flavor violating Higgs decays In Fig. 6 where α , β = e, µ, τ , α = β . The decay width Γ(h → α β ), in turn, is and the SM Higgs width is Γ SM = 4.1 MeV for a 125 GeV Higgs boson [46]. In the panels of Fig. 6 we are assuming that at most one of non-standard decay mode of the Higgs is significant compared to the SM decay width.
From Fig. 6 we see that given current bounds from τ → µγ and τ → eγ, branching fractions for h → τ µ or h → τ e in the neighborhood of 10% are allowed. This is well within the reach of the LHC as we shall show in Sec. V. The allowed sizes of these two decay widths are comparable to the sizes of decay widths into nonstandard decay channels (such as the invisible decay width) that are allowed by global fits [47]. If there is no significant negative contribution to Higgs production through gluon fusion, one has BR(h → invisible) 20%, while allowing for arbitrarily large modifications of gluon and photon couplings to the Higgs leads to the constraint BR(h → invisible) 65% [47]. These two bounds apply without change also to BR(h → τ µ), BR(h → τ e) and BR(h → eµ).
In contrast to decays involving a τ lepton, the branching ratio for h → eµ is extremely well constrained by µ → eγ, µ → 3e and µ → e conversion bounds, and is required to be below BR(h → eµ) 2 × 10 −8 , well beyond the reach of the LHC.

IV. HADRONIC FLAVOR VIOLATING DECAYS OF THE HIGGS
We next consider flavor violating decays of the Higgs to quarks. We first discuss two-body decays to light quarks, h →bd,bs,sd,cu, and then turn to FV three body decays mediated Our LHC limit

Technique
Coupling Constraint   weak Hamiltonian, which for B d −B d mixing is Here we use the same notation for the Wilson coefficients as in [48] and display only nonzero contributions, which are The results for B s −B s , K 0 −K 0 and D 0 −D 0 mixing are obtained in the same way with the obvious quark flavor replacements. We can now translate the bounds on the above Wilson coefficients obtained in [48] into constraints on the combinations of flavor violating Higgs couplings as summarized in Table II. We see that all Yukawa couplings involving only u, d, s, c, or b quarks have to be tiny. The weakest constraints are those in the b-s sector, where flavor violating Yukawa couplings 10 −3 are still allowed. This would correspond to BR(h → bs) ∼ 2 × 10 −3 , which is still far too small to be observed at the LHC because of the large QCD backgrounds.

B. Higgs decays through off-shell top and top decays to Higgs
Among the flavor violating Higgs couplings to quarks, the most promising place for new physics to hide are processes involving top quarks, such as the 3-body decay h → (t * → W b)q. Here, q denotes either a charm quark or an up quark. The corresponding FV Yukawa couplings contribute at one loop to D −D mixing through diagrams of the form of Fig. 7 (b).
The corresponding Wilson coefficients in the effective Hamiltonian (28) are where and x tH ≡ m 2 t /m 2 h . Note that now also the operators Q uc 1,5 ,Q uc 1 (in the notation of [48]) have non-zero Wilson coefficients. By requiring that each individual operator is consistent with its D −D mixing constraint, we derive the limits shown in the last part of Table II. The constraints are much weaker than those on FV Higgs couplings involving only light quarks.
Strong constraints on Y qt and Y tq are also obtained from the non-observation of anomalous single top production. The flavor violating chromomagnetic operators are generated trough loop diagrams similar to Fig. 1, but with leptons replaced by quarks and the photon replaced by a gluon. Here g s is the strong coupling constant, λ a are the Gell-Mann matrices, G a µν is the gluon field strength tensor, and κ tqg,L , κ tqg,R are dimensionless effective coupling constants which depend on Y qt and Y tq according to with the loop function F given in Eq. (A3). The analogous expression for κ tqg,R is obtained by replacing Y * tq → Y qt and Y tt → Y * tt in F . Note that in (35) we have assumed an EFT description with an on-shell gluon. Since m h ∼ m t this is only approximate, but we have checked that varying q 2 ∈ [0, m 2 t ] changes the bounds on Y tq , Y qt only by ∼ 10%. We have also made the approximation m q → 0, which is obeyed even much better. Limits on κ tqg,L , κ tqg,R have been derived by the CDF and DØ collaborations [49,51] and most recently by ATLAS [52]. In the notation of [52], we have |κ tgf |/Λ ≡ |κ tqg,L | 2 + |κ tqg,R | 2 /( √ 2m h ). We obtain the constraints  (where q denotes a charm or up quark). The light yellow region shows a recent limit on t → hc (or hu) from an LHC multi-lepton search [50].
We now translate these bounds into constraints on the h → (t * → Wb)q decay width, which is given by (setting m b,q = 0) where V tb 1 is a CKM matrix element. The branching ratio for h → t * c can be as large as O(10 −3 ), and the one for h → t * u can be few × 10 −4 as shown in Fig. 8.
If the decay h → (t * → W b)c is non-negligible, so is the related non-standard top quark decay mode t → hc, the rate for which is given by (neglecting the charm mass) Branching ratios for t → hc of several tens of per cent are perfectly viable and can be searched for, e.g. in the multi-lepton or t → bbc channels. In fact, the strongest hint on Higgs couplings to tc are already coming from a CMS multi-lepton search which was recast in [50] to search for t → hc, giving a bound of 2.7% on the branching fraction of a top into a Higgs and a charm or up quark. This yields a limit of |Y ti | 2 + |Y it | 2 < 0.34 for i = u or c (see Fig. 8).
We have also calculated the branching ratios for the loop-induced processes t → qγ, t → qg and t → qZ (q = u, c), which are in principle sensitive to |Y qt | and |Y tq |, but have found that even for |Y qt |, |Y tq | ∼ O(1) the current experimental bounds are satisfied [53].
In the above we have assumed that the weak phases of Y ut and Y ut are negligibly small.
Otherwise an unacceptably large contribution to the neutron EDM is generated at 1-loop level with top and Higgs running in the loop. Eq. (25) with the replacements m τ → m t , and which is much more stringent than the bounds on the absolute values of the same FV Yukawa couplings. In contrast, the bounds from charm running in the loop, |ImY uc Y cu | < 1.6 × 10 −7 , and from d-quark EDMs generated by the b-quark and s-quark running in the loop, |ImY db Y bd | < 6.4 × 10 −8 and |ImY ds Y sd | < 1.2 × 10 −6 , respectively, are less stringent than the bounds from meson mixing, Table II.

V. SEARCHING FOR FLAVOR VIOLATING HIGGS DECAYS AT THE LHC
We next discuss possible search strategies for flavor violating Higgs decays at the LHC, focusing on the h → τ µ and h → τ e decays. As shown in Fig. 6, these are among the least constrained of the couplings discussed in this paper, with a potential to modify the Higgs branching fractions significantly. They are sensitive to new particles with flavor violating couplings or to a secondary mechanisms of electroweak symmetry breaking such as additional Higgs doublets, and are thus good probes of new physics. Furthermore, they are also interesting final states as far as the potential for searches at the LHC is concerned.
The decay h → τ µ is quite similar to the standard model h → τ τ decay with one of the tau leptons decaying to a muon. This implies that existing SM Higgs searches, with only small or no modifications at all, can already be used to place bounds on the flavor violating decay. We thus first extract limits on h → τ µ and h → τ e decays from an existing h → τ τ search in ATLAS. We then discuss how modifications to the τ τ search can lead to significantly improved sensitivity to flavor violating Higgs decays.
A. Extracting a bound on Higgs decays to τ µ and τ e We use the existing ATLAS search for h → τ τ in the fully leptonic channel [55] to place bounds on the h → τ µ and h → τ e branching fractions. The reason we use fully leptonic events is that we can simulate the detector response to them more accurately than for events involving hadronic taus. It should, however, be noted that in the SM h → τ τ search in ATLAS, semi-hadronic events are about as sensitive as fully leptonic ones [55], and in CMS, the semi-hadronic mode provides even stronger limits [27]. The analysis in [55] uses the collinear approximation to reconstruct the τ τ invariant mass, i.e. it is assumed that the neutrino and the charged lepton emitted in tau decay are collinear. This approach is less optimized for h → τ τ than the maximum-likelihood method employed by CMS [56], but it is more model independent so that a substantial fraction of h → τ µ or h → τ e decays would pass the cuts. 4 For simplicity we only use the ATLAS cuts optimized for Higgs production in vector boson fusion (VBF) since this channels provides the best sensitivity [55].
To derive limits we have generated 50,000 pp → 2j + (h → τ µ) Monte Carlo events using MadGraph 5 v1.4.6 [57] for parton level event generation, Pythia 6.4 for parton showering and hadronization, and PGS [58] as a fast detector simulation. Combining the ATLAS lepton triggers and off-line cuts from [55], we select opposite sign dilepton events satisfying any of the following requirements: a muon pair with p T > 15 GeV for the leading muon and p T > 10 GeV for the subleading one, an electron pair with both p T > 15 GeV, or an electron and a muon with p T above 15 and 10 GeV, respectively. Electrons (muons) are accepted only if their pseudorapidity is |η| < 2.47 (2.5). We require the invariant mass of the lepton pair to be 30 GeV < m ll < 100 GeV for eµ pairs, or 30 GeV < m ll < 75 GeV for same flavor pairs. The missing p T is required to be above 20 (40) GeV for eµ events (ee or µµ events).
The azimuthal separation between the two leptons is required to be 0.5 < ∆φ ll < 2.5.
Additional cuts are placed with the goal of enriching the event sample in VBF events: at  Figure 9: Background rates and h → τ µ, h → τ τ signal rates in the ATLAS search for fully leptonic h → τ τ decays, optimized for Higgs production in vector boson fusion. The backgrounds expected by ATLAS [55] are shown in yellow, with grey bands for the systematic uncertainty. Our estimates for the τ µ signal at Y 2 τ µ + Y 2 µτ = m τ /v (red) and the SM h → τ τ signal (black), which we include for reference, are scaled by a factor 5 for illustrative purposes only.
least two jets with p T above 40 GeV for the leading jet and above 25 GeV for the subleading jet are required, with the rapidity difference between the two leading jets above |∆η| > 3 and the invariant mass m jj > 350 GeV. We veto events with an additional jet with p T > 25 GeV and |η| < 2.4 in the pseudorapidity region between the two leading jets.
The reconstructed invariant mass is calculated using the collinear approximation in which all invisible particles are assumed to be collinear with either of the two leptons. The fractions of the parent τ 's momenta carried by the charged leptons are denoted by x 1 and x 2 . To be able to compare with ATLAS data from the h → τ τ search, we compute x 1 and x 2 assuming two neutrinos in the final state, even though h → τ µ yields only one. x 1 and x 2 are then obtained as the solutions of the transverse momentum equation p miss,T = (1 − x 1 )p 1,T + (1 − x 2 )p 2,T , where p 1,2,T are the transverse momenta of the charged leptons. Following [55], we require 0.1 < x 1,2 < 1, which removes less than a per cent of h → τ τ events, but nearly 60% of our h → τ µ events. Thus, relaxing this cut would enhance the sensitivity to h → τ µ decays so long as it does not introduce large backgrounds. Nonetheless, we are still able to use the current search for h → τ τ to produce an interesting bound on BR(h → τ µ). 95% C.L. limit In Fig. 9 we show the background distribution for the collinear mass along with the expected shape of a LFV h → τ µ signals (scaled by a factor five for illustrative purposes only), and we compare to the observed data. The background expectation is taken from [55].
The backgrounds and the data in Fig. 9 include events for all three combinations of lepton flavor (even though our τ µ signal does not induce ee events) because only this information is available from ATLAS. For validation purposes, we have also simulated SM h → τ τ events, and comparing the rate and shape to Ref. [55] we find agreement to within 20%.
The τ µ signal is predominantly concentrated in the 120-160 GeV bin, so that the expected and observed limits on the flavor violating Yukawa couplings can be derived from a simple single-bin analysis. If we denote the number of expected background events by B = 4.7, the number of expected signal events for a given set of Yukawa couplings by S, and the number of observed events by O = 2, the expected (observed) one-sided 95% C.L. frequentist limit on S is defined by the requirement that the probability to observe ≤ B (≤ O) events is 5%. The relevant probability distribution of the data here is a Poisson distribution with mean B + S. We can also include the systematic uncertainty in the 120-160 GeV bin, which is ∆ sys ±0.99, in a conservative way by instead using a Poisson distribution with mean B + S − ∆ sys . Assuming the Higgs is produced with the Standard Model rates, this procedure leads to the bound on BR(h → τ µ) and the analogous bound on BR(h → τ e) shown in Table III (see also Figure 6).

B. Comparison of h → τ µ to h → τ τ
We now discuss the experimental differences and similarities between h → τ τ and h → τ µ decays to determine an optimized search strategy for the latter. We focus here on h → τ had τ µ ,where τ µ denotes a τ that decays into a muon and two neutrinos and τ had denotes a τ decaying hadronically. This channel is actively searched for, both at ATLAS [55] and at CMS [27], and is the most sensitive channel in the CMS h → τ τ search. (In ATLAS, fully leptonic τ events provide similar sensitivity to semi-hadronic ones.) It will also be the channel that we will devise a dedicated search for in the next subsection.
There are a few notable differences between the h → τ had τ µ and h → τ had µ decay channels: For (Y 2 τ µ + Y 2 µτ ) 1/2 ∼ Y τ τ the signal for h → τ had µ is thus a factor of ∼ 2.9 larger.
• Lepton Flavor. The flavor violating decays can lead to different rates for muons and electrons in the final state, whereas τ τ decays lead to equal µ and e rates. Thus, if the various lepton flavor combinations were studied separately in the h → τ τ analyses, stronger bounds on flavor violating decays could be inferred.
• Kinematics and Efficiencies. In h → τ had τ µ decays the muon carries an average energy ∼ m h /6, while for h → τ had µ it carries ∼ m h /2. Furthermore, in h → τ had µ events the missing energy is roughly aligned with the hadronic τ . As a result the two channels can have different efficiencies given the same cuts. For example, in the VBF analysis described below (mimicking [27]) the efficiency for h → τ had τ µ is a factor of ∼ 1.8 lower than for h → τ had µ events, mostly because many of the muons in the h → τ had τ µ sample fall below the p T < 17 GeV cut.
• Mass reconstruction. The LHC collaborations use highly optimized procedures for reconstructing the τ had τ µ invariant mass. ATLAS uses the Missing Mass Calculator (MMC) from [56], while CMS uses an in-house maximum likelihood analysis [27].
These procedures use p miss,T and the 3-momenta of the muon and the τ jet as input and estimate the neutrino momenta by assuming typical τ decay kinematics. For h → τ τ events, the MMC procedure returns an invariant mass with high efficiency (∼ 97%) and gives a Higgs mass resolution of ∼ 20%. If the event is not from h → τ τ but instead from h → τ had µ, then i) the efficiency will be significantly lower since the kinematics can be completely inconsistent with a τ τ event, and ii) the reconstructed Higgs mass will be significantly higher as the MMC will assume that the hard muon is accompanied by two roughly collinear and hard neutrinos. This illustrates that a mass reconstruction procedure designed for the specific final state under consideration is mandatory to obtain the best possible sensitivity.
• Backgrounds. The backgrounds for h → τ had τ µ and h → τ had µ events are similar, but because of the different invariant mass reconstruction techniques, the reconstructed background spectra will typically be harder for a h → τ had τ µ analysis, which assumes three neutrinos in the final state, than for a h → τ had µ analysis which assumes only one. This implies, for instance, that the peak from the Z → τ τ background will appear at a τ had τ µ invariant mass around 90 GeV in a search for h → τ had τ µ , but well below (and thus further away from the signal peak) in a dedicated h → τ had µ analysis.
These considerations show that the LHC is potentially more sensitive to flavor violating h → τ had µ decays than to the SM h → τ τ channel. We now discuss a possible strategy for a tailored h → τ had µ analysis.

C. A dedicated h → τ µ analysis
We now investigate the potential of a dedicated h → τ had µ analysis which follows closely the CMS search for h → τ had τ µ [27]. 5 The most important difference to that analysis will be a different algorithm for reconstructing the τ µ invariant mass. In particular, since the τ had µ final state contains only one neutrino (from the hadronic τ ), this mass reconstruction can always be done exactly (i.e. the neutrino momentum can be determined) up to a two-fold ambiguity.
An important background for h → τ µ is Z + jets, where either the Z decays into τ + τ − and one of the τ 's decays further into a muon, or the Z decays into µ + µ − and one of the jets fakes a τ . Another important background is W + jets, followed by W → µν µ and a jet faking a τ . We neglect the small tt background, where a final state τ can come from a W decay or be faked by a jet, and a muon can originate from a W decay or from a leptonic τ decay. We also do not consider backgrounds from QCD multijet production because making reliable predictions for these events requires full detector simulations. Based on the CMS h → τ µ τ had search [27] we expect them to be about as large as the W + jets background in the invariant mass region around 125 GeV.  Table 2. We also normalize the h → τ µ signal using the same scaling factor as for the SM h → τ had τ µ events.
In the analysis we require exactly one muon with p T > 17 GeV and |η| < 2.1 in the final state and exactly one jet tagged as a hadronic τ decay with p T > 20 GeV and |η| < 2.3.
The muon and the τ are required to have opposite charge. In [27], it was found that the best signal-to-background ratio is achieved in the events where the Higgs boson was produced through vector boson fusion (VBF), and we confirm this in our own simulations.
To enrich the data sample in VBF events, we consider only events with a pair of jets j 1 , j 2 satisfying |∆η| > 4.0, η 1 η 2 < 0, m jj > 400 GeV and no other jets with p T > 30 GeV in the pseudorapidity region between the j 1 and j 2 . Here, ∆η = η 1 − η 2 is the pseudorapidity difference between the two jets and m jj is the invariant mass of the jet pair. Non-τ jets are included in the analysis so long as their p T is above 30 GeV and their pseudorapidity is |η| < 4.7. In the CMS analysis [27], the transverse mass of the muon and the missing energy is restricted to be below 40 GeV in order to suppress the W + jets background. This works because in W + jets events, the muon and the neutrino from W → µν µ tend to be  more back-to-back than in h → τ µ τ had , where both τ 's contribute to the missing energy. In h → τ had µ, however, the muon and the missing energy also tend to be back-to-back, so that the m T (µ, p miss,T ) cut also removes a large fraction of signal events.
In light of this we show in Fig. 10 the expected signal and background rates for h → τ µ as a function of the µ-τ invariant mass m τ µ both with and without the transverse mass cut. In computing m τ µ for each event, we have used energy and momentum conservation to compute the z-component of the neutrino momentum p ν,z . There are two solutions to these equations, and we arbitrarily pick the smaller of the two. (We have checked that choosing the larger value for p ν,z yields a very similar plot. This is related to the fact that m τ m h , so that the τ 's decay products are almost collinear.) As shown in the right panel of Fig. 10, dropping the transverse mass cut increases the W plus jets background, but has the benefit of retaining more signal. The transverse mass distributions for signal and background is shown in Fig. 11. Assuming the W plus jets and QCD backgrounds can be controlled reasonably, relaxing this cut may be worthwhile.  Figure 11: The transverse mass distribution of the muon-missing energy system for the backgrounds and for the τ had µ signal.
In summary, Fig. 10 shows that for flavor violating Yukawa couplings well allowed by low energy precision measurements, a spectacular signal can be expected in a dedicated search at the LHC. Such a search would cut deeply into the allowed parameter space of the flavor violation Higgs to τ µ couplings.

VI. CONCLUSIONS
The LHC experiments have recently discovered a Higgs-like resonance with a mass around 125 GeV. In this paper we have examined the constraints on potential flavor violating couplings of this resonance, assuming it is indeed a scalar boson. In deriving the constraints We have refined the indirect constraints on the flavor violating Yukawa couplings Y ij using results from rare decay searches, magnetic and electric dipole moment measurement, and the LHC. All constraints are summarized in Tables I and II and    keeping only the leading terms (so that only the first terms in (A1), (A2) contribute), the above expressions simplify to (13) if the diagonal Yukawa couplings are real. The simplified expressions for τ → eγ and µ → eγ (with a muon running in the loop) are obtained from (13) with trivial modifications, while the simplified expression for µ → eγ with a τ running in the loop is given in Eq. (16).
2. Two loop expressions for τ → µγ, τ → eγ and µ → eγ At two loops there are numerically important diagrams with top or W running in the loop, attached to the Higgs. Here we translate the results of [36] into our notation and adapt them to the case of τ → µγ. The diagrams with top and photon in the loops (see Fig. 12 top left) contributes as while the W -photon 2-loop contribution is
The above operators are generated by integrating out the Higgs in the diagrams shown in Here, Q q is the charge of quark q, andg (p) RV , g q RV is given by Eq. (A16) with the replacement Y → Y † . The loop function in (A16) is where we have defined ∆ ≡ zm 2 h − xz m 2 j − yz m 2 i + (x + y)m 2 f − xyq 2 and z = 1 − x − y. Note that we subtract the value of the one-loop vertex correction at q 2 = 0, which gets absorbed into the wave function and mass renormalizations.
The coefficients c L,R also receive two loop contributions with either a top or a W running in the loop. If we evaluate these contributions at q 2 = 0, so that the expressions are exactly the same as in the previous subsection on µ → eγ decay, we find that both one-loop and two-loop contributions are subdominant (with the 2-loop contribution the larger of the two).
They are roughly an order of magnitude smaller than the scalar contributions, so that the above approximations suffice for our purposes.
The scalar Wilson coefficients, generated from the first diagram in Fig. 5, are while the remaining Wilson coefficients are zero, g q LP = g q RP = g q LA = g q RA = g q LT = g q RT = 0. When computing the µ → e conversion rate in nuclei care must be taken to account for the nuclear matrix elements N |qq|N , N |qγ µ q|N and N |F µν |N and for the overlap of the initial muon wave function and the final state electron wave function. We follow [65] and obtain Γ(µ → e conversion) = − e 16π 2 c L D +g The nucleon matrix elements f (q,p) ≡ p|m qq q|p /m p are calculated according to [66], but using an updated value for the nucleon sigma term Σ πN = 55 MeV with the same values for neutrons. In the above expressions, m q denotes a quark mass, m p is the proton mass, and m n is the neutron mass. The coefficients D, V (p) , S (p) , and S (n) are overlap integrals of the muon, electron and nuclear wave function. They are tabulated for various target materials in [65]. The best limits are obtained from bounds on µ → e conversion on gold, Γ(µ → e) Au /Γ capture Au < 7 × 10 −13 (90% CL) [68], for which in units of