1 Introduction

Precision measurements of low-energy flavor observables are complementary to the effort led by the LHC in the high-energy frontier to search for physics beyond the Standard Model (SM). Flavor processes are fundamental in this quest as they are sensitive to new physics scales well beyond the reach of direct searches, providing information about the flavor structure beyond the SM. In particular, one of the most prominent types of low-energy observables are decays with Lepton Flavor Violation (LFV), which are strictly forbidden by accidental symmetries in the SM. Therefore, these are clean probes of New Physics (see Ref. [1] for a recent review).

The LHC is currently leading the effort in the search for new-physics particles in the high-energy frontier. Besides performing direct searches for particles that might be light enough to be produced on-shell, the LHC can also be used to look for indirect effects from new particles that are too heavy to be produced in proton collisions [2]. The latter strategy is based on the study of the high-energy tails of kinematic distributions, and it has been proven to be extremely useful to constrain flavor transition through LHC data on the Drell–Yan processes \(pp\rightarrow \ell _i \ell _j\) and \(pp\rightarrow \ell _i \nu \) at high-\(p_T\), see Refs. [3, 4] and references therein. By exploiting the five quark flavors that are available in the proton, several Effective Field Theory (EFT) studies have been performed in the literature, deriving LHC bounds on specific semileptonic \(d=6\) operators that were compared to low-energy constraints [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]. Although less accurate in most cases, the energy enhancement of the partonic cross sections induced by the effective operators can lead to LHC bounds that outperform the ones derived from low-energy processes for specific operators and quark/lepton flavors, as shown, for instance, for semileptonic LFV transitions in Ref. [19] and for charm-meson decays in Refs. [19, 20].

In this letter, we will demonstrate that Drell–Yan data can be used to set fully model-independent upper limits on LFV meson decays. In other words, we will derive upper limits on the branching fractions for these processes without making any assumptions regarding the choice of effective operators. We will only assume that the EFT approach is valid to describe the LHC data [21]. It will be fundamental for our derivation that Drell–Yan processes are inclusive, probing operators with all possible Lorentz structures and all possible combinations of the five quark flavors at tree level. This is in contrast to exclusive meson decays which are sensitive to specific quark-flavor transitions and to operators with selected Lorentz structures, depending on the quantum numbers of the mesons in the initial and final states, as depicted in Fig. 1.

We will illustrate the above procedure for the decays \(B_{(s)}\rightarrow \ell _i\ell _j\) and \(B_{(s)}\rightarrow M \ell _i\ell _j\), with \(i\ne j\), where M denotes a pseudoscalar or vector meson [22]. We will show that current LHC data is already sufficient to provide competitive bounds for specific channels compared to the direct limits from flavor experiments, which will be further improved with the high-luminosity phase of the LHC. We will also use LHC data to constrain decays that have not yet been searched for experimentally, such as \(D^0\rightarrow e\tau \) in the charm sector and various semileptonic decays of \(B_{(s)}\)-mesons. These results can provide a valuable guideline for low-energy experimental searches, indicating the range of branching fractions that are not yet constrained by LHC data.

Fig. 1
figure 1

Illustration of the complementarity between di-lepton production at the LHC (left panel) and low-energy meson decays (right panel) to constrain semileptonic LFV interactions

The main caveat of our derivation is the validity of the EFT to describe LHC data, which must be verified in each case by comparing a posteriori the energy of LHC events (E) with the EFT scales (\(\Lambda \)) that can be probed for effective coefficients (\({\mathcal {C}}\)), within the range given by perturbativity [21]. If the experimental sensitivity is poor, it is only possible to probe large values of \({\mathcal {C}}/{\Lambda }^2\), which would either imply that \({\mathcal {C}}\) takes large values, or that the EFT cutoff does not satisfy the consistency condition \(E\ll \Lambda \), cf. Sect. 5. Another important aspect is the operator mixing induced by the Renormalization Group Evolution (RGE) of electroweak and Yukawa interactions, which spoils the tree-level correspondence displayed in Fig. 1. For the processes that we consider, we will show that these effects are suppressed by loop and CKM factors, thus being negligible. However, that is not the case in general, as we will briefly show for quarkonia decays at the end of Sect. 5.

The remainder of this letter is organized as follows: In Sect. 2, we define our effective Lagrangian for LFV processes. In Sect. 3, we discuss the LFV bounds that can be derived from LHC data. In Sect. 4, we demonstrate that the latter can be used to constrain meson decays and we discuss the impact of one-loop operator mixing on our results. Our numerical results are presented in Sect. 5 and we briefly conclude in Sect. 6.

Table 1 Hermitian (left) and non-Hermitian (right) dimension \(d=6\) semileptonic operators in the SM EFT. Quark and lepton doublets are denoted by q and l, respectively, and the weak singlets read u, d and e. \(SU(2)_L\) indices are denoted by ab, with \(\varepsilon _{12}=-\varepsilon _{21}=+1\), and \(SU(3)_c\) indices are omitted. Flavor indices are also omitted in this Table

2 EFT approach for LFV

We start by defining our framework. Since we are interested in New Physics effects arising well above the electroweak scale, we consider an effective Lagrangian invariant under the \(SU(3)_c \times SU(2)_L \times U(1)_Y\) gauge symmetry, namely the SM EFT [23, 24], with operators up to dimension six,

$$\begin{aligned} {\mathcal {L}}_\textrm{eff} = \sum _a \dfrac{{\mathcal {C}}_a}{\Lambda ^2} {\mathcal {O}}_a\,, \end{aligned}$$
(1)

where \(\Lambda \) denotes the EFT cutoff. The effective coefficients are generically denoted by \({\mathcal {C}}_a\) and the effective operators \({\mathcal {O}}_a\) can be of several types. The most relevant ones for both low- and high-energy LFV searches are the semileptonic ones (\(\psi ^4\)), which are collected in Table 1. We adopt the same notation of Refs. [25,26,27] and denote flavor indices by Latin symbols.Footnote 1 For Hermitian operators, we further impose that the Wilson coefficient \([{\mathcal {C}}]_{ijkl}\) must have \(i \le j\) to avoid redundancy in the operator basis from the relation \([{\mathcal {C}}]_{ijkl}=[{\mathcal {C}}]_{jilk}^*\) [3]. For definiteness, we opt to work in the basis where down-quark Yukawas are diagonal, i.e. the quark doublets correspond to \(q=(V^\dagger u_L,\,d_L)^T\) where V is the Cabibbo-Kobayashi-Maskawa (CKM) matrix.

Besides semileptonic operators, the dipole (\(\psi ^2 X H\)) and Higgs-current operators (\(\psi ^2 D H^2\)) can also violate lepton flavor, but these are already tightly constrained by purely leptonic processes. Moreover, such operators would induce suppressed contributions to the tails of \(pp\rightarrow \ell _i\ell _j\) with respect to the semileptonic ones, since they do not feature the same scaling with the collision energy [3] and, therefore, will be neglected in the following.

3 LFV in \(pp\rightarrow \ell _i\ell _j\)

The \(q_k {\bar{q}}_l \rightarrow \ell _i^- \ell _j^+\) partonic cross section (with \(q=u,d\)) can be written as [19]

$$\begin{aligned} {\hat{\sigma }} (q_k {\bar{q}}_l \rightarrow \ell _i^- \ell _j^+) = \dfrac{\hat{s}}{144 \pi \Lambda ^4} \sum _{AB} {\mathcal {C}}_A^{(q)*} \,\Omega _{AB} \,{\mathcal {C}}_B^{(q)}, \end{aligned}$$
(2)

where \(\hat{s}\) is the partonic center-of-mass energy, and \({\mathcal {C}}^{(u)}\) and \({\mathcal {C}}^{(d)}\) are vectors of effective coefficients,

$$\begin{aligned}{} & {} \!\!\!\mathcal {\vec {C}}^{\,(u)} = \left[ {\mathcal {C}}_{lq}^{\prime \,(1-3)}\,,\;{\mathcal {C}}_{lu}\,,\;{\mathcal {C}}_{eq}^\prime \,,\;{\mathcal {C}}_{eu}\,,\;{\mathcal {C}}_{lequ}^{\prime \,(1)} \,,\;{\mathcal {C}}_{lequ}^{\prime \,(1)\,\dagger }\,,\;{\mathcal {C}}_{lequ}^{\prime \,(3)} \,,\;{\mathcal {C}}_{lequ}^{\prime \,(3)\,\dagger } \right] \,,\nonumber \\{} & {} \!\!\!\mathcal {\vec {C}}^{\,(d)} = \left[ {\mathcal {C}}_{lq}^{(1+3)}\,,\;{\mathcal {C}}_{ld}\,,\;{\mathcal {C}}_{eq}\,,\;{\mathcal {C}}_{ed}\,,\;{\mathcal {C}}_{ledq}\,,\;{\mathcal {C}}_{ledq}^\dagger \,,\;0\,,\;0\right] \,, \end{aligned}$$
(3)

where we use the shorthand notation \({\mathcal {C}}_{lq}^{(1\pm 3)}={\mathcal {C}}_{lq}^{(1)}\pm {\mathcal {C}}_{lq}^{(3)}\), and flavor indices are not explicitly written, but should be understood as \({\mathcal {C}} \equiv {[}{\mathcal {C}}{]}_{ijkl}\) and \({\mathcal {C}}^\dagger \equiv {[}{\mathcal {C}}{]}_{jilk}^*\)Footnote 2 The primed coefficients in \(\mathcal {\vec {C}}^{(u)}\) are defined as follows,

$$\begin{aligned}{} & {} \!\!\!{\mathcal {C}}_{lq}^{\prime \,(1-3)} = V^\dagger {\mathcal {C}}_{lq}^{(1-3)} V\,, \end{aligned}$$
(4)
$$\begin{aligned}{} & {} \!\!\!{\mathcal {C}}_{eq}^{\prime } = V^\dagger {\mathcal {C}}_{eq} V\,, \end{aligned}$$
(5)
$$\begin{aligned}{} & {} \!\!\!{\mathcal {C}}_{lequ}^{\prime \,(1)} = V^\dagger {\mathcal {C}}_{lequ}^{\,(1)}\,, \end{aligned}$$
(6)
$$\begin{aligned}{} & {} \!\!\!{\mathcal {C}}_{lequ}^{\prime \,(3)} = V^\dagger {\mathcal {C}}_{lequ}^{\,(3)}\,, \end{aligned}$$
(7)

where the CKM matrix V acts on quark-flavor indices. Similar expressions hold for the conjugated coefficients (\({\mathcal {C}}^\dagger \)). The above redefinition involving the CKM matrix corresponds to the up-quark aligned basis, which will simplify the discussion below, since these coefficients contribute to a single quark-level transition. Lastly, the \(8\times 8\) real matrix \(\Omega \) takes a diagonal form \(\Omega _{AB}=\Omega _A \, \delta _{AB}\) for the full partonic cross section,

$$\begin{aligned} \Omega = \textrm{diag} \big (1,\;1,\;1,\;1,\;3/4,\;3/4,\;4,\;4 \big ). \end{aligned}$$
(8)

The interference terms are either suppressed by the fermion masses, being negligible at the LHC, or they vanish after integration over \(\hat{t}\in (-\hat{s},0)\) [3]. Small corrections due to the lepton masses or to the experimental cuts can also be incorporated, having a minor impact on the following discussion. Note, also, that the \(\psi ^2 X H\) and \(\psi ^2 D H^2\) operators are not included in Eq. (2) since their contributions are further suppressed by \(\smash {v/\sqrt{\hat{s}}}\) and \(v^2/\hat{s}\), respectively [3].

LHC data treatment The most recent LHC searches for heavy resonances in the \(pp\rightarrow e\mu ,e\tau ,\mu \tau \) channels Footnote 3 have been made by CMS with \(140~\textrm{fb}^{-1}\) [29]. The dilepton production cross section at the LHC is obtained via the convolution of the partonic cross section in Eq. (2) with the luminosity functions \({\mathcal {L}}_{q_k {{\bar{q}}}_l}(\hat{s})\),

$$\begin{aligned} \sigma (pp\rightarrow \ell _i^-\ell _j^+) = \sum _{k,l} \int \dfrac{\textrm{d}\hat{s}}{s} {\mathcal {L}}_{q_k {{\bar{q}}}_l}\, {\hat{\sigma }}(q_k {{\bar{q}}}_l\rightarrow \ell _i^- \ell _j^+) , \end{aligned}$$
(9)

with

$$\begin{aligned} {\mathcal {L}}_{ q_k {\bar{q}}_l}(\hat{s}) \equiv \int _{\hat{s}/s}^1 \dfrac{\textrm{d}x}{x}\Bigg [ f_{{q}_k} (x,\mu _F) f_{{\bar{q}}_l} \Bigg (\frac{\hat{s}}{s x},\mu _F\Bigg ){+}({q}_k \leftrightarrow {{\bar{q}}}_l)\Bigg ],\nonumber \\ \end{aligned}$$
(10)

where \(f_{{q}_k}\) and \(f_{{\bar{q}}_l}\) denote the Parton Distribution Functions (PDFs) of \(q_k\) and \({\bar{q}}_l\), and \(\mu _F\) stands for the factorization scale.

We consider the constraints on the SM EFT Lagrangian provided in the HighPT package [3, 4], which have been obtained through the recast of the CMS searches [29] using the PDF4LHC15\(\_\)nnlo\(\_\)mc PDF set [30]. After minimizing the \(\chi ^2\)-distribution describing the invariant-mass tails, we can write the resulting LHC constraints at 2\(\sigma \) in the following general form,

$$\begin{aligned} \sum _{n\,\in \,\textrm{bins}}\alpha _n\,(\mathcal {\vec {C}}^{\,\dagger } A_n \mathcal {\vec {C}}\,-1)^2 \le 1, \end{aligned}$$
(11)

where the summation spans over all experimental dilepton invariant-mass bins, \(\alpha _n\) are real coefficients and \(A_n\) are Hermitian matrices determined for the \(n{\textrm{th}}\) bin. The vector \(\mathcal {\vec {C}}\) comprises all SM EFT Wilson coefficients that violate lepton flavor, defined at the scale \(\mu _\textrm{high} \approx 1~\textrm{TeV}\). To simplify our discussion, we will focus on the limits extracted from a single mass bin at large invariant mass. In this case, Eq. (11) can be solved and brought to a quadratic form,

$$\begin{aligned} \mathcal {\vec {C}}^{\,\dagger } B \mathcal {\vec {C}} \le 1, \end{aligned}$$
(12)

where B is a block diagonal matrix, with a non-zero positive-definite block corresponding to the coefficients entering the partonic processes in Eq. (2). The advantage of working with a single invariant-mass bin is that the determination of bounds for low-energy LFV decays reduces to a tractable eigenvalue problem that can be solved analytically, as will be shown in Sect. 4. By choosing a single interval at large invariant-mass, we will find constraints that are only slightly weaker than the ones obtained by using data from the whole invariant-mass spectrum, cf. Sect. 5.

Before presenting our approach to indirectly constrain low-energy LFV observables, we note that the effective coefficients \(\big \lbrace {\mathcal {C}}_{eu},\,{\mathcal {C}}_{lu},\,{\mathcal {C}}^{\prime \,(1)}_{lequ},\,{\mathcal {C}}^{\prime \,(3)}_{lequ},\,{\mathcal {C}}^{\prime \,(1-3)}_{lq}\big \rbrace \) with third-gene ration quark indices do not appear in Eq. (12), since they would correspond to partonic processes with a top quark, which are not constrained by Drell–Yan processes at tree level. Therefore, the corresponding entries of the matrix B vanish for these operators by construction. Note also that the non-zero block in B is itself block diagonal in the basis defined in Eq. (3), with the only non-diagonal part corresponding to \({\mathcal {C}}_{eq}\). Indeed, \({\mathcal {C}}_{eq}\) is the only effective coefficient in our basis that contributes to both \(\smash {u_k{\bar{u}}_l \rightarrow \ell _i^- \ell _j^+}\) and \(\smash {d_k{\bar{d}}_l \rightarrow \ell _i^- \ell _j^+}\), with contributions modulated by the CKM-matrix elements and by the quark PDFs.

4 Indirect probes of low-energy LFV decays

In this Section, we demonstrate that LHC data can be used to set model-independent bounds on low-energy LFV processes. This procedure will then be applied to the most relevant LFV observables based on the \(b\rightarrow d\ell _i\ell _j\) and \(b\rightarrow s\ell _i\ell _j\) transitions. Other decays modes will be briefly discussed in Sect. 5.

4.1 Analytical derivation

Let us consider a given low-energy meson decay related to the quark-level transition \(q_k\rightarrow q_l \ell _i \ell _j\), with \(i \ne j\), and let us denote its branching fraction by \({\mathcal {O}}_{\textrm{low}}\). This observable is a quadratic form in the Low-Energy EFT (LEFT) coefficients, defined at the relevant low-energy scale \(\mu =\mu _\textrm{low}\). We have then to account for the RGE effects induced by QCD and QED from \(\mu _\textrm{low}\) to \(\mu =\mu _\textrm{ew}\) [31, 32], for the tree-level matching to the SM EFT Lagrangian at the electroweak scale [33], and for the gauge and Yukawa RGE effects above the electroweak scale [25,26,27]. Once this is done, it is possible to write the observable \({\mathcal {O}}_{\textrm{low}}\) as

$$\begin{aligned} {\mathcal {O}}_\textrm{low} = \vec {{\mathcal {C}}}^{\,\dagger } M \vec {{\mathcal {C}}}, \end{aligned}$$
(13)

where the SM EFT Wilson coefficients \(\vec {{\mathcal {C}}}\) are expressed at the scale \(\mu _\textrm{high} \approx 1\) TeV, and M is a Hermitian matrix. The RGE effects induced by QCD are numerically significant since they change the magnitude of scalar and tensor contributions to \({\mathcal {O}}_\textrm{low}\). Although smaller, the RGE effects induced by the electroweak and Yukawa interactions cannot be neglected, since they can induce mixing between the operators appearing in Eq. (2) with those that do not contribute to these processes at tree level. In other words, the tree-level correspondence between low-energy meson decays and Drell–Yan processes in Fig. 1 can be spoiled by RGE effects. These unwanted loop contributions have to be bounded by other means when they are numerically significant, e.g. by using other LHC processes or electroweak/flavor observables; or simply by imposing a naive perturbativity bound, as we choose to do in the following.Footnote 4

The problem that we want to solve can be summarized as the determination of

$$\begin{aligned} {\mathcal {O}}_\textrm{max}= \max _{\vec {{\mathcal {C}}}} \Big \lbrace \vec {{\mathcal {C}}}^{\,\dagger } M \vec {{\mathcal {C}}};~\; \text {with}~\; \vec {{\mathcal {C}}}^{\,\dagger } B \vec {{\mathcal {C}}}\le 1 \Big \rbrace , \end{aligned}$$
(14)

where we remind that the matrix B is defined in Eq. (12). To distinguish the operators appearing in Eq. (2) from the ones appearing only through RGE effects, we define the projector P onto the Wilson coefficients entering Eq. (2), i.e. the matrix \(-P\) is the projector to the kernel of B. With this definition, we can decompose \(\smash {\vec {{\mathcal {C}}}= P \,\vec {{\mathcal {C}}} + (1-P)\,\vec {{\mathcal {C}}}}\) and identify the spurious operators as being those proportional to \((1-P) \,\vec {{\mathcal {C}}}\). The problem defined in Eq. (14) can thus be decomposed as

(15)

where the first term is fully constrained by Drell–Yan data,

$$\begin{aligned} {\mathcal {O}}_\textrm{DY} \equiv \max _{\vec {{\mathcal {C}}}} \Big \lbrace \vec {{\mathcal {C}}}^{\,\dagger } M_P \vec {{\mathcal {C}}};~\; \text {with}~\; \vec {{\mathcal {C}}}^{\,\dagger } B \vec {{\mathcal {C}}}\le 1 \Big \rbrace , \end{aligned}$$
(16)

where \(M_P \equiv P^\dagger M P\) and \(B=P^\dagger B P\) by construction. The second term in Eq. (15) must be bounded by other means since it depends on coefficients that do not contribute to \(pp\rightarrow \ell _i \ell _j\). In our case, we use a naive perturbativity constraint,

(17)

which will depend on the EFT cutoff \(\Lambda \). The norm \(||\vec {{\mathcal {C}}}||\) is defined as the maximum norm, i.e. the largest entry of \(\vec {{\mathcal {C}}}\) in absolute value. The problem is now reduced to finding \({\mathcal {O}}_\textrm{DY}\) and separately, which are well-defined numbers that are necessarily bounded for each process.

To determine \({\mathcal {O}}_{\textrm{DY}}\), we first diagonalize the LHC matrix as \(B=U^\dagger \hat{B} U\), where \(\hat{B}\) is a diagonal matrix, and U is a unitary matrix. It is convenient to define \(\vec {{\zeta }}\equiv (V{\smash {\hat{B}^{1/2}}} U)\, \vec {{\mathcal {C}}}\), so that the LHC constraint can be simply written as \(\smash {\vec {{\zeta }}^\dagger \vec {{\zeta }} \le 1}\), and the low-energy observable takes the canonical quadratic form \(\vec {{\zeta }}^\dagger \hat{N}_P \vec {{\zeta }}\) in terms of \(\vec {{\zeta }}\), where \(\hat{N}_P\) is diagonal. Here, the matrix \(\hat{N}_P\) results from diagonalizing the Hermitian matrix \(N_P\equiv {\smash {\hat{B}^{-1/2}}}U\, M_P\, U^\dagger {\smash {\hat{B}^{-1/2}}}\) via unitary rotations \(N_P=V^\dagger \hat{N}_P V\). Therefore, the optimization problem in Eq. (16) is now reduced to

$$\begin{aligned} {\mathcal {O}}_\textrm{DY} = \max _{\vec {\zeta }} \Big \lbrace \vec {\zeta }^{\,\dagger } \hat{N}_P \vec {\zeta };~\; \text {with}~\; \vec {\zeta }^{\,\dagger } \vec {\zeta }\le 1 \Big \rbrace . \end{aligned}$$
(18)

In other words, we have to evaluate the unit vector \(\vec {\zeta }\) along the quadratic form given by \(\hat{N}_P\). For simplicity, we assume that the diagonal entries of \(\hat{N}_P\) are ranked in descending order. The maximal value of the quadratic form will be reached along the smallest semi-axis of the corresponding ellipsoid, which is given by the largest eigenvalue of \(\hat{N}_P\), i.e., \({{\mathcal {O}}_{\textrm{DY}} = \textrm{max}_i\lbrace {(}\hat{N}_P{)}_{ii}\rbrace ={(}\hat{N}_P{)}_{11}}\), since the eigenvalues of \(\hat{N}_P\) are ordered. The effective coefficient that maximized \({\mathcal {O}}_{\textrm{DY}}\) is then obtained by the corresponding eigenvector,

$$\begin{aligned} {\mathcal {C}}^\textrm{max}_i = \big (U^\dagger \hat{B}^{-1/2}V^\dagger \big )_{i1}, \end{aligned}$$
(19)

which provides a useful cross-check of the EFT validity, as will be discussed in Sect. 5.

To summarize the above discussion, we have shown that a low-energy observable \({\mathcal {O}}_\textrm{low}\) can be expressed in a quadratic form in terms of the SM EFT effective coefficients that describe Drell–Yan processes, see Eq. (9). When using a single invariant-mass bin in the \(\chi ^2\)-function describing LHC data, we can express the LHC constraints as a quadratic form as well, cf. Eq. (12). Since RGE effects can introduce non-trivial mixing between operators, the quadratic form describing \({\mathcal {O}}_\textrm{low}\) can be split into a quadratic form that is constrained by Drell–Yan processes (\({\mathcal {O}}_{\textrm{DY}}\)) and another one that is not (), cf. Eq. (15). The latter term can be bounded, e.g. by perturbativity or by considering other observables, if they are numerically significant, whereas the maximal value of \({\mathcal {O}}_{\textrm{DY}}\) can be obtained by solving the corresponding eigenvalue problem, which has a unique solution. We reiterate that the analytical method described above produces slightly weaker upper-limits on the LFV meson decays than the ones obtained by considering binned data from the whole invariant-mass spectrum studied experimentally at the LHC, as we will provide in Sect. 5. Therefore, this derivation is not only a demonstration that the maximization problem is well defined, but it is also a useful cross-check for the numerical optimization considering all experimental bins.

4.2 Illustration: LFV B-meson decays

In this Section, we apply the above approach to LFV \(B_{(s)}\)-meson decays based on the \(b\rightarrow d\ell _i \ell _j\) and \(b\rightarrow s\ell _i \ell _j\) transitions [22]. Several searches for these processes have been performed at the B-factories [34,35,36] and LHCb [37, 38] in the past years. We considered these decays as an illustration of our approach since they violate both quark and lepton flavors. For this reason, contributions from spurious operators in the second term of Eq. (15) can be safely neglected since they are suppressed not only by loop factors but also CKM matrix elements. However, we note that our method can also be applied to other processes such as quarkonia decays where RGE effects are not entirely negligible, as we will briefly discuss in Sect. 5 (see also Ref. [39]).

EFT description The low-energy effective Lagrangian needed to describe the \(b\rightarrow q \ell _i^- \ell _j\) processes (with \(q=s,d\)) reads

$$\begin{aligned} {\mathcal {L}}_\textrm{eff}&\supset \dfrac{1}{v^2} \sum _{X,Y} \Big [C_{V_{XY}}^{q\,ij} \big ( {\bar{\ell }}_i \gamma _\mu P_X\ell _j\big )\big ({\bar{q}} \gamma ^\mu P_Y b\big ) \nonumber \\&+C_{S_{XY}}^{q\,ij} \big ( {\bar{\ell }}_i P_X\ell _j\big )\big ({\bar{q}} P_Y b\big )\Big ]+\mathrm {h.c.}\,, \end{aligned}$$
(20)

where \(v=(\sqrt{2} G_F)^{-1/2}\) denotes the SM vacuum expectation value and we assume that \(i<j\), as before. Notice that tensor operators are not written in this equation because they are forbidden for the \(b\rightarrow q \ell _i\ell _j\) transitions, at dimension \(d=6\), by the \(SU(2)_L\times U(1)_Y\) gauge symmetry [28]; cf. Table 1. Only the scalar operators are renormalized by QCD, which amounts to \(\smash {C_{S_{XY}}^{ij} (m_b) \simeq 1.46\times C_{S_{XY}}^{ij} (m_Z)}\) [40].

The tree-level matching of Eq. (20) to the SM EFT at the scale \(\mu =\mu _\textrm{ew}\) reads

$$\begin{aligned} C_{V_{LL}}^{q\,ij}&= \dfrac{v^2}{\Lambda ^2}\big [{\mathcal {C}}_{lq}^{(1+3)}\big ]_{ijq3}\,,&C_{S_{LL}}^{q\,ij}&= 0\,, \end{aligned}$$
(21)
$$\begin{aligned} C_{V_{LR}}^{q\,ij}&= \dfrac{v^2}{\Lambda ^2}\big [{\mathcal {C}}_{ld}\big ]_{ijq3}\,,&C_{S_{LR}}^{q\,ij}&= \dfrac{v^2}{\Lambda ^2}\big [{\mathcal {C}}_{ledq}\big ]_{ji3q}^*\,, \end{aligned}$$
(22)
$$\begin{aligned} C_{V_{RL}}^{q\,ij}&= \dfrac{v^2}{\Lambda ^2}\big [{\mathcal {C}}_{eq}\big ]_{ijq3}\,,&C_{S_{RL}}^{q\,ij}&= \dfrac{v^2}{\Lambda ^2}\big [{\mathcal {C}}_{ledq}\big ]_{ijq3}\,, \end{aligned}$$
(23)
$$\begin{aligned} C_{V_{RR}}^{q\,ij}&= \dfrac{v^2}{\Lambda ^2}\big [{\mathcal {C}}_{ed}\big ]_{ijq3}\,,&C_{S_{RR}}^{q\,ij}&= 0\,, \end{aligned}$$
(24)

where the Higgs current operators (\(\psi ^2 D H^2\)) do not appear, since we consider processes that violate both lepton and quark flavors. The SM EFT operators appearing in the right-hand side of the above equations are affected by RGE from \(\mu _\textrm{ew}\) up to \(\Lambda \approx 1~\textrm{TeV}\) [25,26,27]. QCD running will only change the magnitude of scalar coefficients, with \({\mathcal {C}}_{ledq}(\mu _\textrm{ew})\simeq 1.19\times {\mathcal {C}}_{ledq}(1~\textrm{TeV})\). Instead, the electroweak and, most importantly, the Yukawa running effects will introduce nontrivial mixing between different types of operators [41,42,43,44]. For instance, keeping only the top-quark Yukawa \(y_t\) contributions to the anomalous dimensions of \({\mathcal {C}}_{lq}^{(1)}\) [26],

$$\begin{aligned} \big [\dot{{\mathcal {C}}}_{lq}^{(1)}\big ]_{ijkl} {\mathop {=}\limits ^{\text {yuk}}} \big [Y_u^\dagger Y_u\big ]_{kl} \big [{\mathcal {C}}_{Hl}^{(1)}\big ]_{ij} -\big [Y_u^\dagger \big ]_{kv}\big [Y_u\big ]_{wl} \big [{\mathcal {C}}_{lu}\big ]_{ijvw} \nonumber \\ \qquad +\dfrac{1}{2}\big [Y_u^\dagger Y_u \big ]_{kv} \, \big [{{\mathcal {C}}}_{lq}^{(1)}\big ]_{ijvl}+\dfrac{1}{2}\big [Y_u^\dagger Y_u \big ]_{vl} \,\big [{{\mathcal {C}}}_{lq}^{(1)}\big ]_{ijkv} ,\nonumber \\ \end{aligned}$$
(25)
Fig. 2
figure 2

Example of the Yukawa-induced RGE mixing of \(\psi ^2 D H\) operators (left) and \(\psi ^4\) (middle and right) into the operator \({\mathcal {O}}_{lq}^{(1)}\), which contribute to the \(b\rightarrow d \ell _i\ell _j\) and \(b\rightarrow s \ell _i\ell _j\) transitions. The dots represent the insertion of \(d=6\) operators, cf. Eq. (25)

where \(\dot{{\mathcal {C}}} \equiv 16 \pi ^2 \frac{\textrm{d} {\mathcal {C}}}{\textrm{d} \log \mu }\), the up-type Yukawa matrix is defined by \(Y_u= \textrm{diag}(y_u,y_c,y_t)\cdot V \simeq \textrm{diag}(0,0,y_t) \cdot V\), and \({\mathcal {O}}_{Hl}^{(1)}\) and \({\mathcal {O}}_{Hl}^{(3)}\) are the Higgs-current operators defined as follows [25]

$$\begin{aligned} \big [{\mathcal {O}}_{Hl}^{(1)}\big ]_{ij}= & {} \big (H^\dagger \overleftrightarrow {D}_\mu H\big )\big ( {\bar{l}}_i \gamma ^\mu l_j\big )\,,\nonumber \\ \big [{\mathcal {O}}_{Hl}^{(3)}\big ]_{ij}= & {} \big (H^\dagger \overleftrightarrow {D}_\mu ^I H\big )\big ( {\bar{l}}_i\tau ^I \gamma ^\mu l_j\big )\,. \end{aligned}$$
(26)

By choosing in the left-hand side of Eq. (25) the Wilson coefficient that contributes to \(b\rightarrow d\ell _i\ell _j\) at tree level, we find

$$\begin{aligned} \big [\dot{{\mathcal {C}}}_{lq}^{(1)}\big ]_{ij13}{} & {} {\mathop {=}\limits ^{\text {yuk}}} y_t^2\, V_{td}^*V_{tb}\big [{\mathcal {C}}_{Hl}^{(1)}\big ]_{ij} - y_t^2\,V_{td}^*V_{tb} \big [{\mathcal {C}}_{lu}\big ]_{ij33} \nonumber \\{} & {} \quad +\dfrac{y_t^2}{2} V_{td}^*V_{tb}\big [{\mathcal {C}}_{lq}^{(1)}\big ]_{ij33} + \cdots \,, \end{aligned}$$
(27)

where the ellipsis denotes terms suppressed by small Yukawa couplings and/or CKM matrix elements. From the above equation, we see for instance that \(\smash {{\mathcal {C}}_{Hl}^{(1)}}\) and \({\mathcal {C}}_{lu}\) mixes into \(\smash {{\mathcal {C}}_{lq}^{(1)}}\) through the top-quark Yukawa running, as illustrated in the first two diagrams of Fig. 2. These are examples of operators that are not constrained by \(\smash {pp \rightarrow \ell _i \ell _j}\) tails at tree level, and are thus part of \((1-P)\vec {C}\) in the derivation of (15). Nonetheless, in the specific case that we consider, these contributions will be small due to the loop and CKM suppression which make largely subdominant.Footnote 5

In the following, we provide the expressions for LFV \(B_{(s)}\)-meson branching fractions in terms of the low-energy effective Lagrangian (20) and describe the hadronic inputs that we consider in our analysis. These expressions will be combined with the RGE effects described above, within a leading logarithmic approximation, to write the observables in terms of the SM EFT coefficients at the scale \(\Lambda \) that is relevant for the LHC analysis.

\({\textbf {B}}_{(s)}\rightarrow \ell _i \ell _j\) The branching fractions of the leptonic \(B_{(s)}\)-meson decays reads

$$\begin{aligned} {\mathcal {B}}(P \rightarrow \ell _i^- \ell _j^+)= & {} \tau _{P} \, \dfrac{f_P^2 M m_{\ell }^2}{ 128 \pi v^4} \bigg (1-\dfrac{m_{\ell }^2}{M^2}\bigg )^2 \nonumber \\{} & {} \quad \times \bigg \lbrace \bigg |C_{VA}^{q\,ij}- \dfrac{M^2\,C_{SP}^{q\,ij}}{m_{\ell }(m_b+m_q)}\bigg |^2\nonumber \\{} & {} \quad + \bigg |C_{AA}^{q\,ij}+ \dfrac{M^2\,C_{PP}^{q\,ij}}{m_{\ell }(m_b+m_q)}\bigg |^2\bigg \rbrace \,, \end{aligned}$$
(28)

where P denotes the \(B_{(s)}\) meson, with mass M and lifetime \(\tau _P\), and we have assumed \(m_\ell \equiv m_{\ell _j} \gg m_{\ell _i}\). The effective coefficients are defined in terms of Eq. (20) as follows,

$$\begin{aligned} C_{VA}^{q\,ij}&= {C_{V_{RR}}^{q\,ij}-C_{V_{RL}}^{q\,ij} + (L\leftrightarrow R)} \end{aligned}$$
(29)
$$\begin{aligned} C_{AA}^{q\,ij}&= {C_{V_{RR}}^{q\,ij}-C_{V_{RL}}^{q\,ij} - (L\leftrightarrow R)} \end{aligned}$$
(30)

which should be taken at \(\mu =m_b\), and similarly for the (pseudo)scalar operators through the trivial replacements \(V \leftrightarrow S\) and \(A \leftrightarrow P\). The \(B_{(s)}\)-meson decay constant is denoted by \(f_P\), which parameterizes the hadronic matrix element \(\langle 0 | {\bar{q}} \gamma ^\mu \gamma _5 b | P (p) \rangle = i f_P \,p^\mu \), where p denotes the P-meson four-momentum. In our analysis, we consider the latest average of lattice determinations of the decay constants made by the Flavor Lattice Averaging Group (FLAG), which gives \(f_B = 190.0(1.3)\) MeV and \(f_{B_s} = 230.3(1.3)\) MeV [45].

\({\textbf {B}}_{(s)}\rightarrow M \ell _i \ell _j\) For the \(B_{(s)}\rightarrow M \ell _i \ell _j\) decays, where M denotes a light pseudoscalar or vector meson, the branching fractions depend on form factors which parameterize the relevant hadronic matrix elements. In this letter, we use the general expressions provided in Ref. [22] and we update the numerical values with the most recent determinations of the form factors, namely:

  • For the decays into a pseudoscalar meson, there are only two relevant form factors for our analysis, namely the scalar \((f_0)\) and vector \((f_+)\), which have been computed for the relevant transitions by means of numerical simulations of QCD on the lattice at high-\(q^2\) values and extrapolated to the whole kinematical range by using a suitable parameterization [45]. For the \(B\rightarrow \pi \) transition, we consider the combined fit to lattice QCD and experimental data made by FLAG [45]. For \(B_s\rightarrow K\), we use the FLAG average of lattice QCD form factors [45], and for \(B\rightarrow K\) we use the recent combination of lattice QCD results [46, 47] made in Ref. [48] (see also Ref. [49]).

  • For the decays into a vector meson such as \(M=K^*,\phi \), there is less information available from lattice QCD for the relevant form factors, namely \(A_0\), \(A_1\), \(A_2\) and V. For these decays, we use the Light-Cone Sum-Rules results from Ref. [50] (see also Ref. [51, 52]).

Table 2 Upper limits on the \(B_{(s)}\)-meson branching fractions based on the transitions \(b\rightarrow d\tau \mu \) and \(b\rightarrow s\tau \mu \) (top tier), and \(b\rightarrow d\tau e\) and \(b\rightarrow s\tau e\) (bottom tier), obtained by using our approach with current LHC data (140 \(\textrm{fb}^{-1}\)) at 95\(\%\) CL, as well as our projections for HL-LHC (3 \(\textrm{ab}^{-1}\)). The last column corresponds to the direct experimental limit obtained experimentally by BaBar [34], Belle [35, 36, 53] and LHCb [37, 38]. The highlighted cells correspond to the most stringent limits with current (bold) or future (italic) data

5 Numerical results

In this section, we apply the method outlined in Sect. 4 to the LFV decays of \(B_{(s)}\)-mesons detailed above. We will focus only on the processes leading to \(\tau e\) or \(\tau \mu \) pairs, since LHC bounds cannot be competitive with the very stringent low-energy limits on the decays with \(\mu e\) in the final state, as already shown in Ref. [19]. We will present our final results by solving the optimization problem numerically, considering the whole invariant-mass spectrum provided in Ref. [29]. These results will be later compared to the ones from the analytical approach introduced in Sect. 4 when a single invariant-mass bin is considered at high-energies.

In Table 2, we present our upper limits for the decays based on the transitions \(b\rightarrow d \mu \tau \) and \(b\rightarrow s \mu \tau \) (top tier), and \(b\rightarrow d e \tau \) and \(b\rightarrow s e\tau \) (bottom tier) obtained from a reinterpretation [3, 4] of the latest CMS search with 140 \(\textrm{fb}^{-1}\) [29]. These limits are compared in the same table to the direct experimental limits from BaBar [34], Belle [35, 36, 53] and LHCb [37, 38]. For simplicity, we sum over the decay modes with opposite-charge leptons, e.g., \({\mathcal {B}}(B\rightarrow \mu ^\pm \tau ^\mp )\equiv {\mathcal {B}}(B\rightarrow \mu ^+ \tau ^-)+{\mathcal {B}}(B\rightarrow \mu ^- \tau ^+)\) and similarly for the other channels.Footnote 6 We also present in the same table the projections for the High-Luminosity LHC (HL-LHC) phase with \(3~\textrm{ab}^{-1}\), obtained by assuming that uncertainties are statistically dominated.

From the comparison in Table 2, we conclude that flavor experiments perform better than our indirect LHC bounds for most transitions, as one would expect. However, there are exceptions such as the decays \(B\rightarrow \pi e\tau \) and \(B\rightarrow \pi \mu \tau \), for which both constraints are comparable with current data, with projections for the HL-LHC phase are more constraining than the existing limits from the B-factories. Another interesting example comes from decay modes that have not yet been searched for at low energies, such as the semileptonic channels \(B_s\rightarrow K_S\ell _i \ell _j\), \(B^0\rightarrow \rho \ell _i\ell _j\) and \(B_s\rightarrow \phi \ell _i\ell _j\). In this case, our results are the only bounds that are currently available and they may constitute a target for future experimental searches at low energies, as they indicate the range of the branching ratios that are notyet constrained by high-\(p_T\) data.

From Table 2, we also find that our constraints on semileptonic decays such as \(B\rightarrow \pi \tau \ell \) and \(B\rightarrow \rho \tau \ell \), with \(\ell =e,\mu \), are one order of magnitude stronger than the ones on the leptonic channel \(B_d\rightarrow \tau \ell \). To understand this difference, one should note that purely leptonic decays are very sensitive to (pseudo)scalar operators since they can induce contributions that are larger by a factor of \(m_{B_s}^2/m_\tau ^2\) than the vector ones, cf. Eq. (28). This is not the case for semileptonic decays, nor for the high-\(p_T\) processes, which receive comparable contributions from the different Lorentz structures. This is illustrated in Fig. 3 for the decays based on the transition \(b\rightarrow d \mu \tau \), by considering the scalar effective coefficient \({\mathcal {C}}_{ledq}\) and the vector \(\smash {{\mathcal {C}}_{lq}^{(1)}}\) with fixed flavor indices.

Fig. 3
figure 3

The contour lines for \({\mathcal {B}}(B^0\rightarrow \mu ^\pm \tau ^\mp )\) (left), \({\mathcal {B}}(B^+\rightarrow \pi ^+ \mu ^\pm \tau ^\mp )\) (center) and \({\mathcal {B}}(B^+\rightarrow \rho ^+ \mu ^\pm \tau ^\mp )\) (right) are depicted in the plane \(\smash {\big [{\mathcal {C}}_{ledq}^{(1)}\big ]_{3231}}/\Lambda ^2\) vs. \(\smash {\big [{\mathcal {C}}_{lq}^{(1)}\big ]_{2313}}/\Lambda ^2\) by the blue lines, with the other effective coefficients set to zero. The 95\(\%\) CL constraints derived from current LHC data (140 fb\(^{-1}\)) and their projection to HL-LHC (3 ab\(^{-1}\)) are depicted by the dark- and light-orange regions

Lastly, we comment on the differences between the numerical maximization of \({\mathcal {O}}_\textrm{low}\) taking into account the whole dilepton invariant-mass spectrum for the LHC constraints [3, 4], as reported in Table 2, with the simplified problem obtained in Eq. (14) by considering a single high invariant-mass bin. To this purpose, we computed the matrix B defined in Eq. (12) for the intervals \(m_{e\tau }^2 \in (1550,3000)\) GeV and \(m_{\mu \tau }^2\in (1400,3000)\) GeV for the \(pp\rightarrow e\tau \) and \(pp\rightarrow \mu \tau \) channels, respectively, and we performed the analytical derivation from Sect. 4. With this approach, we are able to derive bounds that are slightly weaker than the full limits colected in Table 2, with deviations of at most \(\approx 40\%\). Therefore, this comparison corroborates the validity of the analytical derivation of Sect. 4, which is a valuable cross-check of the numerical optimization.

On the EFT validity Before discussing the application of the method to other processes, we briefly comment on the validity of the EFT description of LHC data [21]. For the EFT to be valid, the EFT cutoff (\(\Lambda \)) must be much larger than the energy scale (E) of the relevant LHC events. Since we are constraining combinations of the type \({\mathcal {C}}/\Lambda ^2\), it is clear that for a search of given sensitivity, our limits will only apply for \({\mathcal {C}}\) above a certain value. The maximal EFT cutoff \(\Lambda _\textrm{max}\) for which our results are applicable can be estimated by requiring that the \(2\rightarrow 2\) amplitude does not exceed the \(4 \pi \) perturbativity bound [2]. Assuming that the effective operators are generated at tree level, \(\Lambda _\textrm{max}\) is given by

$$\begin{aligned} \Lambda _\textrm{max} = \dfrac{\sqrt{4\pi } }{\sqrt{{\mathcal {C}}_\textrm{max}/\Lambda ^2}}, \end{aligned}$$
(31)

which must satisfy \(\Lambda < \Lambda _\textrm{max}\), where \({\mathcal {C}}_\textrm{max}\) denotes the largest effective coefficient that maximizes the observable in Eq. (13) (see also Eq. (19)). For the constraints derived in Table 2, we find that the maximal cutoff \(\Lambda _\textrm{max}\) is sufficiently above E for all processes. For instance, we obtain that

$$\begin{aligned} \Lambda _\textrm{max}&\approx 16~\textrm{TeV}\,,{} & {} (B\rightarrow \pi \mu \tau )\, \end{aligned}$$
(32)
$$\begin{aligned} \Lambda _\textrm{max}&\approx 13~\textrm{TeV}\,,{} & {} (B\rightarrow \pi e \tau )\, \end{aligned}$$
(33)
$$\begin{aligned} \Lambda _\textrm{max}&\approx 12~\textrm{TeV}\,,{} & {} (B\rightarrow K \mu \tau )\,, \end{aligned}$$
(34)
$$\begin{aligned} \Lambda _\textrm{max}&\approx 8~\textrm{TeV}\,,{} & {} (B\rightarrow K e \tau )\,, \end{aligned}$$
(35)

with similar values for the other processes depending on the same low-energy transitions. The largest effective coefficient for the leptonic \(P\rightarrow \ell \tau \) decays and for the semileptonic \(P\rightarrow P^\prime \ell \tau \) is the scalar \(\smash {{\mathcal {C}}_{ledq}}\), with appropriate flavor indices, whereas the semileptonic \(P\rightarrow V \ell \tau \) processes are maximized by vector operators, in agreement with the findings in Fig. 3.

Finally, in order to determine which energy bins are the most relevant for our bounds, we plot in the right panel of Fig. 4 our upper limits on \({\mathcal {B}}(B\rightarrow \pi \mu \tau )\) as a function of a fixed energy scale \(\Lambda _\textrm{cut}\) above which events with higher di-lepton invariant masses are neglected. The last bins are indeed the most important ones, as expected from the energy enhancement of the cross section. However, we see that our limits remain similar even if a smaller value of \(\Lambda _\textrm{cut}\approx 1\) TeV is taken. In the left panel of the same plot, we compare the current and projected constraints on the representative effective coefficient \({[{\mathcal {C}}_{ledq}]_{3231}}\) with the requirement that \(\Lambda _\textrm{cut}<\Lambda _\textrm{max}\) [cf. Eq. (31)], which is satisfied for all values of \(\Lambda _\textrm{cut}\) in our analysis.

Fig. 4
figure 4

Left panel: upper limits on \({[{\mathcal {C}}_{ledq}]_{3231}}/\Lambda ^2\) as a function of the energy scale \(\Lambda _\textrm{cut}\) above which the events are neglected, by using current LHC data (blue line) and our projection from HL-LHC (orange line). The shaded gray region corresponds to \(\Lambda _\textrm{cut}> \Lambda _\textrm{max}\), cf. Eq. (31). Right panel: Upper limits on \({\mathcal {B}}(B^+ \rightarrow \pi ^+ \mu ^\pm \tau ^\mp )\) as a function of the energy scale \(\Lambda _\textrm{cut}\) by considering all SM EFT operators. The current experimental limit from BaBar is depicted by the gray dashed line [34]

Implications to other processes The same method described above can be used to constrain other types of LFV meson decays. For instance, D-meson decays based on the \(c\rightarrow u\ell _i\ell _j\) transition can be induced by an EFT Lagrangian similar to Eq. (20), but with different flavor indices, and with the inclusion of tensor operators that are allowed by \(SU(2)_L\times U(1)_Y\) gauge invariance in this case. The discussion of the RGE-induced operators that are not constrained by Drell–Yan processes is very similar to the case of B-meson decays, with contributions that are entirely negligible due to the CKM and loop suppressions. We find that our indirect limit for the \(D^0 \rightarrow \mu e\) decay is not useful, as it is about two orders of magnitude weaker than the direct experimental limit, namely \({\mathcal {B}}(D^0\rightarrow e^\pm \mu ^\mp )^\textrm{exp}<1.6\times 10^{-8}\) [56]. However, we note that there are no experimental limits available for the analogous decay \(D^0\rightarrow e \tau \), which has a very narrow phase space. In this case, we obtain the following indirect limit with our approach,

$$\begin{aligned} {\mathcal {B}}(D^0\rightarrow e^\pm \tau ^\mp ) \le 2\times 10^{-7}\,,\quad (95\%~\textrm{CL})\,, \end{aligned}$$
(36)

where we used \(f_D=212.0(0.7)\) MeV for the D-meson decay constant [45]. For this process, we find that the maximal cutoff defined in Eq. (19) is given by \(\Lambda _\textrm{max} \approx 18~\textrm{TeV}\). Note, in particular, that the corresponding decays for \(\tau \mu \) transition do not exist, since \(m_{\tau }-m_\mu<m_{D^0}<m_{\tau }+m_\mu \) implies that both \(D^0\rightarrow \tau \mu \) and \(\tau \rightarrow D^0\mu \) are kinematically forbidden.

In principle, the same approach could also be used to constrain LFV decays of quarkonia [57,58,59,60]. However, important differences arise in this case since quarkonia are states without open flavor. By using our LHC bounds to \(95\%\) CL and by focusing at first on the semileptonic operators from Table 1 that contribute at tree level to these decays, we obtain the very stringent limits from LHC data,

$$\begin{aligned} {\mathcal {B}}(\Upsilon \rightarrow \mu ^\pm \tau ^\mp )_\textrm{tree}\le & {} 3 \times 10^{-9} \,,\nonumber \\ {\mathcal {B}}(\Upsilon \rightarrow e^\pm \tau ^\mp )_\textrm{tree}\le & {} 8 \times 10^{-9} \,,\nonumber \\ {\mathcal {B}}(\Upsilon \rightarrow e^\pm \mu ^\mp )_\textrm{tree}\le & {} 2 \times 10^{-9} \,, \end{aligned}$$
(37)

where we use the decay constant \(f_\Upsilon =649(31)~\textrm{MeV}\) computed in Ref. [61]. These limits correspond to \({\mathcal {O}}_\textrm{DY}\) in Eq. (14), which are particularly stringent due to the large \(\Upsilon \) lifetime stemming from the fact that, differently from B-mesons, quarkonia can decay through electromagnetic and strong interactions [19]. These values are orders of magnitude more constraining than the searches performed at the B-factories, which set upper limits in the \(10^{-7}\)\(10^{-6}\) range [62]. However, these results are misleading as they only refer to the tree-level contributions from semileptonic operators in Eq. (15) (i.e., \({\mathcal {O}}_\textrm{DY}\)). We stress that there are additional contributions that are not constrained by Drell–Yan processes (i.e., included in ). The first contribution of this kind arises from the Higgs-current operators \(\smash {{\mathcal {O}}_{Hl}^{(1)}}\) and \(\smash {{\mathcal {O}}_{Hl}^{(3)}}\) which induce quarkonia decays at tree level, but which are not efficiently constrained by high-\(p_T\) Drell–Yan data since they are not energy enhanced [3, 4]. The second effect is induced by the RGE mixing of operators that are not constrained by Drell–Yan data into the ones needed at low energies, which is not CKM suppressed for this transition [39]. For instance, keeping only the \({\mathcal {C}}_{lu}\) effective coefficient, with couplings to the top-quark at \(\mu \approx 1~\textrm{TeV}\), and neglecting the other contributions, we obtain

$$\begin{aligned} {\mathcal {B}}(\Upsilon \rightarrow \mu ^\pm \tau ^\mp )_\textrm{loop} \simeq 1.5 \times 10^{-9} \,\dfrac{\big | \big [{\mathcal {C}}_{lu}\big ]_{2333}/(4\pi )\big |^2}{(\Lambda /~1~\textrm{TeV})^4}+\dots \nonumber \\ \end{aligned}$$
(38)

where we have considered both gauge and Yukawa running effects, within a leading logarithmic approximation, and \(\Lambda \) is fixed to \(1~\textrm{TeV}\) in the logarithm. By setting this effective coefficient to the naive perturbativity limit \({\mathcal {C}}\approx 4\pi \), we find that the loop contribution becomes comparable to the tree-level ones in Eq. (37). Therefore, one should consider additional constraints for the terms in order to provide a meaningful bound on the full observable.

In order to constrain the tree-level contributions to \({\mathcal {B}}(\Upsilon \rightarrow \mu \tau )\) induced by \(\smash {{\mathcal {C}}_{Hl}^{(1,3)}}\), we can consider Z-boson LFV decays, which should satisfy the LHC limit \({\mathcal {B}}(Z\rightarrow \mu ^\pm \tau ^\mp )<6.5 \times 10^{-6}\) at 95\(\%\) CL [63,64,65]. To estimate the impact of this constraint, we first consider \(\smash {{\mathcal {C}}_{Hl}^{(1,3)}}\) and set the other coefficients to be zero, relating the two decay modes at tree level,

$$\begin{aligned} \Gamma (\Upsilon \rightarrow \mu ^\pm \tau ^\mp )_\textrm{tree} {\mathop {\simeq }\limits ^{{\mathcal {C}}_{Hl}^{(1,3)}\ne 0}} g_{Vb}^2\dfrac{f_\Upsilon ^2 m_\Upsilon ^3 }{v^2 m_Z^3 }\, \Gamma (Z\rightarrow \mu ^\pm \tau ^\mp ),\nonumber \\ \end{aligned}$$
(39)

where \(g_{Vb}=-1/2+2/3\, \sin ^2 \theta _W\), \(\theta _W\) denotes the Weinberg angle and, for simplicity, we have neglected the terms suppressed by \(m_\tau ^2/m_\Upsilon ^2\) and \(m_\Upsilon ^2/m_Z^2\). The above expression then gives

$$\begin{aligned} \hspace{-0.8em}{\mathcal {B}}(\Upsilon \rightarrow \mu ^\pm \tau ^\mp )_\textrm{tree} {\mathop {\simeq }\limits ^{{\mathcal {C}}_{Hl}^{(1,3)}\ne 0}} {2.8\times 10^{-10}}\, \dfrac{{\mathcal {B}}(Z\rightarrow \mu ^\pm \tau ^\mp ) }{(6.5 \times 10^{-6})},\nonumber \\ \end{aligned}$$
(40)

which confirms that these contributions to \(\Upsilon \) decays are negligible when Z-pole constraints are taken into account. The Z-pole observables can also be used to constrain the semileptonic operators with the top quark, since they contribute to \(Z\rightarrow \ell _i\ell _j\) at one loop [39, 41, 66,67,68]. For instance, by only keeping the \([{\mathcal {C}}_{lu}]_{2333}\) coefficient, we find that the Z-pole observables allows us to derive the upper bound,

$$\begin{aligned} \dfrac{|[{\mathcal {C}}_{lu}]_{2333}|}{\Lambda ^2} \lesssim {1.5}~\textrm{TeV}^{-2}, \end{aligned}$$
(41)

where we have set \(\Lambda =1~\textrm{TeV}\) in the logarithm. This bound indeed is more constraining than the perturbativity one in Eq. (38).Footnote 7 This rough exercise indicates that it is possible to constraint the terms in Eq. (15) for \(\Upsilon \) decays by combining Drell–Yan data with the Z-pole observables. However, it is clear that the precise assessment of the combined constraints on these decay modes would require a dedicated analysis beyond tree-level, which accounts for all relevant operators and the potentially large one-loop RGE effects.

In conclusion, Drell–Yan processes are very efficient to constrain the semileptonic operators contributing at tree-level to \(\Upsilon \rightarrow \ell _i\ell _j\), but there are additional contributions to these decays at tree- and loop-level which are not negligible and that therefore need to be precisely assessed. Similar conclusions apply to LFV decays of the \(\phi \) and \(J/\psi \) mesons. The way out consists of a combined fit of the LHC constraints with the experimental limits on the processes \(Z\rightarrow \ell _i \ell _j\) and \(\ell _i \rightarrow \ell _j \ell _k \ell _k\) with \(i>j\ge k\), which would allow us to apply the same procedure described in Sect. 4 to these processes, as suggested by the above exercise with selected operators. Such an extended analysis lies beyond the scope of the present letter.

6 Summary and outlook

In this letter, we revisited the constraints on LFV meson decays by using LHC data on the Drell–Yan processes \(pp\rightarrow \ell _i\ell _j\) (with \(i\ne j\)) at high-\(p_T\). By relying on the EFT approach, we have shown that it is possible to derive upper bounds on such processes in a model-independent way by marginalizing over the complete set of SM EFT operators compatible with such LHC bounds. In particular, this optimization problem can be reduced to an eigenvalue problem that can be solved analytically, provided a single bin at large di-lepton invariant-mass is considered for the LHC data.

As an illustration of this approach, we have derived indirect limits on \(B_{(s)}\)-meson LFV decays based on the transitions \(b\rightarrow d \tau \ell \) and \(b\rightarrow s \tau \ell \), with \(\ell =e,\mu \). We find that our upper limits on the decays \(B\rightarrow \pi e \tau \) and \(B\rightarrow \pi \mu \tau \) are already competitive with the current limits from the B-factories and that the expected sensitivity at the HL-LHC will supersede them. Furthermore, we have derived constraints of the order of \(10^{-4}\) on other LFV decays that have not been searched yet experimentally, such as \(B\rightarrow \rho \tau \ell \), \(B_s\rightarrow K \tau \ell \) and \(B_s\rightarrow \phi \tau \ell \), among others. Lastly, using the same approach, we show that the branching fractions of the charmed-meson decay \(D^0\rightarrow e \tau \) must be smaller than around \(10^{-7}\), for which there is not a direct experimental search yet.

The main caveat of our analysis is the validity of the EFT description of LHC data. We have estimated the maximal EFT cutoff \(\Lambda _\textrm{max}\) that can be probed in a perturbative scenario by analyzing the effective coefficients that maximize the low-energy observable. We find that our bounds can probe values of \(\Lambda \) up to \(\Lambda _\textrm{max} \approx 10~\textrm{TeV}\) for B-decays and \(\Lambda _\textrm{max} \approx 20~\textrm{TeV}\) for D-meson decays. Our limits are therefore applicable to EFT scenarios with an EFT cutoff \(\Lambda \) that satisfies \(E\ll \Lambda < \Lambda _\textrm{max}\), where E denotes the energy of LHC events and \(\Lambda _\textrm{max}\) is defined in Eq. (31). For scenarios with light mediators, our constraints have to be reassessed by accounting for the propagation of the new degrees of freedom at the LHC, see e.g. Refs. [3, 4].

Finally, we stress that our method can also be applied to other LFV processes such as the decays of \(\phi \), \(J/\psi \) and \(\Upsilon \). Since LHC constraints on the operators that contributed at tree level to these processes are very stringent, one should carefully assess the RGE-induced contributions of the operators that are not constrained by Drell–Yan processes at tree level. We have argued that it is possible to extend our analysis to also constrain these contributions, e.g. by combining the Drell–Yan constraints at high-\(p_T\) with the Z-pole observables. We believe that our study offers a very convincing illustration of the complementarity of high-\(p_T\) and low-energy searches for LFV through the SM EFT, and we hope that our results will invite experimental collaborations to further improve the sensitivity on both Drell–Yan processes and LFV meson decays.