Hint of Lepton Flavour Non-Universality in $B$ Meson Decays

The LHCb collaboration has recently presented their result on R_K = BR(B+ ->K+ mu+ mu-)/ BR(B+ ->K+ e+ e-) for the dilepton invariant mass bin m_{ll}^2 = 1-6 GeV^2 (l = mu, e). The measurement shows an intriguing 2.6 sigma deviation from the Standard Model (SM) prediction. In view of this, we study model independent New Physics (NP) explanations of R_K consistent with other measurements involving b ->s l l transition, relaxing the assumption of lepton universality. We perform a Bayesian statistical fit to the NP Wilson Coefficients and compare the Bayes Factors of the different hypotheses in order to quantify their goodness-of-fit. We show that the data slightly favours NP in the muon sector over NP in the electron sector.


Introduction
While we all hope to see direct evidence of new particles in the upcoming 14 TeV run of the LHC, indirect searches for New Physics (NP) through precision measurements are also extremely important, in particular because of their high sensitivity to Ultra Violet (UV) physics. In fact, the 8 TeV run of the LHC has already seen interesting indirect hints of NP in some of the B meson decay modes. The quoted 3.7 σ deviation observed by the LHCb collaboration last summer [1,2] in one of the angular observables (P 5 ) in the decay B → K * µ + µ − for one of the dilepton invariant mass bins (4.30 < q 2 = m 2 µ µ < 8.68 GeV 2 ) inspired many theorists to come up with NP explanations [3][4][5][6][7][8]. Interestingly enough, very recently the LHCb collaboration has observed a 2.6σ deviation in another b → s + − mode; in the quantity called R K which is the ratio of the two branching fractions B (B + → K + µ + µ − ) and B (B + → K + e + e − ) in the dileptonic invariant mass bin m 2 = 1 − 6 GeV 2 ( = µ, e) [9]. Note that the branching ratios B (B + → K + µ + µ − ) and B (B + → K + e + e − ) are individually predicted with very large hadronic uncertainties (∼ 30%) in the SM [10]. However, their ratio is a theoretically very clean observable and predicted to be R SM K = 1 if lepton masses are ignored [11]. Inclusion of the lepton mass effects changes the prediction only by a tiny amount making it R SM K = 1.0003 ± 0.0001 [10]. This is contrary to the (P 5 ) anomaly where considerable debate exists surrounding the issue of theoretical uncertainty due to (unknown) power corrections to the factorization framework [12][13][14]. Hence, there is a possibility that the observed deviation might be (partly) resolved once these corrections are better understood. On the other hand, the observable R K is a ratio of two branching fractions which are, by definition, equal to each other if the identity of the final state lepton is not considered. This makes R K well protected from the hadronic uncertainties and the protection is maximal in the SM because the short distance physics governing the b → s + − transition is independent of the final state lepton identity. In fact, this feature remains true even in NP models if the model respects lepton universality. Therefore, the R K measurement is perhaps pointing towards new short distance physics which is not lepton universal. This motivated us to study possible NP explanations of the R K measurement (and their consistency with other observables involving the b → s + − transition) in a model independent way, relaxing the assumption of lepton universality. To this end, we perform a statistical fit to the NP Wilson Coefficients (WC) employing Bayesian inference. In order to quantify and compare the goodness-of-fit of the different hypotheses we also compute their relative Bayes Factors (BFs).
The paper is organized as follows. In the next section we set up our notation and discuss all the experimental data used in our analysis. We present our results in section 3 and conclude with some final remarks in section 4.

Effective Field Theory approach
We base our analysis on the following |∆B| = |∆S| = 1 effective Hamiltonian, where O i are the SU(3) C × U(1) Q invariant dimension-six operators responsible for the flavour changing b → s + − transition. The superscript denotes the lepton identity in the final state ( = e, µ). In our notation the short-distance contribution to the WC's is divided into the SM and the NP ones in the following wayĈ i = C SM i + C i . In our analysis we consider the subset of operators which are directly responsible for the b → s + − decay, namely and the set of chirality flipped operators O i obtained by interchanging the chiral projectors (P L ↔ P R ) in the quark current of O i . The full list of dimension six operators has more entries and includes scalar, pseudo-scalar and tensor structures. However, it was shown that scalar, pseudo-scalar and tensor operators cannot give sufficiently large deviations from the SM in the observable R K once constraints from the other b → s + − processes are taken into account [15,16]. Therefore, we will neglect these operators in our analysis. Furthermore, we do not consider NP contribution to the photonic dipole operator O 7 as it is lepton flavour blind by construction. In what follows we assume that all the WCs are evaluated at the low scale µ = m b with the corresponding SM contributions given in table 8.
Apart from R K we also consider data for other processes which proceed via b → s + − transition e.g., the branching ratios of the fully leptonic decays B s → µ + µ − and B s → e + e − , the inclusive decays B → X s µ + µ − and B → X s e + e − as well as the branching ratio for the semileptonic decay B + → K + µ + µ − . The SM predictions for these branching fractions and their current experimental values are summarized in table 1. [1,6] 1.75 +0.60 −0.29 × 10 −7 [17] (1.21 ± 0.09 ± 0.07) × 10 −7 [1,6] (1.59 ± 0.11) × 10 −6 [19] 0 [1,6] (1. 6] 1.0003 ± 0.0001 [10] 0.745 +0.090 −0.074 ± 0.036 [9]  Note that we have not used the branching fraction for the decay B + → K + e + e − from [9] because of its correlation with the R K measurement (however, in appendix A we will show our results including B(B + → K + e + e − ) in our fit and assuming it to be not correlated with R K ). Instead, we have used the value of B(B + → K + µ + µ − ) from a LHCb measurement in [18]. Moreover, we have used the data only in the low-q 2 bin 1 − 6 GeV 2 (for R K this is the only data available) as the theoretical predictions are more reliable in this phase space region [24,25]. In addition to that, for the process B + → K + µ + µ − there is another issue which is still to be sorted out; a resonance structure in the dilepton invariant mass distribution was observed around m 2 µ µ = 17.3 GeV 2 by the LHCb collaboration last year [26]. This makes the theoretical prediction more vulnerable to hadronic uncertainties in the high-q 2 region. In view of these, we take a conservative attitude and do not consider the high-q 2 data at all in our analysis. For the B → K * µ + µ − mode we use the results from [3], see section 3.3 for more details.

Results
In this section we present our results for the various NP scenarios considered in this paper. As mentioned earlier, we follow a Bayesian statistical approach to quantify our results.
The details of our procedure is explained in the Appendix C.

Single Wilson Coefficient
To start with, we consider only one real NP WC at a time. We will show our results for the WCs both in the standard basis (vector and axial-vector operators for the lepton current) as well as in the chiral basis for the lepton currents. The inclusion of the second set is motivated by the possibility of having the NP WCs generated in an SU(2) L invariant way. This possibility was also highlighted in [16].
Our results are summarized in table 2. The 68% confidence level (C.L.) regions of the WC's are shown in the second column, the best fit values are shown in the third column while the last column shows the BF for each hypothesis taking the same for the C µ 9 -only hypothesis as the reference value. More precisely, the BF for the hypothesis with NP in the WC C i is defined as, where L is the likelihood function and P 0 is our choice of prior for the WC C i , which we assume to be a flat distribution in the range [-10, 10].

Hypothesis
Fit Best fit BF C µ   Table 2. The 68% C.L. ranges for the WC's when only one WC is considered at a time. The data on B → K * µ + µ − have not been used at this stage. In Figure 1 we show the posterior probabilities for the C µ 9 only and C e 9 only hypotheses as examples.
We now discuss some general features of our results shown in table 2. Clearly, the hypothesis with non-zero C µ 9 is the best one as it offers the largest BF. The C µ 10 NP scenario also does quite well. As C µ 9 does not contribute to the decay B s → µ + µ − (whose experimental central value is now slightly lower than the SM prediction), its BF is reduced to some extent. The extremely low BF for the C µ 10 case is due to a tension between R K and and B (B s → µ + µ − ). It can be seen from the expressions of these two observables (see Appendices B.1 and B.4 ) that the experimental value for R K prefers C µ 10 > 0 while the measured branching ratio for B s → µ + µ − prefers the opposite. The hypotheses C µ( ) 9 = C µ( ) 10 (which correspond to the operator directions (sγ α P L b)(¯ γ α P R ) and (sγ α P R b)(¯ γ α P R )) are also strongly disfavoured because they generate R K 1 which is in tension with experiment. The other chiral operator (sγ α P L b)(¯ γ α P L ) (our hypothesis C µ 9 = −C µ 10 ) turns out to be the best among the four chiral operators. This case was also considered in Ref. [16] in the context of their model independent analysis as well as a specific leptoquark model. In their analysis they quote C µ 9 = −C µ 10 ≈ −0.5 as a benchmark point which is in fact consistent with our 68% CL range in table 2.
As far as NP in the electron sector is concerned, all the hypotheses give similar BFs (at most a factor 4 between the best and the worst). This can be understood by the fact that the measurements with electrons in final states have a larger experimental error than those with muons in the final state. Note also that there are a few cases with multiple solutions (because the rates are quadratically dependent on the WC's), some of them with very large NP contributions (see for example C e 10 ). Including other observables (for example those in the decay B → K * + − ) will certainly modify this picture.
The scenario C e 10 = −C e 9 ≈ 0.5 was also considered in the Ref. [16]. Although, their estimate C e 10 ≈ 0.5 is compatible with the 68% CL region from our fit, its BF is the worst among the hypotheses with NP in the electron sector.
Finally, we show our results for the two lepton universal cases C µ 9 =C e 9 and C µ 10 =C e 10 (see the last but one row in table 2). It can be seen that the BF for both these two cases are rather low compared to most of the non-universal NP scenarios in particular, the C µ 9 -only and C µ 10 -only hypotheses. Before we conclude this section we would like to make a few comments about the effect of including the B(B + → K + e + e − ) in our analysis. It affects mainly the scenarios with NP in the electron sector only. Since, in all these cases no NP in the muon sector is considered the B(B + → K + µ + µ − ) does not change from its SM prediction. This means that the B(B + → K + e + e − ) has to be increased from its SM prediction in order to be consistent with the measurement of R K . In fact, the 68% C.L. ranges for the WCs in these scenarios indeed predict the B(B + → K + e + e − ) larger than the SM prediction. This means that including the experimental value for B(B + → K + e + e − ) (whose central value is close to the SM central value) from [9] will make the case of NP in the electron sector worse to some extent. As an example, the BF for the C e 9 only case reduces to 0.05 : 1 as compared to 0.13 : 1 in table 2. We refer our readers to appendix A where we repeat all our analyses including the B(B + → K + e + e − ) and assuming it to be uncorrelated with R K .

Combination of two Wilson Coefficients at a time
In this section we allow the possibility of having NP in two WCs simultaneously and study the consequences. Here we do not consider all the possible combinations of the WCs, rather, we take the few best cases from table 2 and consider their combinations. Our results are summarized in table 3. The 68% range of a parameter is obtained after marginalizing over the other parameter.
The results of our analysis show that, in general, there is no particular gain in considering NP effects in two WCs at the same time, this is more marked in the case of the chiral operators.
Among the hypothesis of NP in two WCs, the bigger BF is obtained for the pair {C e 9 , C µ 9 }. The 2-dimensional posterior distribution for this scenario is shown in the left panel of figure 2. Similarly, the posterior probability distribution for the NP scenario with both C µ 9 and C µ 9 is shown is the right panel of figure 2. It can be seen that the data is consistent with no NP in C µ 9 . The posterior probability for the hypothesis with both C µ 9 and C e 9 turned on is shown in figure 3. In the left panel we show both the allowed regions, while in the right panel we show a zoomed in version of the region close to the SM point C µ 9 = C e 9 = 0. Figure 3 clearly shows that the data prefers NP in the muon sector over NP in the electron sector. Moreover, NP with lepton flavour universality (shown by the the dashed purple line) is disfavoured by more than 95% C.L.

Including the data on
The latest LHCb measurement of the decay distribution in B → K * µ + µ − has seen interesting deviations in several observables from their SM predictions [1,2]. These deviations were most pronounced in two of the so-called optimized observables (where the hadronic uncertainties are expected to cancel to a good extent) P 5 and P 2 in the low-q 2 region [27]. Soon after the LHCb result was published, it was shown in [3] that a good fit to all the Hypothesis Fit Best fit BF C µ    Figure 2. Two dimensional posterior probability distributions for the two hypotheses: 1. C µ 9 and C µ 10 (left panel), 2. C µ 9 and C µ 9 (right panel). The red and green contours are 68% and 95% C.L. regions respectively. (NB. With respect to our version 1, the axis labels on contour plots have been exchanged for frame labels. But in all cases, the horizontal axis corresponds to C µ 9 .) data can be obtained by having a negative contribution C µ 9 ≈ −1 to only one WCĈ µ 9 . A similar conclusion was also reached by two other groups later [4,28]. The Ref [4] found a need for a NP contribution (similar to C µ 9 in magnitude but with opposite sign) also to the chirally flipped operatorĈ µ 9 . As discussed in section 3.2 this is now in tension with  Table 4. Results when the data on B → K * µ + µ − are also included as discussed in section 3.3. The same conventions as in table 2 and 3 are used.
the measurement of R K (assuming the presence of NP only in the muon sector).
Note that the issue of hadronic uncertainties, in particular the role of long-distance cc loops still remains unclear, see [14] and the references therein for a recent discussion. Thus, the jury is still out on whether NP has already been seen in these measurements. Despite this uncomfortable situation, several NP interpretations of the LHCb data have been proposed [5][6][7] and it would be interesting to see whether the R K measurement can be reconciled with the B → K * µ + µ − data. With this motivation, in this section we combine the result of [3] with our analysis.
We follow an approximate procedure (see Appendix C for more details) which allows us to use their result where NP only inĈ µ 9 andĈ µ 7 was considered, see their eq. 4. Note that, although the analysis in [3] was performed assuming lepton universality their results can, to a very good approximation, be taken as valid only for the muonic WC's. This allows us to safely use their result with NP inĈ µ 9 and any operator in the electron sector. As we do not consider NP in the dipole operator in this paper,Ĉ µ 7 is set to its SM value (which is consistent with the 68% C.L. region of [3]). The result of our fit is reported in table 4. In the first row we show the result when only C µ 9 is turned on while in the following rows we allow NP in the electron sector also in addition to C µ 9 . We notice that the range for C µ 9 preferred by the analysis of [3] is confirmed even with the inclusion of our observables. Allowing the possibility of NP in the electron sector makes things slightly better (increases the BF by roughly a factor of 3), except the scenario of the last row of the table where the BF remains almost the same. The posterior probability distribution for the C µ 9 -C e 9 hypothesis is shown in figure 4. Similar to figure 3, in the left panel both the allowed regions are shown while in the right panel only the region close to the SM point is shown. Again, the preference for a lepton flavour non-universal NP is very clear. The data is consistent with no NP in the electron sector but demands a rather large NP in the muon sector. A comparison of the figure in the right panel with that in figure 3 will also show the constraining power of the B → K * µ + µ − data. We warn the readers that the BFs in the table 4 should not be compared with those in the previous section but should only be compared within table 4. This is because the analysis in this section uses different set of information than those in the previous section. Also, we are combining our analysis with that in [3] in a very simplified way. We report the BFs just to show that the data does not show a strong preference for any particular kind of NP in the electron sector; many of them do equally well.

Summary and conclusions
The flavour changing process b → s + − is responsible for many rare decays such as B s → + − , B → X s + − , B + → K + + − , B → K * + − . These decays, being extremely rare in the SM, are very powerful probes of new physics. Large experimental effort is underway in order to measure, as precisely as possible, more and more observables in these decays. The LHCb collaboration already made a breakthrough last year by measuring most of the observables in the full three angle distribution of the decay B → K * µ + µ − . Interesting deviations from the SM predictions were also seen in a few so-called "formfactor independent" or optimized observables. Although several NP explanations of these deviations were put forward, firm confirmation of new short distance physics is not yet possible due to the ever-present hadronic uncertainties which are not at all satisfactorily understood. The 2.6σ deviation in R K , although not yet statistically significant, is worthy of attention because the ratio R K is essentially free of hadronic uncertainties in the SM. Moreover, any lepton flavour blind new short distance physics would predict R K ≈ R SM K and hence, the confirmation of this deviation would clearly point towards lepton flavour non-universal NP. This has motivated us to study in detail the various NP explanations of the new measurement of R K keeping in mind also the other available measurements involving b → s + − transition. To this end, we have performed a Bayesian statistical analysis of the various NP scenarios. We have also quantified the goodness-of-fit of these NP hypotheses by computing and comparing their Bayesian Evidences.
To start with, we have performed the fit without including the data on B → K * µ + µ − . Our results for the case with single (two) WC(s) at a time are summarized in table 2 (3). Among the single WC (in the standard basis) hypotheses in the muon sector, only C µ 9 and C µ 10 offer a good fit. Among the chiral operators, only one SU(2) L invariant direction (C µ 9 = −C µ 10 ) is shown to perform well and all the others are strongly disfavoured. However, this hierarchy is absent in the electron sector; all the NP operators offer similar performance, although worse than the scenarios with C µ 9 or C µ 10 . We further show that the lepton flavour universal NP scenarios for example, C µ 9 = C e 9 or C µ 10 = C e 10 have rather low BFs and hence they are disfavored. However it should be noted that large NP effects in the electron sector are not excluded and in the future, with more precise measurements, the situation could change.
When two NP WCs are turned on simultaneously, the situation does not particularly improve. We have also shown a few key posterior distributions in figures 2 and 3 which show that for the C µ 9 -C µ 9 case the data is consistent with C µ 9 = 0 for the C µ 9 -C e 9 hypothesis the lepton flavour universal case C µ 9 = C e 9 is disfavoured by more than 95% C.L. We continued in section 3.3 to include also the data on B → K * µ + µ − . We employed a simplified procedure (explained in appendix C) to combine part of the results from [3] with our observables. We observed that the allowed range of C µ 9 obtained in [3] remains consistent with the R K data. Our analysis also showed that the data is consistent with no NP in the electron sector. The lepton-universal scenario remains disfavoured by 95% C.L. even after the data on B → K * µ + µ − is included.
We would like to emphasize here the need for the experimental collaborations to provide data separately for the muon and electron final states. A global analysis including the final states with tau leptons would be very useful once more data are available. In fact, the possibility of large enhancements in many of the b → sτ + τ − modes were already discussed in the context of like-sign dimuon asymmetry seen in Tevatron, see for example [29] and the references therein.
A further issue which merits a mention is the inclusion of high-q 2 data in the global analysis. For reasons discussed in the text (section 2), we have taken a conservative attitude in this paper to ignore all the high-q 2 data. Note that, in this phase space region the use of light-cone sum rules is not appropriate. However, lattice calculations of the relevant form factors are making significant progress here [30][31][32][33] and the importance of the high-q 2 data will increase in the future. A Results including the data on B(B + → K + e + e − ) In this appendix we repeat our analysis including the data on B(B + → K + e + e − ) from [9]. The results in the tables 5, 6 and 7 should be compared with the results in 2, 3 and 4 respectively in the main text.

Hypothesis
Fit Best Fit BF C µ

B Details of the analysis
The branching ratio for the decay B + → K + + − in the low-q 2 region can be written as [10,[34][35][36], In order to obtain these numbers we have used the form factor parametrization given in [36]. The values of the form factor parameters f + BK , f T BK , b + 1 and b T 1 are given in table 8. As far as the uncertainties in these parameters are concerned, for b + 1 and b T 1 we have taken asymmetric gaussian priors in the fit. For the other two parameters f + BK and f T BK , we have fixed them to their respective central values and the associated uncertainties are taken into account by rescaling the experimental error σ B in B (B + → K + µ + µ − ) [1,6]  . We made this (conservative) approximation in order to reduce the computational time in the evaluation of the posterior distributions for the WC's and avoid some numerical instabilities. We have followed the same prescription also for the branching ratio of B + → K + e + e − . The theoretical uncertainties coming from f + BK and f T BK in the two observables B(B + → K + µ + µ − ) and B(B + → K + e + e − ) have been assumed to be independent for simplicity.

B.2 R K
The expression for R K can be derived using the result given in the previous subsection.
Here we give a simplified formula which can be useful for analytic understanding of our results. Observing the fact that C SM 7 C SM 9,10 ≈ 0.07 and also that the NP inĈ 7 is severely constrained by the measured branching ratio of B → X s γ, the expression of R K can be approximated by, This simplified expression clearly shows that the theoretical error in R K is very small even in the presence of NP. Note that in our numerical fit we have used the full expression of R K with all the form factor dependent terms.
The branching ratio for the inclusive decay B → X s + − in the low-q 2 region can be written as [37], The branching ratio for the leptonic decays B s → µ + µ − and B s → e + e − can be written as, Here we have used the SM value of B(B s → µ + µ − ) as our input parameter instead of the B s meson decay constant f Bs and the CKM matrix elements. We took theoretical error on this into account by taking a gaussian prior for B(B s → µ + µ − ) SM and marginalizing over it, using values from [21]:

C Statistical procedure
In this appendix we briefly describe the statistical procedure than has been followed in this work. Our aim is to construct the posterior probability distributions (p.d.f.) for a set of WC's that we denote collectively by C, for example C = {C µ 9 , C e 9 , ....}. The theoretical prediction for a given observable will be denoted by O and it will depend on C as well as a set of input parameters x for example, CKM matrix elements, form factors, masses of the particles etc. The inputs are either experimentally measured quantities or theoretically calculated parameters with some uncertainties. In order to take into account these uncertainties we have to attach a probability distribution f x i (x i ) to each input x i .
The measured observables will be assumed to be distributed according to Normal distribution with a mean value O and standard deviation σ O . The generalization to distributions other than Normal distribution is straightforward.
If one has a set of N observables O i (i = 1, ..N ), following the Bayesian approach 1 , the p.d.f. of the WC's can be written as, Let us now split the observables into two sets, O + and O * where the second (first) set refers to the observables (without) involving the decay B → K * µ + µ − . In a similar way, we split the set of inputs into three sets x = {x + , x * , x c } where x + (x * ) are input parameters entering the set of observables O + (O * ) exclusively and x c are the inputs which are common to both the sets of observables. The expression for the p.d.f. now reduces to Now we would like to find the conditions such that the two sets of observables "factorize".
In that case, we will be able to use the results of [3] instead of redoing the full analysis. The factorization is possible if the common set of inputs x c has small uncertainty compared to the other sources of uncertainties. If that is true then one can write to a very good approximation f x c k (x c k ) = δ(x c k − x c k ) and the p.d.f. reduces to where ρ 0 (C) is the result of a Bayesian analysis considering only the global analysis of the decay B → K * µ + µ − . Using the result of [3] in their section 3.2, we assume that with C µ 7 = −0.018 , σ C µ 7 = 0.018 , C µ 9 = −1.6 , σ C µ 9 = 0.3 .

(C.5)
These values are inferred from eq. (4) of [3]. We finally remark that although ref. [3] considered the observable B (B → X s + − ) under the assumption of lepton universality, we interpret that as B (B → X s µ + µ − ) and remove the experimental data on B (B → X s µ + µ − ) (our table 1) from our fit for consistency. We have also not used the data on B(B s → µ + µ − ) since that was also already included in the fit of Ref. [3].