Lepton-flavour non-universality of $\bar{B}\to D^*\ell \bar\nu$ angular distributions in and beyond the Standard Model

We analyze in detail the angular distributions in $\bar{B}\to D^*\ell \bar\nu$ decays, with a focus on lepton-flavour non-universality. We investigate the minimal number of angular observables that fully describes current and upcoming datasets, and explore their sensitivity to physics beyond the Standard Model (BSM) in the most general weak effective theory. We apply our findings to the current datasets, extract the non-redundant set of angular observables from the data, and compare to precise SM predictions that include lepton-flavour universality violating mass effects. Our analysis shows that the current presentation of the experimental data is not ideal and prohibits the extraction of the full set of relevant BSM parameters, since the number of independent angular observables that can be inferred from data is limited to only four. We uncover a $\sim4\sigma$ tension between data and predictions that is hidden in the redundant presentation of the Belle 2018 data on $\bar{B}\to D^*\ell \bar\nu$ decays. This tension specifically involves observables that probe $e-\mu$ lepton-flavour universality. However, we find inconsistencies in these data, which renders results based on it suspicious. Nevertheless, we discuss which generic BSM scenarios could explain the tension, in the case that the inconsistencies do not affect the data materially. Our findings highlight that $e-\mu$ non-universality in the SM, introduced by the finite muon mass, is already significant in a subset of angular observables with respect to the experimental precision.

Improved form-factor determinations: There has been significant progress in the theoretical determination of hadronicB → D ( * ) form factors, both from lattice QCD computations [6][7][8][9] and from light-cone sum rules [10]. These determinations allow for precise predictions of the complete set of form factors inB → D * ν in the whole phase space [11,12]. These predictions are using the heavy-quark expansion and account for contributions up to and including O 1/m 2 c . They are a prerequisite for a general BSM analysis of these modes. Impending progress in experimental and theoretical precision: Both the experimental and the theoretical precision are expected to improve significantly: the ongoing Belle II and LHCb upgrade experiments are bound to deliverB → D ( * ) ν results based on multiples of the current datasets [13][14][15], and updated lattice QCD results for severalB → D * form factors beyond zero recoil are upcoming [16][17][18], see also the discussions in Refs. [19,20]. This renders the discussion of presently negligible effects important for the full phenomenological exploitation of the upcoming experimental and theoretical results.
The discussions resulting from the first two items significantly improve our understanding of these modes, and their sensitivity to the adopted form-factor parametrization. Recent phenomenological analyses have also shown that the V cb puzzle is significantly reduced, albeit not yet fully resolved [11,12,[21][22][23][24][25][26][27][28]. We pose the following questions that affect existing and future angular analyses ofB → D * ν data: 1. What is the amount of LFU violation in the SM induced by the muon mass? Is the muon mass still negligible given the achieved experimental and theoretical precision?
2. What amount of information can be extracted from the available single-differential distributions in comparison to a fully-differential angular analysis ofB → D * ν? Is it possible to increase the sensitivity to BSM physics with available data by modifying the analysis strategy?
3. What are the limits on BSM physics from existingB → D * ν data? Which effective operators could resolve a potential tension with the SM and what would be their implications on so far unmeasured observables?
In order to answer these questions, we proceed as follows: We begin by describing the general properties of thē B → D * ν angular distribution and the BSM physics reach of the angular observables arising from this distribution in Section II. In Section III we prepare a full angular analysis on the basis of the Belle data published in Ref. [3]. In doing so, we identify two obstacles to the full use of these data. In Section IV we carry out a fit of the full angular distribution to the Belle data, and discuss the compatibility with SM predictions. In light of an observed tension, we further discuss possible BSM interpretations of our results. We conclude in Section V.

II. FULL ANGULAR DISTRIBUTION AND ITS BSM REACH
The four-fold differential distribution ofB → D * (→ Dπ) ν decays constitutes a powerful tool for assessing SM as well as BSM physics. It is given as Assuming a purely P-wave Dπ final state, this distribution is fully described by twelve angular observables J ( ) i and their respective angular coefficient functions f i . The dependence of the functions f i on the three angles cos θ , cos θ D and χ, given in Eq. (A1) in Appendix A, is lepton-flavour universal and completely determined by conservation of angular momentum.
The angular observables J ( ) i depend on the momentum transfer q 2 , or equivalently the hadronic recoil w. Their calculation involves the lepton-flavour-universal hadronic form factors, as well as the short-distance coefficients of the low-energy effective theory. The latter encode short-distance SM effects (which are again lepton-flavour universal) as well as potential BSM effects (which are in general non-universal). These dependencies are listed in Table I. Additional sources of lepton-flavour non-universality are known kinematic phase-space effects ∼ m / q 2 , which are most pronounced for = τ . Under the assumption that the short-distance behaviour corresponds to the SM expectation, the angular observables J The complete dependence of the angular distribution on BSM contributions in terms of the BSM couplings has been given for the first time in Ref. [29], see also Ref. [30], with previous partial results throughout the literature [31][32][33][34][35][36][37]. We use the conventions/notation provided in Appendix A. The sensitivity to various BSM couplings and lepton-mass effects have been studied in detail [38] based on helicity amplitudes.
Here we would like to address properties that are not mentioned previously, or that are particularly important for our work. An important observation in charged-current semileptonic decays is that to extremely good approximation no CP-conserving scattering phases appear in the J . Here the notation . . . denotes integration over the full range of the dileptoninvariant mass as defined in Eq. (A2).
The experimental determination of the fully differential rate is rather involved. Many analyses therefore present only results for the partially or fully integrated rate, typically CP-averaged. Doing so simplifies the experimental analysis, but the sensitivity to some of the angular observables is lost, which can render the determination of some parameters of interest impossible. The two recent Belle analyses for instance [2,3] provide binned CP-averaged measurements of the four single-differential distributions where Γ ( ) denotes the CP-averaged decay rate. In particular, in Ref. [3] the authors separate the data by the light lepton flavours = e and = µ. The three CP-averaged single-angular distributions depend on only five out of the twelve angular observables defined in Eq. (1). Out of these five observables, the CP-averaged S ( ) 9 vanishes independently of the BSM scenario, as discussed above Eq. (2), and is thus not relevant for our analysis. This leaves the D * -longitudinal polarization fraction F differ by lepton-mass suppressed terms, only. In a generic BSM scenario, the two observables can further differ due to contributions from pseudoscalar and tensor operators, see Table I. For more details see Appendix A.
The presentation of the data in terms of single-differential distributions implies that all angular observables are integrated over the full q 2 range. By binning in q 2 , the data will provide more information about the BSM couplings through the q 2 shape of the angular observables. In particular, the binned angular observables yield access to more and independent bilinear combinations of the BSM couplings than the q 2 -integrated ones do. Hence, binning the angular observables will constitute a powerful tool to discriminate between BSM scenarios, as discussed in more detail below.
The CP asymmetries of the single-differential rates Eqs. (3)-(5) vanish independently of the BSM scenario. This can be used to validate the experimental analyses. The CP asymmetry of the χ-dependent rate in Eq. (6) is fully described by the angular observable A ( ) 9 . A measurement of this CP asymmetry could be accomplished with existing datasets and would provide important information about potential CP-violating BSM effects.
1 Such CP-conserving phases are strongly suppressed inB → D * ν and can arise, e.g. at the level of dimension eight in the low-energy EFT or due to radiative QED corrections.
The dependence of angular observables on combinations of Wilson coefficients. An entry of denotes the presence of this combination. An entry of m n denotes the presence of this term, but with kinematic lepton-mass suppression ∝ (m / q 2 ) n (n = 1, 2). The "num(·)" indicates that only the dependence of the numerator of this observable is given. The V a i have been introduced in Ref. [29].

A. Parametrization of BSM Physics
BSM physics inB → D * ν decays has been investigated, usually based on the assumption of three light lefthanded neutrino flavours below the electroweak scale. The corresponding most general low-energy effective theory at dimension six [39] can be written as [40] Here the operators are constructed out of SM fermion fields and read They account for lepton-flavour violation (LFV) by = .
The observables inB → D * ν depend only on four combinations of Wilson coefficients: together with C T , whereas the combination C S = C S R + C S L enters only inB → D ν. Since the neutrino flavour is not detectable, it must be summed over in every observable. We determine the minimal number of parameters and their ranges necessary to parametrize these BSM coefficients for different cases. Starting from the lepton-flavour conserving case, Eq. (7) contains five complex parameters C i ≡ C i per charged-lepton species . In the context of BSM analyses ofB → D * ν, the fact that matrix elements of the scalarcb currents vanish implies that one can maximally determine four linear combinations out of the five Wilson coefficients. These four complex coefficients can be parametrized by seven real parameters, since an overall phase is unobservable, i.e. all observables are invariant under a joint phase rotation C i → exp(iφ )C i . For instance, one of the complex coefficients can be chosen real and positive, which leaves four real and three imaginary parts or four absolute values and three relative phases as free parameters. The Lagrangian Eq. (7) is conveniently normalized to G F V cb to ensure that in the SM C V L = 1 at tree-level. In general, these factors cannot be separated from the BSM Wilson coefficients since only their products enter observables. Hence, they do not count as additional parameters. The set of seven real parameters is therefore the maximal information we can hope to extract fromB → D * ν decays for a given without LFV.
All CP-averaged observables depend on these seven parameters through the combinations These combinations, however, are invariant under the discrete symmetry transformation Im can therefore still be determined from CP-averaged observables, albeit only up to an overall sign. One is free to choose one of these signs freely in the fit, since the second solution can always be obtained by inverting the signs of the imaginary parts.
In the limit of a massless lepton, the two classes of Wilson coefficients C A,V and C P,T decouple in the observables, since their interference is m suppressed, as shown in Table I. As we will see below, this applies only to electrons, since in precision analyses the muon mass cannot be neglected anymore. This implies a separate symmetry for each class, C V,A → exp(iφ )C V,A and C P,T → exp(iϕ )C P,T . Therefore another phase cannot be determined from anȳ B → D * −ν observable in this limit. In fact, it can be eliminated altogether from the parametrization, leaving maximally six parameters to be determined fromB → D * ν for massless charged leptons. In this case also the discrete symmetry for the imaginary parts holds separately for each class, allowing to choose another sign freely. Hence, the most general parametrization of CP-averagedB → D * eν data within the weak effective theory and when neglecting LFV requires only six parameters, four of which can be chosen positive. Taking into account lepton-mass effects requires a seventh parameter, and only two of these parameters can be chosen positive.
Note that in the counting above we have assumed the couplings for the different lepton flavours to be completely independent, allowing in particular for independent phase rotations. Such an assumption does not hold in all BSM scenarios; in particular it does not hold in the Standard Model Effective Field Theory (SMEFT) at mass dimension six. In the matching of Eq. (7) to the SMEFT, the coefficient C V R is lepton-flavour universal, a property inherited from the SM gauge group [41,42]. This universality couples the different sectors and consequently the phase rotations cannot be performed independently anymore. This gives rise to an additional measurable phase in this scenario, and therefore necessitates a new corresponding parameter. For instance, for the common and convenient choice of a real and positive C V L , the coefficients C V R cannot be trivially identified with each other. Instead they fulfill and similarly for = τ . The relative phase between the two Wilson coefficients C e V L and C µ V L appears explicitly, while it can be absorbed everywhere else. This implies that although two real parameters are removed (one of the complex C V R coefficients), one is added (the relative phase), and hence the overall number of parameters is reduced only by one.
Generalizing the above observations in the presence of lepton-flavour-violating interactions, = , is straightforward insofar as the contributions with different neutrino flavours do not interfere. Hence all expressions in Eqs. (10)-(11) remain valid with the generalizations The symmetry considerations hold for each neutrino flavour separately. Naively the number of parameters simply triples compared to the lepton-flavour-conserving case above. The situation is nevertheless significantly different from the lepton-flavour conserving case, for which the number of parameters is smaller than the number of combinations resolved   TABLE II. Amount of BSM physics information that can be extracted in different scenarios, see also text. Here S and A denote the measurement of the CP average and the CP asymmetry of the respective differential rate. The first and second number corresponds to the number of parameters that can be extracted without and with mass suppression, respectively.
of Wilson coefficients appearing in the description of the decay. This implies (non-linear) relations between these combinations in the lepton-flavour conserving case, for instance, With the generalizations in Eq. (13), the number of BSM parameters is larger than the number of combinations of Wilson coefficients. Hence, the latter determine the maximal number of parameters (parameter combinations) that can be extracted. This implies that relations such as Eq. (14) do not hold anymore in the presence of lepton-flavour violation and can be used instead to test for LFV in charged-current decays without the need to identify the neutrino flavour experimentally.
In the presence of light right-handed neutrinos, similar considerations as for the LFV case apply, since also here more BSM parameters are introduced and the corresponding contributions do not interfere. The generalization to light right-handed neutrinos is therefore analogous to Eq. (13) and similar comments apply for the determination of the corresponding parameters.
We now turn to the determination of the discussed parameters from the differential distributions. Each fully q 2 -integrated angular observable provides one linear combination of the combinations of Wilson coefficients only, as indicated in Table I. The measurement of their q 2 dependence allows further to separate different BSM contributions to the same angular observable, if their q 2 dependence [38] is different. For instance, the q 2 -differential rate allows to determine all four absolute values of the BSM parameters. The question is what amount of experimental information is necessary to determine the maximal amount of parameters in a given scenario. Table II shows the situation in a few scenarios for different sets of experimental measurements.
A few general comments are in order: • It is necessary to consider the CP-conjugated modes separately if the sign ambiguity for the imaginary parts is to be resolved. Since the lepton charge tags the B meson flavour, this is not difficult to achieve experimentally.
• The interference between the two classes of BSM coefficients C A,V and C P,T is always lepton-mass suppressed, see Table I. Hence its determination requires high statistical power, as expected from the upcoming datasets at Belle 2 and the LHC experiments.
• While for = µ there is some sensitivity to additional combinations of Wilson coefficients, these combinations are still strongly suppressed. The corresponding parameters will therefore be determined comparatively poorly. Generally the best chance to determine them is to consider rather low values of q 2 , given the suppression by powers of m / q 2 , both for the angular observables and the q 2 -differential rate. Probing different bins in q 2 can also improve the sensitivity to other BSM coefficients. Tensor interactions for instance can be probed particularly well at low q 2 in d Γ T /dq 2 ∼ 3S 1s − S 2s , since the SM contributions vanish for q 2 → 0, while the tensor contributions remain finite [40], see also Ref. [43].
Considering some of the scenarios in more detail, we make the following observations: • It is impossible to determine the full set of physical BSM parameters for m → 0 (e.g. = e) from the CPaveraged single-differential rates alone, even disregarding ambiguities in the signs of imaginary parts. The reason is that in this case only A ( ) FB is sensitive to the relative phases between the coefficients. Since there are two observable relative phases (one between C A and C V , one between C P and C T ), they cannot both be determined from this single observable.
• Assuming the flavour-conserving case, the extraction of all seven parameters is possible for finite m from the CP-averaged single-differential rates, modulo discrete ambiguities. However, one relative phase can only be obtained from lepton-mass-suppressed contributions, even though in more sophisticated measurements it would be accessible without lepton-mass suppression.
• Beyond the lepton-flavour-conserving case, it becomes clearer how much more information is contained in a fully q 2 -differential measurement. Strictly speaking, such a measurement is not necessary when assuming leptonflavour conservation. However, also in this case there are additional crosschecks possible and additional formfactor information can be extracted together with the BSM parameters.
These observations apply fully to the recent Belle measurements [3].
Considering the determination of the full BSM information in the lepton-flavour-conserving case as an important intermediate goal, there are several ways this could be achieved with existing data, extending the experimental analyses only slightly: 6c entering this observable. Given that the cos θ -differential distribution (4) has been measured in 10 bins in Refs. [2,3], but contains only two angular observables, this seems feasible by reducing the number of cos θ bins and providing the observables in two or three q 2 bins instead. This would give access to all BSM parameters, leaving only two signs of imaginary parts undetermined.
2. Measuring dΓ/dχ separately for the two lepton charges. This would give access to A ( ) 9 , and thereby to Im(C A C * V ). This in turn would determine also Re(C A C * V ) up to a sign, and thereby allow to access Re(C P C * T ) from A ( ) FB up to a two-fold ambiguity. Each of the solutions would still have a two-fold sign ambiguity for the corresponding imaginary part. Together with the first option, this measurement would resolve the sign ambiguity in Im(C A C * V ), leaving only the one in Im(C P C * T ) (should this parameter combination be found to be different from zero).

III. AVAILABLE EXPERIMENTAL DATA
SemileptonicB → D ( * ) ν decays have been of key interest for many years, see Ref. [44] for a list of analyses over the last ∼ 25 years. However, until recently, almost all experimental analyses have been tied to a specific form-factor parametrization, specifically the so-called CLN parametrization [45]. This parametrization involves assumptions that are not adequate anymore for precision analyses. Applying instead the underlying formalism of a heavy-quark expansion more consistently [25] and extending it to include 1/m 2 c contributions [11,12], allows for a consistent description of the available experimental data and form factor results. However, since experimental analyses presented in most cases only parametrization-specific results, a model-independent reanalysis under different theory assumptions of the underlying experimental data is impossible. 2 Unfortunately, this problem persists in the most recent BaBar analysis [46], which includes a second form factor parametrization, but still does not allow for an independent analysis of the data. Furthermore, in many cases electron and muon data have been averaged without presenting separate results, rendering them of limited use for the analysis of LFU. A notable exception among these past studies is the 2010 untagged Belle analysis [47], which presented lepton-specific differential rates separately for longitudinal and transverse D * polarizations, but lacked the necessary correlations.
More recently, the Belle analysis ofB → D ν [1] presented for the first time lepton-specific differential rates including their full correlations, which made possible precision studies with arbitrary form-factor parametrizations for the first time, initiating an intense ongoing discussion regarding the best way to analyze these and similar data. Similar comments apply to the preliminaryB → D * ν data with hadronic tag in Ref. [2], which were however again lepton-flavour averaged and are presently reanalyzed, and the 2018 untagged analysis [3], superseding the results of Ref. [47], which we discuss in detail in the following.

A. Belle's 2018 untagged analysis
The dataset for the angular distribution provided by Belle [3] is the first analysis that separates the electron mode from the muon mode in both the bin contents and the statistical covariance matrix, and also the systematic covariance matrix can be reconstructed for both lepton species separately. 3 Unfortunately the correlations between the electron and muon modes are not given explicitly. Yet Belle has used these data for a high-precision LFU test that compares the branching fractions to electrons and muons integrated over the entire phase space. They found the ratio to be in agreement with lepton flavour universality, R e/µ = 1.01 ± 0.01(stat.) ± 0.03(sys.). In our study we aim to extend the study of LFU to the angular observables using the same Belle data. For this purpose we need to construct a combined correlation matrix for the full dataset, including correlations between electrons and muons.
Before going into these details, however, we comment on an issue present in the statistical correlation matrix. Belle provides the number of (background-subtracted) events before unfolding in bins of the four aforementioned single-differential distributions. These are the distribution in i.e. the same signal candidates have been histogrammed in four different ways in the four single-differential distributions. These relations imply that for both electrons and muons only 37 of the measured bins are independent, since the content of 3 bins can be calculated as the total yield minus the yields of the other 9 bins of the corresponding distributions. This in turn implies that the corresponding statistical correlation matrices have to be singular; each of the 40 × 40 matrices should exhibit three vanishing eigenvalues. This is, however, not the case: the determinant of both matrices is rather large and all eigenvalues of both statistical correlation matrices are O(1). It remains unclear why the statistical correlation matrices do not reflect the linear dependence of the 3 bins, which should by construction be a result of the description of 10 bins per single-differential distribution used by the Belle collaboration. Note that the issue of the linearly dependent bins affects the determination of V cb from these data: 4 if the sum over each set of 10 bins is identical, no information is added to the determination of the total rate by having the four binnings. However, if the correlations are such that these sums become effectively independent, the total rate is more precisely determined by considering all four binnings than by considering only a single one, leading to an underestimation of the uncertainty of the total rate (and hence V cb ). The effect is not large with the given data, but it is non-vanishing: the determination of the total rate is a couple of per mil better than from each individual distribution. It is important to note that this small numerical impact is not an indication that a correct extraction of the statistical correlation matrices will lead to small corrections in the analysis. Since there is an unknown problem in the extraction of the statistical correlation matrices, there is no way of knowing what the effect of its resolution will be. Given this numerical smallness within our analysis, however, we will work below with 40 × 40 matrices. In LF-specific fits with a 37 × 37 matrix the result varies very slightly, depending on the choice of the three discarded bins, and any specific choice would be arbitrary. We have checked that our numerical results below remain essentially unaffected. The issue with the statistical correlation matrices must be kept in mind when interpreting any results obtained from the data from Ref. [3].
In the remainder of this section, we describe the construction of a combined electron-muon 80×80 covariance matrix based on Ref. [3], with only one mild additional assumption. According to Ref. [3], the only source of systematic uncertainties that is different for = e and = µ is the procedure of lepton identification (Lepton ID). Given the statistical independence of electron and muon samples, this implies the following form for the total covariance matrix: .
The lepton-ID systematic uncertainties are provided individually for both lepton flavours, but also for the "LFcombined" which enter the systematic correlation matrix given explicitly in the article. We therefore have Together with the information that the Lepton-ID systematic uncertainties are 100% positively correlated throughout all bins [48], Cov sys,lep-ID-comb where σ i are systematic uncertainties of the ith bin taken from tables XI-XIV [3], the "LF combination" can thus be undone for the systematic correlations. We compute the LF-specific systematic covariances (Cov sys, ) from the "LF-combined" ones (Cov sys,LF-comb ) of [3] consequently as LF-specific analyses can be performed with these LF-specific 40 × 40 statistical and systematic correlation matrices at hand. The only assumption we make for the construction of the full 80 × 80 covariance matrix is that the lepton-ID uncertainties for electrons and muons are uncorrelated: This is plausible (as confirmed by Belle collaboration members [49]), given they concern different detector parts, but not fully guaranteed. We consider this assumption to be at a comparable level to the assertion in Ref. [3] that the lepton ID constitutes the only non-universal contribution to the systematic uncertainty. Note that this is an approximation that might not hold well enough to analyze LFU. In that case the systematic uncertainty given in [3] for the LFU ratio R e/µ would be underestimated, as would be our e − µ covariance. However, we perform below an extremely conservative check that our observation of a tension with the SM does not depend on this assumption.

IV. FITS TOB → D * (e, µ)ν DATA AND DISCUSSION
We analyze the data from the Belle analysis [3] in detail, based on the general analysis in Section II and the covariance matrix derived in Section III.

A. Angular analysis and comparison with the SM
In the first step our fit is completely model-independent: we use the observation made in Section II that the three single-differential CP-averaged angular distributions can be fully described by only four angular observables retaining all information. Further, we parametrize the 10 bins of the w-distribution again in full generality as the total decay rate and nine independent bins of the normalized w-differential rate: Here w max = 1.5 to comply with the choice in [3], which excludes a tiny part of the low-q 2 phase space. From this parametrization we calculate the bin contents N obs i, by integrating over the relevant angle intervals where necessary, and folding these predictions with the corresponding response matrices and efficiencies provided by the Belle collaboration for each lepton flavour separately, as described in [3]. We thus arrive at a description of the 40 bins per lepton flavour given in [3] in terms of only 10 + 4 = 14 observables in Eqs. (22)- (23). We emphasize that our fit parameters appear up to the common normalization factor linearly, assuring a unique minimum and no distortion of their distributions from a multivariate gaussian shape.
The conversion of number of events to decay rate involves the following numerical input: N BB = (772 ± 11) · 10 6 , B(D * + → D 0 π + ) = (67.7 ± 0.5) %, with N BB from [50], f 00 and from the B 0 lifetime from [44] (see also the discussion on f 00 in [51]), and the latest values of the branching fractions from [52]. Note that the value for B(D 0 → K − π + ) was updated w.r.t the value used in Ref. [3,53], which slightly impacts the determination of V cb . The corresponding uncertainties cancel in all ratios and hence affect only the total decay rate, for which they are included in the systematic uncertainties provided by the Belle collaboration [3]. We further introduce the averages and differences of LF-specific observables for later convenience in the study of LFU violation where X ( ) stands for any of the considered observables. We perform two types of fits with our approach to test the stability of the results: 1. a simple χ 2 fit, 2. a fit using pseudo-Monte Carlo techniques, following the procedure described in Ref. [53], both using the full 80 × 80 covariance matrix. In addition, we have applied a correction to the systematic correlations for d'Agostini bias [54], following the procedure described in Ref. [40]. We find the results of the two fits to be virtually identical. In Ref. [53] the authors observe that in their joint fit of V cb and form-factor parameters the two procedures produce markedly different results. They conclude that this difference is due to the large correlations present in the experimental data and that the usage of the pseudo-Monte Carlo technique is mandatory for phenomenological analyses. Our findings are in stark contrast to this conclusion and indicate instead that large correlations alone are not the cause for this difference. Our interpretation is that the observed difference is related to the form-factor parameters entering non-linearly in the fit of Ref. [53], while our angular observables and x ( ) i parameters enter bilinearly. It is worth emphasizing in this context that • our fit results are extremely well described by Gaussian distributions; and that • the correlations between our fit parameters are much smaller than the ones present in the 80 × 80 matrix describing the bin contents.
As a consequence, we do not distinguish between the results from the two fit procedures in the following. The fit results for our parameters as defined in Eqs. (22)-(23) are listed in Table III and shown in Figure 1. At the best-fit point we find χ 2 = 48.9 for 80 − 2 × 14 = 52 degrees of freedom (dof), indicating a good fit. 5 This suggests that the assumption of a pure P -wave Dπ final state is well justified.
In both Table III and Figure 1 we juxtapose the fit results with their corresponding SM predictions. The latter depend on theB → D * form factors. Here, we use the form-factor determinations from Refs. [11,12]. All SM predictions are obtained using the EOS software [55]. The EOS code for the computation ofB → D * ν observables has been independently checked. We also predict the ratio R e/µ in the SM and obtain: 0.1161 ± 0.0020 0.1164 ± 0.0020 0.1184 ± 0.0027 0.1208 ± 0.0029   [11], together with their values obtained from our fit to the Belle data [3]. For the prediction of the total rate, we leave the value of |V cb | unspecified.
which does not include possible structure-dependent QED corrections.
We emphasize that the predictions [11,12] of theB → D * form factors are conservative in that the corresponding uncertainties include higher-order contributions in the heavy-quark expansion. Furthermore they rely only on theory input from various sources, i.e. no experimental input has been used for their determination. Note that |V cb | cancels in the predictions for the normalized bins x ( ) i as well as in the angular observables; only the total decay rate is proportional to |V cb | 2 . Moreover, theoretical uncertainties of the normalization of the leading hadronic B → D * form factor cancel in the normalized observables. However, we do not include structure-dependent electromagnetic corrections to the angular distribution. Given the expected precision of the experimental data and the impact of muon-mass effects as discussed in this work, we expect that including these effects will become mandatory soon.
Before comparing to our numerical SM predictions, we test the qualitative expectation of approximate leptonflavour-universality, i.e. ∆X ≡ 0, which does not require a specific form-factor parametrization. We find that most  Table III. quantities are well compatible with lepton-flavour universality, with the exception of A ( ) FB , which shows a deviation from exact universality at the 3.9σ level, to be discussed below. This strong violation is not readily observable in the 80 bins provided by the Belle collaboration, but becomes obvious in the results of the fit of the non-redundant set of angular observables to the underlying angular distributions, see Figure 1. The violation is further hidden by the fact that the lepton-flavour averaged data are compatible with the SM expectation.
In the comparison of our SM predictions with the fit results we find: 1. As expected, the precision for most normalized quantities is better than that for the total rate, typically at the level of a few percent. This is true for both the SM predictions and the fit results.
2. Overall we find very good agreement of the fit results with our SM predictions, as can be seen in Figure 1, especially when considering the individual lepton species. There are a few smaller differences of roughly 1σ, only A (µ) FB shows a tension above the 2σ level. 3. The differences of the lepton-flavour-specific observables, ∆X, are predicted with very small absolute uncertainties due to the muon-mass suppression. Their predictions have similar relative uncertainties as the ones for the angular observables themselves. Their absolute values are also very small, with ∆X/ΣX = O( ) in most cases. This can be readily understood, since these observables receive only corrections of O(m 2 µ ) in the SM. The only sizable central values are those of ∆A FB and ∆ F L , which are slightly enhanced by numerical factors. Most importantly, we find that the latter shifts are still small, but already comparable to the corresponding experimental uncertainties, see Table III. This implies that the muon mass cannot be neglected anymore in precision analyses. 4. The pattern of the shifts in ∆x i is surprising at first sight, since |∆x i |/Σx i is almost constant over the whole range of w (or q 2 ), while we argued that the effect scales like (m µ / q 2 ) 2 . This can be understood from the FB )/2 (bottom right). Contours correspond to 68%, 95% 99.7%, and 99.99% probability, respectively. The ragged outermost contours are artefacts due to lack of samples so far in the periphery of the best-fit point. The SM predictions based on the form factors obtained in Refs. [11,12] are shown as blue crosses. The SM uncertainties are found to be much smaller than 10 −2 and hence negligible, with the exception of the last panel. The uncertainty in the ∆AFB-ΣAFB plane is shown as a (highly degenerate) ellipse at the 68% probability level.
normalization to the total rate. The shifts in ∆(∆Γ i )/Σ(∆Γ i ) scale as expected, from significantly less than 1 at w ∼ 1 (high q 2 ) to −5 in the bin with maximal w (lowest q 2 ). The shift in the total rate is about −3 , so normalizing yields shifts in ∆x i /Σx i to the range [−3 , 3 ].
5. For LFU observables we still find mostly excellent agreement between experiment and our SM predictions.
However, the aforementioned difference between the measurements of A FB becomes more significant, given the smaller absolute uncertainty in ∆A FB and the fact that the relatively large SM prediction carries the opposite sign from the one determined in the fit. This quantity differs therefore by approximately 4σ from its SM prediction. In Figure 2 we show the pair-wise 2-dimensional best-fit regions of ∆A FB with ∆F L , ∆ F L , ∆S 3 , and ΣA FB . The discrepancy with the predictions reaches the 4 σ level, compatible with similar levels seen for the 1-dimensional discrepancy for ∆A FB in Table III. These observations mildly depend on the covariance matrix used in the fit. As stated above, we consider our construction of the 80 × 80 covariance matrix reliable to the extent that the data in Ref. [3] are correct. To make absolutely sure that our assumption regarding the e − µ correlations is not the reason for the observed discrepancy, we adopt the following alternative procedure: We determine the A (e) FB and A (µ) FB with separate statistical and systematic uncertainties in two separate fits to the lepton-specific data, using the corresponding 40 × 40 covariance matrices for which we do not have to rely on our assumption. We then minimize the discrepancy with respect to our (strongly correlated) SM predictions by assuming a diagonal 2 × 2 statistical correlation matrix for A (e) FB and A (µ) FB , but allowing for an arbitrary correlation ρ ∈ [−1, 1] between the systematic uncertainties.
We find that the minimal tension with respect to the SM for the combined A FB occurs for maximal anticorrelation (ρ = −1), which is not a realistic value. The correlation determined in the fit to the 80 × 80 covariance matrix is actually very small. Adopting nevertheless this most conservative choice of ρ = −1 still leads to a tension of 3.6σ. We emphasize again that this result is not changed by employing the pseudo-Monte Carlo approach with Cholesky decomposition for the fit as done in [53], nor by the d'Agostini effect (the plots shown in Figure 1 include the corresponding shifts). Therefore, even adopting this maximally conservative procedure, our results amount to evidence for µ-e-non-universality beyond the SM in charged-current b → c ν transitions. However, our finding hinges on the approximate validity of the data and specifically the correlation matrices given in Ref. [3].
We also perform a full SM fit to the 2 × 14 observables in Table III, including their correlations given in ancillary files attached to the arXiv preprint of this article. Starting from a fit of form-factor parameters from theory input, only [11,12], the inclusion of the experimental information on these 28 observables increases the minimal χ 2 by 68.5, while only |V cb | is introduced as an additional parameter in the fit. This does indicate a bad fit, with a p value of 2 × 10 −5 , or a tension at the 4.3σ level. The discrepancy remains driven by a ∼ 4σ tension in A FB . The experimental and theoretical correlations with other observables play a minor role, see also Figure 2. We note in passing that S-P wave interference cannot affect the numerator of A FB , and can only decrease the magnitude of A FB by a coherent contribution to the denominator [56].
We refrain from providing the value of |V cb | from either lepton mode, which would be compatible with the values obtained from the lepton-flavour average in Refs. [3,53] and continue to exhibit a substantial tension with respect to the inclusive determination |V cb | B→Xc = (42.00 ± 0.64) · 10 −3 [57]. Given the incompatibility of the data with the SM prediction, we consider it misleading to use it to extract |V cb |.
To summarize, we find in our fits a discrepancy between data and the SM of ∼ 4σ. This result is stable with respect to the treatment of the d'Agostini bias, the type of fit we are performing (χ 2 fit vs. pseudo-Monte Carlo techniques), and importantly also the precise treatment of the correlations of the systematic uncertainties between electrons and muons. We reiterate, however, the concerns discussed in Section III A: the statistical correlation matrices given in [3] do not seem to be correct, since they are not singular as they should be, given the performed redistribution of events to obtain the different single-differential rates. Bearing this caveat in mind, we still investigate in the following the possibility that the observed discrepancy is an effect of BSM physics.

B. Possible BSM interpretation
We consider the possibility that the observed discrepancy is due to BSM physics. To that aim, we investigate the Lagrangian Eq. (7) in the limit of lepton-flavour conservation = . From our general analysis in Section II we have seen that A ( ) FB is special in that it is determined to O(m µ ) only by interference contributions ∼ Re(C i C * j ), and is the only observable in the single-differential distributions to which interference terms contribute in the massless limit. Given the size of the observed effect, ∆A FB /ΣA FB ∼ O(10%), a muon-mass suppressed contribution does not seem likely as its source. This suggests that in order to accommodate ∆A FB , the first options to consider are BSM contributions to right-handed vector operators, to both pseudoscalar and tensor operators, or to left-handed vector operators. Notably, the first two options correspond to second-order BSM contributions: for the interference between pseudoscalar and tensor operators this is obvious. For the right-handed vector operator the interference term For the BSM contributions to the left-handed vector operator only, the discussion is more involved. The interference terms Re( wherein the 1 stands for the SM contribution. However, if C V L were the only BSM contribution it would cancel in all normalized observables. This is not true for the contribution from right-handed vector operators, the real parts of which, however, enter linearly in |C A,V | 2 . Given the compatibility of all other observables with the SM, this scenario would therefore require the main contribution to either have a sizable imaginary part, or specific cancellations with other BSM contributions, in order not to upset this agreement.
Taking here the Belle data at face value, we perform fits analogous to the ones described above, including different sets of BSM contributions. Note that we keep our description qualitative, since numerical statements are likely to be upset by an eventual correction of the Belle dataset [3]. For the same reason we do not perform a combined fit with other b → c ν modes, which would of course be required to confirm the viability of potential BSM scenarios that resolve the tension in this dataset.
We find that either contributions from right-handed vector operators, or from both pseudoscalar and tensor operators are necessary to accommodate the observed ∆A FB , confirming our previous considerations. In order to describe the dataset well with real BSM Wilson coefficients, only, LFUV contributions to both the rightand left-handed vector operators are required.
The three minimal BSM scenarios that fit the present BelleB → D * ν data [3] can be summarized as follows: 1. C V R = 0: This scenario does require a sizable imaginary part (as anticipated above) and LFU violation. The latter fact is interesting, since it might point to BSM physics beyond SMEFT [41]. The imaginary part of are sizable. We strongly encourage an experimental measurement of these observables.
2. C V R = 0 and C V L = 1: This scenario can obviously describe the data well, given that in principle already C V R = 0 suffices. However, to our surprise it is also compatible with an LFU BSM contribution to C V R , which is required in a SMEFT scenario. Enforcing this flavour-universal C V R , i.e., C e V R = C µ V R , results in significantly different absolute values and a sizable phase difference between C e V L and C µ V L . Sizable A ( ) 8,9 are also likely in this case, although not strictly necessary. It is possible to have all BSM coefficients real, and hence A ( ) 8,9 = 0, but only with a phase between the left-handed coefficients φ L = π. This corresponds to a BSM contribution of about twice the SM one and is therefore highly fine-tuned.
3. C P = 0 and C T = 0: Also this scenario provides a good fit to the data, both for complex and real-valued Wilson coefficients. The fact that both C T and C P are required means that this scenario can be tested by measuring S While we do not attempt to include additional datasets as explained above and therefore cannot quantitatively test specific BSM scenarios, we still observe a few general features of a possible BSM explanation in the context of the B anomalies, especially in b → cτν transitions: 1. While moderate shifts in one or several Wilson coefficients are required to fit the present Belle data [3], the total rates are not strongly affected. Hence it is not possible to explain the discrepancy in R(D * ) with these shifts, i.e. additional new contributions in b → cτν coefficients are required to explain the deviations of LFU ratios involving = τ from SM predictions.
2. If the observations made here based on the Belle data persist after future updates or corrections, they would have strong implications for scenarios addressing the B anomalies: Scenarios that only shift C V L would be ruled out, which are currently favoured as simultaneous explanations of the b → cτν and b → s + − anomalies.
3. Based on the picture provided by the observables, one would naively expect a hierarchy ∆ µ > ∆ e . In light of the more substantial deviations in b → cτν, this could be extended to ∆ τ > ∆ µ , which is quite natural in scenarios addressing both B anomalies. However, we find that ∆ µ > ∆ e is far from being established in our fits at the level of the Wilson coefficients.
There will therefore be far-reaching consequences for the field of particle physics, should this discrepancy be confirmed.

V. CONCLUSIONS
In this article we pave the way for precision analyses of b → c ν processes beyond the assumption of e − µ universality. This endeavour is important for the determination of V cb in the Standard Model, a complete understanding of the weak effective theory (WET) beyond the SM (BSM), and also to gain new insights into the persistent b → cτν anomaly. We focus on the angular distribution inB → D * ν with light leptons = e, µ and highlight strategies for improved experimental analyses.
We discuss the complete set of CP-even and CP-odd angular observables that arise from the fully-differential angular distribution ofB → D * (→ Dπ) ν. In particular we discuss the influence of a finite mass of the charged lepton on these observables in and beyond the SM. We consider in detail the specific case of four single-differential CP-averaged rates that have been experimentally analyzed in Refs. [2,3]. We find that only four flavour-specific angular observables per lepton flavour are sufficient to describe the three single-differential CP-averaged angular distributions including arbitrary BSM contributions: the lepton-forward-backward asymmetry A 3 . However, we find that it is principally not possible to extract the full information on the BSM contributions to the WET Wilson coefficients for the electron mode when using only the single-differential CP-averaged rates. For the muon mode, part of that information enters only muonmass suppressed, although it can be extracted without that suppression when considering a different presentation of the data. We further emphasize the existence non-linear relations between the Wilson coefficients that allow to test for lepton-flavour violation (LFV) and right-handed neutrinos.
The most precise lepton-flavour-specific analysis to date [3] presents the three CP-averaged single-differential angular distributions for electron and muon flavours separately. Since they depend on only four angular observables per lepton flavour, the chosen number of kinematic bins is much larger than necessary. We show that this redundant presentation accidentally hides tensions between SM predictions and data. We encounter an issue with the statistical correlation matrices that can only be clarified by the Belle collaboration. We describe our approach to the combination of statistical and systematic correlations for the electron and muon datasets and extract the non-redundant lepton-flavour specific CP-averaged angular observables from the Belle data. For most of the angular observables we find good agreement with our up-to-date SM predictions, except for A FB in which the correlations of form factors lead to a strong cancellation of uncertainties, reaching the 4 σ level. We perform numerous checks that this tension is not a result of our specific treatment of the data. In particular, even when allowing for arbitrary systematic correlations between the electron and muon data, we find that this tension does not drop below 3.6 σ. This constitutes evidence for lepton-flavour universality violation.
We continue by investigating in a qualitative manner the most economic BSM scenarios that can potentially explain the observed tensions. To this end, we assume lepton-flavour conservation, but allow for lepton-flavour non-universality in the WET description. We find that either right-handed vector operators or both pseudoscalar and tensor operators are necessary to accomodate the observed tension. If only right-handed vector operators are present, large imaginary parts in the Wilson coefficients are necessary. As a consequence, the CP-odd angular observables A ( ) 8,9 would be expected to deviate sizably from their SM predictions. A solution with purely real-valued Wilson coefficients appears only as a highly fine-tuned solution in a combined scenario with left-and right-handed vector operators. For the combination of pseudoscalar and tensor operators, we do not find the necessity of sizable imaginary parts. In this case, S are expected to show significant differences relative to their SM predictions. None of these three scenarios coincides with the preferred explanation of the b → cτν anomaly.
Given the far-reaching consequences of our findings, we consider it essential that the Belle collaboration reviews -and if need-be corrects -the published dataset from Ref. [3]. Without such scrutiny, we cannot determine the impact of the identified issues on results inferred from the data. We strongly recommend that future measurements separate between the two light-lepton flavours in a transparent way. This is also important for the comparison with existing and upcoming LHCb analyses, which focus on the muon mode, only.