1 Introduction

Exclusive \({\bar{B}}\rightarrow D^{(*)} \ell {{\bar{\nu }}}\) decays have become precision probes of the semileptonic parton-level transitions \(b\rightarrow c \ell {{\bar{\nu }}}\). As such, they provide excellent means for the determination of the corresponding Cabibbo–Kobayashi–Maskawa (CKM) matrix element \(|V_{cb}|\) of the Standard Model (SM). The combination of good experimental and theoretical control renders them also sensitive probes of beyond-the-SM (BSM) physics that potentially modifies both the normalization and the angular distribution of these modes. In the SM, the lepton-flavour universal (LFU) nature of the underlying \(W^\pm \)-boson exchange allows for precision predictions of LFU ratios that are almost free of hadronic uncertainties. Measurements of the three different lepton modes \(\ell =e, \mu , \tau \) then allow to test SM paradigms such as CKM unitarity and LFU. Improved LFU tests are especially important in light of the recent indications for LFU violation in the so-called B anomalies, concerning \(b\rightarrow c\tau {{\bar{\nu }}}\) and \(b\rightarrow s\ell ^+\ell ^-\) (\(\ell =e,\mu \)) transitions. Further motivation for precision analyses of \({\bar{B}}\rightarrow D^*\ell {{\bar{\nu }}}\) decays is provided by the persisting \(V_{cb}\) puzzle, i.e. a tension between the inclusive and exclusive determinations of this CKM element, see Refs. [1,2,3,4,5,6,7,8,9,10] for recent discussions.

This work is triggered by three recent developments:

  • Availability of experimental data Starting with the 2015 analysis of \({{\bar{B}}}\rightarrow D\ell {{\bar{\nu }}}\) decays by Belle [11], experimental collaborations made their data on \(b\rightarrow c\ell {{\bar{\nu }}}\) transitions available in a model-independent way [11,12,13,14,15], thereby making phenomenological analyses possible that vary the form-factor parametrizations and BSM scenarios. In particular, a recent Belle analysis [13] presents for the first time four single-differential distributions of \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\) decays for both \(\ell =e,\mu \) including their full correlation matrices.

  • Improved form-factor determinations There has been significant progress in the theoretical determination of hadronic \({{\bar{B}}}\rightarrow D^{(*)}\) form factors, both from lattice QCD computations [16,17,18,19] and from light-cone sum rules [20]. These determinations allow for precise predictions of the complete set of form factors in \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\) in the whole phase space [7, 8]. These predictions are using the heavy-quark expansion and account for contributions up to and including \({\mathcal {O}}\left( 1/m_c^2\right) \) based on Ref. [21]. They are a prerequisite for a general BSM analysis of these modes.

  • Impending progress in experimental and theoretical precision Both the experimental and the theoretical precision are expected to improve significantly: the ongoing Belle II and LHCb upgrade experiments are bound to deliver \({\bar{B}}\rightarrow D^{(*)}\ell {{\bar{\nu }}}\) results based on multiples of the current datasets [22,23,24], and updated lattice QCD results for several \({\bar{B}}\rightarrow D^*\) form factors beyond zero recoil are upcoming [25,26,27], see also the discussions in Refs. [28, 29]. This renders the discussion of presently negligible effects important for the full phenomenological exploitation of the upcoming experimental and theoretical results.

The discussions resulting from the first two items significantly improve our understanding of these modes, and their sensitivity to the adopted form-factor parametrization. Some recent phenomenological analyses have also shown that the \(V_{cb}\) puzzle can be significantly reduced, albeit not yet fully resolved [1,2,3,4,5,6,7,8,9,10]. We pose the following questions that affect existing and future angular analyses of \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\) data:

  1. 1.

    What is the amount of LFU violation in the SM induced by the muon mass? Is the muon mass still negligible given the achieved experimental and theoretical precision?

  2. 2.

    What amount of information can be extracted from the available single-differential distributions in comparison to a fully-differential angular analysis of \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\)? Is it possible to increase the sensitivity to BSM physics with available data by modifying the analysis strategy?

  3. 3.

    What are the limits on BSM physics from existing \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\) data? Which effective operators could resolve a potential tension with the SM and what would be their implications on so far unmeasured observables?

In order to answer these questions, we proceed as follows: We begin by describing the general properties of the \({\bar{B}}\rightarrow D^* \ell {{\bar{\nu }}}\) angular distribution and the BSM physics reach of the angular observables arising from this distribution in Sect. 2. In Sect. 3 we prepare a full angular analysis on the basis of the Belle data published in Ref. [13]. In doing so, we identify two obstacles to the full use of these data. In Sect. 4 we carry out a fit of the full angular distribution to the Belle data, and discuss the compatibility with SM predictions. In light of an observed tension, we further discuss possible BSM interpretations of our results. We conclude in Sect. 5.

2 Full angular distribution and its BSM reach

The four-fold differential distribution of \({{\bar{B}}}\rightarrow D^{*} (\rightarrow D\pi ) \ell {{\bar{\nu }}}\) decays constitutes a powerful tool for assessing SM as well as BSM physics. It is given as

$$\begin{aligned}&\frac{d^4 \Gamma ^{(\ell )}}{d q^2\, d\!\cos {\theta _\ell }\, d\!\cos {\theta _{D}}\, d\chi }\nonumber \\&\quad = \frac{3}{8 \pi } \sum _i J_i^{(\ell )}(q^2) f_i(\cos {\theta _\ell },\, \cos {\theta _{D}},\, \chi ). \end{aligned}$$
(1)

Assuming a purely P-wave \(D\pi \) final state, this distribution is fully described by 12 angular observables \(J_i^{(\ell )}\) and their respective angular coefficient functions \(f_i\). The dependence of the functions \(f_i\) on the three angles \(\cos {\theta _\ell }\), \(\cos {\theta _{D}}\) and \(\chi \), given in Eq. (A1) in Appendix A, is lepton-flavour universal and completely determined by conservation of angular momentum.

The angular observables \(J_i^{(\ell )}\) depend on the momentum transfer \(q^2\), or equivalently the hadronic recoil w. Their calculation involves the lepton-flavour-universal hadronic form factors, as well as the short-distance coefficients of the low-energy effective theory. The latter encode short-distance SM effects (which are again lepton-flavour universal) as well as potential BSM effects (which are in general non-universal). These dependencies are listed in Table 1. Additional sources of lepton-flavour non-universality are known kinematic phase-space effects \(\sim m_\ell /\sqrt{q^2}\), which are most pronounced for \(\ell = \tau \). Under the assumption that the short-distance behaviour corresponds to the SM expectation, the angular observables \(J_i^{(\ell )}(q^2)\) can be used to extract information on the hadronic form factors up to an overall normalization. When lifting this assumption in BSM scenarios, disentangling the BSM short-distance coefficients from the form factors requires additional information, making theory input for the \(q^2\)-dependence of the form factors and especially their ratios indispensable. Below we discuss the necessary amount of experimental information on the angular observables \(J_i^{(\ell )}\) for a reliable determination of BSM contributions. Details on the definitions of the angular observables are given in Appendix A.

The complete dependence of the angular distribution on BSM contributions in terms of the BSM couplings has been given for the first time in Ref. [30], see also Refs. [31, 32], with further partial results throughout the literature [33,34,35,36,37,38,39,40]. We use the conventions/notation provided in Appendix A. The sensitivity to various BSM couplings and lepton-mass effects have been studied in detail [41] based on helicity amplitudes.

Here we would like to address properties that are not mentioned previously, or that are particularly important for our work. An important observation in charged-current semileptonic decays is that to extremely good approximation no CP-conserving scattering phases appear in the \(J_i^{(\ell )}\).Footnote 1 This simplifies their properties under CP conjugation, rendering them simply even (for \(i\in \{1c,1s,2c,2s,3,4,5,6c,6s\}\)) or odd (for \(i\in \{7,8,9\}\)). As a consequence, the numerators in the combinations

$$\begin{aligned} \left\langle S_i^{(\ell )}\right\rangle \equiv \frac{\left\langle J_i^{(\ell )}\right\rangle + \left\langle {\bar{J}}_i^{(\ell )}\right\rangle }{\Gamma ^{(\ell )} + {\bar{\Gamma }}^{(\ell )}},\quad \left\langle A_i^{(\ell )}\right\rangle \equiv \frac{\left\langle J_i^{(\ell )}\right\rangle - \left\langle {\bar{J}}_i^{(\ell )}\right\rangle }{\Gamma ^{(\ell )} + {\bar{\Gamma }}^{(\ell )}}, \end{aligned}$$
(2)

either vanish or are given by \(2\langle J_i^{(\ell )}\rangle \). Here the notation \(\left\langle \ldots \right\rangle \) denotes integration over the full range of the dilepton-invariant mass as defined in Eq. (A2).

The experimental determination of the fully differential rate is rather involved. Many analyses therefore present only results for the partially or fully integrated rate, typically CP-averaged. Doing so simplifies the experimental analysis, but the sensitivity to some of the angular observables is lost, which can render the determination of some parameters of interest impossible. The two recent Belle analyses for instance [12, 13] provide binned CP-averaged measurements of the four single-differential distributions

$$\begin{aligned} \frac{d {\widehat{\Gamma }}^{(\ell )}}{dw}&\equiv \frac{1}{2} \frac{d(\Gamma ^{(\ell )} + {\bar{\Gamma }}^{(\ell )})}{dw}, \end{aligned}$$
(3)
$$\begin{aligned} \frac{1}{{\widehat{\Gamma }}^{(\ell )}} \frac{d {\widehat{\Gamma }}^{(\ell )}}{d\!\cos {\theta _\ell }}&= \frac{1}{2} + \left\langle A_\text {FB}^{(\ell )}\right\rangle \cos {\theta _\ell }\nonumber \\&\qquad + \frac{1}{4} \left( 1 - 3 \left\langle \widetilde{F}^{(\ell )}_L\right\rangle \right) \frac{3 \cos ^2 {\theta _\ell }- 1}{2}, \end{aligned}$$
(4)
$$\begin{aligned} \frac{1}{{\widehat{\Gamma }}^{(\ell )}} \frac{d {\widehat{\Gamma }}^{(\ell )}}{d\!\cos {\theta _{D}}}&= \frac{3}{4} \left( 1 - \left\langle F_L^{(\ell )}\right\rangle \right) \sin ^2\!{\theta _{D}}+ \frac{3}{2} \left\langle F_L^{(\ell )}\right\rangle \cos ^2\!{\theta _{D}}, \end{aligned}$$
(5)
$$\begin{aligned} \frac{1}{{\widehat{\Gamma }}^{(\ell )}} \frac{d {\widehat{\Gamma }}^{(\ell )}}{d\chi }&= \frac{1}{2\pi } + \frac{2}{3\pi } \left\langle S_3^{(\ell )}\right\rangle \cos 2\chi + \frac{2}{3\pi } \left\langle S_9^{(\ell )}\right\rangle \sin 2\chi , \end{aligned}$$
(6)

where \({\widehat{\Gamma }}^{(\ell )}\) denotes the CP-averaged decay rate. The three CP-averaged single-angular distributions depend on only five out of the 12 angular observables defined in Eq. (1). Out of these five observables, the CP-averaged \(\left\langle S_9^{(\ell )}\right\rangle \) vanishes independently of the BSM scenario, as discussed above Eq. (2), and is thus not relevant for our analysis. This leaves the \(D^*\)-longitudinal polarization fraction \(\left\langle F^{(\ell )}_L\right\rangle \), the lepton forward-backward asymmetry \(\left\langle A_\text {FB}^{(\ell )}\right\rangle \), and two further angular observables \(\left\langle \widetilde{F}^{(\ell )}_L\right\rangle \) and \(\left\langle S_3^{(\ell )}\right\rangle \) as independent observables in the distributions. Within the SM, \(\left\langle F^{(\ell )}_L\right\rangle \) and \(\left\langle \widetilde{F}^{(\ell )}_L\right\rangle \) differ by lepton-mass suppressed terms, only. In a generic BSM scenario, the two observables can further differ due to contributions from pseudoscalar and tensor operators, see Table 1. For more details see Appendix A.

The presentation of the data in terms of single-differential distributions implies that all angular observables are integrated over the full \(q^2\) range. By binning in \(q^2\), the data will provide more information about the BSM couplings through the \(q^2\) shape of the angular observables. In particular, the binned angular observables yield access to more and independent bilinear combinations of the BSM couplings than the \(q^2\)-integrated ones do. Hence, binning the angular observables will constitute a powerful tool to discriminate between BSM scenarios, as discussed in more detail below.

The CP asymmetries of the single-differential rates Eqs. (3)– (5) vanish independently of the BSM scenario. This can be used to validate the experimental analyses. The CP asymmetry of the \(\chi \)-dependent rate in Eq. (6) is fully described by the angular observable \(A_9^{(\ell )}\). A measurement of this CP asymmetry could be accomplished with existing datasets and would provide important information about potential CP-violating BSM effects.

2.1 Parametrization of BSM physics

BSM physics in \({\bar{B}}\rightarrow D^* \ell {{\bar{\nu }}}\) decays has been investigated, usually based on the assumption of three light left-handed neutrino flavours below the electroweak scale. The corresponding most general low-energy effective theory at dimension six [42] can be written as [43]

$$\begin{aligned} {\mathcal {L}}(b\rightarrow c\ell {{\bar{\nu }}}) = \frac{4 G_F}{\sqrt{2}}V_{cb} \; \sum _i \sum _{\ell '} C_i^{\ell \ell ^\prime } {\mathcal {O}}_i^{\ell \ell ^\prime } + \text {h.c.} \end{aligned}$$
(7)

Here the operators are constructed out of SM fermion fields and read

$$\begin{aligned} \begin{aligned} {\mathcal {O}}_{V_L}^{\ell \ell ^\prime }&=({\bar{c}}\gamma ^\mu P_L b)({\bar{\ell }}\gamma _\mu P_L \nu _{\ell ^\prime }), \quad&{\mathcal {O}}_{S_L}^{\ell \ell ^\prime }&= ({\bar{c}} P_L b)({\bar{\ell }} P_L \nu _{\ell ^\prime }), \\ {\mathcal {O}}_T^{\ell \ell ^\prime }&= ({\bar{c}}\sigma ^{\mu \nu } P_L b)({\bar{\ell }}\sigma _{\mu \nu } P_L \nu _{\ell ^\prime }), \\ {\mathcal {O}}_{V_R}^{\ell \ell ^\prime }&= ({\bar{c}}\gamma ^\mu P_R b)({\bar{\ell }}\gamma _\mu P_L \nu _{\ell ^\prime }),&{\mathcal {O}}_{S_R}^{\ell \ell ^\prime }&= ({\bar{c}} P_R b)({\bar{\ell }} P_L \nu _{\ell ^\prime }).\nonumber \end{aligned}\\ \end{aligned}$$
(8)

They account for lepton-flavour violation (LFV) by \(\ell \ne \ell '\).

The observables in \({\bar{B}}\rightarrow D^* \ell {{\bar{\nu }}}\) depend only on four combinations of Wilson coefficients:

$$\begin{aligned} C_V^{\ell \ell ^\prime }&= C_{V_R}^{\ell \ell ^\prime } + C_{V_L}^{\ell \ell ^\prime } ,&C_A^{\ell \ell ^\prime }&= C_{V_R}^{\ell \ell ^\prime } - C_{V_L}^{\ell \ell ^\prime } ,\nonumber \\ C_P^{\ell \ell ^\prime }&= C_{S_R}^{\ell \ell ^\prime } - C_{S_L}^{\ell \ell ^\prime } ,&\end{aligned}$$
(9)

together with \(C_T^{\ell \ell ^\prime }\), whereas the combination \(C_S^{\ell \ell ^\prime } = C_{S_R}^{\ell \ell ^\prime } + C_{S_L}^{\ell \ell ^\prime }\) enters only in \({\bar{B}}\rightarrow D \ell {{\bar{\nu }}}\). Since the neutrino flavour \(\ell '\) is not detectable, it must be summed over in every observable.

Table 1 The dependence of angular observables on combinations of Wilson coefficients. An entry of \(\checkmark \) denotes the presence of this combination. An entry of \(m^n\) denotes the presence of this term, but with kinematic lepton-mass suppression \(\propto (m_\ell /\sqrt{q^2})^n\) (\(n=1,2\)). The “num(\(\cdot \))” indicates that only the dependence of the numerator of this observable is given. The \(V_i^a\) have been introduced in Ref. [30]

We determine the minimal number of parameters and their ranges necessary to parametrize these BSM coefficients for different cases. Starting from the lepton-flavour conserving case, Eq. (7) contains five complex parameters \(C_i^\ell \equiv C_i^{\ell \ell }\) per charged-lepton species \(\ell \). In the context of BSM analyses of \({\bar{B}} \rightarrow D^*\ell {{\bar{\nu }}}\), the fact that matrix elements of the scalar \({\bar{c}}b\) currents vanish implies that one can maximally determine four linear combinations out of the five Wilson coefficients. These four complex coefficients can be parametrized by seven real parameters, since an overall phase is unobservable, i.e. all observables are invariant under a joint phase rotation \(C_i^\ell \rightarrow \exp (i\phi ^\ell )C_i^\ell \). For instance, one of the complex coefficients can be chosen real and positive, which leaves four real and three imaginary parts or four absolute values and three relative phases as free parameters. The Lagrangian Eq. (7) is conveniently normalized to \(G_F\, V_{cb}\) to ensure that in the SM \(C_{V_L} = 1\) at tree-level. In general, these factors cannot be separated from the BSM Wilson coefficients since only their products enter observables. Hence, they do not count as additional parameters. The set of seven real parameters is therefore the maximal information we can hope to extract from \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\) decays for a given \(\ell \) without LFV.

All CP-averaged observables depend on these seven parameters through the combinations

$$\begin{aligned} \begin{aligned} |C_i^\ell |^2&={{\,\mathrm{Re}\,}}^2(C_i^\ell ) + {{\,\mathrm{Im}\,}}^2(C_i^\ell ),\\ {{\,\mathrm{Re}\,}}(C_i^\ell C_j^{\ell *})&= {{\,\mathrm{Re}\,}}(C_i^\ell ){{\,\mathrm{Re}\,}}(C_j^\ell ) - {{\,\mathrm{Im}\,}}(C_i^\ell ){{\,\mathrm{Im}\,}}(C_j^\ell ). \end{aligned} \end{aligned}$$
(10)

These combinations, however, are invariant under the discrete symmetry transformation \({{\,\mathrm{Im}\,}}(C_i^\ell ) \rightarrow -{{\,\mathrm{Im}\,}}(C_i^\ell ) \;\forall \; i\). The combinations

$$\begin{aligned} {{\,\mathrm{Im}\,}}(C_i^\ell C_j^{\ell *}) = {{\,\mathrm{Im}\,}}(C_i^\ell ){{\,\mathrm{Re}\,}}(C_j^\ell ) - {{\,\mathrm{Re}\,}}(C_i^\ell ){{\,\mathrm{Im}\,}}(C_j^\ell ), \end{aligned}$$
(11)

can therefore still be determined from CP-averaged observables, albeit only up to an overall sign. One is free to choose one of these signs freely in the fit, since the second solution can always be obtained by inverting the signs of the imaginary parts.

In the limit of a massless lepton, the two classes of Wilson coefficients \(C_{A,V}^\ell \) and \(C_{P,T}^\ell \) decouple in the observables, since their interference is \(m_\ell \) suppressed, as shown in Table 1. As we will see below, this applies only to electrons, since in precision analyses the muon mass cannot be neglected anymore. This implies a separate symmetry for each class, \(C_{V,A}^\ell \rightarrow \exp (i\phi ^\ell )C_{V,A}^\ell \) and \(C_{P,T}^\ell \rightarrow \exp (i\varphi ^{\ell })C_{P,T}^\ell \). Therefore another phase cannot be determined from any \({\bar{B}}\rightarrow D^*\ell ^-{{\bar{\nu }}}\) observable in this limit. In fact, it can be eliminated altogether from the parametrization, leaving maximally six parameters to be determined from \({{\bar{B}}}\rightarrow D^*\ell {{\bar{\nu }}}\) for massless charged leptons. In this case also the discrete symmetry for the imaginary parts holds separately for each class, allowing to choose another sign freely. Hence, the most general parametrization of CP-averaged \({{\bar{B}}}\rightarrow D^* e{{\bar{\nu }}}\) data within the weak effective theory and when neglecting LFV requires only six parameters, four of which can be chosen positive. Taking into account lepton-mass effects requires a seventh parameter, and only two of these parameters can be chosen positive.

Note that in the counting above we have assumed the couplings for the different lepton flavours to be completely independent, allowing in particular for independent phase rotations. Such an assumption does not hold in all BSM scenarios; in particular it does not hold in the Standard Model Effective Field Theory (SMEFT) at mass dimension six. In the matching of Eq. (7) to the SMEFT, the coefficient \(C_{V_R}^{\ell \ell '}\) is lepton-flavour universal, a property inherited from the SM gauge group [44, 45]. This universality couples the different sectors and consequently the phase rotations cannot be performed independently anymore. This gives rise to an additional measurable phase in this scenario, and therefore necessitates a new corresponding parameter. For instance, for the common and convenient choice of a real and positive \(C_{V_L}^\ell \), the coefficients \(C_{V_R}^{\ell }\) cannot be trivially identified with each other. Instead they fulfill

$$\begin{aligned} C_{V_R}^e&= \exp (i\phi _L) \, C_{V_R}^\mu ,&\phi _L&= \phi _{V_L}^e-\phi _{V_L}^\mu , \end{aligned}$$
(12)

and similarly for \(\ell =\tau \). The relative phase between the two Wilson coefficients \(C_{V_L}^e\) and \(C_{V_L}^\mu \) appears explicitly, while it can be absorbed everywhere else. This implies that although two real parameters are removed (one of the complex \(C_{V_R}^\ell \) coefficients), one is added (the relative phase), and hence the overall number of parameters is reduced only by one.

Generalizing the above observations in the presence of lepton-flavour-violating interactions, \(\ell \ne \ell '\), is straight-forward insofar as the contributions with different neutrino flavours do not interfere. Hence all expressions in Eqs. (10)–(11) remain valid with the generalizations

$$\begin{aligned} \left| C_i^\ell \right| ^2&\rightarrow \sum _{\ell '} \left| C_i^{\ell \ell '}\right| ^2 ,\nonumber \\ {{\,\mathrm{Re}\,}}\left( C_i^\ell C_j^{\ell *}\right)&\rightarrow \sum _{\ell '} {{\,\mathrm{Re}\,}}\left( C_i^{\ell \ell '} C_j^{\ell \ell '*}\right) ,\nonumber \\ {{\,\mathrm{Im}\,}}\left( C_i^\ell C_j^{\ell *}\right)&\rightarrow \sum _{\ell '} {{\,\mathrm{Im}\,}}\left( C_i^{\ell \ell '} C_j^{\ell \ell '*}\right) . \end{aligned}$$
(13)

The symmetry considerations hold for each neutrino flavour separately. Naively the number of parameters simply triples compared to the lepton-flavour-conserving case above. The situation is nevertheless significantly different from the lepton-flavour conserving case, for which the number of parameters is smaller than the number of combinations of Wilson coefficients appearing in the description of the decay. This implies (non-linear) relations between these combinations in the lepton-flavour conserving case, for instance,

$$\begin{aligned} {{\,\mathrm{Im}\,}}^2(C_i^\ell C_j^{\ell *}) = |C_i^\ell |^2 |C_j^{\ell }|^2-{{\,\mathrm{Re}\,}}^2(C_i^\ell C_j^{\ell *}). \end{aligned}$$
(14)

With the generalizations in Eq. (13), the number of BSM parameters is larger than the number of combinations of Wilson coefficients. Hence, the latter determine the maximal number of parameters (parameter combinations) that can be extracted. This implies that relations such as Eq. (14) do not hold anymore in the presence of lepton-flavour violation and can be used instead to test for LFV in charged-current decays without the need to identify the neutrino flavour experimentally.

In the presence of light right-handed neutrinos, similar considerations as for the LFV case apply, since also here more BSM parameters are introduced and the corresponding contributions do not interfere. The generalization to light right-handed neutrinos is therefore analogous to Eq. (13) and similar comments apply for the determination of the corresponding parameters.

2.2 BSM reach in \({\bar{B}}\rightarrow D^*\ell {{\bar{\nu }}}\)

We now turn to the determination of the discussed parameters from the differential distributions. Each fully \(q^2\)-integrated angular observable provides one linear combination of the combinations of Wilson coefficients only, as indicated in Table 1. The measurement of their \(q^2\) dependence allows further to separate different BSM contributions to the same angular observable, if their \(q^2\) dependence [41] is different. For instance, the \(q^2\)-differential rate allows to determine all four absolute values of the BSM parameters. The question is what amount of experimental information is necessary to determine the maximal amount of parameters in a given scenario. Table 2 shows the situation in a few scenarios for different sets of experimental measurements.

Table 2 Amount of BSM physics information that can be extracted in different scenarios, see also text. Here S and A denote the measurement of the CP average and the CP asymmetry of the respective differential rate. The first and second number corresponds to the number of parameters that can be extracted without and with mass suppression, respectively

A few general comments are in order:

  • It is necessary to consider the CP-conjugated modes separately if the sign ambiguity for the imaginary parts is to be resolved. Since the lepton charge tags the B meson flavour, this is not difficult to achieve experimentally.

  • The interference between the two classes of BSM coefficients \(C_{A,V}^\ell \) and \(C_{P,T}^\ell \) is always lepton-mass suppressed, see Table 1. Hence its determination requires high statistical power, similar to the one required for the identification of muon-mass suppressed contributions in the SM, discussed below. Such statistical power is expected from the upcoming datasets at Belle II and the LHC experiments.

  • While for \(\ell =\mu \) there is some sensitivity to additional combinations of Wilson coefficients, these combinations are still strongly suppressed. The corresponding parameters will therefore be determined comparatively poorly. Generally the best chance to determine them is to consider rather low values of \(q^2\), given the suppression by powers of \(m_\ell /\sqrt{q^2}\), both for the angular observables and the \(q^2\)-differential rate. Probing different bins in \(q^2\) can also improve the sensitivity to other BSM coefficients. Tensor interactions for instance can be probed particularly well at low \(q^2\) in \(d{{\widehat{\Gamma }}}_T/dq^2\sim 3S_{1s}-S_{2s}\), since the SM contributions vanish for \(q^2\rightarrow 0\), while the tensor contributions remain finite [43], see also Ref. [46].

Considering some of the scenarios in more detail, we make the following observations:

  • It is impossible to determine the full set of physical BSM parameters for \(m_\ell \rightarrow 0\) (e.g. \(\ell =e\)) from the CP-averaged single-differential rates alone, even disregarding ambiguities in the signs of imaginary parts. The reason is that in this case only \(\left\langle A_\text {FB}^{(\ell )}\right\rangle \) is sensitive to the relative phases between the coefficients. Since there are two observable relative phases (one between \(C_A^\ell \) and \(C_V^\ell \), one between \(C_P^\ell \) and \(C_T^\ell \)), they cannot both be determined from this single observable.

  • Assuming the flavour-conserving case, the extraction of all seven parameters is possible for finite \(m_\ell \) from the CP-averaged single-differential rates, modulo discrete ambiguities. However, one relative phase can only be obtained from lepton-mass-suppressed contributions, even though in more sophisticated measurements it would be accessible without lepton-mass suppression.

  • Beyond the lepton-flavour-conserving case, it becomes clearer how much more information is contained in a fully \(q^2\)-differential measurement. Strictly speaking, such a measurement is not necessary when assuming lepton-flavour conservation. However, also in this case there are additional crosschecks possible and additional form-factor information can be extracted together with the BSM parameters.

These observations apply fully to the recent Belle measurements [13].

Considering the determination of the full BSM information in the lepton-flavour-conserving case as an important intermediate goal, there are several ways this could be achieved with existing data, extending the experimental analyses only slightly:

  1. 1.

    Measuring \(A_\text {FB}^{(\ell )}\) in at least two \(q^2\) bins. This disentangles \(S_{6s}^{(\ell )}\) from \(S_{6c}^{(\ell )}\) entering this observable. Given that the \(\cos \theta _\ell \)-differential distribution (4) has been measured in 10 bins in Refs. [12, 13], but contains only two angular observables, this seems feasible by reducing the number of \(\cos \theta _\ell \) bins and providing the observables in two or three \(q^2\) bins instead. This would give access to all BSM parameters, leaving only two signs of imaginary parts undetermined.

  2. 2.

    Measuring \(d\Gamma /d\chi \) separately for the two lepton charges. This would give access to \(A_9^{(\ell )}\), and thereby to \({{\,\mathrm{Im}\,}}(C_A^\ell C_V^{\ell *})\). This in turn would determine also \({{\,\mathrm{Re}\,}}(C_A^\ell C_V^{\ell *})\) up to a sign, and thereby allow to access \({{\,\mathrm{Re}\,}}(C_P^\ell C_T^{\ell *})\) from \(\left\langle A_\text {FB}^{(\ell )}\right\rangle \) up to a two-fold ambiguity. Each of the solutions would still have a two-fold sign ambiguity for the corresponding imaginary part. Together with the first option, this measurement would resolve the sign ambiguity in \({{\,\mathrm{Im}\,}}(C_A^\ell C_V^{\ell *})\), leaving only the one in \({{\,\mathrm{Im}\,}}(C_P^\ell C_T^{\ell *})\) (should this parameter combination be found to be different from zero).

  3. 3.

    Assessing \(S_5^{(\ell )}\), \(A_7^{(\ell )}\) and/or \(A_8^{(\ell )}\). The measurement of each of these requires a different binning scheme, since these observables do not enter the single-differential rates. The latter two further require tagging by the lepton charge. Of particular interest is \(A_7^{(\ell )}\), since contributions linear in BSM parameters are additionally lepton-mass suppressed, rendering the quadratic BSM contributions potentially dominant. A similar statement holds for \(S_{6c}^{(\ell )}\).

3 Available experimental data

Semileptonic \({\bar{B}}\rightarrow D^{(*)}\ell {{\bar{\nu }}}\) decays have been of key interest for many years, see Ref. [47] for a list of analyses over the last \(\sim 25\) years. However, until recently, almost all experimental analyses have been tied to a specific form-factor parametrization, specifically the so-called CLN parametrization [48]. The latter is based on the combination of a dispersive approach for these modes developed in [49,50,51,52,53,54,55,56,57,58,59] and the heavy-quark expansion (see Refs. [60, 61] for reviews and references therein) up to \({\mathcal {O}}(1/m_{b,c}\)) [62,63,64] and \(\mathcal O(\alpha _s)\) [64]. This parametrization involves assumptions that are not adequate anymore for analyses of present and upcoming experimental precision data. Applying instead the underlying formalism of a heavy-quark expansion more consistently [5] and extending it to include \(1/m_c^2\) contributions [7, 8, 21], allows for a consistent description of the available experimental data and form factor results. However, since experimental analyses presented in most cases only parametrization-specific results, a model-independent reanalysis under different theory assumptions of the underlying experimental data is impossible.Footnote 2 Unfortunately, this problem persists in the most recent BaBar analysis [65], which includes a second form factor parametrization, but still does not allow for an independent analysis of the data. Furthermore, in many cases electron and muon data have been averaged without presenting separate results, rendering them of limited use for the analysis of LFU. A notable exception among these past studies is the 2010 untagged Belle analysis [66], which presented lepton-specific differential rates separately for longitudinal and transverse \(D^*\) polarizations, but lacked the necessary correlations.

More recently, the Belle analysis of \({\bar{B}}\rightarrow D\ell {{\bar{\nu }}}\) [11] presented for the first time lepton-specific differential rates including their full correlations, which made possible precision studies with arbitrary form-factor parametrizations for the first time, initiating an intense ongoing discussion regarding the best way to analyze these and similar data. Similar comments apply to the preliminary \({\bar{B}}\rightarrow D^*\ell {{\bar{\nu }}}\) data with hadronic tag in Ref. [12], which were however again lepton-flavour averaged and are presently reanalyzed, and the 2018 untagged analysis [13], superseding the results of Ref. [66], which we discuss in detail in the following.

It is worth pointing out that the background subtraction in all existing \({{\bar{B}}}\rightarrow D^{(*)}\ell {{\bar{\nu }}}\) analyses proceeds under the assumption of the SM. However, some of the background contributions are affected by the same BSM physics as the signal mode, notably \({{\bar{B}}}\rightarrow D^{**}\ell {{\bar{\nu }}}\). Given that these modes constitute less than \(8\%\) of the unsubtracted spectrum [13], we consider it safe to neglect these contributions in this analysis. This point will need to be reconsidered once a BSM contribution is observed with a comparable precision. Efforts to address this issue includes the works presented in Refs. [67, 68].

3.1 Belle’s 2018 untagged analysis

The dataset for the angular distribution provided by Belle [13] is the first analysis that separates the electron mode from the muon mode in both the bin contents and the statistical covariance matrix, and also the systematic covariance matrix can be reconstructed for both lepton species separately.Footnote 3

Unfortunately the correlations between the electron and muon modes are not given explicitly. Yet Belle has used these data for a high-precision LFU test that compares the branching fractions to electrons and muons integrated over the entire phase space. They found the ratio to be in agreement with lepton flavour universality, \(R_{e/\mu } = 1.01 \pm 0.01\text{(stat.) } \pm 0.03\text{(sys.) }\). In our study we aim to extend the study of LFU to the angular observables using the same Belle data. For this purpose we need to construct a combined correlation matrix for the full dataset, including correlations between electrons and muons.

Before going into these details, however, we comment on an issue present in the statistical correlation matrix. Belle provides the number of (background-subtracted) events before unfolding in bins of the four aforementioned single-differential distributions.Footnote 4 These are the distribution in

$$\begin{aligned} w&= \frac{m_B^2 + m_{D^*}^2 - q^2 }{2 m_B m_{D^*}} , \end{aligned}$$
(15)

and the three angular distributions Eqs. (4)–(6). There are 10 equidistant bins for each distribution, resulting in 40 bins per lepton flavour (LF). The events in the 10 bins for each of the four distributions sum up to the same number for each lepton flavour:

$$\begin{aligned} \sum _{i= 1}^{10} N^\text {obs}_{i,\ell }= & {} \sum _{i=11}^{20} N^\text {obs}_{i,\ell } = \sum _{i=21}^{30} N^\text {obs}_{i,\ell } = \sum _{i=31}^{40} N^\text {obs}_{i,\ell }\nonumber \\= & {} \left\{ \begin{array}{cc} 90743.4 &{} \ell = e \\ 89087.0 &{} \ell = \mu \end{array} \right. , \end{aligned}$$
(16)

i.e. the same signal candidates have been histogrammed in four different ways in the four single-differential distributions. These relations imply that for both electrons and muons only 37 of the measured bins are independent, since the content of 3 bins can be calculated as the total yield minus the yields of the other 9 bins of the corresponding distributions. This in turn implies that the corresponding statistical correlation matrices have to be singular; each of the \(40\times 40\) matrices should exhibit three vanishing eigenvalues. This is, however, not the case: the determinant of both matrices is rather large and all eigenvalues of both statistical correlation matrices are \({\mathcal {O}}(1)\). It remains unclear why the statistical correlation matrices do not reflect the linear dependence of the 3 bins, which should by construction be a result of the description of 10 bins per single-differential distribution used by the Belle collaboration. Note that the issue of the linearly dependent bins affects the determination of \(V_{cb}\) from these dataFootnote 5: if the sum over each set of 10 bins is identical, no information is added to the determination of the total rate by having the four binnings. However, if the correlations are such that these sums become effectively independent, the total rate is more precisely determined by considering all four binnings than by considering only a single one, leading to an underestimation of the uncertainty of the total rate (and hence \(V_{cb}\)). The effect is not large with the given data, but it is non-vanishing: the determination of the total rate is a couple of per mil better than from each individual distribution. It is important to note that this small numerical impact is not an indication that a correct extraction of the statistical correlation matrices will lead to small corrections in the analysis. Since there is an unknown problem in the extraction of the statistical correlation matrices, there is no way of knowing what the effect of its resolution will be. Given this numerical smallness within our analysis, however, we will work below with \(40 \times 40\) matrices. In LF-specific fits with a \(37\times 37\) matrix the result varies very slightly, depending on the choice of the three discarded bins, and any specific choice would be arbitrary. We have checked that our numerical results below remain essentially unaffected. The issue with the statistical correlation matrices must be kept in mind when interpreting any results obtained from the data from Ref. [13].

In the remainder of this section, we describe the construction of a combined electron-muon \(80\times 80\) covariance matrix based on Ref. [13], with only one mild additional assumption. According to Ref. [13], the only source of systematic uncertainties that is different for \(\ell =e\) and \(\ell =\mu \) is the procedure of lepton identification (Lepton ID). Given the statistical independence of electron and muon samples, this implies the following form for the total covariance matrix:

$$\begin{aligned} \text {Cov}^\text {total}_{80\times 80}&= \text {Cov}^\text {stat}_{80\times 80}+\text {Cov}^\text {sys}_{80\times 80} \end{aligned}$$
(17)
$$\begin{aligned}&= \begin{pmatrix} \text {Cov}^{\text {stat},e}_{40\times 40} &{} 0_{40\times 40} \\ 0_{40\times 40} &{} \text {Cov}^{\text {stat},\mu }_{40\times 40} \end{pmatrix}\nonumber \\&\quad + \begin{pmatrix} \text {Cov}^{\text {sys,uni}}_{40\times 40} &{} \text {Cov}^{\text {sys,uni}}_{40\times 40} \\ \text {Cov}^{\text {sys,uni}}_{40\times 40} &{} \text {Cov}^{\text {sys,uni}}_{40\times 40} \end{pmatrix} + \text {Cov}^\text {sys,lep-ID}_{80\times 80}. \end{aligned}$$
(18)

The lepton-ID systematic uncertainties are provided individually for both lepton flavours, but also for the “LF-combined” which enter the systematic correlation matrix given explicitly in the article. We therefore have

$$\begin{aligned} \text {Cov}^\text {sys,uni}_{40\times 40}&= \text {Cov}^\text {sys,LF-comb}_{40\times 40} - \text {Cov}^\text {sys,lep-ID-comb}_{40\times 40}. \end{aligned}$$
(19)

Together with the information that the Lepton-ID systematic uncertainties are 100% positively correlated throughout all bins [69], \(\text {Cov}^{\text {sys,lep-ID-comb}}_{ij} = \sigma _i^{\text {lep-ID-comb}} \sigma _j^{\text {lep-ID-comb}}\), where \(\sigma _i\) are systematic uncertainties of the ith bin taken from tables XI–XIV [13], the “LF combination” can thus be undone for the systematic correlations. We compute the LF-specific systematic covariances (\(\text {Cov}^{\text {sys},\ell }\)) from the “LF-combined” ones (\(\text {Cov}^\text {sys,LF-comb}\)) of [13] consequently as

$$\begin{aligned} \text {Cov}^{\text {sys},\ell }_{ij}&= \text {Cov}^\text {sys,LF-comb}_{ij} - \sigma _i\sigma _j |^{\text {lep-ID-comb}} + \sigma _i\sigma _j |^{\text {lep-ID,}\ell } ,\nonumber \\&\quad i,j = 1,\ldots , 40. \end{aligned}$$
(20)

LF-specific analyses can be performed with these LF-specific \(40 \times 40\) statistical and systematic correlation matrices at hand. The only assumption we make for the construction of the full \(80\times 80\) covariance matrix is that the lepton-ID uncertainties for electrons and muons are uncorrelated:

$$\begin{aligned} \text {Cov}^\text {sys,lep-ID}_{80\times 80} = \begin{pmatrix} \text {Cov}^{\text {sys,lep-ID},e}_{40\times 40} &{}\quad 0_{40\times 40} \\ 0_{40\times 40} &{}\quad \text {Cov}^{\text {sys,lep-ID},\mu }_{40\times 40} \end{pmatrix}. \end{aligned}$$
(21)

This is plausible, given they concern different detector parts, but not fully guaranteed [70]. We consider this assumption to be at a comparable level to the assertion in Ref. [13] that the lepton ID constitutes the only non-universal contribution to the systematic uncertainty. Note that this is an approximation that might not hold well enough to analyze LFU. In that case the systematic uncertainty given in [13] for the LFU ratio \(R_{e/\mu }\) would be underestimated, as would be our \(e-\mu \) covariance.Footnote 6 However, we perform below an extremely conservative check that our observation of a tension with the SM does not depend on this assumption.

4 Fits to \({\bar{B}}\rightarrow D^* (e, \mu ){{\bar{\nu }}}\) data and discussion

We analyze the data from the Belle analysis [13] in detail, based on the general analysis in Sect. 2 and the covariance matrix derived in Sect. 3.

4.1 Angular analysis and comparison with the SM

In the first step our fit is completely model-independent: we use the observation made in Sect. 2 that the three single-differential CP-averaged angular distributions can be fully described by only four angular observables

$$\begin{aligned} \left\langle A_\text {FB}^{(\ell )}\right\rangle , \quad \left\langle F_L^{(\ell )}\right\rangle , \quad \left\langle \widetilde{F}_L^{(\ell )}\right\rangle , \quad \left\langle S_3^{(\ell )}\right\rangle , \end{aligned}$$
(22)

retaining all information. Further, we parametrize the 10 bins of the w-distribution again in full generality as the total decay rate and nine independent bins of the normalized w-differential rate:

$$\begin{aligned} {\widehat{\Gamma }}^{(\ell )},\quad x_i^{(\ell )}&\equiv \frac{1}{{\widehat{\Gamma }}^{(\ell )}} \int _{w_{i-1}}^{w_i} \!\! dw \, \frac{d {\widehat{\Gamma }}^{(\ell )}(w)}{dw}, \nonumber \\ w_i&= 1 + i \frac{(w_\text {max}-1)}{10} \quad (i = 2, \ldots 10). \end{aligned}$$
(23)

Here \(w_\text {max} = 1.5\) to comply with the choice in [13], which excludes a tiny part of the low-\(q^2\) phase space. From this parametrization we calculate the bin contents \(N_{i,\ell }^\text {obs}\) by integrating over the relevant angle intervals where necessary, and folding these predictions with the corresponding response matrices and efficiencies provided by the Belle collaboration for each lepton flavour separately, as described in [13]. We thus arrive at a description of the 40 bins per lepton flavour given in [13] in terms of only \(10 + 4 = 14\) observables in Eqs. (22)–(23). We emphasize that our fit parameters appear up to the common normalization factor linearly, assuring a unique minimum and no distortion of their distributions from a multivariate gaussian shape.

The conversion of number of events to decay rate involves the following numerical input:

$$\begin{aligned} \begin{aligned} N_{B{\bar{B}}}&= (772 \pm 11) \cdot 10^6 , \\ {\mathcal {B}}(D^{*+} \rightarrow D^0 \pi ^+)&= (67.7 \pm 0.5)\, \% , \\ f_{00}&= 0.486 \pm 0.006 , \\ {\mathcal {B}}(D^0 \rightarrow K^- \pi ^+)&= (3.950 \pm 0.031)\, \% ,\\ \tau _{B^0}&= (1.519 \pm 0.004) \cdot 10^{-12} \,\text {s} , \end{aligned} \end{aligned}$$
(24)

with \(N_{B{\bar{B}}}\) from [71], \(f_{00}\) and from the \(B^0\) lifetime from [47] (see also the discussion on \(f_{00}\) in [72]), and the latest values of the branching fractions from [73]. Note that the value for \({\mathcal {B}}(D^0 \rightarrow K^- \pi ^+)\) was updated w.r.t the value used in Refs. [13, 74], which slightly impacts the determination of \(V_{cb}\). The corresponding uncertainties cancel in all ratios and hence affect only the total decay rate, for which they are included in the systematic uncertainties provided by the Belle collaboration [13].

We further introduce the averages and differences of LF-specific observables

$$\begin{aligned} \Sigma X&\equiv \frac{X^{(\mu )} + X^{(e)}}{2},&\Delta X&\equiv X^{(\mu )} - X^{(e)}, \end{aligned}$$
(25)

for later convenience in the study of LFU violation where \(X^{(\ell )}\) stands for any of the considered observables.

Table 3 The SM predictions of observables for \(\ell = e\) and \(\ell = \mu \), using the form-factor results [7], together with their values obtained from our fit to the Belle data [13]. For the prediction of the total rate, we leave the value of \(|V_{cb}|\) unspecified
Fig. 1
figure 1

The measured \(x_i^{(\ell )}\) from Belle versus the SM predictions for \(\ell = e\) [left], \(\ell = \mu \) [middle] and the lepton-flavour averaged \(\Sigma x_i\) [right], and also shown the differences \(\Delta x_i\) [lower]. The numbers are collected in Table 3

We perform two types of fits with our approach to test the stability of the results:

  1. 1.

    a simple \(\chi ^2\) fit,

  2. 2.

    a fit using pseudo-Monte Carlo techniques, following the procedure described in Ref. [74],

both using the full \(80\times 80\) covariance matrix. In addition, we have applied a correction to the systematic correlations for d’Agostini bias [75], following the procedure described in Ref. [43].

We find the results of the two fits to be virtually identical. In Ref. [74] the authors observe that in their joint fit of \(V_{cb}\) and form-factor parameters the two procedures produce markedly different results. They conclude that this difference is due to the large correlations present in the experimental data and that the usage of the pseudo-Monte Carlo technique is mandatory for phenomenological analyses. Our findings are in stark contrast to this conclusion and indicate instead that large correlations alone are not the cause for this difference. Our interpretation is that the observed difference is related to the form-factor parameters entering non-linearly in the fit of Ref. [74], while our angular observables and \(x_i^{(\ell )}\) parameters enter bilinearly. It is worth emphasizing in this context that

  • our fit results are extremely well described by Gaussian distributions; and that

  • the correlations between our fit parameters are much smaller than the ones present in the \(80\times 80\) matrix describing the bin contents.

As a consequence, we do not distinguish between the results from the two fit procedures in the following.

The fit results for our parameters as defined in Eqs. (22)–(23) are listed in Table 3 and shown in Fig. 1. At the best-fit point we find \(\chi ^2 = 48.9\) for \(80 - 2\times 14 = 52\) degrees of freedom (dof), indicating a good fit.Footnote 7 This suggests that the assumption of a pure P-wave \(D\pi \) final state is well justified.

In both Table 3 and Fig. 1 we juxtapose the fit results with their corresponding SM predictions. The latter depend on the \({\bar{B}}\rightarrow D^*\) form factors. Here, we use the form-factor determinations from Refs. [7, 8]. All SM predictions are obtained using the EOS software [76]. The EOS code for the computation of \({\bar{B}}\rightarrow D^*\ell {{\bar{\nu }}}\) observables has been independently checked. We also predict the ratio \(R_{e/\mu }\) in the SM and obtain:

$$\begin{aligned} R_{e/\mu } = 1.0026 \pm 0.0001, \end{aligned}$$
(26)

which does not include possible structure-dependent QED corrections.

We emphasize that the predictions [7, 8] of the \({\bar{B}}\rightarrow D^*\) form factors are conservative in that the corresponding uncertainties include higher-order contributions in the heavy-quark expansion. They rely only on theory input from various sources, i.e. no experimental input has been used for their determination. Note that \(|V_{cb}|\) cancels in the predictions for the normalized bins \(x_i^{(\ell )}\) as well as in the angular observables; only the total decay rate is proportional to \(|V_{cb}|^2\). Moreover, theoretical uncertainties of the normalization of the leading hadronic \(B\rightarrow D^*\) form factor cancel in the normalized observables. However, we do not include structure-dependent electromagnetic corrections to the angular distribution. Given the expected precision of the experimental data and the impact of muon-mass effects as discussed in this work, we expect that including these effects will become mandatory soon.

Before comparing to our numerical SM predictions, we test the qualitative expectation of approximate lepton-flavour-universality, i.e. \(\Delta X\equiv 0\), which does not require a specific form-factor parametrization. We find that most quantities are well compatible with lepton-flavour universality, with the exception of \(\left\langle A_\text {FB}^{(\ell )}\right\rangle \), which shows a deviation from exact universality at the \(3.9\sigma \) level, to be discussed below. This strong violation is not readily observable in the 80 bins provided by the Belle collaboration, but becomes visible once interpreted in terms of angular observables of the underlying angular distributions, see Fig. 1. The violation is further hidden by the fact that the lepton-flavour averaged data are compatible with the SM expectation.

In the comparison of our SM predictions with the fit results we find:

  1. 1.

    As expected, the precision for most normalized quantities is better than that for the total rate, typically at the level of a few percent. This is true for both the SM predictions and the fit results.

  2. 2.

    Overall we find very good agreement of the fit results with our SM predictions, as can be seen in Fig. 1, especially when considering the individual lepton species. There are a few smaller differences of roughly \(1\sigma \), only \(\langle A_\text {FB}^{(\mu )}\rangle \) shows a tension above the \(2\sigma \) level.

  3. 3.

    The differences of the lepton-flavour-specific observables, \(\Delta X\), are predicted with very small absolute uncertainties due to the muon-mass suppression. Their predictions have similar relative uncertainties as the ones for the angular observables themselves. Their absolute values are also very small, with \(\Delta X/\Sigma X={\mathcal {O}}(\permille )\) in most cases. This can be readily understood, since these observables receive only corrections of \({\mathcal {O}}(m_\mu ^2)\) in the SM. The only sizable central values are those of \(\Delta A_\text {FB}\) and \(\Delta \widetilde{F}_L\), which are slightly enhanced by numerical factors. Most importantly, we find that the latter shifts are still small, but already comparable to the corresponding experimental uncertainties, see Table 3. This implies that the muon mass cannot be neglected anymore in precision analyses.

  4. 4.

    The pattern of the shifts in \(\Delta x_i\) is surprising at first sight, since \(|\Delta x_i|/\Sigma x_i\) is almost constant over the whole range of w (or \(q^2\)), while we argued that the effect scales like \((m_\mu /\sqrt{q^2})^2\). This can be understood from the normalization to the total rate. The shifts in \(\Delta (\Delta \Gamma _i)/\Sigma (\Delta \Gamma _i)\) scale as expected, from significantly less than \(1\permille \) at \(w\sim 1\) (high \(q^2\)) to \(-5\permille \) in the bin with maximal w (lowest \(q^2\)). The shift in the total rate is about \(-3\permille \), so normalizing yields shifts in \(\Delta x_i/\Sigma x_i\) to the range \([-3\permille , 3\permille ]\).

  5. 5.

    For LFU observables we still find mostly excellent agreement between experiment and our SM predictions. However, the aforementioned difference between the measurements of \(A_\text {FB}^{(\mu )}\) and \(A_\text {FB}^{(e)}\) becomes more significant, given the smaller absolute uncertainty in \(\Delta A_\text {FB}\) and the fact that the relatively large SM prediction carries the opposite sign from the one determined in the fit. This quantity differs therefore by approximately \(4\sigma \) from its SM prediction. In Fig. 2 we show the pair-wise 2-dimensional best-fit regions of \(\Delta A_\text {FB}\) with \(\Delta F_L\), \(\Delta \widetilde{F}_L\), \(\Delta S_3\), and \(\Sigma A_\text {FB}\). The discrepancy with the predictions reaches the \(4\,\sigma \) level, compatible with similar levels seen for the 1-dimensional discrepancy for \(\Delta A_\text {FB}\) in Table 3.

Fig. 2
figure 2

Fit to the Belle data in the planes of \(\Delta A_\text {FB}= A_\text {FB}^{(\mu )} - A_\text {FB}^{(e)}\) vs. \(\Delta F_L=F_L^{(\mu )} - F_L^{(e)}\) (top left), \(\Delta A_\text {FB}\) vs. \(\Delta \widetilde{F}_L=\widetilde{F}_L^{(\mu )} - \widetilde{F}_L^{(e)}\) (top right), \(\Delta A_\text {FB}\) vs. \(\Delta S_3 = S_3^{(\mu )} - S_3^{(e)}\) (bottom left), and \(\Delta A_\text {FB}\) vs. \(\Sigma A_\text {FB}= (A_\text {FB}^{(\mu )} + A_\text {FB}^{(e)}) / 2\) (bottom right). Contours correspond to \(68\%\), \(95\%\) \(99.7\%\), and \(99.99\%\) probability, respectively. The ragged outermost contours are artefacts due to lack of samples so far in the periphery of the best-fit point. The SM predictions based on the form factors obtained in Refs. [7, 8] are shown as blue crosses. The SM uncertainties are found to be much smaller than \(10^{-2}\) and hence negligible, with the exception of the last panel. The uncertainty in the \(\Delta A_\text {FB}\)\(\Sigma A_\text {FB}\) plane is shown as a (highly degenerate) ellipse at the \(68\%\) probability level

These observations mildly depend on the covariance matrix used in the fit. As stated above, we consider our construction of the \(80\times 80\) covariance matrix reliable to the extent that the data in Ref. [13] are correct. To make absolutely sure that our assumption regarding the \(e-\mu \) correlations is not the reason for the observed discrepancy, we adopt the following alternative procedure: We determine the \(A_\text {FB}^{(e)}\) and \(A_\text {FB}^{(\mu )}\) with separate statistical and systematic uncertainties in two separate fits to the lepton-specific data, using the corresponding \(40\times 40\) covariance matrices for which we do not have to rely on our assumption. We then minimize the discrepancy with respect to our (strongly correlated) SM predictions by assuming a diagonal \(2\times 2\) statistical correlation matrix for \(A_\text {FB}^{(e)}\) and \(A_\text {FB}^{(\mu )}\), but allowing for an arbitrary correlation \(\rho \in [-1, 1]\) between the systematic uncertainties.

We find that the minimal tension with respect to the SM for the combined \(A_\text {FB}^{(e)}\), \(A_\text {FB}^{(\mu )}\) occurs for maximal anti-correlation (\(\rho = -1\)), which is not a realistic value. The correlation determined in the fit to the \(80\times 80\) covariance matrix is actually very small. Adopting nevertheless this most conservative choice of \(\rho = -1\) still leads to a tension of \(3.6\sigma \). We emphasize again that this result is not changed by employing the pseudo-Monte Carlo approach with Cholesky decomposition for the fit as done in [74], nor by the d’Agostini effect (the plots shown in Fig. 1 include the corresponding shifts). Therefore, even adopting this maximally conservative procedure, our results amount to evidence for \(\mu \)-e-non-universality beyond the SM in charged-current \(b\rightarrow c\ell \nu \) transitions. However, our finding hinges on the approximate validity of the data and specifically the correlation matrices given in Ref. [13].

We also perform a full SM fit to the \(2\times 14\) observables in Table 3, including their correlations given in ancillary files attached to the arXiv preprint of this article. Starting from a fit of form-factor parameters from theory input, only [7, 8], the inclusion of the experimental information on these 28 observables increases the minimal \(\chi ^2\) by 68.5, while only \(|V_{cb}|\) is introduced as an additional parameter in the fit. This does indicate a bad fit, with a p value of \(2\times 10^{-5}\), or a tension at the \(4.3\sigma \) level. The discrepancy remains driven by a \(\sim 4\sigma \) tension in \(\left\langle A_\text {FB}^{(\mu )}\right\rangle \) and a \(\sim 2\sigma \) tension in \(\left\langle A_\text {FB}^{(e)}\right\rangle \). The experimental and theoretical correlations with other observables play a minor role, see also Fig. 2. We note in passing that S-P wave interference cannot affect the numerator of \(A_\text {FB}\), and can only decrease the magnitude of \(A_\text {FB}\) by a coherent contribution to the denominator [77].

We refrain from providing the value of \(|V_{cb}|\) from either lepton mode, which would be compatible with the values obtained from the lepton-flavour average in Refs. [13, 74] and continue to exhibit a substantial tension with respect to the inclusive determination \(|V_{cb}|_{B\rightarrow X_c} = (42.00 \pm 0.64) \cdot 10^{-3}\) [78]. Given the incompatibility of the data with the SM prediction, we consider it misleading to use it to extract \(|V_{cb}|\).

To summarize, we find in our fits a discrepancy between data and the SM of \(\sim 4\sigma \). This result is stable with respect to the treatment of the d’Agostini bias, the type of fit we are performing (\(\chi ^2\) fit vs. pseudo-Monte Carlo techniques), and importantly also the precise treatment of the correlations of the systematic uncertainties between electrons and muons. We reiterate, however, the concerns discussed in Sect. 3.1: the statistical correlation matrices given in [13] do not seem to be correct, since they are not singular as they should be, given the performed redistribution of events to obtain the different single-differential rates. Bearing this caveat in mind, we still investigate in the following the possibility that the observed discrepancy is an effect of BSM physics.

4.2 Possible BSM interpretation

We consider the possibility that the observed discrepancy is due to BSM physics. To that aim, we investigate the Lagrangian Eq. (7) in the limit of lepton-flavour conservation \(\ell =\ell '\). From our general analysis in Sect. 2 we have seen that \(\left\langle A_\text {FB}^{(\ell )}\right\rangle \) is special in that it is determined to \({\mathcal {O}}(m_\mu )\) only by interference contributions \(\sim {{\,\mathrm{Re}\,}}(C_i^\ell C_j^{\ell *})\), and is the only observable in the single-differential distributions to which interference terms contribute in the massless limit. Given the size of the observed effect, \(\Delta A_\text {FB}/ \Sigma A_\text {FB}\sim {\mathcal {O}}(10\%)\), a muon-mass suppressed contribution does not seem likely as its source. This suggests that in order to accommodate \(\Delta A_\text {FB}\), the first options to consider are BSM contributions to right-handed vector operators, to both pseudoscalar and tensor operators, or to left-handed vector operators. Notably, the first two options correspond to second-order BSM contributions: for the interference between pseudoscalar and tensor operators this is obvious. For the right-handed vector operator the interference term \({{\,\mathrm{Re}\,}}(C_V^\ell C_A^{\ell *}) = |C_{V_R}^\ell |^2-|C_{V_L}^\ell |^2\) is manifestly second order in the \(C_{V_R}^\ell \). For the BSM contributions to the left-handed vector operator only, the discussion is more involved. The interference terms \({{\,\mathrm{Re}\,}}(C_V^\ell C_A^{\ell *}) = |C_{V_R}^\ell |^2-|C_{V_L}^\ell |^2\) contain in principle a linear contribution in \(|C_{V_L}^\ell |^2 = |1 + \Delta C_{V_L}^{\ell ,\text {BSM}}|^2\), wherein the 1 stands for the SM contribution. However, if \(C_{V_L}^\ell \) were the only BSM contribution it would cancel in all normalized observables. This is not true for the contribution from right-handed vector operators, the real parts of which, however, enter linearly in \(|C_{A,V}^\ell |^2\). Given the compatibility of all other observables with the SM, this scenario would therefore require the main contribution to either have a sizable imaginary part, or specific cancellations with other BSM contributions, in order not to upset this agreement.

Taking here the Belle data at face value, we perform fits analogous to the one described above for the SM, i.e., varying the full set of form factor parameters, in this case together with different sets of BSM contributions. Note that we keep our description qualitative, since numerical statements are likely to be upset by an eventual correction of the Belle dataset [13]. For the same reason we do not perform a combined fit with other \(b\rightarrow c\ell {{\bar{\nu }}}\) modes, which would of course be required to confirm the viability of potential BSM scenarios that resolve the tension in this dataset.

We find that either contributions from right-handed vector operators, or from both pseudoscalar and tensor operators are necessary to accommodate the observed \(\Delta A_\text {FB}\), confirming our previous considerations. In order to describe the dataset well with real BSM Wilson coefficients, only, LFUV contributions to both the right- and left-handed vector operators are required.

The three minimal BSM scenarios that fit the present Belle \(\bar{B}\rightarrow D^*\ell {{\bar{\nu }}}\) data [13] can be summarized as follows:

  1. 1.

    \(C_{V_R}^\ell \ne 0\): This scenario does require a sizable imaginary part (as anticipated above) and LFU violation. The latter fact is interesting, since it might point to BSM physics beyond SMEFT [44]. The imaginary part of \(C_{V_R}^\ell \) implies that \(\left\langle A_8^{(\ell )}\right\rangle \) and \(\left\langle A_9^{(\ell )}\right\rangle \) are sizable. We strongly encourage an experimental measurement of these observables.

  2. 2.

    \(C_{V_R}^\ell \ne 0\) and \(C_{V_L}^\ell \ne 1\): This scenario can obviously describe the data well, given that in principle already \(C_{V_R}^\ell \ne 0\) suffices. However, to our surprise it is also compatible with an LFU BSM contribution to \(C_{V_R}^\ell \), which is required in a SMEFT scenario. Enforcing this flavour-universal \(C_{V_R}^\ell \), i.e., \(C_{V_R}^e = C_{V_R}^\mu \), results in significantly different absolute values and a sizable phase difference between \(C_{V_L}^e\) and \(C_{V_L}^\mu \). Sizable \(\left\langle A_{8,9}^{(\ell )}\right\rangle \) are also likely in this case, although not strictly necessary. It is possible to have all BSM coefficients real, and hence \(\left\langle A_{8,9}^{(\ell )}\right\rangle = 0\), but only with a phase between the left-handed coefficients \(\phi _L=\pi \). This corresponds to a BSM contribution of about twice the SM one and is therefore highly fine-tuned.

  3. 3.

    \(C_{P}^\ell \ne 0\) and \(C_{T}^\ell \ne 0\): Also this scenario provides a good fit to the data, both for complex and real-valued Wilson coefficients. The fact that both \(C_T^\ell \) and \(C_P^\ell \) are required means that this scenario can be tested by measuring \(\left\langle S_{6c}^{(\ell )}\right\rangle \) and \(\left\langle A_7^{(\ell )}\right\rangle \), at least one of which is expected to show significant differences relative to their SM predictions, which are small for \(\left\langle S_{6c}^{(\ell )}\right\rangle \) and zero for \(\left\langle A_7^{(\ell )}\right\rangle \).

While we do not attempt to include additional datasets as explained above and therefore cannot quantitatively test specific BSM scenarios, we still observe a few general features of a possible BSM explanation in the context of the B anomalies, especially in \(b\rightarrow c\tau {{\bar{\nu }}}\) transitions:

  1. 1.

    While moderate shifts in one or several Wilson coefficients are required to fit the present Belle data [13], the total rates are not strongly affected. Hence it is not possible to explain the discrepancy in \(R(D^{*})\) with these shifts, i.e. additional new contributions in \(b\rightarrow c\tau {{\bar{\nu }}}\) coefficients are required to explain the deviations of LFU ratios involving \(\ell = \tau \) from SM predictions.

  2. 2.

    If the observations made here based on the Belle data persist after future updates or corrections, they would have strong implications for scenarios addressing the B anomalies: Scenarios that only shift \(C_{V_L}^\ell \) would be ruled out, which are currently favoured as simultaneous explanations of the \(b\rightarrow c\tau {{\bar{\nu }}}\) and \(b\rightarrow s\ell ^+\ell ^-\) anomalies.

  3. 3.

    Based on the picture provided by the observables, one would naively expect a hierarchy \(\Delta _\mu > \Delta _e\). In light of the more substantial deviations in \(b\rightarrow c\tau {{\bar{\nu }}}\), this could be extended to \(\Delta _\tau > \Delta _\mu \), which is quite natural in scenarios addressing both B anomalies. However, we find that \(\Delta _\mu > \Delta _e\) is far from being established in our fits at the level of the Wilson coefficients.

There will therefore be far-reaching consequences for the field of particle physics, should this discrepancy be confirmed.

5 Conclusions

In this article we pave the way for precision analyses of \(b\rightarrow c\ell {{\bar{\nu }}}\) processes beyond the assumption of \(e-\mu \) universality. This endeavour is important for the determination of \(V_{cb}\) in the Standard Model, a complete understanding of the weak effective theory (WET) beyond the SM (BSM), and also to gain new insights into the persistent \(b\rightarrow c\tau {{\bar{\nu }}}\) anomaly. We focus on the angular distribution in \({\bar{B}}\rightarrow D^* \ell {{\bar{\nu }}}\) with light leptons \(\ell = e, \mu \) and highlight strategies for improved experimental analyses.

We discuss the complete set of CP-even and CP-odd angular observables that arise from the fully-differential angular distribution of \({\bar{B}}\rightarrow D^* (\rightarrow D \pi )\, \ell {{\bar{\nu }}}\). In particular we discuss the influence of a finite mass of the charged lepton on these observables in and beyond the SM. We consider in detail the specific case of four single-differential CP-averaged rates that have been experimentally analyzed in Refs. [12, 13]. We find that only four flavour-specific angular observables per lepton flavour are sufficient to describe the three single-differential CP-averaged angular distributions including arbitrary BSM contributions: the lepton-forward-backward asymmetry \(A_\text {FB}^{(\ell )}\), the longitudinal \(D^*\)-polarization \(F_L^{(\ell )}\), and two further observables \(\widetilde{F}_L^{(\ell )}\) and \(S_3^{(\ell )}\). However, we find that it is principally not possible to extract the full information on the BSM contributions to the WET Wilson coefficients for the electron mode when using only the single-differential CP-averaged rates. For the muon mode, part of that information enters only muon-mass suppressed, although it can be extracted without that suppression when considering a different presentation of the data. We further emphasize the existence non-linear relations between the Wilson coefficients that allow to test for lepton-flavour violation (LFV) and right-handed neutrinos.

The data is provided by the Belle Collaboration in folded form. However, from a phenomenological point of view it is more efficient to present the background-subtracted and efficiency-corrected data in terms of angular observables, since under the assumption of a P-wave \(D \pi \) final state only four observables are necessary to represent 30 of the provided bins, per lepton flavour. This model-independent presentation also facilitates the phenomenological interpretation of the angular distribution. We encounter an issue with the statistical correlation matrices that can only be clarified by the Belle collaboration. We describe our approach to the combination of statistical and systematic correlations for the electron and muon datasets and extract the non-redundant lepton-flavour specific CP-averaged angular observables from the Belle data. For most of the angular observables we find good agreement with our up-to-date SM predictions, except for \(A_\text {FB}^{(\mu )}\). The observed tension with the SM predictions is even more pronounced for the observable \(\Delta A_\text {FB}\equiv A_\text {FB}^{(\mu )} - A_\text {FB}^{(e)}\) in which the correlations of form factors lead to a strong cancellation of uncertainties, reaching the \(4\,\sigma \) level. We perform numerous checks that this tension is not a result of our specific treatment of the data. In particular, even when allowing for arbitrary systematic correlations between the electron and muon data, we find that this tension does not drop below \(3.6\,\sigma \). Assuming a limited influence of the issue with the statistical correlation matrices, this constitutes evidence for lepton-flavour universality violation.

We continue by investigating in a qualitative manner the most economic BSM scenarios that can potentially explain the observed tensions. To this end, we assume lepton-flavour conservation, but allow for lepton-flavour non-universality in the WET description. We find that either right-handed vector operators or both pseudoscalar and tensor operators are necessary to accomodate the observed tension. If only right-handed vector operators are present, large imaginary parts in the Wilson coefficients are necessary. As a consequence, the CP-odd angular observables \(A_{8,9}^{(\ell )}\) would be expected to deviate sizably from their SM predictions. A solution with purely real-valued Wilson coefficients appears only as a highly fine-tuned solution in a combined scenario with left- and right-handed vector operators. For the combination of pseudoscalar and tensor operators, we do not find the necessity of sizable imaginary parts. In this case, \(\left\langle S_{6c}^{(\ell )}\right\rangle \) or \(\left\langle A_7^{(\ell )}\right\rangle \) are expected to show significant differences relative to their SM predictions. None of these three scenarios coincides with the preferred explanation of the \(b\rightarrow c \tau {{\bar{\nu }}}\) anomaly.

Given the potentially far-reaching consequences of our findings, we consider it essential that the Belle collaboration reviews – and if need-be corrects – the published dataset from Ref. [13]. Without such scrutiny, it is impossible to determine the impact of the identified issues on the extracted values of \(V_{cb}\) and the form factor parameters, and on tests of LFU. We strongly recommend that future measurements separate between the two light-lepton flavours in a transparent way. This is also important for the comparison with existing and upcoming LHCb analyses, which focus on the muon mode, only.