Lattice QCD study of inclusive semileptonic decays of heavy mesons

We present an ab initio study of inclusive semileptonic decays of heavy mesons from lattice QCD. Our approach is based on a recently proposed method, that allows one to address the study of these decays from the analysis of smeared spectral functions extracted from four-point correlators on the lattice, where the smearing is defined in terms of the phase-space integration relevant to the inclusive decays. We present results obtained from gauge-field ensembles from the JLQCD and ETM collaborations, and discuss their relation with theoretical predictions from the operator-product expansion.


Introduction
The theoretical study of semileptonic decays of B mesons continues to be an important and very active area of research in high-energy physics: this interest is mainly driven by the fact that these decays encode direct information on the modulus of two of the elements of the Cabibbo-Kobayashi-Maskawa (CKM) quark mixing matrix [1,2], namely |V ub | and |V cb |, and may be a sensitive probe to new physics beyond the Standard Model (SM). As a matter of fact, many different types of extensions of the SM are expected to affect flavour physics, inducing new flavour-changing interactions, complex phases in the CKM matrix, possible violations of lepton-flavour universality, etc. Even if the mass scales of new particles beyond the SM turned out to be very high, quantum effects of the associated fields could leave detectable imprints onto the physics of bottom and charm quarks.
On the experimental side, recent results from B factories reveal some tension with SM predictions, but also exhibit puzzling discrepancies between exclusive and inclusive channels [3][4][5][6]. For theorists, this provides further motivation to improve the understanding of these decays and to refine their predictions. Currently, the most powerful tool to obtain theoretical predictions from the first principles of QCD is the one based on numerical simulations in the lattice regularisation of the theory [7]. It is an intrinsically non-perturbative approach, that allows one to obtain accurate and systematically improvable predictions for a variety of quantities, including those relevant for decays of heavy mesons: for an up-to-date world review of lattice results relevant to flavour physics, see ref. [8]. It should be emphasized, however, that most lattice calculations focus on exclusive decays: in a nutshell, this is due to the fact that inclusive processes consist of a potentially very large number of physical states-including states featuring multiple hadrons, which pose their own challenges-and their systematic analysis in numerical calculations is very impractical, if possible at all.
Recently, however, novel approaches have been put forward, that allow one to address inclusive decays in lattice QCD. As an example, in ref. [9] it was pointed out that the differential rate for inclusive decays of the type B → X ν (where X denotes all hadronic states that are compatible with the semileptonic decay of the bottom quark) could be evaluated by relating the hadronic tensor (where J µ is the weak current associated with the b quark decay, p and p X respectively denote the four-momenta of the B meson and of the X state, while q is the transferred fourmomentum) to the forward scattering matrix element T µν (p, q) [10,11], and by extracting the latter through an analytical continuation of lattice results obtained for this quantity in an unphysical region, where the decay is forbidden by kinematics. In ref. [12], on the other hand, it was proposed to study decay and transition rates into final states with an arbitrary number of hadrons by reconstructing the spectral function associated with a Euclidean four-point function in a finite volume from lattice correlators, with an appropriate smoothing protocol. A closely related approach was discussed in ref. [13] (see also refs. [14,15]).
Finally, in ref. [16] it was suggested to study inclusive decays on the lattice by computing a suitable "smeared" spectral density ρ(w) of hadron correlators, where the smearing is defined by the integration over the allowed phase-space region. Also in this case, the strategy involves the lattice determination of a class of four-point correlation functions. This technique allows one to bypass the need for analytical continuation, and, at least in principle, paves the way for the determination of the total semileptonic width as well as of the moments of any kinematic distribution associated with general B → X ν decays.
In the present work, we focus on the method proposed in ref. [16], presenting the results of explicit lattice calculations based upon this framework. We discuss results from two different types of ensembles of lattice QCD configurations, and we also compare them with an analytical calculation based on the operator-product expansion (OPE) [17,18] within the framework of an expansion in inverse powers of the heavy-quark mass [11,19,20].
The structure of this article is the following. In section 2, we recapitulate the formulation of the method, extending the presentation in ref. [16] with additional remarks, and commenting on its application to observables of particular interest (including differential distributions and moments). In section 3, we present an explicit implementation of the method in lattice QCD calculations, using two different ensembles of configurations, generated by the JLQCD collaboration and by the ETM collaboration; the final part of the section is devoted to a technical discussion about the extrapolation to the limit in which the smearing parameter σ tends to zero. The following section 4 presents the analytical calculation based on the OPE, and compares its predictions with the results from lattice QCD. Finally, in section 5 we summarize our results and discuss future prospects.
2 Formulation of the method and application to observables

Spectral representation of the inclusive decay rate
Here we review the formalism to calculate the inclusive semileptonic decay rate in lattice QCD [16]. To be specific, we consider the semileptonic decay of a B meson to charmed final states X c with a pair of massless leptons ( ν) through the flavour-changing current We start from the differential decay rate where G F is the Fermi constant and |V cb | is the relevant CKM matrix element. Here we work in the rest frame of the initial B meson, so that p = (m B , 0) , q = p + pν = (q 0 , q) , r = p − q = (ω, −q) , (2.2) where the differential decay rate is a function of the three kinematical variables q 2 , q 0 and the lepton energy E , and is given by the product of the leptonic tensor, L µν = p µ p ν ν − p · pνg µν + p ν p μ ν − i µανβ p ,α pν ,β , (2.3) and the hadronic tensor, (2.4) The sum over the charmed states X c (r) actually includes an integral over r. 1 1 More precisely, the sum over the charmed states Xc(r) should be written as Xc |Xc(r) Xc(r)| → Xc d 3 r (2π) 3 1 2E Xc(r) |Xc(r) Xc(r)| when the standard relativistic normalization for a single-particle state is employed for Xc.
Performing the integral over the lepton energy E in its kinematical range, i.e. from (q 0 − q 2 )/2 to (q 0 + q 2 )/2, and changing the remaining kinematical variables from (q 0 , q 2 ) to (ω, q 2 ), the total rate can be written as where The quantity X(ω, q 2 ) appearing above is a linear combination, with coefficients depending on the kinematical variables, of the different components of the hadronic tensor. Indeed, Lorentz invariance and time-reversal symmetry allow one to decompose W µν into invariant structure functions according to By introducing the following basis for three-dimensional space, 8) and the hadronic quantities it is easy to see that, in the rest frame of the B meson, the information contained in W µν can be equivalently parametrized in terms of Y (i) ≡ Y (i) (ω, q 2 ). A convenient representation of X(ω, q 2 ) is then given by At this point, some observations are in order. First, we notice that the parity-violating structure function W 3 (or equivalently Y (5) ) does not contribute to the differential decay rate after the integral over E has been performed (this will not be the case for the moments considered below). Then, by rewriting eq. (2.4) as whereĤ andP are the QCD Hamiltonian and momentum operators, we explicitly see that in the rest frame of the B meson the different components of the hadronic tensor are functions of ω and q (we already used this information in changing the integration variables from (q 0 , q 2 ) to (ω, q 2 )). In the following we refer to eq. (2.11) as the spectral representation of the hadronic tensor. Flavour and momentum conservation imply that the hadronic tensor vanishes identically for energies ω < ω min . From this observation one obtains ω min = m 2 D + q 2 . This means that by introducing the kernels the ω integral in eq. (2.5) can be rewritten as We close this subsection by providing an equivalent representation of X(ω, q 2 ) which will be useful later. It is given by where the Z (l) (ω, q 2 ) are again linear combinations of the Y (l) , By introducing the kernels that are functions of the single variable x = ω max − ω, we thus havē

Decay rate from Euclidean correlators
In order to calculateX(q 2 ), as given in eq. (2.13) or eq. (2.17), we need to evaluate the integral over ω of the different components of the hadronic spectral density (2.11) with the kernels K (l) (ω, q 2 ) or Θ (l) (ω max − ω). To this end, following ref. [9], we first establish the connection between suitably chosen correlation functions that can be calculated on the lattice, and W µν . We start by considering the Euclidean correlator whereφ B (0; t) is a B-meson creation/annihilation operator projected onto zero spatial momentum by integrating over space at a time t. A zero-momentum B meson is thus created at time t src and annihilated at t snk . The two currents are inserted in between, at times t 2 and t 1 . The charmed hadrons are created at time t 1 with a momentum insertion −q and propagate until they are transformed back to the B-meson state at time t 2 . The four-point function C µν is saturated by the B-meson non-local matrix element when the double limit t src → −∞, t snk → ∞ is taken. To include a proper normalization, one can analyse where C(t) is the B-meson two-point function and Z B is its residue when a large time separation is taken, Starting from eq. (2.19) we can establish the connection between M µν (t; q) and the hadronic tensor given in eq. (2.11). We have The problem of the calculation ofX(q 2 ) is now reduced to that of trading the integral of W µν (ω, q) with the kernels e −tω for the integral with the kernels Θ (l) (ω max − ω) (or K (l) (ω, q 2 )).
The general inverse problem represented by the extraction of hadronic spectral densities from Euclidean correlators is notoriously ill-posed. Recently, methods to cope with these problems have been proposed, and they treat the above mentioned integrals with some kernels. In this paper we use two approaches proposed in refs. [13,21]. The differences between the two methods will be discussed in detail in the following sections. Here we concentrate on the common starting point of the two approaches, which are actually closely related to each other.
We start by introducing an arbitrary length scale a. On the lattice this will be identified with the lattice spacing. The correlators M µν (t; q) will be computed at times t = aτ where τ ≥ 0 is an integer. By introducing the variable x = e −aω (and its inverse mapping ω = − log(x)/a), standard theorems of numerical analysis guarantee that any C ∞ function f (ω) ≡ g(x) in the interval ω ∈ [0, ∞] (corresponding to x ∈ [0, 1]), vanishing at ω = ∞ (x = 0), can be approximated with arbitrary precision in terms of polynomials in x according to (2. 23) This implies that the integral of the product of W µν (ω, q) with f (ω) can be computed, once the coefficients g τ are known, by using the linear relation This procedure cannot be applied straightforwardly to the calculation of X(q 2 ) because the kernels Θ (l) (ω max − ω) (or K (l) (ω, q 2 )) are not smooth, i.e. they contain a discontinuity due to the θ-function. In this case, a sequence of polynomials can still converge to the kernels in mean, which would be sufficient for our purposes, but a reasonable approximation would imply a very large number of terms. However, the problem can be solved by introducing smeared C ∞ versions of the θ-function, θ σ , such that the sharp step-function is recovered in the limit in which the smearing parameter σ is sent to zero, lim σ→0 θ σ (x) = θ(x). By considering, as suggested in ref. [16], the corresponding smeared versions of the kernels entering the definition ofX(q 2 ), that we call Θ as well as similar relations in the case of the kernels K (l) (ω, q 2 ). A few observations are now in order. The first concerns a subtle theoretical issue. The smearing procedure, which is algorithmically required to implement the procedure just outlined, is also necessary for theoretical reasons. Hadronic spectral densities, and therefore also W µν (ω, q), are elements in the space of distributions and their product with another distribution, such as the θ-function, can only be defined through a regularization procedure (when it exists). The issue is particularly important in the case of lattice simulations because they are necessarily performed on a finite volume. Finite-volume spectral functions, due to the quantization of the energy spectrum, are sums of isolated δ-function singularities and their connection with the corresponding physical quantities requires an ordered double-limit procedure: first the infinite volume limit has to be taken and only after that, if the quantity is non-singular, can one take the σ → 0 limit.
The second observation is related to the fact that the problem we are addressing is particularly hard from the computational point of view. In the limit of very small σ the coefficients g (l) τ (ω max , σ) of eq. (2.25) tend to become arbitrarily large in modulus and oscillate in sign. Since lattice correlators are unavoidably affected by statistical and systematic errors, in these cases the resulting uncertainties on the sums on the left-hand side of eq. (2.26) tend to explode. The two approaches of refs. [13,21] differ for the procedures used to determine the coefficients g (l) τ (ω max , σ), once the series is truncated at τ = τ max , in such a way to keep both statistical and systematic errors under control.

Kernel approximation
In this subsection we review the methods of refs. [13,21] by highlighting the differences in the procedures used to approximate the smearing kernels. To simplify the formulae, we shall consider a generic kernel f (ω), that will then be identified with the kernels Θ (l) (ω max − ω)/m l B or K (l) (ω, q)/m l B , and a generic correlator 27) to be identified with M µν (t; q), so that ρ(ω) will correspond to W µν (ω, q). In this work we shall not address the systematics associated with the finiteness of the extent of the lattice in the temporal direction, see refs. [13,15] for an extended discussion of this issue and, in general, for more details concerning the algorithm and its applications.
In the method of ref. [13] the coefficients g τ corresponding to the approximation of f (ω) are determined by minimizing the functional where λ ∈ [0, 1] is the so-called "trade-off parameter" (see below) and the functionals A[g] and B[g] are given by (2.29) Here Cov [C(t), C(t )] is the statistical covariance of the correlator C(t) and, consequently, the functional B[g] is positive definite. The functional A[g] is also a positive definite quadratic form in the coefficients g τ . Therefore, the minimum conditions are a linear system of equations to be solved for the coefficients g λ τ . These coefficients define the approximation of f (ω) and the associated estimator for the integral of ρ(ω) with f (ω) according to The functional B[g λ ] is the statistical variance of ρ λ [f ] normalized with the square of the correlator in zero and, therefore, vanishes in the ideal case of infinitely precise input data. On the other hand, A[g λ ] measures the distance between the target kernel f (ω) and its approximation f λ (ω) in the range 2 ω ∈ [E 0 , ∞]. In fact A[g λ ] is the squared L 2 -norm in function space of the difference f λ (ω) − f (ω) and can only vanish in the limit t max → ∞.
In the absence of errors, the coefficients g λ τ that minimize A[g] provide the best polynomial approximation of f (ω) with respect to the L 2 -norm. This has to be compared with the method of ref. [21] that provides the best polynomial approximation of f (ω) with respect to the L ∞ -norm (see below). In the presence of errors, the coefficients g λ τ that minimize W λ [g] represent a particular balance between statistical and systematic errors, as dictated by the λ parameter. For small λ the estimator ρ λ [f ] is close to ρ[f ] but with a large statistical uncertainty. Conversely, for large λ the estimator ρ λ [f ] has a small statistical error but differs significantly from ρ[f ]. When evaluated at the minimum, the functional W λ [g] is a function of λ only, thus defining W (λ) ≡ W λ [g λ ]. The prescription suggested in ref. [13] to choose the optimal value of the trade-off parameter defines λ such that . This can be understood as the condition of "optimal balance" between statistical and systematic errors. The numerical results discussed in subsection 3.2 have been obtained using this method, also monitoring the stability of the results with respect to λ ≤ λ .
In ref. [21], on the other hand, the function f (ω) is approximated using the Chebyshev approximation as where T * j (x) is a (shifted) Chebyshev polynomial of the j-th order. The coefficients c * j are determined only by the function f (ω): This yields the best approximation in the sense of the L ∞ -norm 3 . The approximation of 2 The parameter E0 can be adjusted by exploiting the fact that ρ(ω) has support only for ω > ωmin, so for ω < ωmin. The same holds for ρ λ [f ] so that the functional form of f λ (ω) can be left unconstrained for ω < ωmin. Any E0 < ωmin is therefore a viable choice in determining the coefficients g λ t so E0 can be chosen to improve the numerical stability of the minimization procedure. 3 More precisely, the Chebyshev approximation of a generic function f (y) for y ∈ [−1, 1] is in practice equivalent, although not identical, to the optimal polynomial approximation of f ( g; y) = N τ =0 gτ y τ obtained by minimizing the L∞-norm f (y) − f ( g; y) ∞ = max y∈[−1,1] |f (y) − f ( g; y)| with respect to the coefficients g. In fact, the Chebyshev approximation is obtained by minimizing the weighted squared L2norm given by 1 −1 dy w(y)|f (y) − f ( g; y)| 2 with w(y) = 1/ 1 − y 2 . By setting instead w(y) = 1, as done in the case of the method of ref. [13], one gets the Legendre polynomial approximation.
the ω-integral is then constructed as where |ψ µ ≡ e −Ĥt 0 J µ |B is defined such that the state is evolved for some small time t 0 after applying the current insertion: this allows one to avoid any ultraviolet divergence due to contact terms of two currents. To reflect this change, the kernel f (ω) is multiplied by e 2ωt 0 to cancel the time evolution. The right-hand side of eq. (2.35) can be reconstructed from the matrix elements (2.19) using M µν (t + 2t 0 )/M µν (2t 0 ). An advantage of this construction is that the matrix element appearing on the righthand side of eq. (2.35), ψ µ |T * j (e −aĤ )|ψ ν / ψ µ |ψ ν , is strictly bound between −1 and +1, by construction of the Chebyshev polynomial. This corresponds to the condition that the eigenvalues of e −Ĥ lie between 0 and 1, or equivalently that the eigenvalues ofĤ are positive semi-definite. Then, the convergence of the series appearing in eq. (2.35) is dictated by that of the coefficients c * j . Since c * j can be easily calculated for arbitrarily large j's, the error due to the truncation in (2.35) can be rigorously estimated.
The constraint | ψ µ |T * j (e −aĤ )|ψ ν / ψ µ |ψ ν | ≤ 1 is not automatically satisfied in the presence of statistical errors. Since the Chebyshev polynomial T * j (x) is a sign-alternating series of growing powers of x with (exponentially) large coefficients, this constraint is satisfied after huge cancellations for large j. Therefore, even a small statistical error of the lattice correlator can easily violate the constraint. In the numerical analysis, one should add the constraint when the Chebyshev matrix elements are determined by a fit, see ref. [21] for details. The higher-order terms are then masked by the statistical uncertainties and become basically undetermined within ±1, so that they only contribute to the truncation error.
In both methods, a good approximation is obtained only when the kernel function is sufficiently smooth. If this is not the case, the truncation error becomes significant, e.g. due to unsuppressed higher-order coefficients c * j in the case of the Chebyshev approximation. Unfortunately, the kernel functions K (l) (ω, q) or Θ (l) (ω max − ω) are not smooth, because they contain the Heaviside function θ(ω max − ω). We therefore introduce smeared versions of the θ-function and then we take the limit of σ → 0 to recover the unsmeared kernel. This has been done by considering three different smeared θ-functions, and by extrapolating the numerical data to the σ → 0 limit. In the following we shall refer to θ s σ (x) as the "sigmoid function", to θ s1 σ (x) as the "modified sigmoid function" and to θ e σ (x) as the "error function". Any choice of the parameters r s1 and r e appearing in the previous formulae corresponds to a legitimate definition of the smearing kernels that approach the same σ → 0 limit, i.e. the θ-function. By adjusting the values of these parameters one can change the rate of convergence to the θ-function and balance between statistical and systematic errors. In the following we set r s1 = 2.2 and r e = 2.0. This (empirical) choice gives statistical errors of the same order of magnitude for the three kernels at fixed σ and similar (although not identical) shapes for θ s σ (x) and θ e σ (x) while θ s1 σ (x) results into a smoother approximation of the θ-function. A combined analysis of smearing kernels that have rather different shapes at fixed σ is in fact helpful in order to quantify the systematics associated with the σ → 0 extrapolations (see also ref. [15]).

Decomposition of the total rate
The expression of the total rate in eq. (2.5) can also be used to compute the differential decay rate in q 2 , i.e. dΓ/dq 2 = G 2 . This can be further decomposed into its contributions from parallel ( ) and perpendicular (⊥) components, where the ⊥ components are defined as those involving the polarization vector * (α) , while the ones are the rest. In addition, we also separate the contributions from vector (V ) and axialvector (A) current insertions. Since two currents are inserted, we have V V , AA, as well as V A and AV contributions. Among them, V A and AV do not contribute to the differential rate after integrating over E , and thus to the total decay rate. We therefore analyze four components: V V , V V ⊥ , AA , AA ⊥ . For the lepton energy moments, the V A and AV insertions can also appear (see below).

Moments
It is also interesting to consider the moments of various kinematical quantities. In particular, two types of moments have been studied experimentally: the hadronic mass moments (M 2 X ) n and the lepton energy moments E n . They are defined as (2.38) The strategy to compute these moments on the lattice is the same as in the method described above. For the hadronic mass moments defined in eq. (2.37), the numerator contains extra powers of ω 2 − q 2 , with which the ω-dependence of X (0) , X (1) , X (2) is modified. Otherwise, the basic procedure remains the same. Beside these quantities which require an integration over the whole q 2 range, we will also consider moments at fixed values of q 2 , i.e. differential moments: (2.40) and the second central moment or variance of the lepton energy distribution In the case of leptonic moments, the E integral is modified with respect to (2.5). The integrand in the denominator is the same as in (2.10); if we set the q momentum direction n along the k-th axis, the two vectors a can be chosen in the perpendicular directions of the i-th and j-th axes, and we can re-express X(ω, q 2 ) as where repeated indices are not summed. The integrand in the numerators of eq. (2.38) and eq, (2.40) depends on the exponent n . For n = 1, it reads where the last term corresponds to the insertion of V A or AV . The other terms are the same as X n =0 , up to a factor q 0 /2. The next orders are more involved: Again, the term with W ij survives for V A and AV insertions, while the others are from V V or AA. The contributions in eq. (2.42) can be rearranged in such a way that the ω-integral contributing to the numerator of eq. (2.40) takes the form where the Z (l) n =1 (ω, q 2 ) are given by The previous expressions are analogous to the corresponding expressions for the differential decay rate, eq. (2.17) and eq. (2.15), but include the sum of four terms with the one corresponding to l = 3 that involves the kernel Θ (3) (ω max − ω). In this basis the second leptonic moment is given bȳ 48) and the first hadronic moment is where the Z (l) n=1 (ω, q 2 ) are given by

Numerical implementation in lattice QCD
In this section, we discuss in detail two different implementations of the method in lattice QCD calculations. First, in subsection 3.1 we present an implementation based on configurations generated within the JLQCD collaboration. Then, in subsection 3.2 we discuss an analogous calculation based on an ensemble generated by the ETM collaboration (ETMC).
In both cases, we specify the technical details of the lattice calculations, and discuss the different types of uncertainties affecting the results. Finally, in subsection 3.3, we discuss a few technical aspects related to the extrapolation to the σ → 0 limit.

Lattice implementation with JLQCD configurations
One dataset used to demonstrate the lattice computation of the inclusive semileptonic decay rate is based on the ensemble generated by the JLQCD collaboration. See the supplementary materials of [22] for details of the gauge configurations. It employs Möbius domain-wall fermions for both valence and sea quarks. In the sea, 2+1 flavors of light and strange quarks are included. The light quark mass corresponds to a pion of mass around 300 MeV; the strange quark mass is slightly heavier than its physical value. The gauge action is the tree-level O(a 2 )-improved Symanzik at β = 4.35. The corresponding lattice spacing is a 0.055 fm, corresponding to inverse lattice spacing 1/a = 3.610(9) GeV. The lattice volume is 48 3 × 96, so that the spatial volume is about L 3 = (2.6 fm) 3 . This ensemble corresponds to "M-ud3-sa" of [22]. (See also [23].) The valence quarks are also described by Möbius domain-wall fermions. The charm quark mass is tuned to its physical value (see ref. [23] for details), while the bottom quark mass is set at 1.25 4 2.44 times the charm quark mass. The spectator quark is a strange quark, so the process corresponds to the inclusive semileptonic decay of a B s meson, albeit with a light B s meson mass of 3.45 GeV.
The measurement is carried out on 100 gauge configurations and is replicated four times on each configuration with shifted position of the initial source. The B s meson is created by a interpolating pseudoscalar operator, which is spatially smeared by a gauge-invariant operator (1 − (α/N )∆) N with a discretized Laplacian ∆ and parameters α = 20 and N = 200. The source points are spread over the source time slice t src = 0 with Z 2 noises to improve statistics. The initial B s meson is thus projected to zero spatial momentum. The B s meson on the other end is created by another pseudoscalar operator of the same type placed at the time slice t snk = 42 using a sequential source from the spectator strange quark propagator. The bottom quark propagates from there to a time slice t 2 , where the first b → c current is inserted with momentum q and is fixed at t 2 = 26. The charm quark propagator then connects the time slice t 2 to t 1 where the other b → c current is contracted with momentum insertion −q. The charm quark propagator is computed repeatedly for each choice of the current operator and momentum insertion at t 1 . We fix the time separation between t src and t snk under an assumption that the ground-state B s meson state dominates the signal between t src and t 1 or between t 2 and t snk . This separation is at least 16 in the lattice unit, which corresponds to 0.9 fm. The saturation is confirmed in [9].
The matrix element (2.19) is then constructed as in eq. (2.20). For the analysis of this ensemble we applied the Chebyshev polynomial approximation, following [16]. The polynomial order is set to N = 15, but the results are unchanged within the statistical error with other choices beyond N = 12. The σ → 0 limit is taken for each point assuming a polynomial in σ with data points at σa = 0.02, 0.05, 0.10 and 0.20. For all the cases, the extrapolation is small compared to the statistical error on the finite values of σ.
The results are shown in Figure 1. The left panel isX as a function of q 2 , while the integrand to produce the numerator of E is shown in the right panel. The lattice data are obtained at momentum transfer q at (0,0,0), (0,0,1), (0,1,1), (1,1,1), (0,0,2) in units of The lattice data with different momentum insertion q are analyzed together to account for the statistical correlations among them. We then fitX in a polynomial of q 2 including terms up to (q 2 ) 2 .
We observe that the inclusive results for each channel are consistent with the expected ground-state contributions. This means that the excited-state contributions are small, which is consistent with our expectation from the B → D * * ν form factors based on heavyquark effective theory (HQET) [24]. Also phenomenologically, it is plausible because the mass of the initial bottom quark is smaller than its physical value. The heavy quark symmetry predicts that the wave-function overlap is 1 at zero recoil when the initial and final masses are degenerate.
We also calculate the differential moments. The numerators for the hadronic mass moments M 2 X and (M 2 X ) 2 are shown in figure 2, while that for E is in figure 1 (right panel). The corresponding differential moments, evaluated for each channel at individual momentum q 2 , are shown in figure 3 and in figure 4.

Lattice implementation with ETMC configurations
The ETMC gauge ensemble used in this work is the one named B55.32, generated by ETMC together with other 14 ensembles with N f = 2 + 1 + 1 dynamical quarks in refs. [25,26] for determining the average up/down, strange and charm quark masses. The Iwasaki action [27] and the Wilson twisted-mass action [28][29][30] were used for gluons and sea quarks, respectively. Using the mass renormalization constants determined in ref. [31] the physical    In order to avoid the mixing of Kand D-meson states in the correlation functions a non-unitary setup [32] is used in the valence sectors: the strange and the charm valence quarks are regularised as Osterwalder-Seiler fermions [33], while the up and down valence quarks have the same action as the sea. Working at maximal twist, such a setup guarantees an automatic O(a)-improvement [30,32].
The ensemble B55.32 has a lattice volume L 3 × T = (32 3 × 64) a 4 with a lattice spacing equal to a = 0.0815(30) fm and a bare light-quark mass equal to aµ = 0.0055, corresponding to a simulated pion mass m π = 375(13) MeV [31] with m π L 5.0. The number of analyzed gauge configurations, separated by 20 trajectories, is 150. We have carried out our simulations using the values aµ s = 0.021 and aµ c = 0.25 for the bare valence strange and charm quark masses, which correspond to renormalised strange and charm quark masses very close to their physical values.
We have calculated the two-point function C(t), defined in eq. (2.21), using the interpolating operator b(x)γ 5 s(x) with a simulated b-quark mass equal to twice the physical charm mass, i.e. m b (MS, 2 GeV) 2.4 GeV, and a physical strange quark. We set opposite Wilson parameters for the two valence quarks in order to guarantee that cutoff effects on the pseudoscalar mass are O(a 2 µ f ) [30,34,35]. To improve the statistical precision we have made use of the "one-end trick" stochastic method [36,37] and employed 10 spatial stochastic sources at a randomly chosen time-slice per gauge configuration. Moreover, in order to suppress contributions of the excited states in the B s -meson correlation function, we have used Gaussian smeared interpolating quark fields [38] both at the source and at the sink. For the values of the smearing parameters we set k G = 4 and N G = 30. In addition, we apply APE smearing to the gauge links [39] in the interpolating fields with parameters α APE = 0.5 and N APE = 20.
Smearing leads to improved projection onto the lowest-energy eigenstate at smaller Euclidean time separations. As shown by the effective mass aM eff (t) ≡ log (C(t)/C(t + a)) in fig. 5, the dominance of the ground-state signal starts around t/a 13 for both the D s and B s mesons. By averaging over the plateau regions shown in fig. 5 the ground-state masses are respectively found to be m Ds = 2.05(8) GeV and m Bs = 3.08(11) GeV.
We have calculated the four-point function C µν (t snk , t 2 , t 1 , t src ; q), given by eq. (2.18), as a function of t 1 , the time at which the first weak current is inserted with momentum q, for fixed values of t 2 , where the second weak current is contracted with momentum insertion −q, fixing t src = 0 and t snk = T /2 = 32a. The momentum q is inserted along one spatial direction, namely q = (0, 0, q) and we have considered eleven values for q ranging from q = 0 up to q = q max 0.9 GeV. On the lattice these values are injected through the use of twisted boundary conditions (BC's) [40][41][42] in the spatial directions and anti-periodic BC's in time. The sea dynamical quarks, on the contrary, are simulated with periodic BC's in the spatial directions and anti-periodic ones in time. The twisted BC's for the valence quark fields lift the severe limitations, arising from the use of periodic BC's, on the accessible kinematical regions of momentum-dependent quantities. Furthermore we remark that, as shown in refs. [43,44], for physical quantities which do not involve final-state interactions (like, e.g., meson masses, decay constants and form factors), the use of different BC's for valence and sea quarks produces only finite-size effects that are exponentially small.
For the b → c weak current we use the local vector and axial-vector quark currents, b(x)γ µ c(x) and b(x)γ µ γ 5 c(x). The value of the Wilson r-parameter for the charm quark is chosen to be opposite to that of the b quark, i.e. r c = −r b , and therefore in our maximally twisted setup the vector and axial-vector currents renormalise respectively with the axial and vector renormalization constants, Z A and Z V , determined in ref. [31].
We extract the matrix elements M µν (t 2 − t 1 ; q) using eq. (2.20). In order to calculatē X(q 2 ), as defined in eq. (2.17), we apply the smearing kernel Θ (l) (ω max −ω) to the quantities Z (l) (ω, q 2 ). These in turn are defined in terms of the quantities Y (a) (ω, q) in eq. (2.15). To this end we start from the linear combinations of the correlators M µν (t; q) with the kinematical coefficients of eqs. (2.9). We call these objects To show the quality of the numerical data, in fig. 6 we plot the correlators Y (a) (t; q 2 ) cor- responding to |q| 0.5 GeV. Notice that the correlators Z (l) (t; q 2 ) are linear combinations of the Y (a) (t; q 2 )'s, see eq. (2.15). Similar results are obtained for the other momenta considered in this work.
The central values for all the physical quantities extracted from the Y (a) (t, q) correlators have been extracted by setting t 2 = 22a in eq. (2.20) and by using the data up to t = 18a, which corresponds to t 1 − t src = 4a. To check the approach to the t src → −∞ and t snk → ∞ limits we have repeated the analysis by setting t 2 = {18a, 20a, 22a, 26a, 28a} and by varying the maximum value of t used to reconstruct the smearing kernels. Figure 7 shows the comparison of the correlator Y (1) (t, q) at |q| 0.5 GeV for different values of t = t 2 − t 1 and t 2 . In the following analysis, we chose the value (t snk − t 2 ) = 10a, corresponding to t 2 = 22a. Similar results are obtained for the other correlators (Y (2) , Y (3) , Y (4) and Y (5) ), and, in all cases, we observe that the onset of the t snk → ∞ limit is reached within the uncertainties already for t snk − t 2 = 4a.
We now turn to the discussion of the systematics associated with the approximation of the kernels of eq. (2.36) by using the method of ref. [13]. This is an important issue    because, on the one hand, the reconstruction of a given kernel can never be exact with a finite number of time-slices and in the presence of errors. On the other hand, one can (and must) quantify the systematic error associated with an approximate reconstruction.
In order to illustrate this point we consider the quantity Z (0) σ (q 2 ) (see eq. (2.17)) for three smooth approximations of the θ-function given in eq. (2.36). The kernels are approximated as described in section 2, see in particular eq. (2.31), with τ max = 18. The quantity Z (0) σ (q 2 ) is then obtained by applying the coefficients g λ τ that represent the approximated kernel at a fixed value of λ to the correlator Z (0) (t; q 2 ). Figure 8 shows the comparison of the reconstructed kernels with the target ones for |q| 0.7 GeV and σ = 0.12m Bs at the values λ = λ determined independently for each kernel. The values of λ are marked with red points in fig. 9, where we show the dependence of Z By implementing this strategy, proposed in ref. [15], we have checked that the estimated errors on the different quantities that enter our determinations of the physical observables discussed below properly take into account the systematics associated with the kernel approximation.
In fig. 10 we show our results for the total decay rate, with the different points corresponding to different input parameters used in the analysis, as described in the figure's caption. The plot shows clearly that all results are compatible with each other. In order to take into account all the results showed in the figure, we use eq. (28) of ref. [31] to get an estimate of the central value and its standard deviation, corresponding to the filled red dots in the plot, and we quote that value as our final result for the total decay rate. This procedure is repeated for all other observables considered in this work.   fig. 11. C) A threshold changed to A tr = 1 × 10 −2 . D) A threshold changed to A tr = 5 × 10 −3 . E) All parameters equal to default, final result given by summing all the single contributions X (i) together and then extrapolation the sum to σ = 0. F) τ max changed to τ max = 15. G) τ max changed to τ max = 16. H) τ max changed to τ max = 17. I) Same as default, analysis performed using the bootstrap method. J) Final results obtained considering all previous results listed here. Central value and standard deviation are calculated using the average procedure given by eq. (28) of ref. [31]. It is important to note that the analysis of all the cases listed above is performed taking the result corresponding to λ = λ as discussed in subsection 3.2, the only exception being when we change the A tr parameter. In these two cases we take the results corresponding to values of A[g λ ]/A[0] smaller than A tr .

Extrapolation to σ = 0
The ETMC data are produced at several values of the smearing parameter σ and, for each of the target kernels Θ (l) (x) with three different smeared versions of the θ-function in eq. (2.36). These are used in a combined σ → 0 extrapolation for each contribution to the differential decay rate and to the leptonic and hadronic moments.
Before presenting the results of the σ → 0 extrapolation an important remark is needed. As discussed in section 2 the limits of zero smearing radius and of infinite volume do not commute. Because of the quantized energy spectrum on a finite volume, the σ → 0 extrapolation must be performed only after the infinite-volume limit. Under the reasonable assumption that smeared QCD spectral densities are affected by exponentially suppressed finite-volume effects, and given the exploratory nature of the present work, we shall assume below that finite-volume effects are negligible with respect to our statistical uncertainties.   This assumption can only be verified with simulations on larger volumes, a task that we leave for future work on the subject. Taking this issue into account, the σ → 0 extrapolation discussed below has to be considered as a feasibility study that, as we work at unphysical meson masses and fixed cutoff, we consider interesting and promising.
In fig. 11 we show the σ → 0 extrapolations of the three contributions Z (l) σ (q 2 ) to the differential decay rate for |q| 0.5 GeV (plots on the left) and |q| 0.7 GeV (plots on the right). The reconstruction of the kernels Θ  In all cases studied in this work we have obtained results at 10 different values of σ that, for   In fig. 13 we show the σ → 0 extrapolations of the four different terms that enter the calculation of the leptonic moment L 1 (q 2 ).

Operator-product expansion and comparison with lattice results
As inclusive semileptonic B decays are described by an OPE, observables which are sufficiently inclusive admit a double expansion in α s and in inverse powers of m b [10,11,19,20,45], or more precisely of the energy release, which is of the order of m b − m c . Schematically, for an observable M we have where a s = α s (µ)/π is the QCD coupling evaluated at a scale µ ∼ m b and the ellipsis represents higher-order terms in a s and in 1/m b . The parameters µ 2 π , µ 2 G , ρ 3 D , ρ 3 LS are expectation values of dimension-5 and dimension-6 local operators in the physical B meson. For instance, is the b field deprived of its high-frequency modes, and G µν is the gluon-field tensor. In the so-called kinetic scheme [46][47][48], the Wilsonian cutoff µ k ∼ 1 GeV is introduced to factorise longand short-distance contributions. Indeed, the OPE disentangles the physics associated with soft scales of order Λ QCD (described by the above parameters) from that associated with hard scales ∼ m b , which determine the Wilson coefficients M i that admit an expansion in α s . Quite importantly, the power corrections start at O(Λ 2 QCD /m 2 b ) and are therefore comparatively suppressed. The kinetic scheme provides a short-distance, renormalon-free definition of m b and of the OPE parameters by introducing the cutoff µ k to factor out the infrared contributions from the perturbative calculation.
The smearing provided by the phase-space integration, discussed in section 2, is in general sufficient to guarantee the convergence of the OPE for the quantities introduced in eqs. (2.6) and (2.37)-(2.40), which can then be expressed in the form (4.1). The OPE calculation proceeds therefore as in refs. [10,11,49]. There are however two specific points related to the kinematics chosen in the lattice calculation that need to be mentioned. First, while the hard scale that governs the OPE is generally m b − m c , there are regions of the phase space, e.g. at small recoil |q| ∼ 0, where it is rather m c , possibly implying a slower convergence of the expansion. Second, near the maximum value of q 2 the smearing interval in ω closes up and one cannot expect the OPE to provide reliable results.

Details of the OPE calculation and related uncertainties
From a technical point of view, the OPE provides a double expansion like the one in eq. (4.1) for the hadronic tensor W µν defined in eq. (2.4) that can be used to compute the total rate, the moments, and any sufficiently inclusive quantity. The coefficients of the expansion involve the Dirac delta δ(r 2 − m 2 c ) and its derivatives, which upon integration over the quark (partonic) phase space lead to results valid for sufficiently inclusive observables. It is customary to use the decomposition of W µν into Lorentz-invariant form factors as in eq. (2.7) and to identify the four-velocities of the B meson and of the b quark, v = p/m B = p b /m b . In this section we will use eq. (2.7) replacing m B with the b quark mass m b and employing a hat for quantities that are normalised to m b . In the case of massless leptons considered in this work, the form factors W 4,5 do not contribute to the decay amplitude.
The lowest order of the expansion for the relevant W i and the 1/m 2 b corrections can be found in refs. [10,11], while analytic expressions for the O(α s ) terms are given in refs. [50,51]. The O(1/m 3 b ) corrections have been first computed in ref. [52]. Higher power corrections have been investigated in ref. [53], but involve a large number of new and poorly known parameters. They appear to be sufficiently suppressed at the physical m b [54]; we will not consider them but they represent an important source of theoretical uncertainty in our low m b setup. The O(α s /m 2 b ) corrections to the W i are also known [51,55], while for the total rate we also have O(α s /m 3 b ) corrections [56,57]. Numerical results for the O(α 2 s β 0 ) contributions are also available [50], while the complete O(α 2 s ) are available only for the total rate and for a few moments [58][59][60][61]. Finally, the O(α 3 s ) correction to the total rate has been recently computed in ref. [62].
While these corrections have generally been computed in the V − A case realised in the SM, the decomposition in V V, AA and AV = V A components is potentially useful in our case, and has been made manifest for the O(1/m 2,3 b ) and O(α s ) corrections, see refs. [11,63,64]. In the calculation of the q 2 spectrum and of the differential moments we will therefore consider only power corrections up to and including O(1/m 3 b ) and the O(α s ) perturbative corrections. However in the calculation of the total width and of the total moments we will restrict to the SM case and will employ all the known corrections.
Following section 2, we take the three-momentum q to point along the k direction and the i and j directions to be perpendicular to that. The components of the hadronic tensor along these directions are given by In the OPE the decay occurs at the quark level: p b = p + p + p ν , where p b and p are the momenta of the initial b quark and of a final hadronic state made of a c quark and n ≥ 0 perturbative gluons. At the leading order in α s and in 1/m b , this is a free-quark decay into an on-shell c quark, which implies that the W i are proportional to δ(p 2 − m 2 c ) = δ(û)/m 2 b , whereû = (p 2 − m 2 c )/m 2 b . We can rewrite this δ function in terms of the energy of the final c quark, where ρ = m 2 c /m 2 b andχ is the parton-level energy of the final hadronic state in units of m b , which is related to the total hadronic energy ω by ω = m bχ + Λ, with Λ = M B − m b . Similarly, the invariant hadronic mass M 2 X is related to the partonic variables by Only the first term of eq. (4.3) contributes to the physical process of interest and can be readily integrated overχ. At O(1/m 2,3 b ) one has to deal with δ (û), δ (û) and δ (û) that upon integration subject to kinematic constraints lead to new singularities. A typical case is provided by the interplay between the δ and the requirement that q 2 ≥ 0: The singularity at the partonic endpoint of the q 2 spectrum,q 2 max = (1 − ρ) 2 /4, appears because one reaches the maximum energy exactly on the mass-shell of the charm quark.
We apply exactly the same setup to compare with both JLQCD and ETMC data, adjusting only the heavy-quark masses to the two cases. The unphysically light b quark mass and the OPE parameters are expressed in the kinetic scheme with µ k = 1 GeV, while the c quark mass is expressed in the MS scheme at 2 GeV. In the case of the JLQCD data we employ m b (1 GeV)= 2.70(4) GeV, obtained from matching the observed m Bs with the results of [65,66], and m c  For the OPE parameters that appear in eq. (4.1) we start from the results of the most recent fit to the semileptonic moments [68], which refer to the physical B meson, with a much heavier b quark and without a strange spectator. The difference induced in these parameters by the strange spectator at the physical point has been investigated in [66,69,70], where it was found that spectroscopic and lattice data approximately suggest a 20% upward shift in µ 2 π and µ 2 G , while heavy-quark sum rules hint at a similar or even stronger SU(3) flavour-symmetry breaking in ρ 3 D . The dependence on the mass of the heavy quark, on the other hand, can be analysed by observing that µ 2 π and µ 2 G satisfy a heavy-quark expansion where ρ 3 ππ , ρ 3 πG , ρ 3 S , ρ 3 A are expectation values of non-local operators, of which little is known, see ref. [65]. If they were of the same order of magnitude of 2.39 ± 0.08 m c (2 GeV) (ETMC) 1.19 ± 0.04 0.301 ± 0.006 Table 1. Inputs for our OPE calculation. All parameters are in GeV at the appropriate power and all, except m c , in the kinetic scheme at µ = 1 GeV. The heavy-quark masses for the ETMC setup are 100% correlated. As a remnant of the semileptonic fit, we include a 50% correlation between µ 2 π and ρ 3 D .
0.1-0.2 GeV 3 , they could shift µ 2 π and µ 2 G by 0.02-0.1 GeV in going from the physical value of m b to m b ∼ 2.5 GeV, which amounts to a 5-25% shift. We show the inputs of our calculation in table 1. While the heavy-quark masses are slightly different between the two setups, we adopt the same expectation values in both cases. Their central values take into account the shift related to the strange spectator, while the uncertainties follow from the uncertainty of the fit of ref. [68], the SU(3) symmetry breaking, and the lower b mass.
Beside the parametric uncertainty of the inputs, our results are subject to an uncertainty due the truncation of the expansion in eq. (4.1) and to possible violations of quarkhadron duality. We estimate the former by varying the OPE parameters, the heavy-quark masses, and α s in an uncorrelated way and adding the relative uncertainties in quadrature. In particular, we shift m b,c by 6 MeV, µ 2 π,G by 15%, and ρ 3 D,LS by 25%. These corrections should mimic the effect of higher-power corrections. Since in the case of the q 2 spectrum and differential moments we restrict ourselves to O(α s ) corrections, we include the relative uncertainty in the same way, shifting α s by 0.15, which corresponds to a 50% uncertainty. In the case of the total width and total moments, higher-order perturbative corrections are known and the perturbative uncertainty can be reduced, as discussed below.

q 2 spectrum and differential moments
We start our comparison of lattice and OPE results with the q 2 spectrum and the differential moments introduced in eq. (2.39) and in eq. (2.40). Only the O(α s ) perturbative corrections are included in this case. Figure 14 shows the q 2 spectrum in the SM, namely with a V − A current. Despite the large uncertainty of the OPE prediction, about 30% in the JLQCD case and 50% in the ETMC case, the overall agreement is good. The OPE uncertainty is dominated by the power corrections. We also stress that close to the partonic endpoint, corresponding to 1.27 GeV 2 and 0.82 GeV 2 in the two cases, we do not expect the OPE calculation to be reliable, as discussed above. The corresponding hadronic endpoints are 1.35 GeV 2 and 0.75 GeV 2 , respectively.
The uncertainties affecting both calculations can be greatly reduced by considering the differential moments. In particular, the OPE uncertainty becomes smaller because of the cancellations between power corrections to the numerator and to the denominator. To expose the cancellations we expand the ratios in powers of α s and 1/m b . In figure 15 we show the first differential lepton energy moment, L 1 (q 2 ), in the SM, comparing the OPE with ETMC data. As expected, the relative uncertainty of both the OPE calculation and of the lattice data is much smaller than in the bottom panel of figure 14 and we observe good agreement at low and moderate q 2 . Figs. 16 and 17 show the q 2 spectrum in the individual channels. Comparing them with figure 14 we see that in the individual channels the agreement between OPE and lattice results is poorer than in their sum, especially at large q 2 . This is to be expected and (unless discretisation and/or finite-volume effects turn out to have a sizeable impact on the lattice results) is likely to be a manifestation of duality violations. Notice that the OPE central predictions for the AA ⊥ and V V ⊥ channels turn negative at large and moderate q 2 , respectively, and that for q 2 > 0.6 GeV 2 the spectrum is always negative within errors. This unphysical feature suggests that our error estimates are not adequate at large q 2 . The contribution to the V V ⊥ channel, moreover, is particularly small and very sensitive to large power corrections.
Figs. 18 and 19 show L 1 (q 2 ) in the individual channels, for the JLQCD and ETMC cases. In general, we observe good agreement with the lattice data, especially at low q 2 . However, the expansion in powers of α s and 1/m b of the denominator is not justified when the lowest-order contribution to the denominator becomes particularly small or has a zero, like in the V V and V V ⊥ channels. In these cases we also show the unexpanded version of the ratio, whose uncertainty is much larger, but we stress that away from the singularities the expanded form is preferable, and this appears to be confirmed by better agreement with the lattice data. Figure 20 shows the second central moment computed at different values of q 2 in the ETMC case. We do not display the V V ⊥ channel, for which the OPE result would have a very large uncertainty. In the case of L 2c (q 2 ) the OPE does not reproduce the lattice results within uncertainties, except for very small q 2 . It is certainly possible that our method to estimate the OPE uncertainty fails here as a result of multiple cancellations between large contributions to L 2 and L 2 1 which are not necessarily replicated by higher-order contributions. On the other hand, it has not yet been possible to estimate discretisation and finite volume effects on our lattice results, and the additional systematics could affect this particular quantity in a relevant way. For this quantity we do not display the comparison with the JLQCD data, which agree with the OPE but have very large uncertainties.
We also looked at the moments of the hadronic invariant mass. In figure 21 we show the mean hadronic mass M 2 X as a function of q 2 computed from JLQCD configurations in comparison with the OPE predictions. Again, we do not display the V V ⊥ channel because of the large OPE uncertainty. We observe excellent agreement except at large q 2 , but the lattice uncertainty is larger here than in the case of the leptonic moments. Analogous plots for the ETMC calculation are shown in fig. 22. In figure 23 we also show M 4 X as a function of q 2 with JLQCD data.

Total width and moments
We perform a comparison between OPE predictions and lattice results also in the case of the total semileptonic width and of the global moments introduced in eq. (2.37) and in eq. (2.38). In this case the OPE results are going to be slightly more accurate as we can take advantage of existing two-and even three-loop calculations [62]. We can also test the relevance of the singularity at q 2 max . The lattice results for the q 2 spectrum can be interpolated by polynomials or piecewise polynomials, leading to the results shown in   2 and in table 3. As the q 2 -spectrum is peaked near q 2 max , see figure 10, the total width is particularly sensitive to that region. In the JLQCD case the limited number of q 2 points makes the extrapolation to the highest q 2 values more uncertain, with clear implications on the estimate of the total width. On the other hand, it is difficult to estimate such uncertainty, hence table 2 shows only the statistic uncertainty.
In the OPE the total width receives large and concurring power and perturbative corrections, which reflect in a ∼ 20-40% uncertainty. This is at variance with what happens in the case of the physical b quark, for which a recent estimate of the total uncertainty is about 2% [68]. Indeed, the convergence of the OPE expansion deteriorates rapidly as m b decreases approaching m c , even from 2.7 to 2.4 GeV. To illustrate this point we show the various contributions to the semileptonic width in the ETMC case: where the perturbative contribution includes O(α 3 s ) and the non-perturbative contributions include the O(α s ) corrections to the Wilson coefficients. We estimate the perturbative uncertainty by varying the scale of α s between 1.5 and 3.0 GeV. Notice that more than half of the uncertainty on the width reported in tables 2 and 3 is due to the large uncertainty on the heavy quark masses, in both the JLQCD and ETMC cases.
For what concerns the leptonic moments, only the O(α 2 s ) corrections have been com-     puted, either numerically for physical values of the heavy-quark masses [60], or analytically in an expansion up to O(r 7 ) in powers of r = m c /m b [58]. Unfortunately, this expansion converges slowly and does not provide reliable results for r ∼ 0.5, which is the value relevant in the ETMC case. We therefore show results computed to O(α s ) and include the O(α s µ 2 π,G /m 2 b ) corrections discussed in ref. [55] as well. The first moment in the ETMC case is given by where both power and perturbative corrections are smaller than in the total width. Similarly, the second central moment L 2c = E 2 − E 2 is given by As shown in tables 2 and 3, there is reasonable agreement between OPE and both JLQCD and ETMC data in all cases. As a general comment, we stress that the large contributions of ρ 3 D are related to a kinematically enhanced Wilson coefficient and do not necessarily imply similarly large higher-power corrections.
Finally, the OPE prediction for the first hadronic mass moment in the JLQCD case is where we do not include the O(α s /m 2 b ) corrections and consequently enlarge the uncertainty slightly. The OPE prediction for the first hadronic moment is in reasonable agreement with both the JLQCD and ETMC values, see table 2 and table 3.

Determination of the OPE parameters
As different physical quantities have a different dependence on the OPE parameters, it is possible to constrain their values using lattice data. The analytic expressions for the power corrections to the differential q 2 distribution and for the moments, which encode this dependence, are rather lengthy and are provided in an ancillary Mathematica file.
To illustrate this point, let us consider a few examples using simpler approximate formulas, and focussing on the differential leptonic moments at moderately low q 2 , where the OPE is more reliable. We choose a q 2 value for which we have lattice data, q 2 * = 0.1865 GeV 2 . In the ETMC setup, the OPE prediction for L 1 (q 2 * ) can be approximated by , and all quantities are expressed in GeV to the appropriate power. Notice that the lowest order expression for the differential leptonic moments is universal, namely does not depend on the channel. We do not consider the V V ⊥ channel because, as discussed above, the expanded form does not provide a good approximation. The analogous expressions for the second central moments are Each of these moments has a different dependence on the non-perturbative parameters and they can be used in a fit to the lattice ETMC results to obtain constraints on those parameters. In fact, using only these six inputs with their theoretical uncertainty does not lead to any improvement on the constraints given in table 1. Considering additional q 2 points enhances the sensitivity to the non-perturbative parameters, but one has to estimate the correlation among the theoretical uncertainties at adjacent q 2 points. One can also include in the fit the data for the q 2 distribution in the different channels, as well as additional moments like the hadronic mass moments. A global fit to lattice data is however beyond the scope of this paper, especially because our estimate of the lattice systematic uncertainty is incomplete. We stress that the limiting factor here is not the statistical uncertainty of the present ETMC calculation, but the theoretical uncertainty we attach to the OPE predictions. In this respect the unphysical case we have considered, with the partonic energy release (of the order of m b − m c ) about a factor 2 (JLQCD) or 3 (ETMC) smaller than in reality, is strongly penalising. At the physical point the OPE enjoys a much better convergence and the prospects for constraining the non-perturbative parameters are better than it appears from this exploratory study.

Computations with a smooth kernel
In sections 2 and 3 we have seen that the reconstruction of the discontinuous kernel is one of the main problems in the calculation of physical quantities. As far as the comparison with the OPE is concerned, however, the kernel does not need to be discontinuous. Indeed, one can compute inclusive (unphysical) quantities in the OPE employing a smooth kernel (σ = 0) and compare them directly with the analogous quantities computed on the lattice. In this way it is possible to check that the level of agreement between the two calculations is not affected by the σ → 0 limit, and to extract information on the non-perturbative parameters of the OPE, as well as on the heavy quark masses, from slightly more precise lattice data. In figure 24 we show the q 2 spectrum in the different channels computed on the lattice using the sigmoid approximation θ s σ of eq. (2.36) for θ(ω max − ω) with σ = 0.12m B . In the OPE calculation, where the partonic kinematics holds, we replace θ(1 −η − q 2 ) by the sigmoid θ s σ (1 −η − q 2 ) using σ = 0.12m B . At low q 2 the agreement between OPE and ETMC data is similar to that in figure 17, while at large q 2 there is marginal improvement, as expected because the smearing occurs over a larger ω range. In figure 25 we show the first differential leptonic moment L 1 (q 2 ) in the different channels, excluding V V ⊥ because of the large uncertainties in the OPE calculation. With respect to figure 19 we observe a marked improvement of the agreement between OPE and ETMC data at large q 2 in the AA and V V channels, while in the AA ⊥ channel the agreement is slightly worse. Finally, in figure 26 we show the q 2 spectrum in the different channels computed from the JLQCD configurations using the sigmoid approximation θ s σ with σ = 0.1/a. Here the overall agreement between lattice calculations and OPE is similar to figure 16, but now the q 2 dependence of the lattice data is closer to the OPE result, obtained using θ s σ (1−η − q 2 ) with σ = 0.1.

Discussion and future prospects
In this article we have presented the first comprehensive investigation of inclusive semileptonic B-meson decays on the lattice. Using the method of ref. [16] we have computed various inclusive observables with gauge-field ensembles generated by the JLQCD and ETM collaborations for unphysically light values of the b quark mass (about 2.7 GeV and 2.4 GeV, respectively) and m c close to its physical value. In this exploratory study we have not performed the continuum and infinite-volume limits.
An important feature of the method we have adopted is that it requires the approximation of the energy-integral kernel. The kinematics of the inclusive semileptonic decay involves a discontinuity at the boundary of the phase space, for which a reasonable approx- imation with the Euclidean correlator obtained on the lattice is impractical. The problem can be dealt with using a sequence of smooth kernels, parametrized by a smearing width σ, which converge to the physical phase space in the limit σ → 0. As emphasized in section 2, the σ → 0 limit does not commute with the infinite-volume limit that has to be taken first. Under the assumption that finite volume effects are negligible with respect to the statistical errors associated with our lattice results, we have studied the σ → 0 extrapolation in detail and found that it does not induce a significant uncertainty. We have compared the JLQCD results with the contributions of the charmed ground states, estimated from a JLQCD calculation of the B s → D ( * ) s form factors for the same values of the heavy-quark masses (details are given in the appendix A). Due to the proximity between the charm and bottom masses and to the limited phase space available in the decay, the inclusive results are nearly saturated by the ground-state contributions. Although the correlator themselves show the presence of excited states, their contribution to the inclusive rate is relatively small. The ETMC results obtained at even lower b quark mass are also expected to be largely dominated by the ground states.
While the JLQCD and ETMC results cannot be compared directly as they are obtained at different b quark masses, they can be both compared with the expectations from the OPE, assuming that discretisation and finite-volume effects are negligible. When the OPE can be considered reliable, the agreement with both JLQCD and ETMC results is generally good, while we observe possible indications of quark-hadron duality violation at large q 2 .
The variance of the lepton energy distribution also shows a clear and unexpected deviation, which could be due to underestimated uncertainties in our OPE calculation or to nonnegligible lattice systematics. To the best of our knowledge, this is the first time that the onset of quark-hadron duality is studied on the lattice. For m b ∼ 2.4 GeV, the OPE converges much more slowly than at the physical point, but the normalised moments allow us to perform a relatively clean comparison with the lattice data.
We have found that the calculation of the total width and of other global quantities like the moments of the lepton energy or of the hadronic invariant-mass distribution depends crucially on the number of q 2 points that can be computed on the lattice. In the ETMC calculation the flexibility due to the use of twisted boundary conditions has allowed us to reach an accuracy of 6% on the total width and 3% on the first leptonic moment. These uncertainties do not yet include several lattice systematics that need to be considered, but are dominated by statistical uncertainties and could be improved with a dedicated effort. This is an aspect which will become important for future phenomenological applications, which should also focus on reaching the physical b mass.
Finally, we have shown that one can constrain the non-perturbative parameters in the OPE analysis from our results. We have not attempted a fit to the lattice data in the unphysical setup we have considered, as this is penalised by large uncertainties from higherdimensional operators. With larger b-quark masses these uncertainties will be reduced and the data obtained at different values of m b will provide an additional handle on the non-local matrix elements that appear in eq. (4.5).
There are certainly many issues to be improved or investigated in order to get results of direct phenomenological relevance. First, we have not yet studied the continuum and infinite-volume limits. Although we have presented a rather detailed discussion of the systematics associated with the reconstruction of the smearing kernels, including the required extrapolation at vanishing smearing radius, this last step is only permitted after having checked the onset of the infinite-volume limit. The continuum and infinite-volume limits can only be taken by performing calculations at different values of the lattice spacing and on different physical volumes, a task that is beyond the exploratory nature of this study and that we postpone to future work on this subject.
Second, the calculation has to be performed at the physical b and light quark masses. Simulations with physical pion masses are nowadays possible and, for instance, a collection of N f = 2 + 1 + 1 ensembles with physical light, strange and charm quark masses has been produced by the ETM collaboration at different values of the lattice spacing and with different physical volumes. Although it is not possible to simulate directly a physical b quark on these ensembles (because of potentially dangerous cutoff effects), the problem can nevertheless be approached by using well-established techniques such as the ETMC ratio method [71], based on ratios of the observable of interest computed at nearby heavyquark masses. The ratio method has been already applied to determine the mass of the b quark, the leptonic decay constants, the bag parameters of B (s) mesons and the matrix elements of dimension-four and dimension-five operators appearing in the Heavy Quark Expansion of pseudoscalar and vector meson masses [65,66,[72][73][74]. Its main advantages can be summarised as follows: i) B-physics computations can be carried out using the same relativistic action setup with which the lighter-quark computations are performed; ii) an extra simulation at the static point limit is not necessary, while the exact information about it is automatically incorporated in the construction of the ratios of the observable; iii) the use of ratios greatly helps in reducing the discretisation effects. However, there is an important subtlety. In order to apply the ratio method (or any other method based on extrapolations in the b-quark mass) in the case of the inclusive decay rates one has to cope with the fact that at unphysical (lighter) values of the b-quark mass the phase-space available to the decay shrinks. This implies that some of the hadronic channels that are open at the physical value of m b are totally excluded from the phase-space integral at m h < m b . The important point to be noticed here is that this happens when the integration limits are imposed sharply, i.e. by using the exact Heaviside functions that implement the phasespace constraint. The problem is totally analogous to the ordered double-limit required in order to deal with the finite-volume distortion of the hadronic spectral density. Indeed, we envisage applying the ratio method to Γ ≡ Γ(m b ) before taking the σ → 0 extrapolation: while Γ(m h ) is (at least in principle) a distribution in m h , Γ σ (m h ) is certainly a smooth function that can safely be extrapolated at the physical value of m b . Moreover, we already have simulations with m h ∼ 0.8m b , and it can be reasonably argued that for such large masses the missing (mostly continuum) states scale with m h .
Although we have compared the lattice results with the OPE, a more direct and effective validation of our method would come from a comparison with experimental data, such as those for the branching ratio and for the electron energy spectrum in inclusive semileptonic decays of the D or D s mesons [75,76]. Here the challenge is to get accurate results at physical light-quark masses, while the charm quark can be simulated directly on present lattices. Beside validating the method without extrapolations in the heavy-quark mass, a calculation of charm decays might shed light on the following two open and phenomenologically relevant questions. i) To what extent is the OPE applicable to charm decays? ii) What is the role played by weak annihilation (WA) contributions? The first question refers to the onset of quark-hadron duality, and a detailed study of charm decays in connection with their OPE description may yield an insight on this conceptual issue. Answering the second question may help us quantifying the role played by WA contributions in charmless semileptonic B decays, hence improving the inclusive determination of |V ub |. If one could reproduce the lepton energy spectrum of the D s inclusive semileptonic decays that is measured experimentally, a more ambitious future application would be a direct calculation of B → X u ν.
Finally, one may wonder whether the foreseeable precision will be sufficient for a precision determination of |V cb | and for interesting phenomenology. Indeed, present experimental errors for B → X c ν are 1.4% on the branching ratio and a few per mille on the first few moments of the lepton energy distribution. The lattice precision is unlikely to get close to that, at least initially. On the other hand, on a relatively short time-scale lattice calculations of inclusive semileptonic decays might be able to enhance the predictive power of the OPE by accessing other quantities that are inaccurate or beyond the reach of current experiments and are highly sensitive to the non-perturbative parameters, allowing us to validate and improve the results of the semileptonic fits on which the OPE predictions are based.      and perform the ω integral, which merely picks the ground state through δ(p 0 −q 0 −E D ( * ) ) = δ(ω − E D ( * ) ). ForX ≡ 2 l=0 X (l) , we obtain for the D meson contribution, which corresponds to the partial decay rate where w = v · v = 1 + q 2 /m 2 D = E D /m D . The last line is a well-known formula for the B → D ν decay rate.
The vector meson D * contributes in three channels: AA , AA ⊥ , V V ⊥ . The contributions areX Adding them together, we obtain with R 1 ≡ h V /h A1 and R 2 ≡ (h A3 + rh A2 )/h A1 , which confirms a well-known formula. From this analysis, the contributions of the S-wave ground states, D and D * , to the integrandsX V V ,X V V ⊥ ,X AA⊥ , andX AA can be identified.
The contribution of the V A and AV insertions vanishes for the total decay rate as well as for the hadronic mass moments, but it is non-zero for the lepton energy moments. In the SM the contribution of the AV interference from the ground state B → D * to the first leptonic moment can be written as