Approaches to inclusive semileptonic $B_{(s)}$-meson decays from Lattice QCD

We address the nonperturbative calculation of the inclusive decay rate of semileptonic $B_{(s)}$-meson decays from lattice QCD. Precise Standard-Model predictions are key ingredients in searches for new physics, and this type of computation may eventually provide new insight into the long-standing tension between the inclusive and exclusive determinations of the Cabibbo-Kobayashi-Maskawa (CKM) matrix elements $|V_{cb}|$ and $|V_{ub}|$. We present results from a pilot lattice computation for $B_s \rightarrow X_c\, l \nu_l$, where the initial $b$ quark described by the relativistic-heavy-quark (RHQ) formalism on the lattice and the other valence quarks discretised with domain-wall fermions are simulated approximately at their physical quark masses. We compare two different methods for computing the decay rate from lattice data of Euclidean $n$-point functions, namely Chebyshev and Backus-Gilbert approaches. We further study how much the ground-state meson dominates the inclusive decay rate and indicate our strategy towards a computation with a more comprehensive systematic error budget.


Contents
1 Introduction The study of the b-quark sector of particle physics remains an exciting arena of precision physics, in which intriguing tensions between observations and Standard-Model (SM) predictions have been found [1][2][3][4][5][6]. Scrutinising these findings and better controlling and reducing experimental and theoretical error budgets therefore remain a crucial task. Any such anomaly could be an indicator of new effects: while new particles may be too heavy to be produced with energies achievable by current experimental facilities, quantum effects could leave detectable traces in flavour-physics processes. One of these long-standing tensions involves the measured values of the CKM matrix elements |V cb | and |V ub | between exclusive and inclusive decays. Apart from leptonic decays, these can be determined through the exclusive semileptonic decay of a B into a D ( * ) (or π), or through the measurement of the inclusive decay rate, respectively. For example, one of the most recent determination of |V cb | finds |V cb | = (42.19 ± 0.78) × 10 −3 inclusive [7,8], |V cb | = (39.36 ± 0.68) × 10 −3 exclusive [9][10][11][12][13] .
Lattice computations provide crucial nonperturbative input to the exclusive determination and the required techniques in this case are well established (see reviews [13,14]). The existing results for the inclusive decay are based on perturbative QCD. First viable theoretical proposals for how to accomplish the computation of the inclusive decay rate on the lattice have appeared only recently [15]. The idea relies on the extraction of a forward-scattering matrix element through analytic continuation of lattice results obtained in an unphysical kinematical region. In [16] it was then proposed to address decay and transition rates of multi-hadron processes through finite-volume Euclidean four-point functions provided that a method to extract the associated spectral function exists.
In this paper, we present work towards an improved understanding of the calculation of the inclusive decay rate by means of a pilot study of semileptonic decays of B s mesons into charmed particles, namely B s → X c lν l , following [17], where the extraction of the spectral function is bypassed and the decay rate is evaluated directly. Preliminary work has been presented in [18,19]. In particular, we improve and compare two existing methods, namely Chebyshev [17,20,21] and Backus-Gilbert [22,23] reconstructions. Our work uses the relativistic-heavy-quark action (RHQ) [24][25][26] to simulate the bottom-valence quark at its physical mass, while the strange-and charm-valence quarks are treated with a domain-wall fermion action [27][28][29][30], and their masses are tuned to values close to the ones found in nature.
The structure of this paper is as follows: in Sec. 2 we describe the theoretical framework, extending the formalism introduced in [17]. We also address the ground-state limit and its connection with the corresponding exclusive processes. In Sec. 2.3 we describe some details of the lattice implementation. In Sec. 2.4 we report on our analysis strategies; to keep the discussion fluent we refer to App. A, B and C for technical details. Finally, we discuss the details of the simulation in Sec. 3 and present our results in Sec. 4. We summarise our findings and discuss future prospects in Sec. 5. 2 Theoretical framework

The inclusive decay rate
We start by reviewing the formalism to calculate the decay rate of inclusive semileptonic processes [31,32]. Here, we focus on the decay B s → X c lν l illustrated in Fig. 1, but the formalism is more generally applicable to other channels such as, e.g., B → Xlν l or D (s) → Xlν l . The final state X c represents all possible charmed-meson final states allowed by flavour, spin and parity quantum numbers. The ground-state contribution to X c in the vector channel is given by the D s meson. The leading order weak Hamiltonian for theb →c process is given by where G F is the Fermi constant and V cb is the CKM matrix element for the charged-current flavour-changing quark transition. The electroweak quark current for this process is then J µ =b L γ µ c L =bγ µ (1 − γ 5 )c, which we can also write as J µ = V µ − A µ with V µ =bγ µ c and A µ =bγ µ γ 5 c.
The differential decay rate for the inclusive process depends on three kinematical variables, i.e. one more than the corresponding exclusive decay due to the freedom in the mass of the outgoing hadrons. Neglecting QED corrections it reads The lepton contribution is given in terms of the leptonic tensor L µν = p µ l p ν ν l + p ν l p µ ν l − g µν p l · p ν l − i µανβ p l ,α p ν l ,β , where p l and p ν l are the four-momenta of the lepton and the neutrino, respectively. The hadronic tensor W µν is defined as × B s (p Bs )| J µ † (0) |X c (p Xc ) X c (p Xc )| J ν (0) |B s (p Bs ) , (2.4) where in the second line we have inserted the sum Xc |X c (p Xc ) X c (p Xc )| over a complete set of states, which is understood to include an integration over all possible momenta p Xc under a Lorentz invariant phase-space integral, and q = p Bs −p Xc = p l +p ν l is the transferred momentum between the initial and final hadronic states. Note that we will consider only the case of the B s meson at rest, i.e. p Bs = (0, 0, 0), and will henceforth suppress the corresponding momentum label. The hadronic tensor can be decomposed into five scalar structure functions W i ≡ W i (q 2 , v · q) as where v = p Bs /M Bs = (1, 0, 0, 0) is the velocity of the initial B s meson at rest, and q = (q 0 , q) = (M Bs − ω, −p Xc ). From now on, we will indicate with ω = E Xc the energy of the final-state hadron. The individual components of the hadronic tensor can be expressed conveniently in terms of the structure functions, where i, j, k refers to the spatial indices 1, 2, 3. We note that contracting the spatial indices with the three-momentum components q i , we can invert these relations and find expressions for the structure functions in terms of the hadronic tensor and q.
Integrating over the lepton energy E l = p l,0 and assuming m l = 0 we obtain the expression for the decay rate where the integration over ω is contained in and where we defined Recalling that the D s meson is the lightest final state in this inclusive decay process, and imposing four-momentum conservation we obtain q 2 Ds + q 2 and ω max = M Bs − q 2 for the integral limits. X (l) andX (l) depend only on q 2 and not on individual components of q, as can be seen after substituting Eqs. (2.6), (2.7) and (2.8) into the expressions (2.11).
Starting from the decomposition of the hadronic tensor AV , the X (l) can also be rewritten in a way that exposes the V − A nature of the charged current, namely AV , (2.12) and similarly forX (l) .

Ground-state limit
In this section we consider a hypothetical world in which only the lowest-mass final state D s contributes to the inclusive decay, i.e., In this limit we can reconstruct the inclusive decay rate from lattice simulations of the exclusive decay, allowing us to compute the ground-state contribution. We will also use results in this limit to devise consistency checks of the inclusive-decay setup. The required hadronic form factors f + (q 2 ) and f − (q 2 ) parametrising the corresponding matrix element of the exclusive decay B s → D s lν l can be computed separately on the lattice using more conventional methods [33][34][35].
In order to compute the inclusive decay rate in this limit we now establish the relation between the vector form factor f + (q 2 ) andX V V = 2 l=0X (l) V V defined in Eqs. (2.10) and (2.11) using the decomposition in Eq. (2.12). Let us first decomposeX V V =X V V +X ⊥ V V into longitudinal and transverse components in terms of the projectors Π ⊥ µν = g µν −q µ q ν /q 2 and Π µν = q µ q ν /q 2 , where which, inverting Eq. (2.6)-(2.8) and considering q 2 = 0, can be expanded as Inserting the expression Eq. (2.13) into Eq. (2.16), we obtain Figure 2. Diagram of the four-point correlator. Two propagators used for the contraction are depicted in the picture. The black one, G b (x src , x 1 ), is a propagator for the b quark from x 1 to x src . The green one, Σ cbs (x 1 , x src ), is a sequential propagator that propagates the s quark from x src to x snk , the b quark from x snk to x 2 and the c quark from x 2 to x 1 .
In Sec. 4.4 we will use this relation to devise a cross-check of our method for the computation of the inclusive decay rate by comparing with the exclusive decay to the ground state. Note that because of the Dirac delta in (2.13) the integral over ω just selects the ground-state energy for the D s meson with a given momentum. This then implies thatX (l) = X (l) up to δ(ω − E Ds ). Further details on the ground-state limit can also be found in the Appendix of [36].

Inclusive decays on an Euclidean space-time lattice
We now address the strategy for the computation of the inclusive decay rate on the lattice, which follows [15,17,36]. The key quantity is the hadronic tensor in (2.4) The matrix element in Eq. (2.19) can be extracted from the time dependence of the Euclidean four-point function where O S Bs is an interpolating operator with quantum numbers of the B s meson and the currents are projected onto three-momentum by a discrete Fourier transformJ ν (q, t) = x e −iq·x J ν (x, t). In this setup the B s meson is created with zero momentum at source position x src and annihilated at sink position x snk . In Fig. 2 we show the corresponding quark-flow diagram: the black line, G b (x src , x 1 ), is a propagator for the b quark from x 1 to x src whereas the green one, Σ cbs (x 1 , x src ), is a sequential propagator that propagates the s quark from x src to x snk , the b quark from x snk to x 2 and the c quark from x 2 to x 1 .
The matrix element in Eq. (2.19) can be extracted in the window t snk − t 2 0, t 1 − t src 0 and t 2 > t 1 , where excited states of the B s meson have decayed sufficiently. By increasing the overlap of the operator O S Bs with the ground-state B s state the size of this window can be enlarged. This can be achieved by means of operator smearing, to be detailed later. We use a superscript S in case of smearing and L in case of no smearing.
Within the window we expect Our choice of ratio is where we cancel the residual factor | 0| O L Bs |B s | 2 /2M Bs with its value obtained from fits to the time-dependence of, e.g., the C LL two-point function. This leads us to define the key observable where we have used time-translation invariance t = t 2 − t 1 . It is related to the hadronic tensor defined in Eq. (2.19) through a Laplace transform where corresponds to the spectral representation of C µν (q, t). By means of Eq. (2.23) we can compute C µν on the lattice from a combination of meson two-and four-point functions for a finite and discrete set of Euclidean times t. The determination of the hadronic tensor by means of inversion of the integral equation Eq. (2.25) therefore constitutes an ill-posed inverse problem, similar to the extraction of hadronic spectral densities from Euclidean correlators: while the reconstruction of C µν from W µν is straightforward, the other way around is a very difficult task. Fortunately, in order to compute the inclusive decay rate Eq. (2.9), we do not have to compute the hadronic tensor itself, but only integralsX (l) (q 2 ), where the hadronic tensor is smeared with the leptonic tensor integrated over the lepton energy, as defined in Eqs. (2.9)-(2.11). In general, we can writē µν (q, ω) is a known kinematic factor that depends only on the energy and threemomentum. Introducing a step function θ(ω max − ω) and extending the limit of integration as ω max → ∞ and ω min → ω 0 , with ω 0 ≤ ω min we can rewritē . Note that ω 0 can be chosen freely in 0 ≤ ω 0 ≤ ω min as there are no states below the ground state energy ω min , as seen from (2.26). For instance, for B s → X c lν l we expect ω min = M Ds for the contribution from the vector channel at vanishing transferred momentum q. We will later exploit this freedom in the choice of ω 0 .
Let us now discuss how to obtainX (l) from lattice data for C µν (q, t). First we introduce a smoothing of the kernel K While we eventually have to take the limit σ → 0 in order to obtain the physical decay rate, smoothing is useful to control and understand the systematic effects involved in the strategy to compute the decay rate. Following [17], we now expand the smoothed kernel K (l) σ,µν (q, ω) as a polynomial of e −aω (we will set a = 1 for simplicity) up to some order N , i.e., 30) with N coefficients c (l) µν,k (q; σ). In this way, the target quantityX , which now also depends on the smearing parameter σ, can be computed as The factor e −2ωt 0 has been introduced, and compensated for in K (l) σ,µν (q, ω; t 0 ) = e 2ωt 0 K (l) σ,µν (q, ω), in order to avoid the equal-time matrix element t 1 = t 2 , see Eq. (2.20), which contains contributions from the opposite time ordering corresponding to unphysical b scb final states. We will discuss suitable choices for the free parameter t 0 together with the discussion of the analysis of actual simulation data. Inserting now Eq. (2.25) we arrive at the compact expressionX which relates C µν (q, t), which can be computed on the lattice, toX . The expression is understood to be an approximation ofX (l) σ (q 2 ) due the truncation to a finite value N ; we use the same convention for all similar quantities that we address in the following sections. Note that the order N of the polynomial approximation is now directly related to the separation in Euclidean time of the two charged currents in the four-point function in Eq. (2.20). What remains to be done towards the computation of the decay rate for a given value of σ, is to carry out the phase-space integration in Eq. (2.9).

Data analysis
In the previous section we reduced the problem of computing the inclusive decay rate to that of finding a suitable polynomial approximation for the kernel K (l) σ,µν (q, ω; t 0 ). Here we describe two separate methods that we follow (and later compare in Sec. 4), for determining the expansion coefficients c The analysis has to deal with the statistical noise from the data and also systematic errors, e.g. those associated with the polynomial approximation. Here we consider data for a single lattice spacing and lattice volume, leaving discretisation and finite-volume errors for future studies.
In principle,X (l) σ (q 2 ) as defined in Eq. (2.32), could be computed straightforwardly from lattice data for C µν (q, t). For a given order N , the coefficients c (l) µν,k in the power series for the analytically known kernel K (l) σ,µν (q, ω) could, for instance, be determined via linear regression, allowing to constructX (l) σ (q 2 ) from the data for C µν (q, t). The order of the expansion is limited by the number of time slices in the window where C µν (q, t) can be extracted from the lattice data. Unfortunately, the exponential deterioration of the signalto-noise ratio with increasing Euclidean time separation t makes a meaningful signal for the decay rate difficult to extract. What is needed is some form of regulator that provides balance between statistical noise and systematic error due to the truncation. We proceed with outlining two methods that achieve this: one based on Chebyshev polynomials and the other based on the modified Backus-Gilbert method.
For the sake of readability we introduce the following notation where we made use of Eq. (2.26) and defined |ψ ν (q) = e −Ĥt 0J ν (q, 0) |B s / 2M Bs . Note that the kernel has been promoted to an operator, K (l) σ,µν (q,Ĥ; t 0 ).

Chebyshev-polynomial approximation
Chebyshev polynomials T k (ω) defined on −1 ≤ ω ≤ 1 provide an optimal approximation of functions under the L ∞ -norm. We provide a summary of basic properties in App. A.
For the case at hand we define shifted Chebyshev polynomialsT k (ω), which are defined in the interval where expressions for the coefficients A and B can be found in Eq. (A.23). The kernel function from the previous section can then be expanded up to order N as whereT 0 (ω) = 1 by definition, andT with coefficientst where the weight function Ω h (x) is defined in App. A. In this way, the expectation value of the kernel operator is By construction, in particular thanks to the condition of Eq. (A.9), shifted Chebyshev polynomials are bounded, |T k (ω)| ≤ 1. As we will discuss later, this a crucial ingredient in the data analysis: in order to make use of this property, we divide the terms ψ µ |T k (Ĥ) |ψ ν by a normalisation factor ψ µ |ψ ν = C µν (2t 0 ). For a more compact notation we define where in this case there is no summation on µ, ν. We refer to T k µν as the Chebyshev matrix elements, for which, thanks to the normalisation, | T k µν | ≤ 1. In terms of the Chebyshev expansion the expression forX and explicitlȳ The Chebyshev matrix elements can be constructed directly from the lattice data using Using the properties of shifted Chebyshev polynomials as detailed in App. A.2, we can directly relate the matrix element T k µν to the correlatorC µν . In particular, are defined in (A.15). Overall the full Chebyshev expansion of the kernel reads where we emphasise once more that the analytical expressions for the coefficientsc µν,j and t (j) k are known and can be evaluated. Collecting the coefficients intō we arrive at the compact expression µν,k is known in terms of solvable analytical expressions,C µν (k) needs to be computed on the lattice using Monte-Carlo methods. The resulting statistical error onC µν (k) can lead to violations of the bound | T k µν | ≤ 1 when solving the linear system in Eq. (2.55). This can however be avoided in a Bayesian analysis of the correlator data, imposing the bound in terms of priors. One way to impose the constraint is to use a Gaussian prior on some internal parameters τ k µν ∼ N (0, 1) and convert it to a flat prior on the interval . We refer to App. C for a thorough discussion on the fitting procedure that we adopt.

Backus-Gilbert
A different approach to determine the polynomial approximation of the kernel is given by a variant of the Backus-Gilbert method [37] proposed in [22,38]. In this work, we consider a more general scenario to allow the use of different polynomial bases following [39].
Note that, although what we propose is mathematically equivalent to the approach in [39], our formulation may have the advantage of avoiding some of the numerical technicalities that arise in the original version. Indeed, while the latter requires the inversion of an illconditioned matrix with the help of arbitrary precision arithmetic, our approach relies on the inversion of an equivalent diagonal matrix in the case where an orthogonal polynomial basis is chosen, at least as far as the systematics are concerned. We briefly present the idea below and refer to App. B for a more detailed discussion. Note that we adopt a different notation with respect to the original works (we use F instead of W for the final functional to avoid confusion with the hadronic tensor).
The central idea is to address the reconstruction of the (smeared) kernel K (l) σ,µν of the form µν,k (q, σ; t 0 ) is a set of coefficients to be determined. In order to compute them, the strategy is to minimise the functional is the L 2 -norm of the difference between the target kernel function and its reconstruction, weighted with a smooth function Ω(ω), and is the variance of the corresponding channelX µν,λ encodes the information about both systematic and statistical error, whose interplay is controlled by the parameter λ ∈ [0, 1), which in principle can be chosen by hand. The values of the coefficients g (l) µν,k (λ) for each λ are given by the variational principle, i.e.
We can now devise a method to find the optimal λ * . Following [38], we can simply evaluate the functional F (l) µν,λ at its minimum i.e. F (l) , which then becomes a function of λ, and require that λ * maximises F (l) i.e. an optimal balance between statistical and systematic errors. This is the prescription we follow and take g * (l) µν,k (λ * ). Following the steps for the Chebyshev approach we get for the kernel In particular, considering the domain [ω 0 , ∞), we focus on two choices: • exponential Backus-Gilbert:P k (ω) = e −kω and Ω(ω) = 1 (and set g (l) µν,0 = 0 by hand, as in the original proposal [22]); • Chebyshev Backus-Gilbert:P k (ω) =T k (ω), i.e. the shifted Chebyshev polynomials with Ω(ω) = 1/ e a(ω−ω 0 ) − 1 being the weight that enters in the definition of the scalar product as in (A.13).

Numerical setup
We perform a pilot study using a 24 3 × 64 lattice with 2+1-flavour domain-wall fermion (DWF) [40,41] gauge-field ensembles with the Iwasaki gauge action [42] taken from the RBC/UKQCD Collaboration [43] at lattice spacing a 0.11 fm and pion mass M π 330 MeV. The correlation functions analysed in this paper have been generated with the Grid [44][45][46] and Hadrons [47] software packages. Part of the fits in the analysis have been performed using lsqfit [48,49].
We use the same simulation parameter RBC/UKQCD is using in the heavy-light meson projects on exclusive semileptonic B (s) meson decays [35,[50][51][52]. In particular, the valencestrange quark is simulated using DWF, whereas the valence-charm quark is simulated by using the Möbius DWF action [29,30]. Their masses are tuned such that mesons containing bottom, charm and strange valence quarks have masses close to the physical ones. The bottom quark has been simulated at its physical mass using the Columbia formulation of the relativistic-heavy-quark (RHQ) action [25,26], which is based on the Fermilab heavy quark action [24]. In particular, this formulation allows to reduce the b-quark discretisation effects of order O((m 0 a) n ), O(pa) and O((pa)(m 0 a) n ) by tuning three nonperturbative parameters, one of them being the bare mass m 0 .
For the computation we average over 120 statistically independent gauge configurations, and on each configuration the measurements are performed on 8 different linearly spaced source time planes. We use Z 2 wall sources [53][54][55] to improve the signal. We induce 10 different momenta in the four-point functions in Eq. (2.20) using twisted boundary conditions [56,57] with the same momentum in all three spatial directions. Considering q = 2πθ/L in lattice units we have θ ≡ (θ, θ, θ), where θ indicates the twist. We choose them such that all the momenta are linearly spaced in q 2 : θ k = 1.  where the factor 1.90 is determined by the value of q 2 max = 1.83 in lattice units. We also take θ = 1.90 1 9 and θ = 1.90 2 9 to increase the resolution in q 2 for small momenta. We compute two-point functions for both B s and D s . As discussed in Sec. 2.3, for B s we consider three cases at zero momentum C LS Bs (t, t src ), C SL Bs (t, t src ) and C SS Bs (t, t src ) with different smearing combinations, as indicated by the superscripts "L" (local) and "S" (smeared). The smeared-smeared C SS Bs (t, t src ) is also used to determine the renormalisation constant together with the three-point functions. The sources are smeared gauge-invariantly using Jacobi iteration [58,59] using the same parameters as in RBC/UKQCD's study of exclusive semileptonic decays in [52,60,61].
The D s correlators are relevant mainly for the analysis of the ground-state limit in Sec. 4.4. We consider again three different combinations of smearing at source and sink and we induce momenta for the c quark with the available twists. We show the speed of light from the fitted masses of the D s for the smallest momenta, comparing with the continuum dispersion relation and the lattice dispersion relation in Fig. 3. The latter shows excellent agreement with the fitted energies.
We also compute three-point correlators for the B s → D s lν l process Following the analysis of [35,50,51], we extract its form factors and compare with our inclusive results. The momentum is carried by the charm quark through twisted boundary conditions, q = 2πθ/L. We use a source-sink separation of t snk − t src = 20 in lattice units. The corresponding quark-flow diagram is depicted in Fig. 4. We now move to the four-point correlators defined in Eq. (2.20), which are the building blocks in the computation of inclusive processes. We use the same source-sink separation as for the three-point functions, i.e., t snk − t src = 20 in lattice units. The current J † µ is fixed at the time slice t 2 = t src + 14, such that the time dependence is enclosed in 0 ≤ t ≤ 14 with t = t 2 − t 1 . For this choice we find ground state saturation at the points where we insert the currents. In practice, referring to Fig. 2, the contractions are performed between a b-quark propagator G b (x 1 , x src ) and a sequential propagator Σ cbs (x 1 , x src ). For the latter, we first propagate the s quark to point x snk , starting from a Z 2 wall source at t src ; we then use it as a sequential source at fixed t snk with zero momentum to propagate the b quark. The b quark is propagated to point x 2 , and it is then used again as a source with a specific choice of gamma matrix corresponding to the current J † µ (x 2 ) and the momentum insertion to propagate the c quark.
As before, the momentum q induced through twisted boundary conditions is carried by the c quark. Given that we are dealing with (V − A) currents, we consider all possible However, in the limit of massless leptons the combinations A † µ V ν and V † µ A ν do no contribute to the total decay rate. Indeed, these terms are related to the structure function W 3 as W AV ij + W V A ij = i ij0k q k W 3 , as can be seen analysing parity in Eq. (2.7), which does not contribute to the total decay rate for m l = 0.
The local vector and axial-vector currents used in our lattice calculation receive a finite renormalisation. We use the almost nonperturbative prescription of [62], whereby The subscript "bare" indicates the bare, unrenormalised heavy-light vector or axial-vector current. Z cc V is the vector-current renormalisation constant for domain-wall fermions. Due to the approximate chiral symmetry of domain-wall fermions, Z cc V = Z cc A up to residual chiral-symmetry-breaking effects. The renormalisation constants Z bb V and Z cc V are computed from the charge of the heavy-light mesons, and are defined as BsBs,0 (t snk , t, t src ) and where both the two-and three-point functions are zero-momentum projected. The results for Z bb V = 9.085(50) and Z cc V = 0.80099 (21) are reported in Fig. 5. The coefficient ρ bc V /A is expected to be close to unity and can be computed in perturbation theory. Here we set it to its tree-level value, i.e. ρ bc V /A = 1. This is sufficient for the qualitative study aimed at here, where no attempt is made at taking the continuum limit.
For all the three-point and four-point functions we always average over the spatial directions given that the momentum is the same in all three directions. Note in particular that for the four-point correlators we have to average separately over J † i J i and J † i J k with i = k, which can be seen from Eq. (2.7).

Results
In this section we present and discuss the main results of our work. We first discuss how well the kernels K (l) µν,σ are approximated by the polynomials and then discuss the reconstruction via Chebyshev and Backus-Gilbert methods. Eventually we combine various analysis steps for a prediction of the inclusive decay rate. Towards the end of this section we compare our results with the ground-state contribution. We emphasise that the work presented here focuses on a qualitative understanding of the methods aiming at developing reliable techniques, which in future work can be used to make phenomenologically relevant predictions.

Polynomial approximation of the kernel
In this section we discuss the key aspects of the polynomial approximation. The two ingredients to optimise the approximation are the choice of the starting point of the approximation ω 0 , and the value of t 0 in (2.54). In particular, we choose t 0 = 1/2 in lattice units, such that the exponential growth of the term e 2ωt 0 in the kernels (2.33)-(2.37) is minimal, and the number of data points we can use is maximised. We study two values of ω 0 , i.e. ω 0 = 0 and ω 0 = 0.9 ω min for each momentum q 2 . Note that this section deals purely with the approximation of the kernel with no connection to the data; for the Backus-Gilbert method this means that we set λ = 0.
In Fig. 6 we highlight some of the key features of our approach and in Fig. 7 we show the approximation for different kernels K (l) µν,σ with l = 0, 1, 2. The plots are for the smallest and one of the largest q 2 computed, respectively. Here we illustrate the case of σ = 0.02, which smoothes the step function only mildly. Later we will also discuss the case of larger values of σ.
Some comments are in order. First of all, we point out that with the current data set, the polynomial order N = 9 is the maximum value available. This depends on the size of the lattice and the choice of t src , t 2 and t snk in the four-point correlator. In particular, setting a = 1, the available time slices are 2t 0 ≤ t < t 2 − t src , which in our case correspond to 1 ≤ t < 14. On top of that, we need to make sure that t t 2 − t src , i.e. t 1 − t src 0: the choice N = 9 corresponds to a separation t 1 − t src = 4. Of course, with an improved data set N could be chosen larger and the differences between the two approaches would reduce further.
We also notice that the kernel with l = 0 is the most delicate to treat, as it is the one that shows the sharpest drop to zero at the threshold. Note also that for the case l = 0 we plotted only K (0) 00 as all the other kernels are the same up to a constant factor. Secondly, as shown in Fig. 6 (left) the results for Chebyshev and Backus-Gilbert agree very well and the quality of the approximation seems comparable.
The quality of the approximation varies with ω 0 : as shown in Fig. 6 (right), starting the approximation as close as possible to ω min gives the best result, as the nodes of the interpolation (the points where the target function and its polynomial reconstruction meet) are denser in the allowed phase space in energy (the grey shaded area). This is most evident in the case of large q 2 , as ω min is moved further away from 0. This is then the region where we expect larger deviations for the values ofX (l) (q 2 ) between the two choices of ω 0 . Note also that a value slightly below ω min (e.g. 0.9ω min ) safeguards against statistical fluctuations  µν,σ (q, ω; 2t 0 ), for l = 0 (first row), l = 1 (second row) and l = 2 (third row) with t 0 = 1/2 and σ = 0.02. The left column shows the case of the smallest q 2 = 0.26 GeV 2 , whereas the right column shows one of the largest momentum q 2 = 4.77 GeV 2 . The grey area corresponds to the kinematically allowed range ω min ≤ ω ≤ ω max for the given q 2 . The solid lines show the target function; the dashed lines show the approximation with the Chebyshev approach, whereas the dotted ones show the approximation with Backus-Gilbert with an exponential base and λ = 0.
in the D s -meson mass.

Chebyshev polynomials and Backus-Gilbert in practice
We now discuss the quality of the data analysis as outlined in Sec. 2.4. Focusing first on the Chebyshev-polynomial approach, the correlator data are traded with the fitted Chebyshev matrix elements asC where the coefficientsã (k) j are given by the power representation of the Chebyshev polynomials, see App. A. Following (2.58), the kernel with fitted Chebyshev matrix elements can be written as An example of the Chebyshev matrix elements obtained from the fits can be seen in Fig. 8, where we compare two different extractions according to the starting point of the approximation ω 0 . The plots show the distribution of each order of the Chebyshev matrix elements obtained through the fitting procedure described in C: each histogram plots values obtained for all the 1000 bootstrap bins. We show the axial channel A i A i , as its signal turns out to be particularly clean. In Fig. 9 we show results for the A i A j channel, with i = j, which is found to be the noisiest channel. Here, only few terms can be determined meaningfully by the lattice data. Higher-order terms just follow the flat prior distribution in [−1, 1].
In both cases we observe that a larger number of Chebyshev matrix elements can be determined meaningfully for ω 0 = 0 than for ω 0 = 0.9 ω min . For example, in the A i A i channel the distribution of the former is close to the prior distribution, which is flat between −1 and +1, for N = 9, whereas the latter start flattening at N 7. A possible explanation is as follows: as can be seen from (A.27),ã (k) j | ω 0 =0 = e −0.9ω min kã (k) j | ω 0 =0.9ω min . The additional exponential factor largely cancels the ground-state exponential decay in the correlation function in Eq. (4.1). Hence, the polynomial approximation has less structure to describe and higher-order terms become less relevant. Nevertheless, in both cases the χ 2 of the fits are acceptable and the reconstruction of the data as in Eq. (4.1) gives comparable results.
We now move to the Backus-Gilbert case, for which we have so far only considered the limit λ = 0. In this limit the coefficients of the polynomial approximation are determined without reference to the data. We then consider the case λ = 0 and, by visual inspection of Fig. 10, find that the polynomial approximation of the kernel function gets worse. The effect of non-zero λ can be understood as a correction to the optimal coefficients, as outlined in Sec. B.2. In particular, if we rewrite the coefficients as g * (l) µν,k = γ µν,k is a correction which takes care of reducing the noise coming from the statistical error. for two values ω 0 = 0 (blue) and ω 0 = 0.9ω min (orange) at q 2 = 0.26 GeV 2 . The matrix element T 0 AiAi = 1 by definition and is therefore not shown. This channel is one of the most precise: we find that in both cases the fitting procedure is able to determine the matrix elements up to order N 7, after which the distribution of the bootstrap bins remains flat.

The inclusive decay rate
In this section we present the main results of our work. In Fig. 11 we show the results of X(q 2 ) for all the simulated values of q 2 . For each simulation point we show the results of three studied approaches, i.e., Chebyshev polynomials, exponential Backus-Gilbert and Chebyshev Backus-Gilbert, all of them for both ω 0 = 0 and ω 0 = 0.9 ω min . We find that all sets of three points for a given value of ω 0 agree very well. However, sets with different ω 0 start deviating as we increase the value of q 2 . As discussed in the previous section, this can be understood in terms of the polynomial approximation of the kernel: as q 2 increases, the phase space in ω shrinks, and the two approximations start differing increasingly. Our data indicates that the approximation improves as ω 0 → ω min . In order for the approximations for different ω 0 to be comparable the order of the polynomial needs to be increased for lower ω 0 . It is also conceivable that other systematics like finite-volume or cutoff effects play a role here. These effects are beyond the scope of this work but will have to be addressed in future work.
In the previous section we have seen that the shape of the kernel, and hence, the quality of approximation, varies substantially for different l and q 2 . The degree to which this impacts the combined resultX(q 2 ) depends on the magnitude of each contribution, as  Figure 9. Histogram of the Chebyshev matrix elements T k AiAj with i = j for k = 1, 2, . . . , N with N = 9 for two values ω 0 = 0 (blue) and ω 0 = 0.9ω min (orange) at q 2 = 0.26 GeV 2 . The results for T k AiAj are less well constrained than the ones for A i A i shown in Fig. 8. The minimum of the χ 2 is determined almost entirely by the uniform priors. illustrated in Fig. 12. The plots indicate that the largest contribution originates from the channel with l = 2. The underlying kernel is, at least for smaller values of q 2 , relatively smooth (Fig. 7). We therefore expect less sensitivity to the systematics of the polynomial approximation in this kinematical region but more care is needed for larger q 2 .
We now address the stability against the order of the polynomial N . Starting from the Chebyshev approach, we study the saturation in Fig. 13. We start from the fit with N = 9. The plot shows the result where the first k Chebyshev matrix elements (cf. legend) are taken from the fit, and the remaining N − k are replaced by a flat distribution −1 ≤ T j µν ≤ 1 with j = k + 1, . . . , N . We can see that the signal is dominated by small orders; for ω 0 = 0, the signal is saturated at around N 5, whereas for ω 0 = 0.9 ω min saturation starts at N 3. This is also compatible with the previous discussion on the fit of the Chebyshev matrix elements, cf. with Fig. 8 and Fig. 9.
In order to estimate higher-order contributions, which are not constrained by our data, we study how the results change after adding more terms in the Chebyshev distributions on top of the N = 9 available. In this way we obtain an estimate of the approximation up to N = 50, as in µν,σ (q, ω; 2t 0 ), for l = 0 (first row), l = 1 (second row) and l = 2 (third row) with t 0 = 1/2 and σ = 0.02 in the case of Backus-Gilbert with exponential basis and λ = 0. The value of λ has been chosen to be λ * for each plot, which gives equal weight to the statistical and systematic errors.
terms contribute to the final error only mildly: these observations suggest that the results obtained do not suffer from huge systematic error from the polynomial approximation. A more complete study is however required for a reliable estimate of the underlying systematic effects.
Concerning the Backus-Gilbert method, we investigate the stability around the chosen value of λ * , obtained with the prescription of Sec. 2.4.2. We focus in particular on the

channelX (2)
A i A i as it is the one responsible for the largest contribution. The plot is shown in Fig. 15. We can see that for small q 2 the value ofX(q) is stable, which implies that statistical and systematic errors are well balanced. For larger q 2 the situation is more delicate: this can be understood in terms of the reduced phase space in ω, as shown for example in Fig. 10. A first attempt at mitigating the induced systematic effect could be to identify the region where the two Backus-Gilbert approaches with different bases are consistent, to identify (where possible) a plateau, and to estimate a value inside such region.  (q 2 ) with extra +1/-1 Chebyshev Figure 14. Saturation of Chebyshev polynomial approach, where N = 9 is the reference case, and for N = 50 higher-order terms are sampled from a Z 2 distribution.
In the r.h.s. plot of Fig. 15 we see, however, that this is not always the case: there is no clear plateau region for λ. Interestingly, the statistical error of the Chebyshev approach turns out more conservative in this case, and compatible with the result one would obtain from Backus-Gilbert. More generally, apart from the absence of a plateau region in some cases, both choices of polynomial basis are consistent between themselves and with the Chebyshev-polynomial approach. Coming back to the decay rate, to extract the final result we perform a polynomial fit of degree two onX (l) (q 2 )/( q 2 ) 2−l . The final result is then obtained integrating these results in the physical range in q. Since this is a qualitative study, we don't report any final number; however, the result obtained here seems to be in the right ballpark if compared with the B s meson decay rate. Furthermore, all the approaches give compatible results, Scan over λ Backus-Gilbert q 2 =4.77 GeV 2 CHEB N=9 ω0=0.9ωmin BGexp N=9 ω0=0.9ωmin BGcheb N=9 ω0=0.9ωmin Figure 15. Scan over λ for q 2 = 0.26 GeV 2 (left) and q 2 = 4.77 GeV 2 (right) for the Backus-Gilbert method with exponential and Chebyshev basis forX (2) AiAi with ω 0 = 0.9 ω min . The green shaded line is the reference from the Chebyshev; the magenta points correspond to the choice of λ * . Note that the points λ = 0 and λ = 1 (vertical grey dashed lines) are not included in this plot. and the final statistical error is of order 5%.
In both cases, δX(q 2 ) can be interpreted as a noisy zero that does not impact the naive calculation but helps with variance reduction. This is represented in Fig. 16, which shows the statistical error onX with and without the correction term. The reduction in statistical error is substantial. Additionally, the magnitude of the correction varies depending on ω 0 , where larger values result in a greater increase in |δX(q 2 )| as q 2 increases.
To conclude this section we discuss some of the aspects we neglected for the purpose of this study. In particular, all the results presented here have been obtained with kernels smeared by a sigmoid with a fixed σ = 0.02. Eventually however, one will first have to first take the infinite-volume and continuum limits, followed by an extrapolation to σ → 0. Exemplarily though, we show the σ dependence at finite lattice spacing and volume in Fig. 17. There, one sees that for our setup and statistical precision the dependence on σ is mild. There is an indication that it might be more pronounced for larger q 2 . We argue that here the extrapolation in σ is quite delicate and could lead to misleading results. σX¯X naive BGexp X naive CHEB X BGexp X BGcheb X CHEB Figure 16. Effect of the variance reduction toX naive (q 2 ) from the correction δX(q 2 ) as in Eq. (4.4) for ω 0 = 0.9ω min and N = 9. The y axis shows the standard deviation σX forX naive (q 2 ) (empty symbols) andX(q 2 ) =X naive (q 2 ) + δX(q 2 ) (filled symbols). Indeed, increasing values of sigma would result in kernel functions quite different from the target ones; on the other side, differences in small values of σ will not be captured by a polynomial approximation with small value of N , as small deviations would be noticeable only for higher degrees of approximations.

The inclusive decay rate in the ground-state limit
We now study the ground-state limit of the inclusive approach as discussed in Sec. 2.2, which provides for a cross-check of the inclusive-decay analysis strategies. The four-point  Figure 18. Ground-state limit. The "exclusive" labels refer to the data built from the three-point correlators as in Eq. (4.5), whereas the "inclusive" label refers to the full inclusive data analysis starting from the four-point correlation functions. The analysis has been performed using the Chebyshev approach.
function representing the ground state can be constructed with input from lattice data for the exclusive decay B s → D s lν l . In particular, restricting the discussion to the vector channel V V , the ground-state correlator can be constructed from lattice data for the ratio of three-point and two-point functions R BsDs,µ (t; q) = 4M Bs E Ds C SS BsDs,µ (q, t snk , t, t src )C SS DsBs,µ (q, t snk , t, t src ) C SS Bs (t snk , t src )C SS Ds (q, t snk , t src ) , (4.6) which converges to M µ ≡ D s |V µ |B s for t t src and t < t snk . The matrix element can be decomposed into form factors (4.7) Recalling that we assume p Bs = 0, we then extract f + (q 2 ) from a constant fit to the combination which converges to f + (q 2 ) as R BsDs,µ (t; q) → M µ . We consider only the three smaller momenta to test the approach, as the signal-to-noise deteriorates rapidly with larger q 2 . The result of the inclusive analysis for the channelX V V is reported in Fig. 18. In particular, we compare the expected value (2.18) from the extracted values of f + (q 2 ) with the inclusive analysis performed using the mock data C G µν and the real data C µν . Note that for the mock data the normalised correlator corresponds simply toC G µν (t) = e −E Ds t by construction.
We find excellent agreement between the results from the conventional analysis for exclusive decay on the one side, and the one based on ground-state saturation, but using the full analysis chain adopted for the inclusive decay, on the other side. This provides a strong test of the analysis method for inclusive decay discussed in this paper. The results for the full inclusive decay on the other hand differ significantly from the exclusive case: while future studies will have to establish to which extend this could be down to systematics like finite-volume or cutoff effects, the magnitude of the effect makes appear likely to be to a large part due to contributions from the tower of finite states contributing to the inclusive decay. In particular, the deviation is expected to be larger for smaller q 2 , as the available phase space in ω is larger and may include more excited states.

Conclusions and outlook
In this work, we have presented a full and flexible setup for studying inclusive semileptonic decays in lattice QCD, focusing in particular on B (s) mesons. We incorporate and compare Chebyshev polynomials and the Backus-Gilbert method, both of which enable efficient and accurate calculations of the total decay rate. In particular, we improved the Chebyshev polynomial technique through the use of a generic set of shifted polynomials in e −ω , and we refined the statistical analysis with a bootstrap method, fully accounting for the bounds [−1, 1]. We also showed how the result depends on the number of Chebyshev matrix elements and presented a possible way to take the limit N → ∞ to address the systematics associated with the polynomial approximation. On the Backus-Gilbert side, we introduced a generalisation of the method of [22] to allow for the use of arbitrary bases of polynomials.
The two methods have been shown to be compatible, and the final results for the decay rate are in agreement. We compared how the two techniques deal with the variance reduction of the final observable: the Chebyshev polynomials' approach relies on trading the data with Chebyshev matrix elements that fully account for the bounds, whereas the Backus-Gilbert method achieves the same goal by modifying the coefficients of the polynomial approximation to reduce the statistical error. We also studied the ground-state limit, which offered a cross-check of the inclusive analysis technique and outlined the effect of excited states in the inclusive decay with respect the corresponding exclusive process B s → D s lν l .
Overall, our work provides a solid foundation for future studies with these techniques. However, there are still several areas that require further investigation, including systematic errors associated with the polynomial approximation, finite-volume effects, discretisation errors, and the continuum limit. We intend to address these issues in future works, repeating the computations on more ensembles and also addressing similar processes involving D s mesons, which offer a more controlled environment. Additionally, we plan to explore alternative observables such as hadronic and lepton moments to compare with experimental data and to gain a deeper understanding of the ground state limit, which may provide useful insight on the physics contributing in such processes.

A Chebyshev polynomials
We summarise here important properties of the standard Chebyshev polynomials relevant for this work and in particular the generalisation for the shifted version extensively used in the analysis. We refer to other sources [63] for more details.

A.1 Standard polynomials
The standard Chebyshev polynomials of the first kind are defined as They are orthogonal with respect the scalar product where Ω(x) = 1/ √ 1 − x 2 is a weight function. Their polynomial expansion in x k is given by where the prime indicates that the first term is halved.

A.1.1 Expansion in Chebyshev polynomials
Chebyshev polynomials provide the best approximation of the function f : [−1, 1] → R to any given order N in terms of the L ∞ -norm. In other words, the minmax error, i.e. the maximum difference between the target function and the reconstructed one, is minimised. In particular, for the functions considered in this work, it is guaranteed that the Chebyshev approximation converges when N → ∞. The polynomial approximation reads where we recall that T 0 (x) = 1 by definition. The coefficients are given by the projection of the target function f on the basis of Chebyshev polynomials.

A.2 Shifted Chebyshev polynomials
In general, for the purpose of this work we consider generic functions f (x) defined in an interval [a, b], which we want to approximate with Chebyshev polynomials in e −x . To this end we can define shifted polynomialsT n (x) with x ∈ [a, b], such that their domain matches the one of the target function. The relation to the standard polynomials is given bỹ where h : The orthogonality relation for the shifted polynomials reads where Ω h (x) is the new weight for the shiftedT k , which depends on the map h. To show that this recover the original integral in Eq. (A.2), we set x = h −1 (y) and dx = We can also generalise the polynomial expressions and their properties. The polynomial representation reads (A.14) We can expand this sum explicitly and re-sum it in order to isolate the coefficients of e −kx . We obtaiñ In a similar way we can generalise the power representation as and starting fromp 0 = 1 we can work out iteratively the general expression for e −nx as We can finally collect the numerical coefficients and rewrite everything in terms of the shifted Chebyshev polynomials as The set of coefficients {ã n } can be easily found numerically for each value of n.

A.2.1 Expansion in Chebyshev polynomials with exponential map
We have now all the elements necessary to proceed with the polynomial approximation of a generic function in e −x . For the purpose of this work we will restrict ourselves to the case f : [x 0 , ∞) → R. In particular, the approximation is now The coefficients can be rewritten more explicitly as The last equality follows from setting y = h(x) and inverting In this case, the coeffients A and B are given by

A.2.2 Matrix relations
In this subsection we illustrate some useful properties that arise when setting x 0 = 0, assuming the domain of the target function in [x 0 , ∞). We then explicitly consider B = 1 and A = −2e −x 0 to simplify the treatment, but the following discussion can be generalized trivially. We start expressing (A.15) in matrix notation It is clear that these (n + 1) × (n + 1) matricest with (t) ij =t where A kk = A k = (−2e x 0 ) k is a diagonal matrix, P jk = j k is the lower triangular Pascal matrix and the matrix t follows from (A.4). This expression makes it easy to see the effect of x 0 : considering A kk

B Generalised Backus-Gilbert
In this appendix we reformulate and generalise the modified Backus-Gilbert approach proposed in [22,38,39]. The idea is to provide a more general framework which allows for the use of an arbitrary basis and to explore the properties and numerical advantages of different choices.

B.1 The method
The problem we want to address is the evaluation of a generic observable O of the form where K(ω) is a function we will refer to as kernel and ρ(ω) is the spectral function related to a given correlation function While typically the range of integration is a = 0 and b = ∞, here we chose to leave it generic to keep the discussion general. The idea to address the computation is to approximate the kernel in polynomial up to some degree N , i.e. K(ω) = N j=0 g j e −ωj , such that the target observable can be estimated as For example, a typical problem consists in the extraction of the spectral density of a correlator, in which case one would consider the kernel to be a smoothed Dirac delta K(ω) = δ σ (ω) with a finite width σ, as for example a Gaussian. The approach consists of weighting the two functionals A[g] and B[g] against each other, where the first one provides a measure for the systematic effects coming from the polynomial approximation, and the second one provides a measure for the variance σ 2 O of the observable O, in particular, . This is equivalent to solving a minimisation problem with constraints. We can then define a new functional F θ as and determine the coefficients by variational principle ∂F θ [g] ∂g j = 0 at different values of θ 2 . The value θ 2 = 0 corresponds to addressing exclusively the polynomial approximation, as prescribed by the choice of A[g], whereas the choices θ 2 → ∞ would correspond to dealing purely with the variance minimisation and would result in g j = 0. Note that we can map θ 2 = λ 1−λ for simplicity, such that λ ∈ [0, 1) and θ 2 → ∞ for λ → 1. Furthermore, any relative normalisation term between the two functionals can be reabsorbed into θ 2 . Depending on the choice of the basis, the coefficients g j may grow over different orders of magnitude and numerical instabilities may appear. This can be addressed in practice by using arbitrary precision arithmetic.
We now discuss in detail how to generalise the modified Backus-Gilbert [22] for a generic basis of functions, starting from the construction of A[g]. Following the original paper we can generalise the L 2 -norm of the difference between the target function and the polynomial reconstruction using an arbitrary family of basis function P k (x) = k j=0 p (k) j x j defined in an interval x ∈ [p − , p + ]. As for the Chebyshev, we will deal in general with a shifted version of this family of polynomials in e −x defined in a generic interval [a, b] With respect to the original version we now have introduced a generic weight Ω(ω); note that we start the approximation atP 0 (ω) (as long as Ω(ω) can be integrated in [a, b]).
If we consider only the A[g] term, the solution of the system by variational principle is given by dω Ω(ω)P i (ω)P j (ω) , (B.8) dω Ω(ω)P i (ω)K(ω) , (B.9) and g is a vector of parameters. With this setup, the convenient choice consists in picking a set of (shifted) orthogonal polynomials with Ω being the actual weight that defines the scalar product. The advantage is immediately clear, as the matrix A becomes A ij = P i ,P j , (B.11) and the coefficients are given by dω Ω(ω)P i (ω)K(ω) . (B.12) Since the matrix A is now diagonal, the inverse required to compute Eq. (B.7) is analytically known. Furthermore, the solution is now equivalent to the projection on the polynomial basis.
We can now include the B term, i.e. the covariance matrix of the data. Note that in general we now need to consider a linear combination of the correlator at different time slices according to the polynomial basis, i.e. The full functional is then where B ij = σ P ij . If A is diagonal (and possibly proportional to the identity), the inversion of the matrix F θ may be better conditioned and possible numerical instabilities arising from an ill-conditioned matrix A may be avoided.
On top of that we could also implement some constraints that our approximation has to fulfil. In particular, following what was done for the spectral function in [22,38], we can require that the polynomial approximation preserves the (weighted) area of the target function, i.e. Taking into account these constraints, the solution becomes (B.21) The final observable then reads O θ N j=0 g θ,j C P (j) , (B.22) for a given value of θ. The choice of θ is in principle arbitrary. A common choice is to take the value θ * that gives equal weight to the A and B functional, A[g θ * ] = B[g θ * ], i.e. an equal weight to statistical and systematic error. For a given choice of θ, it is important to make sure that the value of the final observable is stable for small changes in θ, in order to make sure that the procedure did not introduce any bias.

B.2 A different perspective
The previous reformulation in Sec. B.1 allows us to rely on arbitrary polynomials for the approximation. In this general picture it is useful to consider a different perspective to the method: we can reduce the problem to finding a suitable correction to the optimal coefficients, i.e. g j = γ j + j , where γ j are the coefficients of the polynomial approximation coming purely from the functional A[γ], i.e. γ = A −1 K as in Eq. (B.7), and j a correction that takes into account the data. We can then rewrite the functional as which is equivalent to the previous approach. It is then clear that j are by construction coefficients that should not modify the quality of the polynomial approximations but take care of the reduction of the statistical noise. In practice, this will of course depend on the choice of θ.

C Fit strategy
We discuss the general strategy for the Bayesian fit used in the analysis. We consider only linear fits, as these are the ones directly relevant for this work. To keep the discussion very general we consider a linear model in the form y(p, x) = M α=1 p α X α (x) , p = (p 1 , p 2 , · · · , p M ) , (C.1) where X α (x) are known coefficients (which in principle can depend on x) and p α are M parameters we want to determine.

C.1 MAP with bounds
We address the fits using Bayesian statistics, in particular using a maximum a posteriori (MAP) probability estimate, which relies on an augmented χ 2 with Gaussian priors. On top of that, we implement generic bounds on the parameters. The way we address this is by "wrapping" the parameters in a function p α = f (π α ) which encodes the desired bounds. In this case, the fit is performed on the new parameters π α , and the prior is introduced accordingly. The augmented χ 2 reads Note that the prior distributions refer to the internal parameters π α and are assumed to be Gaussians. This allows to deal with a more generic distribution for the parameters p α , depending on the shape of the wrapping function f . The parameters are found as usual by imposing ∂χ 2 aug ∂πγ = 0; note that in this case the problem is no more linear due to the presence of f .

C.2 MAP with bootstrap
As outlined in the sections above, the presence of a "wrapping" function on the parameters p α implies that their distribution is in general non Gaussian. This is obvious from the fact that we assume the internal parameters π α to be Gaussian and that the wrapping function implements some bounds, therefore limiting the domain of p α . Instead of fitting the central value of the data and estimating their error from the inverse of the curvature matrix (the Hessian of the χ 2 with respect to the parameters), it is then more convenient to adopt a bootstrap approach, such that the procedure automatically takes into account any deviation from Gaussianity. In practice, one would then fit all the bootstrap bins and reconstruct the distribution of the parameters, treating the error accordingly.
The approach we adopt consists in assuming a normal distribution for the internal parameters π ∼ N (µ, σ) such that p α = f (π α ) is distributed according to our prior knowledge of the parameters. In practice, considering a set of N b bootstrap bins with corresponding data y b i , we perform N b fits to the data where each time we use a different prior value π b α sampled from the normal distribution N (µ, σ). This ensures that the correct prior is assumed for p α . For example, in the case where the data contain little information and min(χ 2 aug ) min(χ 2 prior ), the fit gives back the prior information we encoded by hand.