The photon energy spectrum in $B\to X_s\gamma$ at N$^3$LL$'$

We present predictions for the photon energy spectrum in inclusive $B\to X_s\gamma$ decays mediated by the electromagnetic penguin operator $O_7$ to N$^3$LL$'$. We use soft-collinear effective theory (SCET) to resum the singular contributions in the peak region at large photon energy. In the tail region the resummed predictions are matched to fixed order at N$^3$LO, where we include the known fixed-order contributions for $O_7$ up to $\mathcal{O}(\alpha_s^2)$. We develop a method to suitably parametrize the still unknown $\mathcal{O}(\alpha_s^3)$ nonsingular corrections in terms of theory nuisance parameters, whose variations provide an estimate of the associated theory uncertainty. In this context, we also study different ways to treat higher-order cross terms in the matching. Another important aspect of our analysis is the short-distance scheme used for the $b$-quark mass $m_b$. We find that in the present context, the 1$S$ mass scheme, which was previously used up to 2-loop order, fails to work at 3-loop order, because the mass scheme enters at a soft scale much smaller than $m_b$ here, for which the 1$S$ scheme was not devised. Using instead the MSR mass scheme with $R\sim 1\,\mathrm{GeV}$, we obtain stable results with good perturbative convergence up to N$^3$LL$'$.


Introduction
The flavor-changing neutral-current b → sγ transition plays a key role in exploring the flavor sector of the Standard Model (SM) [1][2][3][4] and in searches for possible physics beyond the SM [5][6][7]. A prominent example is the inclusive B → X s γ decay, whose normalization is sensitive to beyond-SM contributions.
Experimental measurements of B → X s γ are most sensitive in the peak region at large photon energy E γ , where as a result also most information on the normalization of the B → X s γ rate comes from. Furthermore, the shape of the E γ spectrum is directly sensitive to the b-quark distribution function, known as shape function, which describes the relevant nonperturbative dynamics of the b quark within the B meson [8][9][10]. Recently, the first global fit exploiting all available experimental information on the B → X s γ spectrum [11][12][13][14] was carried out by the SIMBA collaboration [15], simultaneously extracting the normalization of the B → X s γ rate, encoded in the effective inclusive Wilson coefficient C incl 7 , the b-quark mass m b , as well as the shape function. The shape function is a universal object that also enters the description of inclusive B → X u ν [9,10,16] and B → X s + − [17,18] decays, where one restricts the phase space to small hadronic invariant masses to suppress the otherwise overwhelmingly large background from b → c ν transitions. In particular, it enters in the extraction of the Cabibbo-Kobayashi-Maskawa (CKM) matrix element |V ub | from B → X u ν, which is important for overconstraining the flavor sector of the SM as it is one of the few tree-level quantities. Inclusive determinations of |V ub | show some tensions with its determination from exclusive decays as well as the indirect determination from the CKM unitarity [19].
In the analysis of ref. [15], the theoretical predictions for the B → X s γ spectrum are obtained at NNLL +NNLO. The final fit results exhibit a similar size of theoretical and experimental uncertainties. On the experimental side, upcoming and future Belle II measurements [20,21] of B → X s γ and B → X u ν will further reduce the experimental uncertainties. To benefit from the improved experimental precision, the theoretical predictions have to be improved likewise.
In this work, we take an important step in this direction by extending the resummed predictions to the next order, N 3 LL , taking advantage of the recent computation of the 3-loop jet function and heavy-to-light soft function [22,23]. At this order, the 3-loop hard function, corresponding to the b → sγ form factor, is also needed but currently not known. Therefore, in our numerical results, we treat its unknown nonlogarithmic constant term at O(α 3 s ) as a theory nuisance parameter [24,25] which we vary as part of our perturbative uncertainties.
To obtain a complete description of the spectrum away from the endpoint, we have to match to the full fixed-order result. At N 3 LL , this requires one to perform the matching to N 3 LO. Since the full fixed-order results at this order are not known, we devise a method to parametrize the missing ingredients in terms of a set of theory nuisance parameters c k [24,25] in such a way that the matching can be performed in a consistent manner and the perturbative uncertainties due to the missing ingredients can be estimated. We denote the so-constructed matched result as N 3 LL +N 3 LO(c k ). It provides a description of the B → X s γ spectrum which benefits from the improved precision at N 3 LL in the peak region while consistently including the known fixed-order results up to O(α 2 s ) [26][27][28]. Throughout the paper, we focus on the contributions from the electromagnetic penguin operator, O 7 , in the electroweak Hamiltonian, which induces the to-be-resummed singular contributions that dominate at large photon energies. At sufficiently high order, operators other than O 7 also produce singular contributions, but these are always O 7 -like and are automatically included via the definition of C incl 7 . The purely nonsingular contributions from non-O 7 operators can simply be added to the order they are known as in ref. [15], so we do not discuss them here further.
Extending the resummation order to N 3 LL turns out to be more subtle than one might naively expect. We find that the 1S mass scheme [29][30][31], which was used at NNLL in refs. [15,32], fails to provide a stable prediction at N 3 LL . The cause for this failure lies in the intrinsic scale R 1S (µ S ) of the 1S scheme, which at the soft scale µ S becomes large and incompatible with the power counting of HQET. On the other hand, by using the MSR scheme [33] we are able to obtain stable predictions that exhibit good perturbative convergence. Furthermore, at N 3 LL it turns out to become necessary to switch to a shortdistance mass scheme also in the jet and hard functions, which makes the RGE of the hard function more involved, where m b plays the role of setting the hard kinematic scale.
The outline of this paper is as follows. In section 2 we present our theory framework and discuss all the ingredients for computing the B → X s γ photon energy spectrum at N 3 LL +N 3 LO(c k ). We discuss the implementation of the short-distance mass scheme in more detail in section 3. Our methodology for estimating the perturbative uncertainties of our predictions is given in section 4. In section 5, we present our numerical results and discuss alternative choices for the b-quark mass scheme and the treatment of higher-order perturbative terms. We summarize our findings in section 6.
2 Theory framework

Overview
We follow the setup of ref. [15] and write the B → X s γ photon energy spectrum as where m b denotes the b-quark mass in a short-distance scheme and The function P (k) is perturbatively calculable and corresponds to the partonic b → sγ spectrum with k ∼ m b − 2E γ . We write it as [15] P (k) = C incl 7 2 W s 77 (k) + W ns 77 (k) + 2 Re(C incl 7 ) i =7 3) The coefficient C incl 7 contains by definition all virtual contributions from operators in the electroweak Hamiltonian that give rise to singular contributions [15,34]. It is dominated by the Wilson coefficient C 7 of the electromagnetic operator O 7 . In this paper, we focus on the contributions proportional to C incl 7 2 , which are discussed in more detail in the following.
The function W s 77 (k) in eq. (2.3) accounts for the contributions to the partonic spectrum ∼ δ(k) and ln n (k/ m b )/k that are singular and dominant in the peak of the spectrum where k m b . They can be resummed to all orders based on their well-known factorization [35,36]. We will resum them to N 3 LL , as discussed in more detail in section 2.2. Note that the overall factor (2E γ / m b ) 3 in eq. (2.1) has a purely kinematic origin. It arises from the photon phase space integration and derivative operators in the photon field strength tensor of O 7 . As in ref. [15], we factor it out of the singular contributions and keep it unexpanded in the endpoint region.
The function W ns 77 (k) in eq. (2.3) contains the remaining nonsingular contributions to the partonic spectrum. They start at O(α s ) and are accompanied by powers of k/ m b relative to the singular contributions in W s 77 . Thus, they are power-suppressed in the peak region where k m b . On the other hand, in the tail region where k ∼ m b , they are of similar size as the singular contributions. We will include their known results at fixed order to O(α 2 s ), while the currently unknown corrections at O(α 3 s ) are estimated by introducing appropriate theory nuisance parameters, as discussed in section 2.3.
The remaining non-77 contributions W ns i7 and W ns ij in eq. (2.3) are purely nonsingular and thus only become relevant in the tail region. Since they do not have singular counterparts, they can be straightforwardly added to the order they are known, as was done in ref. [15]. We neglect them in the following, since our focus here is on the resummation and fixed-order matching of the dominant 77 contributions. Similarly, we neglect the remaining O(Λ QCD / m b ) terms in eq. (2.1), which contain subdominant resolved and unresolved contributions.
To ensure that eq. (2.1) reproduces the full fixed-order result in the tail with a smooth transition between the peak and tail regimes, we use profile scales to gradually switch off the resummation away from the peak region, as discussed in section 2.4.
The nonperturbative function F(k) in eq. (2.1) contains the leading shape function as well as the combination of subleading shape functions that appear at tree level in B → X s γ. It is discussed in section 2.5. The partonic spectrum P and the hadronic shape function F are completely factorized in eq. (2.1). This factorization enables a coherent description of the spectrum in both peak and tail region. In the tail region only the first few moments of F are relevant, while in the peak region its full form is required.

Singular contributions
The singular contributions W s 77 (k) are the leading contributions to the spectrum in the limit k m b . Their well-known factorization theorem [35,36] allows us to systematically resum the large logarithmic distributions to all orders in perturbation theory. Here we make use of the SCET-based factorization theorem following ref. [32] where h s , J, and C 0 are the hard, jet, and partonic soft functions, respectively. The hard and soft evolution kernels, U H and U S , evolve the hard and soft functions from their characteristic hard and soft scales, µ H and µ S , to the jet scale, µ J , thereby summing logarithms of the form ln(µ H /µ J ) and ln(µ J /µ S ). Since we choose to evolve everything to the jet scale, the jet evolution kernel U J (p 2 , µ J , µ J ) = δ(p 2 ) drops out. The hats indicate that an object is defined in a renormalon-free short-distance scheme, as discussed in more detail in section 3. Explicit results for the perturbative ingredients are collected in appendix A. In eq. (2.4), h s (µ H ), J(µ J ), C 0 (µ S ) are the boundary conditions for the evolution, which are evaluated at fixed order. When doing so, by default we always strictly reexpand their product to the given order in α s , i.e., we count α s (µ H ) ∼ α s (µ J ) ∼ α s (µ S ) and drop all higher-order cross terms in the product of their fixed-order series. By doing so, the strict fixed-order expansion of W s 77 in terms of a common α s (µ) is reproduced simply by taking all scales to be equal µ H = µ J = µ S = µ. This is different to refs. [15,32], where only the product J ⊗ C 0 is reexpanded, while the hard function is kept unexpanded as an overall multiplicative factor, which results in keeping certain higher-order cross terms in the fixed-order limit. The effect of these differences is studied in section 5.2.
To resum the singular corrections using eq. (2.4) to N 3 LL order, we have to include the fixed-order boundary conditions h s (µ H ), J(µ J ), C 0 (µ S ) to O(α 3 s ) and use the 3-loop noncusp and 4-loop cusp anomalous dimensions as well as the 4-loop beta function in the evolution factors U H (µ H , µ J ) and U S (µ J , µ S ). The jet and soft functions have been computed up to three loops in refs. [16,22,37] and refs. [16,23,38], respectively.
Regarding the hard function, its 3-loop anomalous dimension is also known via the consistency relation with the jet and soft anomalous dimensions and is given in ref. [23]. The full hard function is currently only known up to NNLO [32,39]. We account for all the logarithmic terms at N 3 LO, which are determined using the RGE in terms of the known anomalous dimensions and lower-order constant terms. The result is given in appendix A.2. Thus, the only missing ingredient to obtain W s 77 at full N 3 LL is the finite, nonlogarithmic 3-loop constant of the hard function, h 3 , which is defined by expanding the (pole-scheme) hard function at µ = m b as We treat this unknown constant as a theory nuisance parameter, where the range of variation is estimated using the Padé approximation We will see in section 5 that it only has a minor impact on the perturbative precision of our results. For future reference, we write the fixed-order expansion of the singular contribution up to N 3 LO as where we have made the fixed-order µ dependence and its order-by-order cancellation explicit. Numerically, we have where L n (x) ≡ [ln n (x)/x] + are the usual plus distributions defined in eq. (A. 4). Note that at this order the unknown 3-loop constant h 3 appears only in the δ(x) coefficient, so w s (3) 77 (x) is completely known for x > 0. The correction terms ∆w s(n) 77 (µ, x) in eq. (2.8) arise from switching to the short-distance b-quark mass m b , and their µ dependence separately cancels among them order by order. They are discussed in more detail in section 3.4, and their explicit expressions are provided in eq. (3.12).

Nonsingular contributions
The nonsingular contribution W ns 77 (k) is included at fixed order. It is obtained by subtracting the fixed-order singular terms from the full fixed-order result for dΓ/dE γ , which is known up to O(α 2 s ) [26][27][28]. We write its perturbative expansion up to N 3 LO as The overall 1/(1 − x) 3 factor is included by convention. Explicit expressions for w ns (1) 77 (x) and w ns (2) 77 (x) are given in eq. (S21) in ref. [15]. The O(α 3 s ) function w ns (3) 77 (x) is currently unknown. The ∆w ns(n) 77 (µ ns , x) terms arise from switching to the short-distance mass m b and are given in eq. (3.13).
In the peak region for k m b , the nonsingular are power-suppressed by k/ m b relative to the singular. Hence, the two can be considered as independent perturbative series, which are treated separately from each other. In particular, it is consistent to include the nonsingular only at fixed order, while the singular are being resummed. By contrast, for k ∼ m b , the separation into singular and nonsingular becomes ill-defined and only the full result given by their sum, W full 77 = W s 77 + W ns 77 , is meaningful. This is reflected by the fact that there are typically large numerical cancellations between the singular and nonsingular contributions for k → m b , as we will see explicitly in section 2.4. Consequently, W s 77 and W ns 77 must be included using the same perturbative expansion in this limit, i.e., at the same scale and the same perturbative order, to ensure that the cancellations between them can take place and the proper full result is recovered. This has important ramifications. First, since the full and nonsingular results are only known at fixed order, it is essential to turn off the resummation for W s 77 for k ∼ m b such that it also reduces to its fixed-order result. Second, the N n LL resummation reduces to the fixed O(α n s ) singular result, so consistently matching it to fixed order requires including the nonsingular to O(α n s ). Therefore, at N 3 LL we need W ns 77 to O(α 3 s ), which means we have to parametrize the unknown nonsingular function w ns (3) 77 (x). We do so by considering its required asymptotic behavior in the x → 0 and x → 1 limits. In particular, for x → 1 we have to account for the singular-nonsingular cancellations at O(α 3 s ), which basically implies that w are not independent. In addition, we want to exploit the parameterization to estimate the perturbative uncertainty due to the missing w ns (3) 77 (x). We begin by separating w ns (3) 77 (x) into a "correlated" and an "uncorrelated" piece, w ns (3) 77 (x) = w ns (3) cor (x) + w ns (3) uncor (x) . (2.11) The "correlated" term w ns (3) cor (x) is designed to completely cancel the singular corrections in the x → 1 limit without disturbing the hierarchy between singular and nonsingular contributions in the x → 0 limit. We define it as where the 3-loop singular function w 77 (x) is defined in eq. (2.8). The overall factor (1 − x) 3 simply cancels the overall 1/(1 − x) 3 in eq. (2.10). The remaining "uncorrelated" piece w ns (3) uncor (x) can now be considered independent of the singular contribution, so we can parametrize it. We do so by expanding it as , (2.13) where the function L(x) is positive for 0 < x < 1 and has the following asymptotics in the x → 0 and x → 1 limits, (2.14) We construct L(x) using w ns (1) 77 (x) with the expectation that its powers provide a reasonable guess of the possible shape of the higher-order function w ns (3) 77 (x) in the intermediate region 0 < x < 1. Furthermore, eq. (2.13) incorporates the knowledge that at O(α 3 s ) the nonsingular in the limit x → 0 is a degree-5 polynomial in ln x. Therefore we include up to five powers of L(x) to ensure we are able to probe the complete logarithmic structure in the small-x limit.
The parameters c ns 0...5 can be treated as theory nuisance parameters with zero central values and the following variation magnitudes: . The reason is that, similar to the leading-power case, it is expected that the universal cusp and a set of subleading noncusp anomalous dimensions govern the logarithmic structure at subleading power. Thus, we similarly exploit 4x w s(3) 77 (x) for estimating the typical size of δc ns 1...5 .
To assess the uncertainty due to the missing w ns(3) 77 we separately vary each nuisance parameter in the ranges given above. The different nuisance parameters are considered as independent such that the resulting individual uncertainties from varying them are added in quadrature. This is discussed in more detail in section 4. Of course, this means that the uncertainty necessarily increases by adding more parameters. Therefore, to obtain a realistic uncertainty estimate and avoid becoming overly conservative we also put the constraint that the total uncertainty estimate for the missing 3-loop correction does not exceed the size of the 2-loop corrections. To satisfy this constraint, the values for δc ns 2,3,4 are chosen somewhat smaller than the corresponding coefficients of 4x w s(3) 77 (x) in eq. (2.16). It is easy to see from eq. (2.14) that all powers of L(x) approach zero in the far tail of the spectrum. Thus, in this limit the constant term c 0 dominates. Therefore, we estimate the size of δc ns 0 using the Padé approximation from the corresponding lower-order corrections, Finally, we like to stress that the purpose of the parameterization given by eqs. (2.11) and (2.13) is not to construct an approximation of the unknown function w ns (3) 77 (x). Rather, the goal is first to enable a consistent matching at N 3 LL , which is essentially achieved by the separation in eq. (2.11), and second to obtain a reliable estimate of the perturbative uncertainty due to its unknown form. For this purpose, we only need to estimate the typical size of the theory nuisance parameters, say within a factor of a few, for which the above considerations are sufficient. However, since eq. (2.10) includes all known O(α 3 s ) contributions that are predictable from lower orders, and since the parameterization of the remaining unknown w ns (3) 77 (x) does include nontrivial information on its structure, we do expect some improvement in the perturbative precision of our predictions beyond O(α 2 s ), which will be reflected in the size of the resulting uncertainty, as we will see in section 5.

Profile scales
The resummation of the singular contributions is determined by the choices of the hard µ H , jet µ J , and soft µ S scales in the factorization theorem in eq. (2.4). To achieve the proper resummation they must be chosen according to the kinematics relevant in the different regions of the spectrum. In addition, the nonsingular scale µ ns determines the scale at which the fixed-order nonsingular terms in eq. (2.10) are evaluated.
For our scale choices we follow ref. [15]. The E γ spectrum has three parametrically distinct kinematic regions: • Shape function (nonperturbative) region: m b This corresponds to the peak of the spectrum where the full shape of the shape function is relevant and the soft scale is fixed to the lowest still-perturbative scale This corresponds to the region left of the peak, i.e., the transition region between the peak and the far tail. Here the soft scale has canonical scaling µ S ∼ m B − 2E γ .
This corresponds to the far tail of the spectrum, which is described by fixed-order perturbation theory. Here, the distinction between singular and nonsingular becomes meaningless and the resummation must be turned off to ensure that singular and nonsingular contributions properly recombine into the correct fixed-order result. This requires that all scales become equal The canonical value for the hard scale in all regions is µ H ∼ m b . In the first two regions the SCET resummation is applicable with the canonical scaling µ J ∼ √ µ S µ H . To account for these different scale hierarchies we use the common approach of profile scales [32,40] The key advantage of using profile scales to connect the descriptions in the different parametric regions is that they provide a smooth transition between the regions that is solely implemented in terms of scale choices, such that the ambiguity in the precise choices of the transition are equivalent to scale ambiguities, which by construction are formally beyond the order one is working and reduce as we go to higher order.
For our numerical analysis we employ the profile scales used in ref. [15], given by where the function f θ (x) provides a smooth transition from (2.19) Figure 1 shows the absolute values of the singular and nonsingular contributions as well as the full result at NNLO (i.e. without resummation), which is used to pick the transition points E 1 and E 2 between the different parametric regions. For E γ E 1 = 2.2 GeV, the singular contributions clearly dominate, which also corresponds to the nonperturbative region. For E γ E 2 = 1.6 GeV, there are large cancellations between the singular and nonsingular contributions. This corresponds to the fixed-order region, where the separation into singular and nonsingular is ill-defined and only their sum is meaningful. Therefore, the resummation of the singular must be turned off to ensure that this cancellation is not spoiled by it and the correct full result is recovered. Since there is little space between E 1 and E 2 , the transition region in between them effectively coincides with the intermediate shape function OPE region.
To summarize we use the following values for the profile scale parameters [15]: For each parameter, the first value in the set is the central value and the next two are the variations that we will use to assess perturbative uncertainties in section 4. The central values for the profile scales along with their individual variation ranges are illustrated in figure 2. Since the nonsingular contributions are treated at fixed order, a priori there is no canonical scaling to guide the choice of the nonsingular scale µ ns beyond the fixed-order region E γ ≥ E 2 . In practice, it is picked as the geometric mean of the hard and central jet scales to account for the fact that the nonsingular terms have some sensitivity to scales below m b , and this choice is varied up to the hard and down to the central jet scales as shown in figure 2.
Note that the individual scales are not independent of each other but are parametrized in such a way that their relative hierarchies are preserved upon varying the profile parameters. For example, they all depend on µ H , such that varying µ H up and down (by varying e H ) simultaneously moves the other scales up and down accordingly. In particular all scales always merge into a common value for E γ ≤ E 2 to properly turn off the resum-mation. Similarly, µ J and µ ns depend on µ S , such that varying µ 0 not only varies µ S but also moves µ J and µ ns up and down accordingly to preserve the hierarchy between them. The variations for µ J and µ ns parametrized by e J and e ns correspond to small deviations from the default hierarchy. This also means that the µ 0 , e J , and e ns variations smoothly turn off between E 1 and E 2 like the resummation itself, such that below E 2 only the overall µ H variation remains corresponding to the usual fixed-order scale variation.

Shape function
The leading-power factorization theorem for B → X s γ in eq. (2.4) separates all soft dynamics into the soft function . This definition of S(ω, µ) is such that it has support for ω ≥ 0 [32]. The soft function contains both perturbative soft radiation as well as the nonperturbative Fermi motion of the b quark inside the B meson. Following ref. [32], we further factorize it as where the partonic soft function C 0 can be calculated in perturbation theory, while the shape function F(k) is a nonperturbative object. It has support for k ≥ 0 and peaks around k ∼ Λ QCD . In the tail region where ω Λ QCD , the right-hand side of eq. (2.22) can be expanded in powers of Λ QCD /ω, where M n ∼ Λ n QCD are the moments Hence, in the tail region the leading nonperturbative corrections are encoded in the first few nonperturbative moments of F(k), and its moment expansion recovers the local-OPE description of the spectrum. On the other hand, in the peak region where ω ∼ Λ QCD , the full function F(k) is needed.
As discussed in ref. [32], an important feature of the factorization in eq. (2.22) is that it provides a common description of the nonperturbative effects across these different kinematic regions, incorporating all available perturbative information in the limit ω Λ QCD without having to explicitly carry out an expansion in Λ QCD /ω, whose precise region of validity would be unclear. That is, all perturbative corrections to moments of S(ω, µ) are encoded in the perturbative function C 0 (ω), while the shape function F(k) is a genuinely nonperturbative function that is formally independent of the perturbative order and can be extracted from experimental data.
The first few moments of F(k) are given in terms of B-meson matrix elements of local HQET operators, The hadronic parameters λ 1 and ρ 1 are matrix elements of local HQET operators defined in a short-distance scheme as discussed in section 3.3. The ellipses denote terms that suppressed by relative powers of Λ QCD / m b arising from subleading shape functions that are typically absorbed into F(k) [15], but which are not relevant for our purposes here. A general method for parametrizing F(k) via a systematic expansion around a given base model has been developed in ref. [32], which was used in ref. [15] to fit F(k) from data. Since our primary interest in this paper are the perturbative corrections, the precise form of F(k) is not relevant here. We only need a reasonably realistic model for it in order to illustrate our results numerically. For this purpose, we take the exponential base model used in refs. [15,32], given by Its normalization and first moment are By picking λ ≈ m B − m b ≈ 0.6 GeV this base model already provides a good fit of the experimental measurements [15]. Note that evaluating the N n LO soft function in a short-distance scheme involves taking n derivatives of F(k). Therefore, at N 3 LL , which needs the N 3 LO soft function, and assuming integer p, we require p ≥ 4 to ensure that the soft function vanishes for ω → 0, which in turn is required for the E γ spectrum to vanish at the kinematic endpoint E γ → m B /2.

Short-distance schemes
A major challenge in B physics is to parametrize nonperturbative effects in such a way that their extraction from experimental measurements is stable with (and ideally independent of) the perturbative order. Achieving this stability is not trivial due to infrared sensitivity of the involved perturbative series and the resulting ambiguity in the asymptotic series of perturbative QCD, which is known as the renormalon problem. The renormalon problem manifests itself in practice as poor or no convergence of the perturbative series even at low orders. Since physical (measurable) quantities are independent of the perturbative order, the large perturbative corrections at each order are compensated by corresponding large changes in some associated nonperturbative parameter. In other words, the renormalon ambiguity in the perturbative series is compensated order by order by an equal and opposite renormalon ambiguity in the nonperturbative parameter.
Conceptually, to resolve this issue, the renormalon must be identified and subtracted from both the perturbative quantity (C) and the associated parameter (p), such that both become renormalon free and perturbatively stable. To give a simple toy example, On the left-hand side, the renormalon only cancels between C and p. On the right-hand side, the so-called residual term δp is a perturbative series in α s that contains the renormalon. Its specific choice defines a specific so-called short-distance scheme. The renormalon then cancels within each of the parenthesis defining the short-distance C and p, which are now separately free of the renormalon.

Soft function
In reality, the structure is of course more complicated than the above simple toy example. In our case, the leading-power perturbative series that suffers from renormalon ambiguities is that of the partonic soft function C 0 (ω, µ), whose renormalons are cancelled by the nonperturbative object F(k). Its leading renormalon ambiguity of O(Λ QCD ) is due to the pole mass definition of the b quark, m pole b , which explicitly enters in the definition of the soft function S(ω, µ) in eq. (2.21) and henceforth shows up in all the moments of F(k). (At subleading power, also the jet and hard functions involve the pole-mass renormalon, which we come back to in section 3.4.) Furthermore, the hadronic parameter λ 1 , which first appears in the second moment of F(k), has a subleading O(Λ 2 QCD ) renormalon ambiguity [41]. Similarly, we expect the hadronic parameter ρ 1 , which first appears in the third moment, to have an O(Λ 3 QCD ) renormalon. We write the parameters in a generic short-distance scheme as The residual terms δm b , δλ 1 , and δρ 1 are defined to cancel the renormalon in their respective parameter such that the short-distance parameters on the left-hand side are renormalon free. The original 1 HQET parameters λ 1 and ρ 1 are defined in dimensional regularization as [42] The construction of the short-distance partonic soft function C 0 (ω, µ) with the appropriate renormalon subtractions is derived in detail in ref. [32]. Up to N 3 LO, we have where the original pole-scheme C 0 (ω, µ) is defined and given in appendix A.4. Importantly, for the renormalons to cancel on the right-hand side, it must always be fully expanded to a given fixed order in α s , including the δm b , δλ 1 , δρ 1 and their products with each other and with C 0 . With these subtractions both C 0 (ω, µ) and F(k) become renormalon free, up to yet higher-order renormalons. In particular, the moments of F(k) are then given by the short-distance parameters m b , λ 1 , ρ 1 , as shown in eq. (2.25). [Note that F(k) ≡ F(k) in eq. (2.22) is already the short-distance parameter.] To evaluate the convolution integral C 0 ⊗ F, we use integration by parts to move all the derivatives in eq. (3.4) to act on F [32]. The renormalon subtractions significantly improve the perturbative convergence of the soft function compared to the pole scheme, which we demonstrate numerically in section 5.3. In general, the residual terms are scale dependent, leading to a similar scale dependence of the short-distance parameter, which we suppress for simplicity in our generic notation. The scale dependence can be explicit, as e.g. for the MS mass, in which case it is usually governed by an associated RGE. It can also be only internal, as e.g. for the 1S mass or the MSR mass, in which case δm b (and also m b ) is formally scale independent (with only the usual scale dependence from truncating the perturbative series that is cancelled by higher orders). In either case, the residual terms must be expanded at the same scale µ, i.e., in terms of the same α s (µ), that is used for the perturbative series whose renormalon is supposed to be subtracted to ensure that the renormalon actually cancels. In our case this is the soft scale µ S at which we evaluate the fixed-order boundary condition C 0 (ω, µ S ) in the factorization theorem in eq. (2.4).
In this context, the power counting of the HQET Lagrangian provides a powerful constraint on the generic size of the residual mass term δm b ,

5)
A suitable short-distance scheme for the bottom-quark mass should respect the power counting of residual soft momentum δm b ∼ α s v · D ∼ α s k ∼ α s Λ QCD [43]. Note that the perturbative series for δm b starts at NLO and therefore scales like α s . As stated in the previous section, in B → X s γ the residual momentum of the bottom quark in the B meson scales like k ∼ m B − 2E γ , which in the peak region scales like Λ QCD . Thus, in the peak region one expects that a low-scale mass scheme, such as the 1S scheme [29][30][31] or the MSR scheme [33] with R ∼ Λ QCD , are applicable. In the context of B → X s γ, the 1S mass scheme has been extensively discussed in ref. [32], and we remind the reader of the main features of the MSR scheme in the following section.

MSR mass scheme
The MSR mass scheme is a short-distance mass scheme designed to subtract the pole-mass renormalon by introducing an infrared cutoff scale R as follows, where a MSR n coefficients are determined from matching the MSR mass scheme onto the MS mass scheme at m b (m b ). The infrared scale R controls the size of self-energy contributions which are absorbed into the mass definition [33,44]. In our analysis we use the so-called "natural" MSR mass definition, where the series coefficients a MSR n = a MS n (n l , n h = 0) , see ref. [45] for more details.
The MSR mass is a natural extension of the MS mass for R ≤ m b (m b ), which interpolates between all short-distance schemes with residual power counting δm b (R) ∼ R α s .
one approaches the MS mass, whereas in the opposite limit, R → 0 , the MSR mass formally approaches the pole mass. In practice, when taking this limit one encounters the Landau pole of the coupling constant. This issue is deeply related to the pole mass renormalon and cannot be addressed unambiguously.
Values of the MSR mass at different R scales are related by the so-called R-evolution equation [33], whose solution resums logarithms ln(R 1 /R 0 ) in the perturbative correction between m MSR b (R 1 ) and m MSR b (R 0 ). The R evolution can be used to obtain the MSR mass value at a low R from the MS mass and vice versa.
In contrast to the 1S scheme, the infrared scale R of the MSR scheme is an external parameter. We exploit this feature and pick R = 1 GeV to be of the same order as the soft scale in the peak region. In section 5.3 we demonstrate that this ensures a proper cancellation of the renormalon in the soft function. In principle, it is possible to pick a different value for the R scale in different regions of the spectrum by introducing a profile function R(E γ ) [46], as long as the R evolution is consistently used to relate the shape functions at different R scales. This can be used to enforce R ∼ µ S over the whole spectrum and to eliminate logarithms ln(µ S /R) in the series of δm b (R). Ref. [40] implements this Revolution setup for an analogous soft function for the thrust distribution in jet production. In our case the use of R evolution does not lead to a significant improvement in convergence, so for simplicity in our numerical analysis we use a fixed value of R.

Short-distance schemes for λ 1 and ρ 1
To express the B-meson matrix elements λ 1 and ρ 1 in a short-distance scheme, we use analogous schemes to the "invisible" scheme for λ 1 [32], where δλ 1 ∝ α 2 s . The reason for this α 2 s scaling in the invisible scheme compared to the kinetic scheme [47,48], where δλ 1 ∝ α s , is that in the invisible scheme one employs Lorentz-invariant UV regulators for regularizing the kinetic energy operator [41,49], while in the kinetic scheme the regulator is not Lorentz-invariant. Indeed it has been shown in ref. [32] that using the kinetic scheme for λ 1 leads to an over-subtraction of the u = 1 renormalon in the soft function. It is worth to note here that there is no analogous study for the u = 3/2 renormalon present in ρ 1 . Following these features of the invisible scheme for λ 1 , we write where we set R λ = R ρ = 1 GeV by default. Note that λ 1 and ρ 1 appear in the second and third moments of the shape function and contain an O(Λ 2 QCD ) and O(Λ 3 QCD ) renormalon ambiguities, respectively. Therefore by dimensional analysis they must scale like δλ 1 ∝ R 2 λ and δρ 1 ∝ R 3 ρ . Furthermore, we take δρ 1 ∝ α 3 s to impose a plausible "invisibility" of our scheme choice and to avoid diluting the lower-order corrections arising from δm b and δλ 1 . For δλ (2) 1 we use the value given in ref. [32] for the invisible scheme, δλ 1 and δρ (3) 1 are not available so far. Usually, one defines a short-distance scheme to all orders by exploiting the perturbative series of some physical and thus renormalon-free quantity. However, this is not strictly necessary, since after all the main goal of the renormalon subtractions is to obtain a stable perturbative result. Thus, here we take a pragmatic approach and simply define our "invisible" scheme for λ 1 and ρ 1 at O(α 3 s ) by choosing numerical values for δλ 1 and δρ 1 such that the resulting soft function at different perturbative orders manifests good convergence, i.e. that the size of scale variations reduces when including higher-order corrections, that the resulting uncertainty bands at different orders have reasonable overlaps, that the peak position for the soft function remains stable, and finally that it remains positive at small k and approaches zero with similar slopes at different orders. Following this procedure, we find a satisfactory convergence for the soft function, see figure 11, by taking In principle, we could also consider the perturbative convergence of the final spectrum. However, since it also receives contributions from the jet and hard functions, the dependence on the soft function is washed out in the spectrum. Therefore, we use the convergence of the soft function to define the short-distance scheme. This is also the most natural, since the soft function is the object containing the renormalons to be subtracted. This is somewhat similar to the "shape-function" scheme in ref. [50], where the second and third moments of the perturbative shape function with some, largely arbitrary, hard cutoff is used to define the short-distance parameters. (Similarly, the short-distance b-quark mass is defined based on the first moment.) The disadvantage of that approach is that it yields δλ, δρ 1 ∼ α s , which leads to massive oversubtraction similar to the kinetic scheme.

Subleading δm b corrections
As discussed in section 3.1, the leading renormalon at leading power comes from the b-quark mass that enters via the argument of the soft function. In addition to the soft function, the b-quark mass also enters in the hard and jet functions through the SCET label momentum p − ∼ m b . By default, label momentum conservation sets p − = m pole b . Formally, choosing a different label p − = m b amounts to a power-suppressed effect. For this reason, in ref. [15] the resulting corrections from changing to the m b scheme could effectively be absorbed into the nonsingular corrections. As we will see, at N 3 LL this is no longer viable, so instead we will explicitly switch both hard and jet functions to a short-distance mass scheme.
To derive the scheme change, we consider the partonic function W (k) appearing in eq. (2.3), which can be either the singular, the nonsingular, or the full contribution. It has mass dimension −1 and depends on two dimensionful quantities, k and m b . Therefore, by dimensional analysis it must have the form where all dependence on m b is made explicit on the right-hand side and w(x, α s , L) is a scaleless function of its arguments. Since W (k) is defined to be µ independent, the µ dependence on the right-hand is only the internal µ dependence from α s (µ) which cancels order by order. Therefore, we can pick µ = m b , which eliminates all logarithms and allows us to track the associated m b dependence via the dependence on α s (m b ). After switching the scheme, we can easily reintroduce the µ dependence by reexpanding α s (m b ) in terms of α s (µ). The partonic rate in the pole scheme is given by eq.
we find for the singular ∆w s(1) 12) and the nonsingular ∆w ns (2) 77 (µ ns , x) (3.13) As we have seen above, there are three sources of δm b corrections: 1. Shifting the argument k → k + δm b .
2. Changing to m b in the argument k/ m b , which yields the rescaling x → x/(1 + δm b / m b ).
3. Changing to m b in the µ dependence of α s ( m b ).
Considering just the singular contributions and only keeping the first and neglecting the latter two corrections amounts to only keeping the leading-power terms in eq. (3.10), (3.14) The relevant formal power counting here is δm b / m b ∼ λ 1, d/dx ∼ 1/x ∼ λ −1 , so all terms on the right-hand side are leading power. These terms are exactly reproduced by the factorized result at fixed order by changing the soft function to the m b scheme, which involves the analogous shift of its argument. In the numerical implementation, the derivatives d/d( m b x) = d/dk are moved via integration by parts to act on the shape function F(k) so they count as 1/Λ QCD . Since δm b ∼ Λ QCD , we see again that all terms in eq. (3.14) are leading power, counting as (δm b d/dk) n ∼ 1.
All terms ∼ x n w s (x, α s ) are thus induced by the second source. By moving the derivatives to act onto the shape function, we see that they are explicitly power-suppressed by x and hence nonsingular. For this reason, they could be included as part of the nonsingular correction terms ∆w ns 77 , as was done in ref. [15]. In the singular contributions, the k/m b dependence only appears in logarithms which are factorized into the soft, jet, and hard functions, where m b corresponds to the large p − label momentum, which only appears in the hard and jet functions, while the soft function only depends on the small momentum k. Therefore, the associated correction terms can be reproduced by the leading-power factorized result by changing the m b dependence in the hard and jet functions to the short-distance m b .
Finally, the third source produces the last term in eq. (3.10), Since it starts at O(α 3 s ) it first appears at N 3 LL . Formally, this term is also subleading power, because δm b / m b ∼ λ. However, since there is no explicit kinematic suppression by x, the x dependence itself is still singular ∼ 1/x, involving δ(x) and logarithmic plus distributions. Hence, this term cannot simply be absorbed into the nonsingular contributions but must be properly accounted for in the resummed singular contribution. The m b dependence in the corresponding fixed-order ln(µ/ m b ) also corresponds to the p − label momentum. Therefore, to account for the distributional structure of this contribution and to resum it, we consistently switch the hard and jet functions to the short-distance mass m b . The details of this procedure are discussed in the following two subsections.

Hard function
The hard function in a short-distance scheme is obtained by writing the b-quark mass in the pole-scheme hard function in eq. (A.5) in terms of a short-distance mass and reexpanding the result strictly in powers of α s (µ). At N 3 LO we obtain where the coefficients ∆H (n) m are given by 3 , ∆H Similarly, the RGE for the hard function in a short-distance mass scheme is derived by rewriting the pole mass in eq. (A.6) in terms of a short-distance mass, 19) and the hard cusp anomalous dimension coefficients Γ H n = −2Γ n are defined in eq. (A.1). Note that the reexpansion in eqs. (3.16) and (3.18) must be performed in terms of the same α s (µ), such that h s ( m b , µ) in eq. (3.16) indeed satisfies the RGE in eq. (3.18).
We expect that the renormalon associated with the pole mass cancels in the perturbative series of the hard anomalous dimension. The cusp anomalous dimension is universal and arises in the evolution of many perturbative objects with Sudakov double logarithms that do not involve the b-quark mass at all. Therefore, the Γ H series cannot know about the pole-mass renormalon, so the cancellation must happen between γ H and ∆γ H . For this reason, it is important to consistently expand ∆γ H in powers of α s (µ) to the same order as γ H .
The all-order solution to the differential equation (3.18) can be written as where U H is the usual hard evolution factor given in eq. (A.10), and the correction factor ∆U H is given by Note that in general, the µ dependence of ∆γ H (µ) coming from δ (n) m (µ) can be more involved than for the usual anomalous dimension, such that the µ integral may have to be performed numerically. Using the MSR mass and up to N 3 LL, we can still perform the integral by employing an analytic approximation analogous to the one used for the K and η integrals in eq. (A.3). We find up to N 3 LL where r = α s (µ)/α s (µ H ) and At NNLL only the first line on the right-hand side of eq. (3.22) is kept.

Jet function
Similar to the hard function in the previous section, we define the jet function in a shortdistance scheme by expressing m pole b in terms of a short-distance mass m b . To this end, we start with the jet function in the pole scheme given in eq.
The expansion coefficients ∆J 2 , ∆J

Perturbative uncertainties
For our predictions we can distinguish perturbative and parametric uncertainties. Parametric uncertainties arise from the uncertainty in input parameters, such as C incl 7 , |V tb V * ts | 2 , m b , F(k). These are not considered in the following. We normalize our numerical results by dividing out the overall prefactor Γ 0 C incl 7 2 , so the associated uncertainties drop out. We also ignore the parametric uncertainties due to the shape function F(k) and m b , which do affect the shape of the spectrum, because we do not compare with experimental B → X s γ measurements, in which case they would be determined by the fit to the data.
Our primary focus is on the perturbative results and their perturbative uncertainties arising from missing higher-order corrections. For these there are various different sources, which fall into two categories: • Profile scale variations: We identify three sources of perturbative uncertainties that are estimated by a suitable set of variations of the profile scales discussed in section 2.4. The resummation uncertainty ∆ resum is obtained by taking the maximum envelope of all 27 simultaneous variations of the profile function parameters e H , e J , µ 0 , which corresponds to scale variations in the resummed singular contributions, together with corresponding correlated variations in the nonsingular contributions. Note also that despite its name, ∆ resum reduces to the overall fixed-order scale variation in the fixed-order region. The nonsingular uncertainty ∆ ns is determined by varying the e ns parameter, which determines the central value of the nonsingular scale in the resummation regions. The matching uncertainty ∆ match comes from varying the transition point E 1 , which marks the start of the transition region. These three sources are considered independent and are thus added in quadrature, For notational convenience, we use the symbol ⊕ to denote addition in quadrature, x ⊕ y ≡ x 2 + y 2 .
• Theory nuisance parameter variations: The uncertainties ∆ h 3 and ∆ c ns are estimated by varying the nuisance parameters h 3 and c ns k , respectively, within the ranges given in sections 2. The nuisance parameters at N 3 LL +N 3 LO(c k ) are introduced in such a way that the scale dependence cancels at this order, i.e., all terms that are predicted by scale dependence are correctly included. This means that at N 3 LL +N 3 LO(c k ) the profile scale variations estimate the uncertainty due to the missing next order N 4 LL +N 4 LO, while the nuisance parameters capture the uncertainty due to the missing O(α 3 s ) ingredients at N 3 LL +N 3 LO(c k ).
The total perturbative uncertainty is obtained by adding all sources in quadrature, The nuisance parameter uncertainties only contribute at the highest order N 3 LL +N 3 LO(c k ), while at lower orders we simply have ∆ total = ∆ profile . Note that in ref. [15] the perturbative uncertainty was estimated from profile scale variations by taking the maximum envelope of 3 5 = 243 profile scale variations corresponding to simultaneous variations of the above five profile scale parameters. Here we have refined the estimation procedure by separating conceptually different sources of perturbative uncertainties, which leads to an overall more consistent picture of the resulting uncertainties when including the new highest order at N 3 LL . In part, this becomes possible because we are now able to reexpand the fixed-order hard function against the product of the fixed-order jet and soft functions. We have also checked that the total ∆ profile estimated as described above is comparable in size to what we obtain in our setup by taking the maximum envelope of all 3 5 = 243 variations excluding a small set of obvious outliers.  Table 1. Numerical values of required input parameters.

Results
In this section, we present our numerical results for the photon energy spectrum. All numerical results are implemented and obtained with the SCETlib [51] library. Our default numerical setup is as follows. The values used for input parameters are summarized in table 1. The spectrum is always divided by the overall normalization factor Γ 0 C incl 7 2 , so its numerical value is not needed. For the shape-function model in eq. (2.26) we use λ = 0.6 GeV and p = 4 as the default settings. We illustrate the impact of changing p and λ later in this section. We neglect finite-charm-mass corrections and work in QCD with n f = 4 massless quarks. We always use the 4-loop running of α s , which is sufficient for resummation at N 3 LL. We use the MSR scheme for the b-quark mass and adopt shortdistance schemes for λ 1 and ρ 1 as discussed in section 3.3. The impact of the short-distance mass scheme is discussed in section 5.3. Throughout this section, the colored bands always show the perturbative uncertainties obtained from profile scale variations ∆ profile .

Main results
To begin, in figure 3, we show the contribution of the resummed singular corrections to the full result at NNLL (left panel) and N 3 LL (right panel). The resummed contribution  Figure 4. The B → X s γ spectrum at different perturbative orders. The E γ spectrum itself is shown on the left, while the relative differences to the highest-order central value are shown on the right. In the top row, we use the same value p = 4 at each order, while in the bottom row we use successive values at each order. The colored bands show the perturbative uncertainty estimated by just profile scale variations, ∆ profile . The dashed gray line includes in addition the uncertainty due to h 3 . The solid black line further includes in addition the nonsingular nuisance parameters, corresponding to the total perturbative uncertainty at N 3 LL +N 3 LO(c k ). is indeed dominant across the peak of the spectrum, while it decreases rapidly in the tail and eventually changes sign at E γ 1.8 GeV, where the resummation is getting turned off. Here, only the full matched result is meaningful, which remains positive and slowly approaches zero in the far tail.
Our main predictions for the B → X s γ photon energy spectrum at different perturbative orders are presented in figure 4. In addition to the colored bands showing ∆ profile , the gray dashed line shows ∆ profile ⊕ ∆ h 3 , and the black solid line shows ∆ total defined in eq. (4.2). The first column shows the results for the spectrum, and the second column shows the relative difference to the central value at the highest order, i.e. to the red solid line on the left. In the first row, we use the same value for the shape-function parameter p = 4 for all orders, whereas in the second row, we use different values for p. In figure 5 we show the breakdown of the relative perturbative uncertainties into the individual contributions at NNLL +NNLO and N 3 LL +N 3 LO(c k ).
We remind the reader that the choice p = 4 was needed to ensure that the spectrum  at the highest order N 3 LL still vanishes in the limit E γ → m B /2. In practice, when fitting to the experimental data, the fit will always fix the precise shape near the endpoint to that of the data irrespective of the perturbative order, while the perturbative differences get moved (at least partially) into the fit result for F. By using the same fixed model for F at each order, we can directly assess the perturbative convergence in the spectrum. However, by using a common p, the spectrum at lower orders vanishes correspondingly faster, i.e., quadratically at NNLL and cubically at NLL , which also affects to some extent the shape of the spectrum into the peak. Therefore, in the lower panels of figure 4 we also show an alternative order comparison, where we do change the model at each order by using successively lower values for p at the lower orders, such that the spectrum vanishes linearly at each order. The results in figures 4 and 5 manifest a good perturbative convergence, especially from NNLL +NNLO to N 3 LL +N 3 LO(c k ). The relative uncertainties are under good control over the entire E γ range, except at the very endpoint where the spectrum vanishes so the relative uncertainties necessarily blow up. Apart from the very endpoint, the total uncertainty at N 3 LL +N 3 LO(c k ) is at most 15% and in most of the E γ range well below that.
The uncertainties at N 3 LL +N 3 LO(c k ) are substantially reduced compared to NNLL +NNLO, even accounting for the fact that not all 3-loop perturbative ingredients are known. As expected, the higher-order uncertainties estimated from profile scale variations (∆ resum , ∆ ns , ∆ match ) are much reduced. Also recall that in the fixed-order tail ∆ resum turns into the usual fixed-order scale variation. The uncertainty ∆ h 3 is visible but subdominant. Across the entire peak of the spectrum, the uncertainty ∆ c ns from the missing 3-loop nonsingular corrections is at most comparable to the other sources thanks to the power suppression of the nonsingular. As expected, in the tail below E γ E 1 = 2.1 GeV it starts to take over and becomes the dominant uncertainty. This demonstrates that in our approach we are able to substantially benefit from the increased precision of the N 3 LL resummation even in the absence of the full N 3 LO result. Furthermore, in the fixed-order tail the total uncertainty dominated by ∆ c ns is still reduced compared to the scale-variation based estimate at NNLL +NNLO. As discussed at the end of section 2.3, this is anticipated and justified because of the additional nontrivial perturbative information included at N 3 LO(c k ). The effect of varying the shape function model parameters p and λ is illustrated in figure 6. As expected, the position and height of the peak depend on the model parameters. We can clearly see how the value of p controls how fast the spectrum vanishes toward the endpoint E γ → m B /2. The value of λ determines the width of shape function, which as a result controls the width of the peak. We stress that these results are just meant as an additional illustration. In particular, the differences seen in figure 6 are not to be interpreted as an additional theoretical uncertainty in the predictions. As mentioned before, the actual shape of F will be determined by fitting to the experimental data.

Different treatments of higher-order singular cross terms
When evaluating the singular correction using the factorization theorem in eq. (2.4) by default we expand the fixed-order boundary conditions of the hard, jet, and soft functions against each other. This is usually done, because it ensures that when switching off the resummation in the far-tail region and adding the fixed-order nonsingular correction, the fixed-order results are exactly reproduced. In ref. [15] it was observed that in the 1S scheme and up to NNLL +NNLO the perturbative convergence in the resummation region is substantially improved by keeping the hard function unexpanded as an overall factor. The main reason is that this ensures that the hard function does not affect the shape of the spectrum, which receives relatively large corrections from the jet and soft functions. The disadvantage is that this has the danger of generating unphysically large higher-order corrections in the fixed-order limit. In ref. [15] it was checked that this does not happen in the region of interest.
In this section, we study the differences in our perturbative setup between expanding the hard function against soft and jet functions (our default) vs. keeping it unexpanded.  Figure 7. B → X s γ spectrum with unexpanded hard function.
A priori, the additional higher-order terms induced by keeping the hard function unexpanded can easily spoil the delicate cancellation between singular and nonsingular contributions in the far tail of the spectrum. To fix this problem, we compensate for these higher-order terms by adding a constant term that cancels them in the fixed-order region but is only a power-suppressed correction in the peak region, where we subtract the singular with unexpanded hard function, W s,unexp 77,FO ( m b ) and add it back with expanded out hard function, W s,exp 77,FO ( m b ). Both of these terms are evaluated at fixed order and at k = m b , corresponding to the tail limit x = 1, so in the peak region they are indeed power suppressed. This prescription allows us to take advantage of keeping the hard function unexpanded in the peak region while avoiding unphysically large corrections from higher-order cross terms in the fixed-order region, since they are explicitly removed in the x → 1 limit.
The numerical results using this prescription are shown in figure 7, where we adopt the MSR scheme with short-distance schemes for λ 1 and ρ 1 . We compare these results to our default setup using the expanded hard function in figure 8. Here one can see that, overall, both scenarios lead to somewhat compatible results. Nevertheless, the choice of expanding the hard function or not clearly has a large impact at lower orders (which are expected to be more sensitive to the treatment of higher-order cross terms). Keeping the hard function unexpanded leads to a rather unnatural reduction of scale variations in the peak region of the spectrum. This behavior is quite dramatic at NNLL +NNLO, where the uncertainty band with an unexpanded hard function (the gray band) barely captures the central line from the results with expanded hard function (solid blue line), and its size is almost as large as the scale variations at N 3 LL +N 3 LO(c k ). This is also visible in figure 7, where the blue band suspiciously narrows down in the peak region of the spectrum and competes with the orange band. Based on this set of observations we conclude that in our setup keeping the hard function unexpanded is disfavored. Hence, for our final numerical results we opt for the conventional treatment of fully expanding the fixed-order boundary  conditions of the hard, jet, and soft functions against each other, since it yields an overall more consistent picture of perturbative uncertainties and convergence.

Impact of short-distance schemes: 1S vs. MSR mass schemes
As already explained in section 3 the right choice of short-distance scheme for the b-quark mass plays a key role in stabilizing the predictions at different orders in perturbation theory.
In this section we illustrate the numerical impact of expressing m b , λ 1 , and ρ 1 in short- distance schemes. Moreover, we show that the 1S mass scheme, which was successfully used in previous works [15,32] up to NNLL +NNLO, starts to break down at N 3 LL , while the MSR mass scheme yields convergent, stable results.
In the following figures we show our numerical predictions for the soft function (left column) and photon energy spectrum (right column) at different orders using the pole (figure 9), 1S ( figure 10 and figure 12), and MSR (figure 11) schemes. In these plots, we always use our default setup modulo the different short-distance schemes as indicated. For the soft function, we always show the combination S(k, µ 0 ) ⊗ U S (k, µ 0 , 1.3 GeV), i.e., we use the soft evolution kernel to evolve the soft function to a fixed scale µ = 1.3 GeV. In this way, the µ 0 dependence cancels up to higher order, so we can use our default µ 0 variations to estimate the perturbative uncertainties for the soft function.
In figure 9 we clearly see that the soft function in the pole scheme suffers from a sizable renormalon ambiguity, which is intrinsic to the pole scheme, leading to a large negative deep before the peak. Moreover, the peak position varies significantly from one order to another, which reflects the instabilities in the first moments of the shape function. All these features are also mirrored in the corresponding results for the spectrum. This behaviour of the pole scheme was already observed in ref. [32] up to NNLL , and we see that it continues to get worse at N 3 LL .
The predictions in the 1S mass scheme are presented in figure 10. Although the differential decay rate up to NNLL is somewhat stable, the prediction fails dramatically at N 3 LL . In particular, the uncertainty band from scale variation is completely out of control. Adopting short-distance schemes for λ 1 and ρ 1 slightly improves the spectrum up to NNLL +NNLO, but the picture at N 3 LL +N 3 LO(c k ) does not change. Neither does it seem to be possible to substantially improve the convergence by adjusting the values of the residual short-distance coefficients δλ   The results in the MSR scheme are presented in figure 11. In this case we observe much more stable results in comparison to the 1S scheme. Here in the first row, where we only switch the b-quark mass to the MSR scheme, we still observe that the spectrum is very sensitive to the behavior of the soft function at small k. This sensitivity is reflected in the uncertainty estimates from scale variation. By subtracting the subleading renormalons present in λ 1 and ρ 1 , we find a substantial improvement in the peak region of the spectrum, and the result exhibits excellent convergence between all orders (second row). From these results we can conclude that the MSR scheme is indeed a much more suitable mass scheme for the B → X s γ spectrum when going beyond NNLL .
To understand the reason for the breakdown of the 1S scheme at N 3 LL we recall the relation between the pole and 1S schemes up to the 3-loop order,  Figure 11. The soft function (left panel) and the B → X s γ spectrum (right panel) in the MSR mass scheme. The top panels use the pole scheme for λ 1 and ρ 1 , while the bottom panels use a short-distance scheme. The convergence is significantly better than in the other mass schemes, and is further improved by adoption of short-distance schemes for λ 1   The mismatch between the size of R 1S and µ S ∼ k not only breaks the power counting of the EFT description of the decay rate, but also spoils the renormalon subtraction in the soft function. To see this explicitly, we consider the conversion between the 1S and MSR b , which formally does not involve a renormalon. We obtain the following perturbative series for ∆m b (R = µ) at various scales, where ≡ 1 is an auxiliary parameter denoting the perturbative order of the corrections. The perturbative series in the first line shows a rather good convergence for ∆m b at the hard scale. However, in this regime the perturbative expansion for the 1S scheme contains large logarithms of the form ln(µ/R 1S (µ))| µ=4.7 GeV ∼ ln(4.7/1.36). These logarithms are suppressed by R 1S (µ = m 1S b )/m 1S b = 1.36/4.7 and are therefore harmless in the fixed-order expansion. For example, in the context of calculating the total decay rate, it is well-known that the 1S scheme provides a good description for the bottom mass.
We can also define a natural scale for the 1S scheme, which we denote as µ 1S , at which all logarithms of the form ln(µ 1S /R 1S (µ 1S )) are resummed. It corresponds to the fixed point of the R 1S scale where R 1S (µ 1S ) = µ 1S , which yields µ 1S = 1.93 GeV. The perturbative series for ∆m b (µ 1S ) is shown in the second line of eq. (5.4). It shows the same good convergence as the first line, but with overall smaller corrections and a change of sign in the O( 3 ) coefficient.
Finally, the last line in eq. (5.4) shows the perturbative series for ∆m b (µ S ) at the soft scale µ S = 1.3 GeV. As one can see, the resulting series exhibits no convergence and breaks down at the 3-loop order. This behavior vividly explains the failure of the 1S mass scheme when used at the soft scale to remove the renormalon in the soft function.
Another interesting conclusion from the discussion above is that one can retain the use of the 1S scheme as soon as the actual soft scale in the problem is roughly of the same order of R 1S = µ 1S ∼ 1.93 GeV. To examine this hypothesis, in figure 12 we show the results for the soft function and the photon energy spectrum in the 1S mass scheme where the soft scale is chosen to have larger values, µ S ∈ {2.2, 1.9, 3} GeV. Indeed the resulting spectrum exhibits a significant improvement at all orders compared to figure 10, and in particular the N 3 LL +N 3 LO(c k ) result is now much more well behaved. However, in practice this setup is not really ideal since the soft scale is now much larger than Λ QCD and our default soft scale, which leads to rather large unresummed logarithms in the soft function. Consequently, the results in figure 12 do not reach the same level of stability as those in the MSR scheme in figure 11.

Conclusions
In this paper, we presented the photon energy spectrum in B → X s γ at N 3 LL +N 3 LO(c k ), mediated by the electromagnetic operator in the weak Hamiltonian. For the singular contributions we used the SCET factorization theorem to resum large logarithms, which arise in the description of the spectrum close to the kinematic endpoint. We accounted for the complete soft and jet functions at N 3 LO and treated the unknown nonlogarithmic constant of the 3-loop hard function as a nuisance parameter. In addition, the RG evolution is performed at complete N 3 LL, taking advantage of the fully-known 3-loop anomalous dimensions. We matched the resummed predictions to fixed order at the formal N 3 LO order, developing a method that allows us to consistently include the known NNLO results, while parametrizing the missing nonsingular corrections at 3-loop order in terms of suitable theory nuisance parameters c k . The variation of these nuisance parameters provides an estimate of the uncertainty that arises from our ignorance of these missing terms.
We incorporated nonperturbative effects by convolving the partonic spectrum with a universal shape function. The first moments of the shape function depend on the b-quark mass m b and the HQET parameters λ 1 and ρ 1 . It is crucial to define these parameters in a suitable short-distance scheme to avoid renormalon ambiguities that would spoil the convergence of perturbative series. Another important aspect of our analysis was the implementation and choice of an appropriate short-distance scheme at N 3 LL . By using the MSR mass scheme for m b and adopting an analogous scheme to the "invisible" scheme for λ 1 and ρ 1 we are able to obtain stable predictions. We also find, quite unexpectedly, that the 1S mass scheme, which has been successfully used in the past for this process up to NNLL , starts to badly break down at N 3 LL +N 3 LO(c k ). We demonstrated that the reason for this sudden breakdown is that the 1S scheme is ultimately not designed for use at very low scales. This is because its built-in infrared cutoff scales with α s (µ) and thus increases at lower scales and quickly becomes too large when the soft scale is lower than µ 1S = 1.93 GeV, such that it effectively breaks the power counting of the underlying HQET.
Our main results are presented in section 5.1. Our final predictions for the photon energy spectrum in the MSR scheme and invisible schemes for λ 1 and ρ 1 exhibit excellent perturbative stability, in particular considering the rather low scales involved in the problem. In particular, we observe a substantial improvement of the perturbative theory uncertainties from NNLL +NNLO to N 3 LL +N 3 LO(c k ). Importantly, even without the complete fixed O(α 3 s ) information available, with our method we are able to benefit from the increased perturbative precision at N 3 LL , allowing us to improve the precision of the theory predictions across the entire phenomenologically important peak region. Indeed, the uncertainties due to the missing fixed-order results at O(α 3 s ) only become dominant in the fixed-order tail of the spectrum, which is phenomenologically less relevant. Nevertheless, their calculation is still encouraged and needed to further reduce the theory uncertainties.
In the future, we look forward to confronting our improved predictions with both existing and future experimental measurements, enabling more precise determinations of the shape function F(k), the b-quark mass, and the normalization of the B → X s γ rate.
performing the integrals after expanding the numerators in α s . For our purposes here, the numerical accuracy of the approximate analytic solutions is sufficient [61].
Following ref. [32], we define the plus distributions The jet function is normalized such that J (n) −1 = 1. Explicit expressions for the coefficients J (n) m up to 3-loop order in our notation are given in refs. [22,62]. In our numerical implementation, we do not need to explicitly solve the jet-function RGE, because we always evolve the hard and soft functions to the jet scale.