Two-loop non-leptonic penguin amplitude in QCD factorization

We complete the calculation of the QCD penguin amplitude at next-to-next-to-leading order in the QCD factorization approach to non-leptonic B-meson decays. This provides the last missing piece in the computation of the QCD correction to direct CP asymmetries at leading power in the heavy-quark expansion.


Introduction
Direct CP violation arises from the interference of amplitudes with different CP-violating (CKM) and rescattering phases. The values of the CKM parameters in the Standard Model (SM) imply that direct CP violating asymmetries are either very small or require the measurement of rare decays. In B-meson physics, rare decays to charmless final states provide the best opportunity to study such CP violation, when there is an interference of an amplitude generated primarily by a tree-level W -boson mediated process and a loop-induced b → Dg * (→ qq) (D = d, s) amplitude, called the QCD penguin amplitude.
The theoretical calculation of direct CP asymmetries is very challenging, since it is usually not possible to calculate the rescattering phases in a process involving hadrons. The best prospects are offered by charmless decays to two (pseudoscalar or vector) mesons, in which case the QCD factorization approach [1][2][3] provides a rigorous and systematic approximation to the non-leptonic decay amplitudes for the leading term in the heavy quark expansion. 1 The matrix element of the effective Hamiltonian operators Q i (to be specified below) responsible for the B → M 1 M 2 decay can be expressed as 2 in terms of non-perturbative B → M form factors F BM (0), light-cone distribution amplitudes (LCDAs) f M φ M (u), and perturbatively calculable hard-scattering kernels T I i (u), T II i (ω, v, u). Importantly, the rescattering phases are present only in the latter, hence direct CP violation can be calculated once the form factors and LCDAs are known. It also follows that direct CP asymmetries are either of O(α s ), since phases arise from loop contributions to the kernels above, or of next-to-leading power O(Λ/m b ), where Λ m b denotes the strong interaction scale, in the heavy-quark expansion. Since both parameters are O(1/10) it is a priori unclear whether the direct CP asymmetries in charmless decays are short-or long-distance dominated.
The calculation of the short-distance direct CP asymmetry has therefore been of longstanding interest. The first non-vanishing O(α s ) contribution has been known for a long time from the first QCD factorization calculations [1,3,7]. As usual, the next term in the α s expansion is needed to check the reliability of the expansion and to reduce theoretical 1 Larger direct CP asymmetries have been observed in three-body decays [4]. However, these are caused by interference with strong phases which are generated by long-distance, hadronic resonance physics. 2 The overall sign refers to the case of two pseudoscalar final-state mesons. The factor of 1/4 arises in relation to the corresponding equation in [5], since the operators in the effective Hamiltonian (2.5) below are now defined with an overall factor 4G F / √ 2 rather than G F / √ 2. Note that this factor was missing in the corresponding equation in [6]. scale uncertainties. In the present case of direct CP asymmetries, this implies the nextto-next-to-leading order (NNLO) O(α 2 s ) correction to both interfering amplitudes. The O(α 2 s ) correction to the spectator-scattering kernels T II i (ω, v, u) is somewhat simpler to compute, since it is a one-loop effect, and has been obtained in [8][9][10] and [11] for the tree and the QCD and electroweak penguin amplitudes, respectively. The NNLO calculation of the kernel T I i (u) of the form-factor term in (1.1) has been performed about ten years ago [5,12,13] for the tree-induced amplitudes T , C, but only a partial result is currently available for the QCD penguin amplitude, P , from 1) the one-loop matrix element of the chromomagnetic dipole operator Q 8g [14], and 2) the two-loop matrix elements of the current-current operators Q p 1,2 [6]. In the present paper we present the calculation of the last missing, and most difficult piece of the NNLO computation, the matrix elements of the penguin operators Q 3−6 in the effective weak Hamiltonian. Two-loop vertex and two-loop penguin diagrams contribute to these matrix elements, which we compute by extending previous results from [5,6,13,15]. The NNLO computation of the penguin amplitudes provides the basis for a comprehensive reanalysis of the phenomenology of all 130 final states of two pseudoscalar and/or vector mesons from the ground-state nonet [7,16]. This analysis is deferred to a separate study, while here we present only a numerical result of the penguin amplitude, that puts the new NNLO contribution into perspective.
The paper is organized as follows. In section 2 we lay out the theoretical framework and formulate the problem as a matching calculation of the matrix elements of operators from the effective weak Hamiltonian to a certain four-quark operator in soft-collinear effective theory (SCET), which factorizes into the product of the B → M form factor and the LCDA of the light meson at the matrix-element level. Section 3 provides some technical details of the two-loop computations in QCD and SCET, and the structure of the result for the hard-scattering kernel before and after the convolution with the Gegenbauer expansion of the light-meson LCDA. The numerical size of the NNLO correction and the residual renormalization scale dependence of the full QCD penguin amplitude is evaluated in section 4. We conclude in section 5. Two appendices summarize the renormalization constants for the operators from the effective weak Hamiltonian required for this work, and list numerical tables from which the convoluted kernels used in section 4 can be reconstructed, including their dependence on the internal charm-quark mass.
2 Theoretical framework

Penguin amplitude in QCD factorization
On the fundamental level, what is commonly called the QCD penguin amplitude in nonleptonic decays is generated by the loop-induced weak-interaction process b → Dg * , where D refers to a down or strange quark, followed by g * → q=u,d,s qq, where the q andq end up in different mesons in the final state. This basic process defined by the quark flavour configuration can be dressed by quark and gluon loops.
Following the notation introduced in [7], the charmless two-body decay matrix element of the effective weak Hamiltonian can be decomposed in terms of CKM structures λ (D) p = V * pD V pb and flavour operators as The term T p A accounts for the flavour topologies of the form-factor and spectator-scattering terms in (1.1), and T p B is reserved for the 1/m b suppressed weak annihilation amplitudes. The QCD penguin amplitude corresponds to whereq s denotes the spectator anti-quark in theB meson. The coefficient α p 4 (M 1 M 2 ) contains the dynamical information, while the arguments of A encode the flavour composition of the final state M 1 M 2 and hence determine the final states M 1 M 2 to which the QCD penguin amplitude can contribute.
The matrix element of T p A contains kinematical factors as well as the form factors and decay constants that appear in (1.1), as defined in [7,16], such that α p 4 (M 1 M 2 ) is a dimensionless number composed of the Wilson coefficients C i of the operators Q i and the convolutions of the hard-scattering kernels with the meson LCDAs. In the QCD factorization approach α p where the plus (minus) sign applies to the decays where M 1 is a pseudoscalar (vector) meson. We focus on a p 4 (M 1 M 2 ) in this paper, which is the only leading-power contribution in the heavy-quark expansion. The normalization of a p 4 (M 1 M 2 ) is such that in terms of the Wilson coefficients of the effective weak Hamiltonian given in the following subsection. We note that at tree-level a p 4 (M 1 M 2 ) is independent of the final state and real. However, in higher orders in α s , the QCD penguin coefficient in QCD factorization depends on the identity of the final-state mesons through their LCDAs and acquires an imaginary part from loop corrections to the hard-scattering kernels. In the following we describe the calculation of and provide results for the O(α 2 s ) correction to a p 4 (M 1 M 2 ).

Operator bases
The calculation is done in the framework of the effective weak Hamiltonian for b → D transitions, which is given by We adopt the CMM operator basis [17], where the current-current and QCD penguin operators are defined as Here the sums run over the five quark flavours q = u, d, s, c, b. The electroweak penguin operators Q 7−10 and the electromagnetic dipole operator Q 7γ are irrelevant for the QCD penguin amplitude. Our definition of the chromomagnetic dipole operator, corresponds to the sign convention iD µ = i∂ µ + g s A A µ T A , and m b denotes the bottom quark mass in the MS scheme at the scale µ.
In dimensional regularization the operator basis has to be supplemented by a set of evanescent operators. In the CMM basis the one-loop evanescent operators are defined as Any four-vector in SCET can be decomposed as where the two light-like vectors n µ ∓ which define the collinear and anti-collinear directions, satisfy n − n + = 2. Eq. (2.10) also defines the perpendicular component. Collinear (anticollinear) modes have momenta with large component n We denote the collinear and anti-collinear SCET fields by ξ and χ, respectively, see figure 1. It turns out that there is only one physical SCET operator in our problem. It has the fermion contraction (χχ)(ξh v ) and is given by where h v is the heavy-quark field in heavy-quark effective theory (HQET). In contrast, the diagrams relevant to the penguin amplitude a p 4 lead to operators where the fermion lines are contracted in a different Fierz ordering, (ξχ)(χh v ), and are therefore of the "wrong-insertion" type (see [5]). The corresponding wrong-insertion SCET operators are conveniently chosen as 3 where we will need n up to 4 (strings with seven γ matrices in each bilinear). The operators O n are evanescent for n > 1, while O 1 is Fierz-equivalent to O 1 /2 in four dimensions. We therefore add O 1 − O 1 /2 as another evanescent operator. In the equations above we omitted the Wilson lines necessary to make the SCET operators, which are non-local on the light-cone [8], gauge-invariant.

QCD matrix elements
The renormalized matrix element of a QCD operator Q i has the perturbative expansion Note that the O n are different from those for the colour-suppressed tree amplitude in [5].
Throughout the paper, α s ≡ α s (µ) denotes the five-flavour strong coupling in the MS scheme. Moreover, a superscript on any quantity G labels the order in α s according to (2.14) The A ( ) ia in (2.13) denote bare -loop on-shell matrix elements of QCD operators. For = 2 we only need matrix elements of the physical operators, but for < 2 the treelevel and one-loop matrix elements of evanescent operators are required in addition since in terms such as Z ja the sum over j includes the evanescent operators. For the matching procedure it turns out to be convenient to split up the amplitudes A ( ) ia further into factorizable and non-factorizable diagrams, see figure 2, (2.15) The renormalization factors Z ij , Z α , δm, and Z ext account for operator, coupling, mass, and wave-function renormalization, respectively. The coupling is renormalized in the MS scheme, whereas the mass and the fields are renormalized in the on-shell scheme. In the matrices Z ij the row index runs over 3 , E 4 }. They were computed in [18,19] but need to be adjusted to our operator basis. We give the matrices Z ij explicitly in appendix A.

SCET matrix elements and hard-scattering kernels
On the SCET side the matrix elements of O a have a simpler structure. Once dimensional regularization is used as infrared regulator the on-shell renormalization constants are equal to unity, and the bare matrix elements M ( ) ab are non-zero only for > 1. At two loops, only the diagrams with a massive charm quark-loop insertion into the gluon line contribute. Moreover, there is no mass counterterm for a HQET quark. One therefore obtains Hereα s denotes the strong coupling in the four-flavour theory. Since the non-local SCET operators depend on a variable, the renormalization factors Y ab = Y ab (u, u ) are functions of two variables and the product in (2.16) has to be interpreted as a convolution. In the procedure of determining the ultraviolet operator renormalization factors Y ab one has to regulate infrared divergences other than dimensionally. While the physical operator O 1 is minimally subtracted in the MS scheme, the evanescent operators are renormalized such that their matrix elements with a non-dimensional infrared regulator vanish, so that eventually after renormalization and matching they can be dropped in the evaluation of a physical decay amplitude. Once the Y ab are at hand, one can use (2.16) with dimensional regularization as infrared and ultraviolet regulator for the on-shell matrix elements of the O a .
The hard-scattering kernels T i of the operators Q i , i = {1p, 2p, 3-6, 8g} are then extracted by matching the QCD operators onto SCET, whose steps are explained in detail in [5] and shall not be repeated here. At the end of this procedure we obtain a master formula for the expansion of the wrong-insertion hard-scattering kernels, which is a generalization of the master formula given in [5] to the case when the tree-level matching of the Q i involves evanescent SCET operators. For the tree, one-loop and two-loop kernels, we obtain b1 . (2.20) The factor 1/2 on the left-hand side appears, since quantities such as A (n) i1 are defined as the coefficients of O 1 , which after renormalization is replaced by O 1 /2. Together with a factor of 4 from extracting the factor 1/4 in (1.1) this implies T To arrive at (2.20) we traded the four-flavour coupling for the five-flavour one by means of the D-dimensional relationα s = ξ −1 45 α s , where ξ 45 = 1 + O(α s ) is given explicitly in [20]. The matching coefficient C F F can be determined from matching calculations for the b → u transition [20][21][22][23] and reads (L = ln(µ 2 /m 2 b )) The terms A ( )f 31 denote -loop factorizable matrix elements of Q 3 of the right-insertion type [5], whose result is proportional to the tree-level matrix element of the physical SCET operator O 1 .
All except three terms in (2.18) -(2.20) have an analogue in the wrong-insertion master formula for the tree amplitudes, which was discussed at length in [5]. The last term in (2.19) and the last line in (2.20) are new. They stem from tree-level matrix elements of the Q i proportional to evanescent SCET operators, convoluted with SCET matrix elements and renormalization constants that describe the mixing of evanescent with physical SCET operators. Such non-vanishing tree-level SCET evanescent operator contributions can appear whenever the operator in H eff contains a fermion bilinear with more than one Dirac matrix as is the case for Q 5,6 and the evanescent QCD operators in (2.8), (2.9).

NNLO calculation 3.1 Operator insertions and diagrams
Several insertions of the operators from the effective weak Hamiltonian contribute to the computation of the penguin amplitudes at higher orders. They are depicted in figure 3. Whereas the current-current operators Q p 1,2 can be inserted in a single manner only (see first panel of figure 3), there exist several ways of inserting the QCD penguin operators 4 Note that this factor 1/2 when ever T ( ) i appears is missing in eqs. (7)-(9) in [6]. Keeping track of all these insertions leads to quite some bookkeeping during the calculation. The leading order (LO) and next-to-leading order (NLO) contributions to the QCD penguin amplitudes have been known since long [1,3,7]. Also the calculation of the NNLO O(α 2 s ) correction involving one-loop spectator scattering dates back more than a decade [11]. The first NNLO calculation of a vertex correction to the leading QCD penguin amplitudes was the one-loop O(α 2 s ) insertion of the chromomagnetic dipole operator Q 8g [14]. More recently the current-current operator contribution has been completed [6], and in the present work we compute the remaining insertions from figure 3 at NNLO.
There are in total more than a hundred diagrams that one has to compute at NNLO. Those of the "penguin-type" contractions are shown in figures 4 and 5. We found after explicit calculation that the sum of all diagrams in figure 5 vanishes for all operator insertions. In addition, one has to insert the QCD penguin operators Q 3−6 into the "tree-type" diagrams which are shown explicitly in section 5 of [2]. Finally, the one-loop diagrams for the chromomagnetic dipole operator Q 8g are depicted in figure 6.

Details of the two-loop calculation
To obtain the vertex kernels T I i (u) the quark matrix elements D(uq)q(ūq)q(p)|Q i |b(p b ) must be calculated at the two-loop order. Some general kinematic features can be seen p 2 = q 2 = 0. The quark that goes into meson M 2 carries momentum fraction u ∈ [0, 1] of q, the anti-quark the remaining fractionū ≡ 1 − u. There is a kinematic threshold at u = 4m 2 c /m 2 b in the charm-loop diagrams. The problem at hand is a genuine two-scale problem with dimensionless quantitiesū and where the infinitesimally small quantity η > 0 determines the sign of the analytic continuation. For later convenience we also define the following additional kinematic variables [15] The methods that we apply in the two-loop calculation have become standard in contemporary advanced multi-loop computations. We work in dimensional regularization with D = 4 − 2 , where ultraviolet and infrared (soft and collinear) divergences appear as poles in . We first apply a Passarino -Veltman [24] reduction to the tensor structure of the amplitude. The Dirac and colour algebra is then performed by means of in-house routines. Subsequently, the dimensionally regularized scalar integrals are reduced to master integrals using the Laporta algorithm [25,26] based on integration-by-parts (IBP) identities [27,28]. To this end we use the package FIRE [29] and an in-house routine. The master integrals that stem from the insertion of penguin operators into the "tree-type" diagrams are known analytically since long [5,12,13,30] and evaluate to harmonic poly-logarithms (HPLs) [31]. Those that come from the reduction of the diagrams in figures 4 and 5 were computed analytically in [15] in terms of iterated integrals over generalized weight functions.
In the notation of [15] the generalized HPLs are defined as For the weight functions we have f 0 (x) = 1/x, while for any expression with w = 0 we define Moreover it turns out to be convenient to define the linear combinations In our calculation, we encounter the following expressions for w, Finally there is one new master integral that stems from the one-loop diagrams in figure 6, whose analytic result reads where S Γ = 1/((4π) D/2 Γ(1 − )) and ζ 2 = π 2 /6.

Calculation of the counterterms
The diagrams shown in figures 4 -6 and their calculation discussed in the previous subsection refer to the term A In the following we remark on the remaining terms in (2.20) complementing the discussion of the corresponding terms in [5] for the calculation of the colour-allowed and coloursuppressed tree amplitudes.
We begin with the terms on the right-hand side of the expression (2.19) for the oneloop kernels T , is the counterterm that renormalizes the operator Q i . The implicit sum over j includes the evanescent operators in the effective weak Hamiltonian. The next term, A i1 arises, because the factorizable one-loop terms that are not part of the kernels but of the full QCD form factors correspond to right insertions of Q 3 , while the fermion lines of the Q i are contracted in the wrong-insertion order for the QCD penguin amplitude. The difference must be put into T may give finite contributions to the kernels. Once again we find that these contributions actually vanish as → 0. However, this no longer holds true at the two-loop order.
The counter-and subtraction terms needed to obtain finite two-loop hard-scattering kernels T (2) i in (2.20) are much more involved. The first two lines of (2.20) represent ultraviolet counterterms for the operators Q i , including external field as well as bottom and charm mass renormalization. The internal quark masses are taken to be the pole masses, hence δm (1) is the counterterm for the pole mass. There are no internal massive quark lines in the O(α s ) matrix element of the chromomagnetic dipole operator Q 8g . However, the mass parameter m b that appears in the definition (2.7) of the operator is understood to be the MS mass and the mass counterterm is not included in the anomalous dimension matrix from [18]. The conversion to the pole mass then implies that the same counterterm δm (1) is applied multiplicatively to the bare amplitude A is the only place where fermion-tadpole contractions of the four-quark operators survive in the counterterms. The relevant diagram is shown in figure 7. For the one-loop tadpole diagrams without the mass counterterm insertion there is a cancellation between the two diagrams obtained from attaching the gluon to the external fermion lines to the left and above the four-quark vertex. This cancellation does not occur for the mass counterterm tadpole diagram, since it only exists for the massive b-quark line. The non-vanishing tadpole contribution is divergent and required to make the final result for T (2) i finite. The corresponding tadpole contribution vanishes for the Q p 1,2 matrix elements due to the (V − A) ⊗ (V − A) structure of these operators. In the bare two-loop matrix elements almost all the tadpole contractions cancel (see figure 5), and only two diagrams give a net contribution (see last line of figure 4).
The remaining terms on the right-hand side of the two-loop master formula (2.20) represent generalizations of similar structures in (2.19) and account for i) the use of the full QCD rather than SCET B → M form factors, ii) the renormalization of the SCET operators, iii) non-vanishing two-loop SCET bare matrix elements due to massive internal charm loops, iv) the finite subtractions required to make the renormalized matrix elements of evanescent operators vanish.
As was the case in the calculation of the topological tree amplitudes, the expression It is advantageous to rearrange some of the terms in (2.20). To this end, we define the one-loop kernel without the terms from the SCET evanescent operator renormalization, Notice that T (1)F i must be computed to O( 2 ), since it multiplies the renormalization constant Y (1) 11 , which contains 1/ 2 poles. Hence, the term A i1 , which is O( ) cannot be dropped here. We then combine terms in (2.20) as follows: The combination of renormalization constants∆ was already computed in [5]. The quan-tityÊ appears only for the matrix elements of Q 5,6 , which have non-vanishing tree-level contributions A (0) ib proportional to the evanescent SCET operator O 2 . As discussed in [5], the advantage of defining∆,Ê is that while the individual terms in these expressions depend on the infrared regulator chosen to compute the non-minimal evanescent operator renormalization constants,∆,Ê are regulator-independent, as indeed the final result must be. By applying infrared rearrangements, in which the product of one-loop renormalization factors appears as subgraph of the two-loop factor,∆,Ê can be computed directly without the need of ever introducing an explicit infrared regulator.
With these remarks we provide explicit expressions for

Hard-scattering kernels
Although most of the individual terms in the master formulas (2.18) -(2.20) have poles in the dimensional regulator , the total expression for the hard-scattering kernels (HSK) T ( ) i must be free of poles in , which we checked analytically to O(α 2 s ) and for all i ∈ {1u, 2u, 1c, 2c, 3-6, 8g}.
The tree-level and one-loop HSK are known from [1]. However, the calculation was performed with another operator basis for the effective weak Hamiltonian [32], which is less suitable for NNLO calculations than the CMM basis [17]. For completeness, we therefore begin by summarizing the tree-level and one-loop HSK in the CMM basis and in the notation of the present paper.
At tree-level the HSK relevant for the penguin amplitude read The variable s is as in (3.1) and also z f = (m 2 f −iη)/m 2 b is as before. The penguin function G(z f ,ū) coincides with the corresponding function defined in [3], while t 1 (u) is related to the one-loop vertex function g(u) in that reference by t 1 (u) = g(u) − 22. One then has 1c , Note that the HSK must be calculated to higher orders in when they enter a term in the master formula at higher loop order, since they usually multiply terms that are divergent in . We refrain from giving terms beyond O( 0 ) here since they are straightforward to derive.
At O(α 2 s ) only the kernel T (2) 8g of the chromomagnetic dipole operator is of a length suitable for printing, and given by the expression The kernel T (2) 8g was previously calculated in [14]. Our expression (3.23) confirms this result. 5 The expressions for the two-loop penguin HSK of the operators Q p 1,2 , Q 3−6 are long and complicated, although they are available analytically as a linear combination of the generalized HPLs introduced in section 3.2. We provide the analytic expressions for all HSK at O(α 2 s ) electronically as supplementary material to the present article.

Convolution in Gegenbauer moments
The HSK enter the formula of the QCD penguin amplitudes via the convolution with the LCDA φ M (u) of the light meson. The LCDA is expanded into the eigenfunctions of the one-loop renormalization kernel, where a M n ≡ a M n (µ) and C (3/2) n (x) are the Gegenbauer moments and polynomials, respectively. We truncate the Gegenbauer expansion (3.26) at n = 2, which is sufficient in practice.
The convolution of those parts of the HSK that come from operator insertions into "tree-type" diagrams is performed in the same way as in [5,12,13] and can be done completely analytically. The terms that stem from "penguin-type" diagrams, however, require a different and more refined treatment. A method that is applicable to the majority of the terms is to trade u for s = 1 − 4z c /ū as integration variable, which results in s = r and s = +i∞ as integration limits for s, The threshold atū = 4z c is mapped to s = 0. The main advantage of this substitution is the ability to perform the integration over s in terms of the same iterated integrals as in section 3.2. Subtleties arise, however, when taking the limits s → r and s → +i∞ of the integral function. In case of the lower limit individual terms contain power divergences proportional to 1/(r − s) n . They can be isolated via a Taylor expansion about s = r of the corresponding numerators and disappear in the sum of all terms. Taking the upper 5 The MS heavy-quark masses are used in [14], whereas we employ pole masses. However, since T does not depend on quark masses, the scheme conversion does not affect the one-loop kernel (3.23). Note that if one does not convert the MS mass m b in the definition of Q 8g into the pole mass, the tree-level HSK T (1) 8g is proportional to m b /m b , where the pole mass in the denominator arises from on-shell kinematics. In order to arrive at the expression in the last line of (3.22), one must then express one mass definition in terms of the other. To obtain the correct result for T (2) 8g , the conversion must be done with one-loop accuracy. The one-loop correction contributes to T limit requires an argument inversion of the generalized HPLs. This is done recursively via the following formulas: The recursion ends at HPLs of weight one, where the explicit form of the argument inversion can be easily computed, e.g.
which holds for arbitrary values of 1/t on the positive, imaginary axis. In this way, logarithmic divergences contained in generalized HPLs as s → +i∞ are made explicit, as in The divergences that arise in individual terms as s → +i∞ have to cancel in the end as well. This procedure yields generalized HPLs up to weight five, which we evaluate numerically by means of the program GiNaC [33,34]. The complex number a in (3.28) is in principle arbitrary. The choice a = i in the argument inversion turns out to be convenient for the numerical evaluation.
There are a few terms to which the above procedure cannot be applied since the dependence on the kinematic variables is more involved, e.g. in products of HPLs of different u-dependent arguments. Fortunately, these terms are easy to integrate numerically and/or are free from kinematic thresholds. In the latter case we derive Mellin-Barnes (MB) representations for the integrals and perform the convolution over u analytically. The subsequent integration over the MB variables can be done numerically to high accuracy using MB.m [35].
In this way, we obtain all terms in the leading QCD penguin amplitudes that involve powers of L = ln(µ 2 /m 2 b ) completely analytically. In the L 0 pieces a few terms are obtained only as an interpolation in z c . We present all the expressions out of which the penguin amplitudes are built in the next section.
As an independent check we evaluated the convolution integrals for a fixed value of the charm-quark mass numerically, based on a grid of 230 points in u that was designed to capture the singular behavior of the hard-scattering kernels at the endpoints u → 0, 1 and at the charm threshold u = 1 − 4z c . We then fitted the integrands to a suitable ansatz in the singular regions, while we used interpolating functions in the regions where the integrands are smooth. In this way we compared our results for the convolution integrals for all considered Gegenbauer moments and 25 different values of the charm-quark mass, and found agreement between the two methods at the subpercent level, except for a few outliers that are due to large numerical cancellations.

Penguin amplitudes with NNLO accuracy 4.1 Analytic expressions
In the CMM basis the leading QCD penguin amplitudes a p 4 with p = u, c become, after expanding through to the second Gegenbauer moment, 1u + δ pc I (2) 1c 2u + δ pc I (2) 2c . of the light-meson LCDA. They are known analytically [3], 3) where we used the variable r = √ 1 − 4z c . The functions g 1 (r) and g 3 (r) can be found in (B.2). At O(α 2 s ), all terms involving L were also obtained in a fully analytic manner, while the L 0 terms are analytic only for I (2) 1u , I (2) 2u and I (2) 8g . For the L 0 term in I (2) 1c , I (2) 2c and I 3−6 we calculated numerical values for several hundred points in the interval 0.01 ≤ z c ≤ 1. We provide the data tables for these functions electronically in ancillary files, and in this write-up present fits which reproduce the data at the level of 5 per mille for all 0.01 ≤ z c ≤ 1. For the physical region 0.05 < z c < 0.2 the agreement is well below the per-mille level. All functions that contribute at O(α 2 s ) are collected in appendix B.

Numerical results
The convoluted kernels and fit coefficient tables allow for a fast evaluation of the two-loop penguin amplitude. In the following we discuss the size of the new correction and present for the first time the numerical value of the complete leading QCD penguin amplitude at NNLO. For the sake of comparison with previous partial results, in particular [6], we adopt the same values for the hadronic parameters as in that reference, which in turn are mostly the same as adopted in [5] for the evaluation of the topological tree amplitudes at NNLO in QCD factorization. The form-factor contribution to a p 4 depends only on the Gegenbauer moments of the light-meson LCDA. The spectator-scattering contribution further depends on form factors, the B-meson decay constant and LCDA, and light-quark masses.
The penguin amplitude coefficients a u,c 4 depend on the final state mesons through the hadronic parameters. We present the result for the πK final state: In both equations the first two lines originate from the form-factor term (kernels T I i in (1.1)). The third line is the spectator-scattering contribution (kernels T II i in (1.1)), which was computed to O(α 2 s ) in [11]. Spectator scattering has only a small effect on the penguin amplitude for r sp = 0.434 and we do not discuss it further.
The second line in (4.7), (4.8) represents the NNLO correction to the form-factor term, which is further split into the contribution from the current-current operators Q p 1,2 and the penguin operators Q 3−6,8g . The first reproduces the result from [6]. The second, [0.33 + 0.38i] P 2 , Q 3−6,8g , is the main result of this paper. Note that the penguin operators contribute equally to a u 4 and a c 4 . Comparing the new correction to the LO (unlabelled) and NLO (labelled V 1 and P 1 ) contribution, we see that the real part constitutes a (10 − 15)% correction relative to LO and is a sizeable fraction of the NLO term. The imaginary part is especially important as it is responsible for the existence of a direct CP asymmetry and vanishes at LO. In case of a u 4 (πK), the new contribution represents a −27% correction, and for a c 4 (πK) it reaches −54%. However, for both, the real and imaginary part there is a strong cancellation between the NNLO correction from the current-current operators Q p 1,2 and from the penguin operators Q 3−6,8g , resulting in a much reduced overall NNLO correction.
The numerical situation is illustrated in figure 8 in the complex a p 4 plane, which shows the LO, NLO and NNLO approximation to the QCD penguin amplitude, including the spectator-scattering term. At NNLO, we show in addition the previous result (labelled NNLO| Q 1,2 ) from [6] and for comparison the result when the current-current operator rather than the penguin operator contributions to the NNLO form-factor term are excluded (labelled NNLO| Q 3−6,8g ). That the full NNLO result lies between the two previous points illustrates the above mentioned cancellation. Note that the LO and NLO numerical values shown in the figure do not correspond precisely to the values of the LO and NLO terms in (4.7), (4.8). The reason is that in the NNLO result we employ Wilson coefficients  Figure 9: Scale dependence of a u 4 (πK) (upper row) and a c 4 (πK) (lower row). The spectator-scattering contribution is not included here. Real part to the left, imaginary part to the right. Line coding: full NNLO (solid, dark-grey/blue), NNLO| Q 3−6,8g (dash-dotted, dark-grey/blue), NNLO| Q 1,2 (dashed, dark-grey/blue), NLO (solid, light-grey/orange), LO (dashed, light-grey/orange).
C i in the CMM basis evolved to the bottom-quark mass scale with next-to-next-to-leading logarithmic accuracy, while in the NLO (LO) result we use only next-to-leading logarithmic (leading logarithmic) accuracy. Eqs. (4.7), (4.8), on the other hand, represents a split-up of the NNLO expression into terms of different orders in α s , but always employing next-to-next-to-leading logarithmic Wilson coefficients. As a consequence the full NNLO result is even closer to the NLO result than the cancellations in (4.7), (4.8) suggest. 6 The natural renormalization scale of the form factor term is the bottom-quark mass scale. In figure 9 we display the residual renormalization scale dependence of the formfactor contribution to a p 4 . The various lines refer to the approximations of figure 8 as explained in the caption, with the solid, dark-grey (blue) line representing the full NNLO result. At this order the scale dependence is negligible, especially for µ > 4 GeV. The last line in (4.7) and (4.8) provides an estimate of the theoretical uncertainty of the entire QCD penguin amplitude coefficient a p 4 , which is also shown in figure 8. Figure 8 also displays the uncertainty at LO and NLO for comparison. Somewhat surprisingly, the uncertainty is larger at NNLO than at NLO. Most of this effect is caused by the NNLO O(α 2 s ) correction to spectator scattering, and hence was present already in [11]. The total uncertainty arises to a large part from the uncertainty in hadronic input parameters, which cannot be reduced by perturbative calculations. The dominant hadronic uncertainties are due to the second Gegenbauer moment of the light-meson LCDAs and, despite the smallness of the spectator-scattering contribution, the inverse moment λ B of the B-meson LCDA. We note, however, that the two-loop correction to the form-factor term calculated in the present paper roughly doubles the sensitivity to the first two Gegenbauer moments of the light-meson LCDA. Similarly, the sensitivity of Im [a c 4 ] FF to the value of the charm-quark mass is larger in the NNLO approximation than at NLO.

Conclusion
In this paper we computed the remaining missing piece to the leading QCD penguin amplitude at NNLO in QCD factorization from the two-loop matrix elements of the penguin operators Q 3−6 in the effective weak Hamiltonian. The new contribution is sizeable and cancels in part the previously known current-current operator contribution. With this result, direct CP asymmetries in charmless B decays, which are largely determined by the interference of the QCD penguin amplitude with the topological tree amplitudes, can for the first time be calculated including the first QCD correction. The Belle II experiment is expected to measure many charmless B decays with unprecedented precision in the near future, motivating a reconsideration of the analysis of complete sets of charmless final states with pseudoscalar and/or vector mesons in the framework of QCD factorization. Such analyses can now be performed with NNLO perturbative calculations.

A Renormalization constants
Here we list the matrices Z ij needed for operator renormalization. The row index runs over i = {Q p 1 , Q p 2 , Q 3−6 , Q 8g }, while the column index j labels 3 , E (1) 4 3 , E (2) 4 The matrices Z ij were computed in [18,19], but need to be adjusted to our operator basis since our definition of Q 8g differs from that in [18,19]. We split up the matrices into several blocks, which in our operator basis, for n f = 5 quark flavours and T f = 1/2 read Moreover, we define the following combinations of functions g 0 (r) = H w − 1 (r) + 2 ln(2) = − ln(z c ) , g 1 (r) = H w + 1 (r) − iπ = ln 1 + r 1 − r − iπ = ln r + 1 r − 1 , g 2 (r) = ψ (1) 1 6 − 2π 2 , g 3 (r) = H w + 1 ,w + 1 (r) − iπ g 1 (r) + π 2 2 = 1 2 ln 2 1 + r 1 − r − iπ g 1 (r) + π 2 2 , g 4 (r) = H w − While all terms involving L were obtained in a fully analytic manner, the L 0 terms are available in analytic form only for I (2) 1u , I 2u and I (2) 8g . For the L 0 term in I (2) 1c , I 2c and I (2) 3−6 we calculated numerical values for 318 points in the interval 0.01 ≤ z c ≤ 1. We provide the data tables for these functions electronically in ancillary files. In the following write-up we present fits which reproduce the data at the level of 5 per mille for all 0.01 ≤ z c ≤ 1. For the physical region 0.05 < z c < 0.2 the agreement is well below the per-mille level. We construct this fit by making the following ansatz for the zeroth, first and second Gegenbauer moment, respectively: respectively. The fitted coefficients a 118.7297335 Table 6: Coefficients of the fit functions F (6,j) for j = 0, 1, 2.