Quark mass effects in double parton distributions

Double parton distributions can be computed from the perturbative splitting of one parton into two if the distance between the two observed partons is small. We develop schemes to take into account quark mass effects in this computation, and we study these schemes numerically at leading order in the strong coupling. Furthermore, we investigate in detail the structure of the next-to-leading order corrections to the splitting kernels that include quark mass effects.


Introduction
To make the best physics use of measurements at the Large Hadron Collider, it is important to understand the strong-interaction dynamics of hadron collisions to the best degree possible. Among many other things, this motivates the study of double parton scattering (DPS), where in a single proton-proton collision two partons in each proton undergo a hard scattering. Generically, the importance of this mechanism increases with collision energy, so that it is relevant for future hadron colliders at least as much as for the LHC.
Double parton distributions (DPDs) are the nonperturbative ingredients in the calculation of DPS cross sections. A corresponding factorisation formalism has been developed in [13] and underlies the present work. In this formalism, a DPD depends on the momentum fractions x 1 and x 2 of the two partons extracted from a proton, and on the transverse distance y between these two partons. It also depends on the renormalisation scales µ 1 and µ 2 associated with each parton. We assume them to be equal throughout this work, since taking µ 1 ̸ = µ 2 would not add much to our discussion. We will limit ourselves to DPDs without colour correlations between the two partons; an extension of our results to the colour correlated case should be possible but would require additional work.
Consider a DPS process with the two hard scatterings taking place at a scale µ. The DPDs of the two colliding protons enter the factorisation formula in the form of a double parton luminosity d 2 y θ(y − y min ) F a 1 a 2 (x 1a , x 2a , y; µ) F b 1 b 2 (x 1b , x 2b , y; µ) , (1.1) where a i and b i label the species and polarisation of the scattering partons. The lower cutoff y min in the integration over y is to be taken of order 1/µ. In the formalism of [13], a subtraction term depending on the same cutoff appears in the overall cross section and removes double counting between the cross sections for single and for double parton scattering. The dependence on y min cancels in the sum of terms, up to higher-order corrections beyond the accuracy of the calculation. It important to understand which distances y are most relevant in the integral (1.1). In the limit of perturbatively small y, one may decompose the DPD as F = F spl + F intr . The first term represents the case where the two extracted partons originate from a single parton, which can be computed in terms of a perturbative splitting kernel and an ordinary parton distribution (PDF). The second term represents the "intrinsic" two-parton content of the proton and can be expressed in terms of a twist-four distribution. In fixed-order perturbation theory, one obtains a power behaviour F spl ∼ y −2 and F intr ∼ y 0 at small y. This suggests that the double parton luminosity (1.1) should be strongly dominated by the region where y is close to y min . In the overall cross section, however, the contribution from this region is largely cancelled by the subtraction term just mentioned. Furthermore, it was found in [13] that the stated power behaviour can be substantially flattened by evolution from the scale ∼ 1/y at which a fixed-order calculation of F spl is reliable to the scale µ of the hard scatterings. Finally, for some parton combinations, F spl is suppressed compared with F intr by α 2 s rather than by α s . In summary, the relevant y region in a cross section with hard scale µ extends to values substantially larger than 1/µ, with details depending on kinematics and the parton channels. This region may include both perturbative distances, where one has a natural decomposition F = F spl + F intr , and nonperturbatively large y, where one must resort to modelling F . Guidance for such modelling may for instance be obtained from quark model studies [24][25][26][27][28][29][30][31][32][33][34], from calculations in lattice QCD [35,36], or from the sum rules that DPDs must obey [1,[37][38][39][40][41].
Heavy-quark masses play a quite nontrivial role in the computation of DPS. The number n F of active quark flavours in a double parton luminosity depends on µ and on the particular scheme used to compute the hard-scattering cross sections. If one has for instance m b ≪ µ ≪ m t , then it is appropriate to use DPDs with n F = 5 active flavours, and m b and m c can be neglected in the hard scattering. However, as explained above, the relevant y range in the double parton luminosity may include regions where y is comparable to 1/m b or 1/m c . In these regions, computing F spl with massless charm and bottom quarks is a poor approximation. This calls for a more realistic scheme for evaluating F spl across the relevant range of y in the perturbative region. To develop and assess such schemes is the purpose of the present work.
There is a certain similarity between the problem just stated and the role of n F in the computation of transverse-momentum dependent single-parton distributions (TMDs). Taken in impact parameter space, a TMD f (x, b; µ) depends on a transverse distance just like F (x 1 , x 2 , y; µ), and at small impact parameter b the TMD can be computed in terms of a perturbative matching kernel and a PDF. The treatment of heavy-quark masses in this case was investigated in detail in [47]. An important difference with the present case is that the relevant distances of b in a TMD cross section are of order 1/q T , where q T is a measured transverse momentum, whereas there is no such simple relation between the process kinematics and the relevant distances y in DPS.
This paper is structured as follows. In section 2, we recall some theory results that will be needed in our work. In section 3, we present general schemes for treating heavy flavours in splitting DPDs. In section 4 these schemes are studied numerically at leading order (LO) in α s . In sections 5 and 6 we analyse the structure of massive splitting kernels at next-to-leading order (NLO) in α s , first for the case of a single heavy quark and then for the case of charm and bottom. In section 7, we show how the number and momentum sum rules for DPDs imply corresponding sum rules for the massive splitting kernels. This is used in section 8 to construct a model ansatz for these kernels at NLO. Our main results are summarised in section 9. Additional numerical examples are given in appendix A, and technical material relevant to the model in section 8 can be found in appendix B.

Theory background
In this section, we set up our notation and recall basic results about scale dependence, flavour matching, and the splitting mechanism for DPDs.
To denote PDFs and DPDs for n F active flavours, we respectively write f n F a 1 (x; µ) and F n F a 1 a 2 (x 1 , x 2 , y; µ), where a 1 and a 2 specify the flavour and polarisation of a parton. A generic massive quark flavour is denoted by Q, and light quark flavours by q or q ′ .
We consider only colour-singlet DPDs in this work. Unless stated otherwise, our arguments apply to both unpolarised and polarised partons. For ease of notation, we will in general give explicit relations for the unpolarised case and note how they generalise to the polarised one. Notice that for transverse quark or linear gluon polarisation, DPDs carry one or more Lorentz indices and depend on the transverse vector y rather than its length y. For ease of writing, we will not indicate this explicitly.
We will explicitly indicate sums over flavour labels, where it is understood that the flavour sums run over the active flavours of the quantities being summed. We use the convention that f n F a 1 = 0 , F n F a 1 a 2 = 0 , P n F a 1 a 0 = 0 , V n F a 1 a 2 ,a 0 = 0 , U n F a 1 a 2 ,a 0 = 0 if a 0 , a 1 , or a 2 is heavier than the n F active flavours, (2.1) where P denotes the DGLAP evolution kernels and the kernels V and U are introduced in sections 2.3 and 7, respectively. For kernels involving n F light flavours and a heavy flavour Q, we set where A Q is the flavour matching kernel for parton distributions (section 2.2) and V Q the massive DPD splitting kernel defined in section 3.1.
In general, kernels like P n F of A Q,n F have an explicit n F dependence beyond the conditions (2.1) and (2.2) on their flavour indices. This dependence is absent in some channels at low perturbative orders. When this is the case, the superscript n F is omitted, i.e. its absence signals that the corresponding quantity is n F independent.
Due to charge conjugation invariance, all kernels in (2.1) and (2.2) remain the same if one changes quarks to antiquarks and vice versa. Furthermore, permutation symmetry implies F a 1 a 2 (x 1 , x 2 , y) = F a 2 a 1 (x 2 , x 1 , y) and corresponding relations for the kernels V and U . We will give explicit results only for channels that are independent w.r.t. these symmetry operations.

Scale dependence
Throughout this work, we assume that PDFs and DPDs are renormalised by the MS prescription for all of the n F active flavours, regardless of whether a flavour is considered as massive or massless. The corresponding DGLAP evolution kernels are denoted by P n F a 1 a 0 . For PDFs, we then have d d log where ⊗ denotes the usual Mellin convolution. In the factors of a convolution product we omit the momentum fractions that are integrated over, and it is understood that the product depends on one momentum fraction (which can be deduced from the context). As mentioned in the introduction, we consider DPDs with the same factorisation scale µ for both partons. The evolution equation then reads d d log µ 2 F n F a 1 a 2 (x 1 , x 2 , y; µ) = with separate Mellin convolutions P ⊗ 1 for each parton momentum fraction. It is understood that convolution products with a subscript 1 or 2 depend on two momentum fractions (x 1 and x 2 in the present case). We write the perturbative expansion of the evolution kernels as P n F a 1 a 0 (z; µ) = ∞ k=0 a n F s (µ) k+1 P n F (k) a 1 a 0 (z) , (2.6) where a n F s (µ) = α n F s (µ) 2π (2.7) and α n F s is the strong coupling constant for n F flavours renormalised in the MS scheme. The scale dependence of the coupling reads d d log µ 2 a n F s (µ) = β n F a n F s (µ) 2π (2.8) with β n F (a n F s ) 2π = − β n F 0 2 (a n F s ) 2 + O (a n F s ) 3 (2.9) and β n F 0 =

Flavour matching
We refer to changing the number of active flavours as "flavour matching". When matching from n F to n F + 1 active flavours, we call the first n F flavours "light" and the (n F + 1)st flavour Q "heavy".
The matching relation for the strong coupling is a n F s (µ) = ∞ k=0 a n F +1 where m Q is the mass of the heavy quark. The LO and NLO coefficients are n F independent and read (2.14) One readily verifies that (2.13) is consistent with the scale dependence of a s (µ) given by equations (2.8) to (2.10). The higher-order coefficients A n F (2) α and A n F (3) α can be found in [48]. The matching relation for PDFs reads a 1 a 0 (z, m Q ; µ) . (2.16) As is customary, we expand the matching kernels A Q,n F in a n F +1 s rather than in a n F s . The matching coefficients in (2.16) contain explicit logarithms: At LO, the matching kernels have the simple form A Q(0) a 1 a 0 (z) = δ n F a 1 l δ a 1 a 0 δ(1 − z) , (2.18) where δ n F al = 1 if a is one of the n F light flavours 0 otherwise (2.19) enforces the condition (2.2). The NLO matching kernels are n F independent and can be written as A Q(1) a 1 a 0 (z, m Q ; µ) = P n F +1(0) a 1 a 0 (z) − P n F (0) a 1 a 0 (z) log where we recall the convention specified in equation (2.1). This implies A Q(1) a 1 a 0 = 0 if a 1 or a 0 is a light quark or antiquark. In the other channels, we have for unpolarised partons, as well as  [49][50][51][52]. They are all independent of n F . Flavour matching for DPDs proceeds in full analogy to the PDF case and reads (2.25) One way to see this is to rewrite the matching equation (2.15) for PDFs as a matching relation between the twist-two operators that define PDFs with n F + 1 or n F active flavours. DPDs are defined in terms of the same twist-two operators, containing the product of an operator for parton a 1 and an operator for parton a 2 at relative transverse distance y. This readily implies the matching relation (2.25) as a generalisation of (2.15), just as the renormalisation group equation for the twist-two operators implies the evolution equation (

DPDs at short distance
As we just mentioned, DPDs are defined as the matrix elements of the product between two twist-two operators that are separated by a distance y in the transverse plane. In the limit where this distance y is much smaller than a typical hadronic scale, one can perform an operator product expansion and thus express DPDs in terms of short-distance coefficients and matrix elements of single operators with definite twist. The leading operators have twist two, and their matrix elements are PDFs. This corresponds to the first term in the decomposition F = F spl + F intr we already mentioned in the introduction. From now on, we focus on this term and omit the superscript "spl" for brevity. A detailed account of its properties can be found in [53,54], where the associated short-distance coefficients (called "DPD splitting kernels") are computed up to NLO for unpolarised massless partons. The general form of the splitting formula for DPDs with n F massless flavours reads is a generalised Mellin convolution depending on two momentum fractions. A useful relation is where D depends on two momentum fractions and C and f depend on one momentum fraction. The DPD splitting kernels can be expanded in the strong coupling as with coefficients of the form Here we have introduced the mass scale corresponding to the distance y, The LO splitting kernels are n F independent and have the kinematic constraint where the function V on the right-hand side depends on only one momentum fraction. With this constraint, the general form (2.26) turns into The functions V (1) a 1 a 2 ,a 0 (z) are equal to the LO DGLAP kernels P (0) a 1 a 0 (z) with all plus-distributions and δ(1 − z) terms removed. The structure of the DPD splitting kernels at NLO (i.e. at order a 2 s ) will be discussed at the beginning of section 5. The relations from (2.26) to (2.33) hold if the partons a 1 and a 2 are unpolarised or polarised.

Schemes for treating heavy quarks in splitting DPDs
As discussed in the introduction, the computation of a DPS cross section requires the double parton luminosities (1.1), which involve DPDs for all distances y above a value of order 1/µ, where µ is the typical scale of the hard-scattering processes. The DPDs need to be evolved to this scale, starting from a scale at which they can either be computed (for small y) or modelled (for large y). In the present section, we discuss how to treat massive quarks in different regions of small y.
Before doing so, let us briefly discuss the transition from small to large y. For small y and massless partons, one should compute the DPDs at a scale proportional to µ y given in (2.31), since this avoids large logarithms in the coefficients (2.30) of the splitting formula (2.26). For large y, it is natural to model the DPDs at a scale in the GeV region, which must be large enough to use the perturbative expansion of the DGLAP kernels when evolving to higher scales. To interpolate between the two regimes, one may take the starting scale for DPD evolution proportional to which tends to µ y for small y and to µ min = b 0 /y max for large y. In the numerical study of section 4, we will take µ min = 1 GeV and the functional form With p = 2, this function is widely used in the phenomenology of transverse-momentum dependent parton distributions. We will instead take the form with p = 4, which tends more rapidly to µ y for small y. A rather similar function was used in [55].

Splitting kernels including mass effects
The DPD splitting formula (2.26) is applicable if all quark masses in the perturbative splitting process can be neglected. The corresponding LO kernels are given in reference [6] for all polarisations, and the unpolarised NLO kernels can be found in [53,54]. This version of the splitting formula is appropriate if the characteristic mass scale µ y = b 0 /µ of the splitting is much bigger than the masses of the active quark flavours in the DPD. If µ y is much larger than the masses of the first n F quark flavours but similar in size to the mass m Q of the n F + 1st flavour, then the latter should be treated as massive in the computation of the 1 → 2 splitting kernels V . In this case, we can use the splitting formula with a perturbative expansion The label n F on V Q indicates the number of flavours that are treated as massless. Notice that only the n F light quark flavours are taken as active in the PDF on the r.h.s. of (3.3), i.e. the heavy flavour Q only appears in the splitting kernel. In analogy to the flavour matching kernels (2.16), we expand the massive splitting kernels in a n F +1 s . At LO one finds only one channel where heavy quarks can be produced by the splitting, namely g → QQ. The corresponding kernels can readily be obtained by extending the calculation in section 5.2.2 of reference [6]. We find where K i (w) denotes the modified Bessel functions of the second kind. Kernels with exactly one observed heavy flavour are zero at this order, whereas kernels with only light flavours are equal to their massless counterparts, if a 1 and a 2 are light. (3.6) The situation at NLO is significantly more involved and will be analysed in section 5. Consider now the unpolarised g → QQ kernel from (3.5) in the limits µ y ≪ m Q and µ y ≫ m Q . In the first case -which corresponds to y ≫ 1/m Q -the massive kernel vanishes exponentially, so that the production of heavy quarks is strongly suppressed. In the second case -which corresponds to y ≪ 1/m Q -one obtains Analogous limiting expressions hold for polarised quarks, except that for transverse polarisation the corrections to the small y limit are of order (y m Q ) 2 log(y m Q ). As one expects, the massive kernels approach the massless ones for µ y ≫ m Q , with power corrections in the small parameter y m Q . Notice that the massless LO kernel on the right-hand side of (3.8) does not depend on the number of active flavours, but the coupling a s it is multiplied with in the perturbative expansion (2.29) does.
Two heavy flavours If µ y is comparable in size to both m c and m b , one may want to treat both charm and bottom quarks as massive when computing the 1 → 2 splitting. The DPD with five active flavours is then given by with the perturbative expansion For brevity, we do not indicate that V cb assumes the presence of 3 light quark flavours. At LO, the only nonzero kernels for observed heavy flavours are and their polarised counterparts, whereas for observed light flavours one has V cb(1)

Schemes for one massive flavour
In this subsection, we consider a setting with n F light flavours and one heavy flavour Q. The DPD for n F + 1 active flavours is characterised by three mass scales: the scale Λ of nonperturbative interactions, the scale µ y associated with the distance between the two observed partons, and the mass of the heavy quark, which satisfies m Q ≫ Λ. The DPD factorises in different ways depending on the size of µ y .
For µ y ∼ Λ, the dynamics for the production of light flavours is purely nonperturbative and encoded in the n F flavour DPD F n F . The DPD for n F + 1 active flavours, including the production of heavy quarks, is then given by flavour matching for each of the two partons and given by equation (2.25).
For µ y ≫ Λ, the dynamics at nonperturbative scales is contained in PDFs for the n F light flavours, and we can distinguish three different factorisation regimes.
1. For µ y ≪ m Q , the dynamics at scales ∼ µ y is contained in the DPD splitting kernel V n F that describes the transition from the parton in the PDF to two light partons in an n F flavour DPD. The dynamics at scales ∼ m Q is contained in the flavour matching kernels for the transition from F n F to F n F +1 . This corresponds to factorised graphs as shown in figure 1(a).
2. For µ y ∼ m Q , splitting and heavy-quark excitation take place at the same scale, and the transition from the light parton in the PDF to the observed partons of the DPD is described by the massive splitting kernel V Q,n F introduced in the previous subsection. The structure of the corresponding graphs is given in figure 1 3. For µ y ≫ m Q , the quark mass effects are contained in the matching kernel for the transition from PDFs f n F to PDFs f n F +1 . The 1 → 2 splitting happens at much higher scales µ y and is described by the massless splitting kernel V n F +1 . The factorised graphs have the form of figure 1(c).
In section 5.3 we will use the preceding analysis to derive the limiting forms of V Q,n F for small and for large y. In the following, we discuss two schemes to compute the DPD in the full range of perturbative y. We will briefly comment on the transition to nonperturbative y in section 3.3.
(a) µy ≪ mQ Kernels with superscript Q are computed with the heavy flavour Q treated as massive, whereas kernels with superscripts n F and n F +1 are computed for n F or n F +1 massless quark flavours, respectively. For brevity, we omit the superscript n F in A Q and V Q .

Scheme with massless splitting kernels
We start with a simplified scheme in which only the splitting formula (2.26) for massless quarks is used. Not surprisingly, this requires rather coarse approximations. One may nevertheless want to use such a scheme, e.g. because at NLO accuracy only massless splitting kernels are presently available. In this purely massless scheme, one directly switches from the description of figure 1(a) to the one of figure 1(c) at a scale µ y = b 0 /y = γ m Q . The choice of the scheme parameter γ will be discussed in section 4.2.1; we anticipate that values γ ∼ 1 are most appropriate.
For µ y < γ m Q , the splitting DPD is computed for n F massless flavours from an n F flavour PDF. This is done at a renormalisation scale µ split ∼ µ y (3.12) to avoid large logarithms spoiling the perturbative expansion of the splitting kernel. The DPD is then evolved to the scale where flavour matching of the DPD from n F to n F + 1 active flavours is performed. The corresponding formulae read for µ y < γ m Q . (3.14) For µ y > γ m Q , the splitting DPD is computed for n F + 1 massless flavours from an n F + 1 flavour PDF: with µ split as in (3.12). The n F + 1 flavour DPD at any other scale is then obtained by evolution from the starting conditions in (3.14) or (3.15). The transition from n F to n F + 1 Figure 2: Illustration of the scheme with only massless splitting kernels. The green line indicates the scale µ split = µ y at which the DPD splitting formulae are evaluated, and the solid red line indicates the scale µ Q = m Q used for DPD flavour matching. For brevity, we write n ′ F = n F + 1 and omit the superscript n F in A Q . The region of nonperturbatively small µ y is not included in the plot (i.e. the zero point is suppressed on the axes for µ y and µ).
flavours in the PDF is obtained by flavour matching: Here and in the following it is understood that flavour matching for the strong coupling is also performed at the scale µ Q . A graphical representation of this scheme is given in figure 2.
The approach just described has an obvious shortcoming. It lacks the heavy-quark contributions that can be produced by splitting for µ y < γm Q , and it neglects the effects of finite m Q in the splitting kernels for µ y > γm Q , even though these effects can only be neglected for µ y ≫ m Q . An indication of these shortcomings is that the DPDs for some flavour combinations have large unphysical discontinuities at µ y = γ m Q , as we will see in section 4.1.1. Nevertheless, we will show in section 4.2.1 that, after one integrates over y to obtain a double parton luminosity (1.1), the massless scheme can actually give a fair approximation of the more realistic scheme presented next.

Scheme with massive splitting kernels
We now consider the case where the massive splitting kernels are used. For the transition between the three regimes shown in figure 1, we introduce two scales αm Q and βm Q with α ≪ 1 and β ≫ 1. For µ y < αm Q or µ y > βm Q , it is appropriate to use the two-step factorisation The choice of the scheme parameters α and β is a matter of compromise. To minimise the errors inherent in the two-step factorisation schemes, one should take α small and β large enough. On the other hand, the splitting kernels V Q,n F in the intermediate regime contain logarithms log(µ/m Q ) and log(µ/µ y ) at higher orders in a s , which cannot be kept small for any choice of renormalisation scale µ if the interval from α to β becomes too large.
The scheme just described is represented in figure 3. The splitting DPD is computed from and from with f n F +1 given by (3.16). In all cases, we take renormalisation scales µ split ∼ µ y and µ Q ∼ m Q as before. We will explain in section 5.6 why we prefer the choice µ split ∼ µ y in the massive kernels V Q,n F , where one may alternatively consider scales that depend on m Q or on both y and m Q .

Two heavy flavours (charm and bottom)
For two quark masses that are far apart, such as m b and m t , the schemes described above can easily be combined sequentially. However, the masses m c and m b are not well separated, and there is a region of y where it is appropriate to keep both of them in the 1 → 2 splitting kernels.
A scheme that takes this into account is shown in figure 4. The general idea remains the same as in the case of a single heavy flavour, with the main difference being that now there is a regime where both the c and b quarks are treated as massive. Notice that this scheme does not include a separate region µ y < αm c where the charm quark would be absent in the 1 → 2 splitting process and generated by flavour matching from F 3 to F 4 . This is because for α ≪ 1, the value of αm c is not large compared with nonperturbative scales. The n F = 4 splitting DPD is thus computed with the massive kernel V c,3 = V Q, 3 .
In addition, the figure explicitly shows that for low µ y the splitting is to be computed at the modified scale µ y * of equation (3.1) rather than at µ y , in order to keep scales in the perturbative region. Moreover, for values of y where µ y * differs appreciably from µ y , the perturbative splitting formula for DPDs loses its validity. One way to deal with this issue is described in section 4.1.
The computation of the DPD at perturbative values of y thus proceeds as follows: In equations (3.21) to (3.23) one may use either µ split ∼ µ y or µ split ∼ µ y * . The PDFs in (3.22) and (3.23) are obtained from f 3 by flavour matching at the appropriate scales.

Numerical studies with LO splitting kernels
We now present numerical studies of the schemes introduced in the previous section. Since the massive 1 → 2 splitting kernels are at present only known at leading order, we limit ourselves to that order. The initialisation and scale evolution of the splitting DPDs are performed with the ChiliPDF library [56,57]. Unless stated otherwise, scale evolution and flavour matching of DPDs, PDFs, and α s are all performed at LO, with flavor matching done at µ Q = m Q . We restrict ourselves to unpolarised partons.
To cover effects from all three heavy quarks, we consider DPDs evolved to the scale µ = Q at momentum fractions associated with one of the settings where Q is the invariant mass of the system produced in each of the hard-scattering processes, and √ s is the c.m. energy of the proton-proton collision. Examples of corresponding DPS processes are the production of two dijets with heavy flavors at the LHC for the first setting and double tt production at a future hadron collider for the second one. In both cases, Q is significantly larger than the mass of the heaviest active parton in the DPD, so that in the hard-scattering cross section all active partons may be treated as massless.
Given the special interest of like-sign W production for DPS studies [15,21,[58][59][60][61][62][63], we also considered the setting We did not find any features in this setting that are not also present in the one of equation (4.1) and therefore refer to appendix A for its discussion.

Splitting DPDs
We start by looking at the splitting DPDs. For each of the three settings just discussed, the parton momentum fractions x 1 and x 2 are chosen such that the final-state system corresponding to each hard scatter is produced at central rapidity, i.e.
As discussed in section 3.3, we initialise the splitting DPDs at the scale µ split = µ y * , where µ y * is specified by (3.1) with minimal value µ min = 1 GeV and by (3.2) with power p = 4. In addition, we modify the perturbative splitting formula by multiplying with an exponential damping factor, where V is either V (1) or V Q (1) . For the damping constants, we take The exponential gives a more realistic behaviour at nonperturbative y while not significantly affecting the perturbative region. For a motivation of the values in (4.6) we refer to section 9.2.1 of reference [13]. For the PDFs in the splitting formula, we take the default LO set of the MSHT20 parametrisation [64], with the associated values α s (m Z ) = 0.13 for the strong coupling and for the quark masses. We also produced plots for LO sets from HERAPDF2.0 and NNPDF4.0 [65,66] and find them to be quite similar to the ones for MSHT20. 1 In particular, they lead us to the same conclusions regarding the values of our scheme parameters α, β, and γ. The focus of our attention is the y dependence of the DPDs evolved to scale µ for different parton combinations. We will plot against µ y = b 0 /y rather than y, in order to facilitate comparison with the graphs in figures 2 to 4. Let us recall that we need the DPDs for µ y up to order µ when computing DPS observables.
For a given number n F of active flavours, we expect that the DPDs are smooth functions of y (and hence of µ y ), since y corresponds to a space-like distance between different fields, and there are no physical production thresholds associated with such a distance. In both the massless and the massive schemes we introduced earlier, we will however find discontinuities at the transition points between regimes where different approximations are made in computing the DPDs. While this is unavoidable, we regard it as desirable to minimise such discontinuities. We will use this as our default criterion to select the parameters α and β in the massive scheme.

Massless scheme
To begin with, we discuss the DPDs in the massless scheme of section 3.2.1, where the splitting DPDs are initialised for n F = 3, 4, 5 or 6 active flavours as µ y increases, with transition points at γm c , γm b , and γm t . The qualitative features we wish to discuss depend weakly on γ, and we find it sufficient to show plots with γ = 1 in the following.
In figure 5 we show the DPDs for the jet production setting at µ = 25 GeV introduced above. We see in figure 5(a) that the cc distribution has a discontinuity at µ y = γ m c . This  is readily understood: above this value, charm is treated as massless and F cc can be directly produced by g → cc splitting. Below this value, charm is only produced by evolution above the flavour matching scale µ c = m c . For F cc this requires two g → cc splittings in the evolution chain. The discussion of F bb , shown in 5(b), proceeds in full analogy. The discontinuity is even stronger in this case.
A very different behaviour is seen for the cb distribution in figure 5(c). This parton combination is not produced by direct splitting at LO, but only by scale evolution, which explains the zero crossing at µ y = µ = 25 GeV, where no evolution takes place. Above this value, the DPD is actually negative, because one has to evolve backwards from µ y to µ. To understand the tiny discontinuity of the distribution at γ m b , we note that above this value F bb is directly produced by splitting (see figure 5(b)), so that F cb is produced more abundantly by evolution. The corresponding effect at γ m c is too small to be visible in the plot. We remark in passing that when the 1 → 2 splitting is evaluated at NLO, the cb channel can directly be produced by c → cb and b → cb for µ y > γ m b , so that one can expect a more pronounced discontinuity in that case.
As an example for a channel with one observed heavy flavour, we show the gb distribution in figure 5(d). We see a small discontinuity at µ y = γ m b , above which F gb can be produced directly by b → gb splitting, or by one evolution step from F bb . The fact that the discontinuity is not very pronounced suggests that the dominant contribution in this region comes from the evolution of F gg via a g → b splitting. Indeed, one finds that F gg is larger than F bb by more than a factor 100 around µ y = γ m b .
For DPDs with only light flavours, we find a smooth behaviour at the transition points in µ y , as one may expect. The behaviour of the n F = 6 DPDs in the tt setting (4.2) is analogous to the one just discussed and therefore not shown here.

Massive scheme
We now investigate the scheme where massive splitting kernels are used along with massless ones. DPDs with n F = 5 are computed as laid out in section 3.3. DPDs with n F = 6 are obtained from these by flavour matching if µ y < αm t , whereas for µ y > αm t the prescription of section 3.2.2 is used. We computed distributions with different scheme parameters, namely α = 1/2, 1/3, 1/4 and β = 2, 3, 4, and will compare a subset of these in the following. We do not consider smaller α or larger β in order to avoid large logarithms in the intermediate mass region, as explained in section 3.2.2.
Let us first look at distributions in setting (4.1), beginning with F bb shown in figures 6(a) and 6(b). Instead of the large discontinuity at γ m b in the massless scheme, the massive scheme has a much smaller discontinuity at αm b and a tiny one at β m b . The tiny jump at β m b is due to switching from massive to massless DPD splitting kernels -which according to (3.8) is a small effect -and to changing from a 4 to a 5 flavour gluon density in the splitting formula. Quantitatively, we find that this jump is somewhat smaller for β = 2 than for β = 4. The discontinuity at αm b can also be ascribed to two effects. For µ y above this value, bb can be produced by direct splitting, although the massive splitting kernels are exponentially suppressed at µ y ≪ m b according to (3.7). Furthermore, the bb distribution is initialised at scale µ y to the right of αm b , whereas to the left of this point it can be produced by evolution only above the flavour matching scale µ b = m b , i.e. with a shorter evolution path towards the final scale µ. Quantitatively, we find that the absolute size of the discontinuity is smaller for α = 1/4 than for α = 1/2. Among the scheme parameters we considered, the mildest discontinuities for the bb distribution are thus obtained for the combination α = 1/4 and β = 2.
We now turn to the gb distribution shown in figures 6(c) and 6(d), which has only small discontinuities in the massless scheme. In the massive scheme, a discontinuity appears at µ y = α m b , and its absolute size varies only weakly with α. To explain this discontinuity, we recall that gb can be produced by one evolution step from the very large gg distribution. Just to the right of the discontinuity, this evolution starts at scale µ y ≈ αm b , whereas just to the left it starts only at µ b = m b . This large difference in evolution length is absent in the massless scheme, where the transition happens at µ y = γ m b with γ ∼ 1.
A second (and perhaps more intriguing) feature of the gb distribution in the massive scheme is the significant discontinuity at µ y = β m b , which strongly increases with β. To understand this behaviour, let us compare how gb can be produced in the massive scheme for the region β m c < µ y < β m b and in the massless scheme for µ y > γ m b (which coincides with the massive scheme for µ y > β m b ). In the massless scheme, all production modes shown in figure 7 contribute for µ y > γ m b and are evaluated with massless 1 → 2 splitting kernels. The PDFs in the splitting formula are evaluated for 5 flavours, and the b quark PDF has been obtained from a four-flavour gluon PDF by flavour matching and subsequent evolution as illustrated in the lower box of figure 7(a). For the contributions in figures 7(b) and 7(c), evolution is necessary to produce the final gb distribution, as indicated by the DGLAP splitting in the upper boxes. This implies that these contributions vanish if the 1 → 2 splitting is evaluated at the target scale, i.e. for µ y equal to µ = 25 GeV. As µ y approaches this value, the direct splitting contribution in figure 7(a) therefore becomes dominant. In the massive  Figure 7: Production modes of F gb for µ y > γ m b in the massless scheme and for β m c < µ y < β m b in the massive scheme. For µ y > β m b the two schemes coincide. Horizontal parton lines correspond to unobserved spectators. In panel (b) F bg is shown for clarity of presentationfrom this F gb is readily obtained by swapping the two partons. In the same panel, V (1) must be replaced by V b(1) for β m c < µ y < β m b in the massive scheme. scheme, the PDFs in the splitting formula are evaluated for four flavours for β m c < µ y < β m b , so that only the modes in figures 7(b) and 7(c) -now evaluated with four-flavour PDFs and massive 1 → 2 splitting kernels -contribute to the gb distribution. This explains why the discrepancies between the massless and massive scheme become larger as µ y comes closer to µ = 25 GeV (for any β ≤ 25 GeV /m b ).
At which value of β should one switch between the two regions within the massive scheme? For the contribution in figure 7(c), the two regions differ only by the number of active flavours in the gluon distribution, which is a higher-order effect. The contribution in figure 7(b) is evaluated more precisely for µ y < β m b , because mass effects are included in the kernel for g → bb. For equal momentum fractions of b and b, the massive kernel differs from the massless one by more than 100% if µ y = m b , but for µ y ≥ 2m b the difference is less than 15%. Most importantly, the contribution of figure 7(a) is missing for µ y < β m b if one works with LO kernels, which becomes a serious deficiency if β is too large. Our preferred choice is therefore β = 2, which we already favoured in the description of the bb distribution. Notice that the shortcoming of the massive scheme for larger values of β should be mitigated if one could work with splitting kernels at NLO, which include the process g → gb of figure 7(a). This provides a particular motivation to study the massive NLO kernels in sections 5 and 8. We return to this particular issues of the gb distribution in section 8.3.
The features of the cb distribution in the massive scheme are similar to the gb case, including a discontinuity at β m b that grows with β (but is less pronounced than for gb). The explanation of this finding is similar to the one just presented. For the cc and gc distributions, we find only tiny discontinuities in the massive scheme. We recall that there is no transition point αm c in this case.
The two-gluon distribution, shown in the bottom row of figure 6, differs only a little between the massive and the massless schemes. Nevertheless, we can see a small discontinuity at αm b , which is more pronounced for smaller α. Visibly, it matters that to the left of this point, gluons evolve with n F = 4 flavours up to the scale µ b = m b and with n F = 5 flavours above, whereas to the right of this point, they evolve with n F = 5 starting at the initialisation scale µ y . Similar small discontinuities are seen in DPDs for light quark flavours, showing their sensitivity to the gluon distribution during evolution. Of course, the presence of these small discontinuities in the massive scheme does not lead us to conclude that it is less realistic than the massless one (which makes more drastic approximations). Clearly, the absence of discontinuities in the y dependence does not guarantee that an approximation for the DPDs is very precise.
To avoid the largest discontinuities across different parton channels for the dijet production setting, we choose α = 1/4 and β = 2 as our preferred scheme parameters. In figure 8 we show the corresponding DPDs for the same parton combinations that we already presented in figure 5 for the massless scheme.
To see whether the same parameters give a good description in other situations, we show in figure 9 the tt and gt distributions for the tt production setting (4.2). The pattern of discontinuities in the massive scheme is the same as the one we see in figure 6 for b quarks, confirming our above choice of parameters. Remarkably, the discontinuity of the tt distribution amounts to almost two orders of magnitude at µ y = γ m t in the massless scheme, and to more than one order of magnitude in the massive scheme with α = 1/2. DPDs for our preferred scheme parameters are shown in figure 10. It is remarkable that the discontinuities of the bb and cb distributions at αm b are not entirely washed out by evolution to the much larger final scale 1 TeV.

Double parton luminosities
So far our focus has been on the DPDs and their y dependence. This dependence is not directly observable in DPS processes, where DPDs only appear in integrals over y. As already mentioned in the introduction, the relevant quantities are double parton luminosities where we express the lower cutoff on y in terms of a momentum scale ν following [13]. We always set ν = µ in this section, except for figures 11 and 12. We evaluate these luminosities at parton momentum fractions that correspond to a kinematic setting where the system produced in the first and second hard scattering has rapidity Y and −Y , respectively. For equal invariant mass Q of both systems, this corresponds to At this point, we recall that the splitting DPDs F spl discussed so far are the leading contribution in a small-distance expansion that contains also an "intrinsic" contribution F intr .  While subleading in y, this contribution can be important for parton combinations where the splitting DPD is small, and it is generally growing more strongly when the parton momentum fractions decrease. Following section 9.2.1 of [13], we model the intrinsic contribution as with µ 0 = 1 GeV. The normalised Gaussian with the widths given in equation (4.6) is used to model the y dependence of the DPDs, and the x 1 and x 2 dependent prefactor ensures a sensible behaviour of the DPDs as the momentum fractions approach the kinematic threshold x 1 + x 2 → 1. The ansatz (4.10) is made for n F = 3 active flavours, and flavour matching is used to obtain the DPDs for higher n F .
We note that the distinction between a splitting and an intrinsic part of the DPD is unambiguously defined only in the small y limit; for large y we simply define our DPD model as the sum of the regularised splitting form (4.5) and the intrinsic term (4.10) (after evolving them to a common scale). We emphasise that this model has not been tuned against data and is likely too simplistic, but we estimate that it contains enough realistic features for the studies that will follow.
The decomposition F spl + F intr induces a decomposition of the double parton luminosity (4.8) into four contributions: where the arguments and integration boundaries are the same as in equation (4.8). The superscripts "1v1", "1v2" etc. follow the nomenclature introduced in [10].
In figure 11, we show the different contributions to L ccbb , L cbgg , L gcbg , and L gggg for the dijet production setting (4.1). We see that the 1v1 and 2v2 and contributions are of comparable size, except for L ccbb , where the 1v1 contribution is dominant for small and intermediate values of the rapidity Y . The luminosity for four gluons is by far the largest one. To which extent this channel dominates the production of two dijets containing heavy flavours depends of course on the relative size of the relevant parton-level cross sections and jet functions in given kinematics. To study this question is beyond the scope of the present work.
For the tt production setting (4.2), the mass scheme dependence is most pronounced for DPDs containing top quarks, so that we expect significant mass effects for the luminosities L tttt and L gtgt shown in figure 12. We see that for both channels, the 1v1 contribution dominates in the full Y range considered.
At this point, we recall that the luminosities shown here need to be combined with luminosities that correspond to the double counting subtraction terms in the overall cross section. The importance of these terms, which we will not investigate in this work, can be estimated by varying the lower cut-off of the y integration. This is because the dependence on this cutoff approximately cancels between DPS and the subtraction terms, up to perturbative orders that are beyond the accuracy of the calculation. A detailed discussion is given in sections 6.2 and 9.2 of [13]. The largest variation with ν seen in figures 11 and 12 appears in the 1v1 contributions to ccbb, gggg, and tttt at low Y . One can hence expect the double counting subtraction to be most important in these cases.
Given the very different y dependence of F spl and F intr , the dominant region of y in the partial luminosities (4.11) in general differs appreciably between the 1v1 and the 1v2 or 2v1 terms. We can therefore expect corresponding differences in the sensitivity of these terms to details of the heavy-flavour treatment. The 2v2 contribution is of course independent of how heavy quarks are treated in the splitting DPDs.

Dependence on the scheme parameters
We now study how the double parton luminosities depend on the scheme parameters -α and β in the massive scheme and γ in the massless one. To this end, we consider the ratios , (4.12) whose denominator is evaluated with our preferred values in the massive scheme (see figures 11 and 12). Within the massive scheme, we plot r(β) for a range of values β = 2, 3, 4 where logarithms in the massive splitting formula remain of moderate size. For the massless scheme, we plot r(γ) with γ = 1/2, 1, 2. We discard values γ < 1/2, for which quarks as treated as massless in the splitting at scales much below their mass, as well as values γ > 2, for which direct g → QQ splitting is omitted at splitting scales much larger than m Q . In figure 13 we show ratios for the different contributions to L ccbb in the dijet production setting. Generally, deviations from unity are larger in the massless scheme than in the massive  one (note however that different parameters are varied in the two cases). Furthermore, the ratios show a pronounced Y dependence in many cases. If the bb pair is produced by splitting (figures 13(a) and 13(c)), the largest deviations are slightly below 30% in the massive scheme and slightly above in the massless one. Deviations are smaller if only the cc pair originates from splitting ( figure 13(b)). This is in line with the weak scheme dependence we observed for F cc in figure 8(a). We now turn to the channels in figure 11 for which the heavy flavours are not directly produced by 1 → 2 splitting. An example is the cbgg channel shown in figure 14. Again, the deviations of the ratios from unity are somewhat larger in the massless scheme than in the massive one, but they remain below 15% in both cases. Deviations in the gcbg luminosity are not larger than those shown for cbgg. In the gggg channel, we find a parameter deviations of at most 6%, which originate in the small but visible scheme dependence we observed for F gg in figure 6.
Plots for the tt production setting (4.2) are shown in figure 15. Deviations for the tttt luminosity reach at most 20% in the massless and at most 40% in the massless scheme. For the gtgt channel, effects are smaller in either scheme, and for gggg they do not exceed 1%.
In both settings, we thus find that the largest scheme dependence arises for parton combinations that can directly be produced by 1 → 2 splitting in both DPDs. Results in the massive scheme depend on α and β at the level of at most 30%.
One may ask whether there is a choice of γ in the massless scheme that typically gives the best agreement with the more realistic results in the massive scheme. Whilst for ccbb luminosities in figure 12, ratios closest to unity are obtained for γ = 1/2, the preferred value for the gtgt luminosities in figure 12 is between 1/2 and 1. Values below 1/2 or above 2 would not improve the global agreement. For 1/2 ≤ γ ≤ 1 the luminosities in the massless scheme approximately reproduce the ones in the massive scheme with α = 1/4 and β = 2. Typical deviations reach 30% in either direction and can depend strongly on Y , i.e. on the momentum fractions in the DPDs.

Splitting scale variation
To estimate the importance of higher-order corrections in the DPD splitting formulae (2.26) and (3.3), one may vary the scale µ split at which they are evaluated before the DPD is evolved to the final scale µ (with flavour matching at an intermediate scale if appropriate). It is customary to vary the renormalisation scale in a fixed-order formula by a factor of 2 around its central value. We modify this to min µ min , µ y * /2 ≤ µ split ≤ 2µ y * , (4.13) which avoids renormalisation scales below µ min = 1 GeV. This results in an asymmetric scale variation for µ y * < 2µ min , which translates to y > 0.57 GeV −1 with our choice of y * (y). In the following, we study the impact of this scale variation on double parton luminosities in the massive scheme with α = 1/4 and β = 2. Starting with the dijet production setting, we show in figure 16 the splitting scale dependence for the same luminosities that were given in figure 11. The variation of the 1v1 terms is far greater than for the 1v2 and 2v1 terms, which is not surprising because the former involve two splitting DPDs and the latter only one. We also observe that the bands are more asymmetric for the sum of the 1v2 and 2v1 terms than for 1v1. This is because the former receive important contributions from the region y > 0.57 GeV where the scale variation (4.13) becomes asymmetric. In the channels cbgg, gcbg, and gggg, the scale variation is of considerable size and depends rather weakly on Y . By contrast, the scale variation of for ccbb is very weak at low Y but large at high Y . An explanation for this remarkable feature is given in appendix A.
The splitting scale dependence of double parton luminosities in the tt production setting is shown in figure 17 and shows similarities to the cases just discussed. In the 1v1 terms, we find a large scale variation for gtgt at all Y and for tttt at high Y .

Variation of the flavour matching scale
Apart from the DPD splitting scale µ split , the evaluation of DPDs in the schemes of section 3 involve a second scale choice, namely for the scale µ Q at which flavour matching is performed for DPDs, PDFs, and α s . It is therefore natural to study the impact of varying this scale as well. In analogy to (4.13), we vary the matching scale in the interval where the minimum-prescription on the l.h.s. is relevant for the charm quark mass.
In the following, we also compare results with flavour matching evaluated either at LO or at NLO. For the default choice µ Q = m Q , this makes no difference, since all one-loop flavour matching kernels are zero at that point. For different scale choices, this is no longer the case. We note that computing A Q at NLO but V and V Q at LO corresponds to order α s in all kernels appearing in the factorised graphs of figure 1. Of course, the overall accuracy remains at LO with such a hybrid choice.
Notice that not only the computation of the splitting DPDs involves flavour matching, but also the evaluation of the intrinsic part F intr . The latter is initialised for n F = 3 flavours   figure 11 for the dijet production setting. Central curves are for µ split = µ y * , and bands correspond to the variation specified in (4.13). Notice that the 1v2 and 2v1 contributions have been added.  for all y and thus requires several steps of DPD flavour matching in our settings with n F = 5 or 6 for dijet or tt production, respectively. The matching scale dependence of luminosities in the dijet production setting is shown in figure 18. With LO flavour matching we find large scale uncertainties, except in L 1v1 ccbb at  Corresponding plots for the tt production setting can be seen in figure 19 and show a very similar picture. At LO, the scale variation for the 2v2 terms is huge, in particular for the tttt channel, whereas only small variations are found in all cases at NLO.
Since the scale variation at NLO is very small, we expect that higher-order graphs as-   figure 12 for the tt production setting.
sociated with flavour matching will provide only small corrections to the results obtained with µ Q = m Q (at LO or NLO). This is in stark contrast to the higher-order corrections to DPD splitting, where based on the scale variation exercise, we must expect very substantial corrections to the leading-order results.

Massive splitting kernels at NLO
The large variation of LO splitting DPDs reported in the previous section provides a strong motivation for evaluating the splitting kernels at NLO accuracy, i.e. at two loops. For massless kernels and unpolarised partons, this has been achieved in [53,54], and the extension of these calculations to polarisation is rather straightforward. By contrast, the massive splitting kernels are only known at LO (see equations (3.5) and (3.6)). To compute them at NLO is well beyond the scope of the present work. Instead, we derive their limiting behaviour for small and large y, as well as their dependence on n F and on the renormalisation scale µ. In a later section, we will also show which constraints follow from the DPD sum rules [1,68] for unpolarised partons. From now on it is understood that PDFs f n F , DPDs F n F , the splitting kernels V n F and V Q,n F , and the flavour matching kernels A Q,n F are to be evaluated at the following arguments:

4)
A Q,n F a 1 a 0 ≡ A Q,n F a 1 a 0 (z, m Q ; µ) , (5.5) unless indicated otherwise. Notice our convention to use momentum fraction arguments x, x 1 , x 2 in distributions and z, z 1 , z 2 in kernels.

Two-loop splitting graphs with massive quarks
Some properties of the NLO massive splitting kernels follow directly from the Feynman graphs for the associated splitting process. Let us briefly review these and point out in which ones massive quark lines appear. We distinguish between "LO channels" with splitting processes that are already possible at LO (g → qq, q → qg, g → gg, and g → QQ) and "NLO channels", which first appear at two-loop order (q → QQ, g → Qg, q → Qq, q → Qq, and corresponding channels with q or q ′ instead of Q). Recall that we denote light flavours by q and q ′ and heavy ones by Q. In detail, we have 5. q → QQ. The diagram for this channel is obtained from the one for the massless q → q ′ q ′ kernel by replacing all q ′ quark lines by massive lines, as depicted in figure  21(k).
6. g → Qg. The diagrams for this channel correspond to the ones for the massless g → qg kernel, again with all quark lines replaced by massive ones. They are shown in figures 21(e) to 21(j).
7. q → Qq. The diagram for this channel is obtained from the one for the massless q → q ′ q kernel by replacing all q ′ quark lines by massive lines, shown in figure 21(l).
The two-loop diagrams for q → Qq and q → Qq are related by charge reversal of the heavy-quark line, which results in Qq,q . (5.6) The two-loop graphs for NLO channels with observed light partons do not involve any heavyquark lines, so that we have V a 1 a 2 ,a 0 for the corresponding parton combinations. Figure 20: Virtual NLO diagrams for the amplitude that contain heavy-quark lines, indicated by red double lines. The tree graphs in the complex conjugate amplitude are not shown. In Feynman gauge, graphs with eikonal lines are to be added, as specified in figure 4 of [53].

Reminders about massless kernels
To begin with, let us recall two properties of the massless two-loop splitting kernels derived in [54,68].
1. The part V n F [2,1] of a two-loop kernel that is multiplied by log(µ 2 /µ 2 y ) can readily be deduced by taking the µ derivative of the splitting formula (2.26) and using the evolution equations for the DPD, PDF, and a s (µ). One obtains 2. The n F dependence of the NLO kernels enters only via quark loops in virtual graphs. This affects the two channels q → qg and g → gg, whereas the NLO kernels for all other channels are n F independent. The n F dependence naturally appears via β n F 0 , see equations (4.17) and (4.23) in [54], and we write for (a 1 a 2 , a 0 ) = (qg, q) and (gg, g), where ℓ = 0, 1. Since V β [2,ℓ] originates from virtual graphs, it is proportional to δ(1 − z 1 − z 2 ), just as the LO splitting kernels in (2.32). Using (5.7) and the n F dependence of the LO DGLAP kernels, we can derive that for (a 1 a 2 , a 0 ) = (qg, q) and (gg, g).
We note that in the expression of (5.7) for (a 1 a 2 , a 0 ) = (qq, g) the n F dependence cancels between the third and fourth terms, owing to the relation (2.11).
The preceding relations also hold if the produced partons a 1 and a 2 are polarised. The first two DGLAP kernels in (5.7) are then polarised, whilst the third one is unpolarised since it refers to the incoming parton a 0 .

Limiting behaviour for small or large distances
In the limits of small or large y -corresponding to large or small momentum scales µ y , respectively -the massive splitting kernels can be expressed in terms of massless splitting kernels and flavour matching kernels. This follows from the consistency between the different factorisation regimes in figure 1. If in the intermediate regime (where the massive splitting kernels are used) one takes µ y ≫ m Q , then one must recover the factorisation regime in which the heavy flavour Q is treated as massless in the 1 → 2 splitting process, whereas for µ y ≪ m Q one must recover the regime in which Q decouples in the 1 → 2 splitting. Comparing the DPD splitting formulae for the three regimes (see section 3.2.2) we thus find the following.
For µ y ≫ m Q (i.e. in the small-y limit) the massive kernels can be expressed by n F + 1 flavour massless kernels, convoluted with a flavour matching kernel: This corresponds to the right panel in figure 1, where one has first a flavour matching in the PDF and then the 1 → 2 splitting for n F + 1 massless flavours. The O(a 2 s ) term in equation with δ n F a 0 l defined in (2.19). For µ y ≪ m Q (i.e. in the large-y limit) one has This corresponds to the left panel in figure 1, where one has a 1 → 2 splitting process for n F flavours and then DPD flavour matching to n F + 1 flavours. The O(a 2 s ) kernel reads where the term with A α appears because V Q,n F is expanded in a n F +1 s and V n F in a n F s .

Scale dependence
Truncated at NLO, the massive splitting (3.3) formula reads Inserting (5.14) on the l.h.s. of the double DGLAP equation (2.4) gives a n F +1 whilst for the r.h.s. of (2.4) one obtains a n F +1 The prefactor 1/(πy 2 ) has been omitted in both equations. Using the relation (2.28) for the convolutions with three factors, one derives 18) The first equation in (5.17) confirms the µ independence of the LO kernels given in (3.5) and (3.6). Taking into account the n F dependence of the LO DGLAP kernels, one obtains the explicit expressions v RGE qq,g = V [2,1] qq,g + . In accordance with the inspection of the two-loop Feynman graphs, we find that the only kernels with an n F dependence are those for q → qg and g → gg.
The analogues of equations (5.19) to (5.25) for polarised partons a 1 and a 2 are obtained by replacing the one-loop V kernels on the r.h.s. with their polarised counterparts, as well as the LO DGLAP kernels in the convolutions ⊗ 1 and ⊗ 2 .
As a cross check, let us verify that the form (5.18) has the correct behaviour at small and at large y. To this end, we take the scale derivative of the limiting relations (5.11) and (5.13). Using dV n F (1) a 1 a 2 ,a 0 /d log µ 2 = 0 and dV Given the relations (3.6) to (3.8) and the fact that at LO there is no splitting from a light parton to a light and a heavy one, we see that (5.28) and (5.29) are fulfilled.

An explicit parametrisation
We now give an explicit parametrisation of the massive two-loop kernels that incorporates their small-and large-y limits as well as their dependence on µ. It reads with v n F ,RGE given in (5.18), where we abbreviate The function V I a 1 a 2 ,a 0 is µ independent and interpolates between the small and large y limits of the kernels with the properties For dimensional reasons, it depends on the product of y and m Q . The functions k 11 (y m Q ) and k 02 (y m Q ) in (5.30) can be replaced with other functions that have the same limiting behaviour for y → 0 and y → ∞, provided that one makes a corresponding change in V I . Note that we do not allow for an n F dependence of V I . Indeed, the n F dependence of the two-loop kernels originates from graphs with massless quark loops, and the contribution of these graphs is fully contained in the first two terms and in the last term of (5.30).
To verify that the parametrisation (5.30) has the correct limits for small and for large y, we recall that the modified Bessel functions K i satisfy As a consequence, one has Along with the limiting behaviour (5.32) of V I a 1 a 2 ,a 0 , one readily finds that This is agrees with (5.11) and (5.13), given that for µ = m Q , the LO matching kernels A Q(1) in these equations are zero. Since the µ dependence of the parametrisation (5.30) resides entirely in its last term, the correct limiting behaviour at any other scale follows from the discussion at the end of the previous subsection.
The explicit form of our parametrisation (5.30) for the different channels reads V Q(2) qq,g = V [2,0] qq,g + V [2,1] qq,g log where ∆β 0 is defined in (2.14) and V β [2,0] in (5.8). In equations (5.40) to (5.42) we have inserted the explicit forms (5.7) of V [2,1] for the relevant channel. The analogues of (5.36) to (5.42) for polarised partons are obtained by replacing the V kernels on the r.h.s. with their polarised counterparts, as well as the DGLAP kernels in the convolutions ⊗ 1 and ⊗ 2 . The interpolating function for g → gg splitting receives contributions only from virtual graphs with a massive quark loop (see figures 20(c) to 20(e)). It therefore has the form where the function on the r.h.s. depends on one instead of two momentum fractions. In section 7.3 we will show that the interpolating function for g → qq is zero, i.e.
V I qq,g = 0 (5.44) with corresponding relations for longitudinal or transverse quark polarisation. We already noted that for NLO channels with light observed partons, the massive two-loop kernels are equal to their massless counterparts. Correspondingly, one finds V I a 1 a 2 ,a 0 = 0 for the relevant parton combinations.

Large logarithms
For both µ y ≪ m Q and µ y ≫ m Q , the massive splitting kernels involve two physical scales of very different size. The expressions just derived allow us to explicitly investigate the large logarithms that appear at NLO in these limits. Notice that according to (5.34), the function k 02 (y m Q ) counts as a large logarithm in the limit of small y, whereas this is not the case for k 11 (y m Q ). Both functions vanish for µ y ≪ m Q .
From the explicit forms (5.36) to (5.42) it is clear that no choice of µ can remove all large logarithms, not even for light observed partons. In the following we will focus on the scale choice µ ∼ µ y . This provides an element of continuity between the massive regime and the two regimes with massless DPD splitting kernels in figure 3 -for the latter, µ y is the only relevant scale in the 1 → 2 splitting process. Moreover, the numerical study in [54] showed that the logarithmic part V n F [2,1] gg,g of the gluon splitting kernel gives rise to large NLO corrections when the splitting formula is evaluated at scales away from µ y . In fact, this kernel contains large logarithms in several kinematic limits.
We now discuss the different parton channels in turn.
• LO channels with observed light partons. With the choice µ ∼ µ y the expressions for all three channels g → qq, q → qg, and g → gg receive logarithmic contributions of the type To analyse the small-y limit, we use the explicit form of V where the ellipsis denotes non-logarithmic contributions from V I + V [2,0] . With the choice µ ∼ µ y one thus finds that V Q(2) QQ,g has the same large logarithm as its counterpart V Q(2) qq,g for light quarks, whilst V Q(2) QQ,q has no large logarithm at all.
• Channels with one observed heavy quark. In the large-y limit, one has so that the production of one heavy quark does not decouple in the limit µ y ≪ m Q . At this point we must remember that we defined the DPDs in the MS renormalisation scheme, which does not yield heavy-quark decoupling for scales much smaller than the quark mass, see e.g. [48]. We note in passing that at order a 3 s , non-decoupling in the limit µ y ≪ m Q will also occur in the channel g → QQ, because the three-loop kernel contains a term A gg,g according to (5.12).
For the scale choice µ ∼ µ y , the two-loop kernels in (5.49) and (5.50) yield splitting DPDs F Qg and F Qq that are negative and enhanced by a large logarithm log(µ 2 y /m 2 Q ). This is similar to the negative values of the heavy-quark PDF that one obtains when flavour matching at scales µ ≪ m Q . Both in the DPD and in the PDF case, subsequent DGLAP evolution up to the scale m Q yields a distribution close to zero, provided that a s log(m 2 Q /µ 2 ) is sufficiently small. Indeed, the large logarithm in (5.49) and (5.50) is then cancelled by a corresponding term with log(m 2 Q /µ 2 ) from DGLAP evolution, and further evolution terms come with at least one more factor of a s log(m 2 Q /µ 2 ). Let us now turn to the small-y limit. In this case, we have where the ellipsis again denotes non-logarithmic contributions. For the choice µ ∼ µ y , a large logarithm remains only in the g → Qg channel, which can be rewritten as This is just the expression of the graph in figure 7(a). We recall that in section 4.1.2, this was identified as a large missing NLO contribution when the massive splitting scheme is evaluated at LO. It led us to choose a rather low value for the upper limit β m Q of the µ y region where massive splitting kernels are used. One might expect that using massive splitting kernels at NLO will mitigate the problem and possibly allow for using larger β. Of course, this should be confirmed by a numerical study, which is beyond the scope of this work.
The preceding discussion, and in particular the relations (5.46) to (5.53) are readily generalised to the case of polarised partons.

NLO splitting kernels with two heavy flavours
In the previous section, we analysed the massive NLO 1 → 2 splitting kernels for one heavy flavour Q with mass m Q . We now extend our analysis to kernels for two massive flavours, which appear in the scheme we laid out in section 3.3. The corresponding LO kernels have been given in section 3.1. In analogy to equations (5.1) to (5.5), it is understood that the massive kernels are taken at arguments V cb a 1 a 2 ,a 0 ≡ V cb a 1 a 2 ,a 0 (z 1 , z 2 , y, m c , m b ; µ) (6.1) unless specified otherwise.

Expressions in terms of kernels with one heavy flavour
Inspection of the relevant Feynman graphs allows us to obtain the kernels for massive charm and bottom from the ones for a single heavy flavour. We discuss the different types of parton channels in turn.
LO channels with observed light partons. For the LO channels g → gg, g → qq, and q → qg, heavy-quark effects arise solely from closed quark loops, as shown in figure 21. Each of these loop graphs is evaluated once for charm and once for bottom, and there are no graphs that involve both heavy flavours. We now decompose the kernels for three light flavours and one heavy flavour Q as a 1 a 2 ,a 0 is the massless three-flavour NLO kernel and ∆V Q(2) a 1 a 2 ,a 0 (m Q ) contains the contributions from diagrams with massive quark loops (see figures 20(a) to 20(e)). Each term on the r.h.s. contains the ultraviolet counterterms in the MS scheme for the flavours running in the quark loops. The sum of counterterms for the kernel (6.2) hence corresponds to renormalising the strong coupling for 3 + 1 flavours, in agreement with (3.4).
According to our discussion of the relevant Feynman graphs, the kernel for three light and two heavy flavours is then given by 3) The sum of the kernels on the r.h.s. contains the ultraviolet counterterms for 3+1+1 flavours, which corresponds to the use of a 5 s in the perturbative expansion (3.10) of V cb . Combining the two previous equations, we obtain Here and in the following, we explicitly indicate the dependence of kernels on the two quark masses, but omit the arguments z 1 , z 2 , y and µ.
LO channels with observed heavy partons. In the NLO kernels for g → cc and g → bb, the two massive quarks enter on a different footing. For definiteness, let us consider the kernel for g → cc, from which the one for g → bb is readily obtained by interchanging the roles of c and b quarks, i.e. NLO channels with observed heavy partons. These are the channels q → QQ, g → Qg, and q → Qq with Q = c, b. The relevant graphs involve massive lines only for the observed flavour Q, as seen in the second and third rows of figure 21. They are hence not sensitive to the presence of a second heavy quark and read V cb(2) a 1 a 2 ,a 0 (m c ) = V c(2) a 1 a 2 ,a 0 (m c ) for (a 1 a 2 , a 0 ) = (cc, g), (cg, g), (cq, q) (6.8) for produced charm. For produced bottom, one has to replace the flavour labels and the quark mass.
NLO channels with observed light partons. Since the two-loop graphs for these channels do not involve any heavy-quark lines, we have V cb(2) a 1 a 2 ,a 0 for the corresponding parton combinations, just as in the case of a single heavy flavour.
In the following subsections, we will verify that (6.4), (6.7), and (6.8) have the correct limits for small or large y and the correct dependence on the renormalisation scale.

Limiting behaviour for small or large distances
For small or large y, the massive splitting kernels with two heavy flavours can be reduced to the convolution of massless splitting kernels with flavour matching kernels, just like in the one-flavour case (section 5.3). This follows from the consistency between the different factorisation regimes in figure 1 for the case that the heavy-quark subprocesses contain two heavy flavours instead of one.
PDF matching for two heavy flavours. Instead of successively matching a three-flavour PDF first to four and then to five active flavours, one may directly match with a perturbative expansion of the kernel in a 5 s : For brevity, we do not indicate explicitly that A cb assumes the presence of 3 massless quark flavours. At tree-level, we have with δ 3 a 1 l defined in (2.19), whereas at one-loop order the matching coefficients are additive: Results for A cb up to three loops can be found in reference [69]. Up to two loops, the coefficients can be obtained from the single-flavour case, as shown in [70].
In full analogy, one can perform two-flavour matching for the strong coupling, with coefficients at LO and NLO.
General limiting behaviour. For small y, or more precisely for µ y ≫ m b , one finds that the two-flavour kernels can be expressed as Expanding this in the five-flavour strong coupling a 5 s gives for the NLO coefficient. In the opposite limit of large y, or more precisely for µ y ≪ m c , the two-flavour kernels may be expressed as where the O(a 2 s ) term reads We now show that the kernels derived in section 6.1 correctly fulfil these limits, again considering separately the different groups of parton channels.
LO channels with observed light partons. To have the same small-y limit on both sides of the relation (6.4), we need where we have used (5.11) and (6.16). To see that this is the case, we use (6.12) and the relation (6.20) which holds because V n F (2) depends linearly on n F . For (6.4) to have the correct large-y limit, we need a 1 a 2 ,a 0 (6.21) according to (5.13) and (6.18). Given (6.12) and (6.14), this is indeed fulfilled.
LO channels with observed heavy partons. According to (5.11) and (6.16), the consistency of relation (6.6) in the small-y limit requires which is satisfied because of (6.12). In the large-y limit, both sides of equation (6.6) tend to zero according to equations (5.13) and (6.18). An analogous analysis applies to V cb (2) bb,g .
NLO channels with observed heavy partons. For definiteness, we consider again the case of observed charm quarks. The small-y behaviour given in (5.11) and (6.16) implies that V cb (2) cc,q and V The consistency of the relation (6.8) with (6.23) is evident, whilst for (6.24) it requires This holds due to (6.12) and the fact that A b(1) cg = 0. In the large-y limit, both V cb (2) cc,q and V c (2) cc,q tend to zero, whilst ga,a for a = g, q (6.26) according to (5.13) and (6.18). The consistency of (6.8) is again ensured by the relation (6.25).

Scale dependence
The analysis in section 5.4 can be extended to the case of two heavy flavours in a straightforward manner. Starting from and adapting the steps that lead from (5.14) to (5.17) and (5.18), one finds the following scale dependence of the massive splitting kernels with two heavy flavours: We now show that this is fulfilled by the two-flavour kernels in (6.4), (6.7), and (6.8).
LO channels with observed light partons. The scale dependence of the two-loop kernels for one heavy flavour is given by (5.18), and for the massless kernel with three flavours one has a 1 a 2 ,a 0 (6.30) with the r.h.s. given in equation (5.7). Inserting this on the r.h.s. of the relation (6.4), one obtains Here we have used that for observed light partons the kernels V Q(1) on the r.h.s. of (5.18) are equal to their massless counterparts V (1) . Since P n F (1) and β n F 0 are linear functions of n F , we have so that equation (6.31) reduces to This is consistent with (6.29), because the LO kernels V cb(1) are equal to V (1) for observed massless partons.
LO channels with observed heavy partons. Using (5.18) and (6.29) together with the fact that the DGLAP kernel P (0) qq does not depend on the number of active flavours, one finds that V cb (2) cc,g in (6.6) has the correct scale behaviour.
NLO channels with observed heavy partons. For NLO channels, the kernels V Q(1) and V cb(1) on the r.h.s. of (5.18) and (6.29) are zero, leaving only terms with flavour non-diagonal DGLAP kernels, which are independent of n F . This ensures the correct scale dependence of the kernels V cb(2) a 1 a 2 ,a 0 in (6.8).

Explicit parametrisation
An explicit parametrisation of the kernels V cb(2) is readily obtained from the relations in section 6.1 and the parametrisation of V Q(2) in section 5.5. For the LO channels, this gives V cb (2) qq,g = V [2,0] qq,g + V [2,1] qq,g log (2) cc,g = V I cc,g (m c ) + k 11 (y m c ) V [2,0] qq,g − k 02 (y m c ) V [2,1] qq,g where we abbreviated For definiteness, we have indicated all dependence on the heavy-quark masses and also given the expression for V cb (2) bb,g . According to (6.8), the kernels V cb (2) for NLO channels are directly obtained from (5.40) to (5.42) by appropriate substitution of parton labels and masses.
Large logarithms. The analysis of logarithms in (6.34) to (6.38) is very similar to the one in section 5.6, with similar conclusions regarding the large logarithms that remain if one chooses µ ∼ µ y . The most notable change is that for the NLO corrections from massive quark loops, which appear in all LO channels, one needs to replace log(µ 2 y /m 2 Q ) in (5.45) with the sum of logarithms log(µ 2 y /m 2 c ) + log(µ 2 y /m 2 b ).

DPD sum rules and massive splitting kernels
In this section we will show that the number and momentum sum rules for unpolarised DPDs [1,68] imply corresponding sum rules for the massive splitting kernels V Q . These sum rules are easily verified for the LO results given in (3.5) and (3.6). More importantly, they provide valuable constraints on the interpolating terms V I in our parametrisation (5.30) of the NLO kernels. Throughout this section, it is understood that all partons are unpolarised unless stated otherwise. As in section 5, we consider n F light flavours along with one heavy flavour Q.
The DPD sum rules do not directly apply to the distributions F (x 1 , x 2 , y) we have considered so far. They require an integral of these distributions over all distances y in the transverse plane, which due to the 1/y 2 behaviour from perturbative splitting is logarithmically divergent at small y if carried out naively. In [13,68] it was shown that the sum rules hold if this splitting singularity is renormalised in the MS scheme (with the integral over y carried out in 2 − 2ϵ dimensions before modified minimal subtraction of the resulting pole at ϵ = 0). Moreover, a matching formula was derived that connects these distributions F MS (x 1 , x 2 ) to the integral of F (x 1 , x 2 , y) over y in two dimensions, evaluated with a lower cutoff on y. The corresponding matching kernels were computed in [68] at LO and in [53] at NLO accuracy.
For DPDs with n F + 1 active flavours, the matching relation reads where we use the notation for an integral over a ring in the y plane. The ν dependence cancels between the two terms on the r.h.s. of (7.1), up to power corrections in Λ/ν, where Λ is a hadronic scale. We therefore require ν ≫ Λ. In line with the convention in (5.2), we will suppress arguments and write in the following. The perturbative expansion of the matching kernels U reads In the following derivations, the kernels U will be needed at different values of ν but always the same µ. Using that the ν dependence appears only via the ratio µ/ν in (7.5), we write U n F a 1 a 2 ,a 0 (µ/ν) ≡ U n F a 1 a 2 ,a 0 (z 1 , z 2 ; ν, µ) , (7.6) where the separate dependence on µ, due to the factors of a s (µ) in (7.4), is suppressed on the l.h.s. The coefficients U n F [k,ℓ] are related to V n F [k,ℓ−1] via Details about the U kernels are given in equations (79), (116) and (140) of reference [53]. In the following, we need that the coefficients U [1,0] are n F independent, whilst U n F [2,0] has the same type of n F dependence as V n F [2,0] . In analogy to (5.8), we can hence write for (a 1 a 2 , a 0 ) = (qg, q) and (gg, g), The kernels in the matching equation (7.1) have been computed only for massless quarks. They are associated with physics at momentum scales above ν, so that in our setting with a heavy flavour Q we can use them if ν ≫ m Q . This is ensured by taking For DPDs with n F + 1 active flavours, the momentum sum rule reads 10) and the number sum rule is given by 11) where N a 2v is the number of valence quarks with flavour a 2 in the hadron. The parton label a 2v denotes the difference between a 2 and a 2 , i.e. we write F a 1 a 2v = F a 1 a 2 −F a 1 a 2 for DPDs and likewise for various kernels appearing below. We also use the shorthand notation introduced in reference [68], where integrals over momentum fractions are denoted by for functions of one or two momentum fractions. The multiplication with a power of a momentum fraction is indicated by operators X n and X n 2 , which act as The integration boundaries in (7.10) and (7.11) are readily inferred from the support properties of the DPDs, and one finds that the x 2 integrations run from 0 to 1 − x 1 in both sum rules. It is furthermore understood that the left-and right-hand sides of (7.10) and (7.11) still depend on the momentum fraction x 1 , which is not written out explicitly.
In the following calculations, we will use the identities derived in section 6.1 of [68], where it is understood that the overall expressions depend on x 1 in both cases.

Sum rules for the massive splitting kernels
The sum rules just spelled out involve an integral over DPDs from the perturbative to the nonperturbative region of y. To make contact with the DPD splitting kernels, we split the integration over y into the two intervals [y β , y α ] and [y α , ∞], where with β ≫ 1 and α ≪ 1, and with αm Q being in the perturbative region. 2 The lower limit y β is equal to ν/b 0 with ν given in (7.9). A number of integrals over y that will be used in the following is collected in table 1.
For y β ≤ y ≤ y α , we have used the massive DPD splitting formula, whereas for y > y α we used DPD flavour matching to relate F n F +1 to F n F . This is in line with the scheme we presented in section 3.2.2, except that for large y one cannot compute F n F using a splitting formula. We now use the analogue of the matching formula (7.1) for n F flavours and take ν = αm Q , which satisfies ν ≫ Λ by assumption. This gives Inserting (7.17) and (7.18) into (7.1), we obtain our master formula where for the term going with U n F +1 we have expressed f n F +1 in terms of f n F and used the relation (2.28) for a triple convolution. For the argument µ/ν of the U kernels (see (7.6)), we have introduced the notation r β = µ/(β m Q ) , r α = µ/(αm Q ) (7.20) in analogy to (7.16). Using the DPD sum rules for both F n F +1,MS and F n F ,MS in (7.19), we will obtain sum rules for the massive splitting kernels V Q,n F .

Momentum sum rule
When inserting (7.19) into the momentum sum rule (7.10) for a DPD with n F + 1 active flavours, one needs the expression where D n F is the expression in square brackets on the last line of (7.19), and where we used (7.14) in the last step. We now use the momentum sum rule for the PDF matching kernels, which ensures the consistency of the momentum sum rules for PDFs with n F and n F + 1 flavours and can readily be checked for the LO and NLO expressions (2.18) and (2.20). This gives where we could drop δ n F b 2 l because D n F is constructed from quantities with n F flavours. The first term on the r.h.s. of (7.23) is equal to (1 − X) f n F b 1 according to the DPD momentum sum rule for n F flavours. Putting everything together and using the relations (7.14) and (7.15) for integrals over convolution products, we obtain The r.h.s. of the DPD momentum sum rule (7.10) for n F + 1 flavours can be expressed in terms of an n F flavour PDF as and is to be evaluated at momentum fraction x 1 . Using that (7.24) and (7.25) must be equal for any set of PDFs f n F , we obtain the desired momentum sum rule for the heavy-quark kernels V Q a 1 a 2 ,a 0 : At LO, the only heavy-quark kernel is V Q(1) QQ,g , which appears in the sum rule for a 0 = g and a 1 = Q. The term of order a n F +1 s in (7.26) reads the r.h.s. of (7.27) reduces to For the l.h.s. of equation (7.27) one obtains the same expression, using the result (3.5) for the LO splitting kernel and the relevant integral in table 1. This confirms the validity of the momentum sum rules for the massive splitting kernels at LO. The term of order (a n F +1 s ) 2 in equation (7.26) is given by where we have rewritten the perturbative expansion (7.4) of U n F in terms of a n F +1 s using the matching equation (2.12) for the strong coupling. Since all quantities on its r.h.s. are known, (7.30) provides a non-trivial constraint on the massive two-loop splitting kernels.

Number sum rule
In analogy to (7.21), we now need the expression where D n F denotes again the expression in square brackets on the last line of (7.19). The number sum rule for the PDF matching kernels reads which is easily verified for the LO and NLO expressions (2.18) and (2.20). Inserting this in (7.31) and performing the sum over b 2 , we get The first term on the r.h.s. is equal to N a 2v + δ b 1 a 2 − δ b 1 a 2 f n F b 1 by virtue of the DPD number sum rule for n F flavours. With this, we obtain Expressing the r.h.s. of the number sum rule (7.11) for n F +1 flavours in terms of a n F flavour PDF, we get Using that (7.34) and (7.35) must be equal, we obtain the number sum rule for the massive splitting kernels: Let us explicitly verify this at LO for the kernel V QQ,g . In this case, the term of order a n F +1 s in (7.36) reads Using the expression (3.5) of the massive splitting kernel together with (2.21) and (7.28), one finds that both sides of this equation evaluate to The term of order (a n F +1 and provides another constraint on the massive NLO splitting kernels.

Sum rules for the interpolating function V I
Let us now rewrite the sum rules (7.30) and (7.39) for the two-loop kernels in terms of the interpolating term V I in our parametrisation (5.30). To this end, it is sufficient to evaluate these sum rules at the scale µ = m Q , where they simplify considerably. Using (5.30) and the integrals in table 1, we find that the y integral needed in the sum rules reads yα y β where we defined yα y β d 2 y 1 πy 2 V I a 1 a 2 ,a 0 (z 1 , z 2 , y m Q ) = v I a 1 a 2 ,a 0 (z 1 , z 2 ) . (7.41) Since V I tends to zero for both y → 0 and y → ∞, the y integral in (7.41) is finite for α → 0 and β → ∞. The r.h.s. of equation (7.41) is evaluated in this limit, as are the integrals in table 1.
Momentum sum rule. For µ = m Q , the momentum sum rule (7.30) simplifies to Here we have replaced the two-loop PDF matching kernel A Q(2) with its non-logarithmic part A Q [2,0] , given that the other terms in (2.17) vanish for µ = m Q . The one-loop matching kernels A (1) α and A Q(1) are zero at that point. Inserting (7.40) for the y integral on the l.h.s. of equation (7.42) and the decomposition (7.5) for the two-loop U kernels on its r.h.s., we obtain Using the relation (7.7) between U [k,ℓ] and V [k,ℓ−1] , we find that all terms depending on α or β cancel and obtain the momentum sum rule Following the same steps that lead from (7.42) to (7.43), we obtain

Explicit determination of the NLO kernel for g → qq
The constraints discussed in the previous subsections can be used to fully determine the kernel V Q(2) qq,g for a gluon splitting into a light qq pair. To this end it is crucial to realise that this kernel can be written as V Q (2) qq,g = V (2) qq,g + v Q qq,g , (7.47) where V (2) is the massless kernel. The term v Q contains the contributions from the graph in figure 22 and its complex conjugate, together with the MS counterterm of the heavy-quark loop. Since this is a virtual graph with no lines crossing the final-state cut, v Q includes the momentum conservation constraint δ(1 − z 1 − z 2 ), just like the LO splitting kernels (2.32).
As indicated in the figure, v Q factorises into two parts, qq,g (z 1 , z 2 , y m Q ) g(m Q ; µ) , (7.48) where the upper part is simply given by the LO g → qq splitting kernel. The function g that represents the lower part cannot depend on y, since this variable is defined in terms of the lines in the upper part of the graph (see section 4.1 of [53] for details). Likewise, g cannot depend on the momentum fractions z 1 and z 2 of the lines in the upper part. It is straightforward to compute it from the renormalised one-loop graph. Here, we show instead how its expression can be obtained from the results we already derived so far. qq,g . The heavy quark is indicated by a red double line.
The µ dependence of g is readily obtained from (7.47) and (7.48): where we indicated only the µ dependence of the functions. In the last step, we used equations (2.30), (5.17), and (5.19). The solution of (7.49) is The integration constant g 0 can be fixed using the number sum rule (7.39) for V qq,g . Evaluating this sum rule at µ = m Q , we obtain qq,g (α −1 ) − U (2) qq,g (β −1 ) (7.51) where we used that A Q(2) qg = 0 and that U (2) qq,g is independent of n F . The evaluation of both sides proceeds in full analogy to the steps leading from (7.42) to equation (7.43). Using the relation (7.7) between U and V kernels, we find that the sum rule integral over g 0 must be zero. Since g 0 is a constant, this implies g 0 = 0 . (7.52) The full expression for V Q(2) qq,g thus reads Comparing this result with the general parametrisation (5.36), we find that the interpolating function V I qq,g is zero, as we anticipated in (5.44). Whilst the sum rules are only valid for unpolarised DPDs, a representation analogous to (7.47) and (7.48) holds also for polarised quarks. Since it involves the same function g for the heavy-quark loop on a gluon line, the result (7.53) generalises to longitudinal or transverse polarisation of the quark and antiquark, with the appropriate V kernels for polarised g → qq splitting.

A model for the unpolarised massive splitting kernels at NLO
In this section, we formulate a model ansatz for the interpolation part V I of the massive two-loop kernels, which may be used as long as these kernels have not been computed. The interpolating functions must satisfy the limiting behaviour (5.32) at small and large y, which does not provide any constraints on their form at intermediate y ∼ 1/m Q . For unpolarised partons, such constraints are provided by the sum rules (7.44) and (7.46) derived in the previous section, which involve an integral over y. We therefore limit our model ansatz to the unpolarised case.
To begin with, we set V I a 1 a 2 ,a 0 (z 1 , z 2 , y, m Q ) = k 00 (y m Q ) v I a 1 a 2 ,a 0 (z 1 , z 2 ) , (8.1) which goes to zero for y → 0 and y → ∞, and which is consistent with the normalisation condition (7.41) according to the rightmost entry in table 1. The form (8.1) is inspired by the result for (3.5) of the LO kernels, which contain k 00 (y m Q ) multiplied by a function of z 1 and z 2 . To be clear, we do not expect that the NLO splitting graphs with massive lines yield such a simple factorised form of the y dependence, so that our ansatz is only guided by simplicity at this stage.

Constraints on the interpolating functions
The factorised ansatz (8.1) allows us to directly use the sum rule constraints (7.44) and (7.46) on the two-parameter functions v I a 1 a 2 ,a 0 (z 1 , z 2 ). To determine these is not a trivial task, since in general the kernel for given parton labels (a 1 a 2 , a 0 ) appears in different sum rules. In the following we present a sequential construction that ensures that the sum rule constraints are satisfied for all parton channels. The kernels for the channels q → QQ, q → Qq, and q → qg have to be treated simultaneously, because they enter together in two distinct momentum sum rules. In addition to these momentum sum rules, one finds three number sum rules for Sum rule constraints. The momentum sum rules for the interpolating functions read where we used (5.8), (5.9) and (7.8) to simplify the difference of V or U kernels for n F + 1 and n F . In (8.4) we also used v I qQ,q + v I qQ,q = 2v I qQ,q and corresponding relations for the V and U kernels on the r.h.s. The relevant number sum rules are given by Since the only heavy-quark contribution to V Q(2) gq,q is due to the virtual graph in figure 20(b) and its complex conjugate, the function v I gq,q includes a factor δ(1 − z 1 − z 2 ). It is therefore uniquely fixed by (8.7) and reads . Given that v I gq,q (z 1 , z 2 ) = v I qg,q (z 2 , z 1 ), the momentum sum rule in (8.4) reduces to The next step is to fix v I QQ,q and v I Qq,q . To this end, we make an ansatz v I QQ,q (z 1 , where c i , d i are numerical coefficients and f i , g i are basis functions. We have made the colour factor C F T F explicit, which is readily obtained from the Feynman graphs in figures 21(k) and 21(l). The function v I QQ,q must be symmetric under z 1 ↔ z 2 , so that we require the same symmetry for the individual basis functions f i . Of course, (8.12) follows from (8.11).
The set of basis functions f i and g i must be large enough to admit a solution of the integral equations (8.3), (8.5), (8.6), and (8.9). As a first attempt, one may take the functions that appear in the corresponding massless kernels, i.e. in V [2,0] q ′ q ′ ,q and V [2,1] q ′ q ′ ,q for f i , and in V [2,0] q ′ q,q and V [2,1] q ′ q,q for g i , given that the massive two-loop graphs for these channels have the same topology as their massless counterparts. It turns out that this set is insufficient. We do, however, obtain solution to all constraints when including the functions that appear in U [2,0] q ′ q ′ ,q in the set of f i , and those appearing in U [2,0] q ′ q,q in the set of g i . This choice is motivated by the observation that these kernels enter the integral equations in the same manner as the functions v I . The resulting basis functions are specified in appendix B.
Kinematic constraints. It is plausible to expect that the DPDs computed with massive splitting graphs are not more singular than their massless counterparts in certain kinematical limits. A detailed analysis for the massless case has been given in sections 4.2 to 4.5 of [54] for the limits x 1 + x 2 → 1, x 1 + x 2 ≪ 1, x 1 ≪ 1, and x 2 ≪ 1.
In the three latter cases, it turns out that individual terms in the massless kernels can be more singular than the sum over all terms. In our ansatz for the massive kernels, we should therefore enforce the same type of cancellation between super-leading terms as happens in the massless case.
To discuss the singular behaviour, it is useful to change variables in the splitting kernels. With the form (2.27) of the convolution ⊗ 12 , one readily finds for the momentum fractions in the splitting DPD.
In the limit of small x 1 + x 2 , one finds that the convolution of a 1 → 2 kernel with a PDF is enhanced if the kernels grow at least like ∼ z −2 for small z. Some of the individual basis functions f i and g i are in fact more singular than the complete massless kernels, and one finds super-leading terms The massless kernel for q → q ′ q is finite for z → 0, whilst individual terms in the basis functions behave like ∼ log 2 (z) , ∼ log(z) in some g i . (8.16) In this case, we impose the constraint that the massive splitting kernel should not be more singular for z → 0 than its massless counterpart, even though this does not enhance the resulting DPD at small x 1 + x 2 . Requiring that terms of the form shown in (8.15) and (8.16) cancel in (8.10), (8.11), and (8.12) imposes constraints on the coefficients c i and d i .
The limits x 1 ≪ 1 and x 2 ≪ 1 (with generic values of the other momentum fraction) correspond to u ≪ 1 andū ≪ 1, respectively. As explained in section 4.4 of [54], a singular behaviour of the splitting DPD for u ≪ 1 arises from terms in the kernels that grow at least like 1/u, and from terms with a power of 1 − z 2 = 1 − (1 − u)z in the denominator. Inspection of our basis functions reveals that some of them lead to a super-leading behaviour for the convolution of some g i with a PDF in the limit u ≪ 1. Requiring that these terms cancel places additional constraints on the coefficients d i . The analysis in [54] also includes the triple Regge limits x 1 ≪ x 1 + x 2 ≪ 1 and x 2 ≪ x 1 + x 2 ≪ 1, which respectively correspond to taking u ≪ 1 orū ≪ 1 in the small-z limit of the kernels. It turns out that these limits yield no additional constraints for the kernels at hand.
The kinematic requirements just specified, together with the integral equations (8.3), (8.5), (8.6), and (8.9) result in a system of linear equations for the coefficients c i and d i . With our choice of basis functions, this system is under-constrained and admits a plethora of solutions. In section 8.2 an algorithm for picking a unique solution will be presented.
For the channels g → QQ, g → Qg, and g → gg, one finds again several sum rules that have to be fulfilled simultaneously, namely two momentum sum rules and a number sum rule for Sum rule constraints. The momentum sum rules for the interpolating functions read whilst the number sum rule for v I QQ,g is given by Our strategy for this case is the same as for the previous one, and we make the ansatz where the different colour factors are again deduced from the relevant Feynman graphs. As before, we take as basis functions the terms that appear in the massless kernels V [2,0] , V [2,1] , and U [2,0] for the relevant parton channel. For f F i , we must include additional functions proportional to (1 − z 1 − z 2 ) −1 in order to be able to fulfil all constraints. A simplification occurs for v I gg,g , which includes a factor δ(1 − z 1 − z 2 ) according to (5.43). Only basis functions with this structure are therefore needed in (8.25). Details on the resulting set of basis functions are given in appendix B.
Kinematic constraints. As in the previous case, we obtain additional constraints by requiring that the singular behaviour of massive kernels and the resulting splitting DPDs should not be stronger than for their massless counterparts.
For the limit of small x 1 + x 2 , we find super-leading terms in the small-z behaviour of the basis functions, namely

A particular solution for the kernels
As just stated, the constraints from the sum rules and kinematic limits for the interpolating functions v I do not uniquely fix the coefficients in our expansion on basis functions. Since we are building a model ansatz, we are free to impose additional requirements. It appears natural to require that the different expansion coefficients do not become too large overall, given that this is true if one expands the massless kernels on the same basis functions.
To turn this into a criterion that leads to a unique solution, let us first build some intuition in a simple toy example. We consider a system with three coefficients c 1 , c 2 , and c 3 and one linear constraint where A, B, and C are fixed. In geometric terms, the solutions of this under-determined system form a plane (c 1 , c 2 , Ac 1 + Bc 2 + C) in the three-dimensional (c 1 , c 2 , c 3 ) space. c 1 and c 2 can then be fixed by choosing the point on this plane that has a minimal distance from the origin (0, 0, 0), which in a global sense minimises the size of the coefficients. This point is obtained by minimising the length L of the vector (c 1 , c 2 , Ac 1 + Bc 2 + C), which is given by The minimum of L is obtained by requiring which yields two more linear constraints for the coefficients c 1 and c 2 . This approach is readily generalised to the case where one has n coefficients c 1 , . . . , c n and m < n linear constraints. Without loss of generality, one can use the constraints to write the first m coefficients c 1 , . . . , c m as linear combinations of the remaining n − m coefficients c m+1 , . . . , c n , i.e.
The length of the vector (c 1 , . . . , c n ) is then given by and the linear constraints that fix the remaining coefficients c m+1 , . . . , c n are The expansion coefficients for the interpolating functions v I that we obtain with this procedure are given in the ancillary files associated with this paper on the arXiv. We visualise them in figures 23 and 24 and find that they are all of order unity, as desired.   Figure 25: The gb distribution for n F = 5 at the scale µ = µ y * where the splitting formula (2.26) is evaluated. The momentum fractions x 1 and x 2 correspond to the dijet production setting (4.1). The splitting is computed from V 5 ⊗ f 5 (massless, blue curves), from V b ⊗ f 4 (massive b, green curves), or from V cb ⊗ f 3 (massive c and b, red curves). With massive splitting kernels, the gb distribution starts at NLO, and the interpolating part V I of the kernels is modelled as described in the present section. We have multiplied the DPD with y 2 in order to remove the leading power behaviour F gb ∝ 1/y 2 at large µ y .

Numerical illustration
With a model for the massive NLO kernels at hand, it is natural to ask to which extent the shortcomings of the LO description discussed in section 4 are mitigated when splitting is computed at NLO.
Investigating the interplay between NLO splitting and DGLAP evolution requires the numerical evaluation of the convolution integral (2.27) on interpolation grids in x 1 , x 2 , and y for all flavor combinations in V a 1 a 2 ,a 0 . An efficient and accurate implementation of this is nontrivial and beyond the scope of the present work. Instead, we will study DPDs at the scale where the splitting formula is evaluated, i.e. at µ = µ y * with µ y * specified in (3.1), (3.2) and the surrounding text. For this it is sufficient to evaluate the convolutions (2.27) at selected values of x 1 and x 2 and for selected flavors, which can be done with standard quadrature routines.
This allows us to address the discontinuity at µ y = β m b which we observed for F gb at LO in figures 6(c) and 6(d). This discontinuity increases with β and led us to use the rather small value β = 2. We recall that in the massive scheme, five-flavor DPDs are obtained from the convolution of the massive kernels V b with four-flavor PDFs if βm c < µ y < βm b , and from the convolution of massless kernels V 5 with five-flavor PDFs if µ y > βm b (see figure 4). With LO kernels, the description for µ y < βm b is then missing the contribution from the graph in figure 7(a). This deficiency is cured at NLO, where that graph is taken into account via the splitting kernel for g → gb.
In figure 25 we show F gb for n F = 5 active flavors at the scale µ = µ y * , computed with different treatments of the charm and bottom masses. We note that for µ y ≥ m b one has µ y * ≈ µ y . For the PDFs in the splitting formula we take the NLO distributions of the MSHT20 parametrisation, with n F = 3, 4, or 5 active flavours as appropriate. 3 For the strong coupling we use the value corresponding to these distributions, namely α s (M Z ) = 0.118 for n F = 5. The quark masses are as given in (4.7).
Comparing the solid and dotted blue curves in figure 25(b), we see that even with massless kernels the contribution from NLO splitting to F gb is important. This is because the LO term goes with f b evaluated at the scale µ y * , which is much smaller than f g and even vanishes when µ y * = m b . With massive splitting kernels, the LO term is zero and the NLO result originates entirely from f g .
Most importantly, we see in figure 25(b) that the NLO results computed with V b or V 5 are close to each other in the entire region 2m b ≤ µ y ≤ 4m b . Switching from one to the other description in this region (i.e. taking 2 ≤ β ≤ 4) hence results in only a small discontinuity of F gb (x 1 , x 2 , y; µ y * ) in y. This discontinuity should become even smaller when the DPDs are evolved to higher scales, e.g. to µ = 25 GeV as in figure 6. This is because under evolution F gb receives a substantial contribution from the convolution of F gg with the g → b splitting kernel. We verified that in the kinematics of figure 25(b) F gg barely changes when switching from massive to massless kernels. The upshot of our discussion is that with NLO kernels one can indeed take larger values of β and thus extend the region in which mass effects are taken into account.
To exhibit the importance of the interpolating part V I of the NLO kernel in our model, we also show in figure 25 the results obtained when setting this part to zero. Comparison between the solid and dashed curves reveals that for decreasing µ y , i.e. when the b quark decouples, the impact of V I is very mild. By contrast, the interpolating kernel remains more important for increasing µ y : even at µ y = 20 GeV it accounts for about 20% of the full NLO result. This is not surprising, because we modelled the y dependence as V I gb,b ∝ k 00 (y m b ), which decreases like y exp(−2y m b ) for y → ∞ but only like y 2 log 2 (y m b ) for y → 0. We recall that a term with k 00 (y m b ) actually appears in the massive LO kernel (3.5) for g → bb splitting.
Taking the difference between our model for V I and V I = 0 as an estimate of the model uncertainty, we conclude that taking NLO effects into account should be worthwhile even if the interpolating part of the massive kernels has not yet been calculated.

Conclusions
We have investigated quark mass effects for the 1 → 2 splitting process in DPDs, which dominates the distributions in the limit of small interparton distances y. Such mass effects should be taken into account if y is comparable to the inverse 1/m Q of a heavy-quark mass. We have set up schemes to compute the splitting part of DPDs in the full range of perturbatively small distances y. In the massive scheme, 1 → 2 splitting kernels including mass effects are used in a y interval around 1/m Q . For smaller y, the quark Q is treated as massless in the splitting process, whereas for large y, the DPDs including the heavy flavour are obtained by standard flavour matching, which involves the same matching kernels as flavour matching of PDFs. A graphical representation of these three regimes is given in figure 1. The massive 1 → 2 splitting kernels at LO accuracy can be found in equations (3.5) and (3.6). A simpler, purely massless scheme, is formulated in which only massless 1 → 2 splitting kernels are used. The different schemes are depicted in figures 2 and 3 for a single heavy flavour, and in figure 4 for the combination of charm and bottom quarks.
We numerically studied the different schemes at LO accuracy, focusing on the y dependence of the DPDs and on the double parton luminosities that appear in physical cross sections, both evolved to scales much larger than the masses of the active quarks. This is done using the ChiliPDF library, which we plan to make public in the future. We find a moderate dependence of parton luminosities on the parameters α and β that specify at which y the calculation switches between the three regimes of the massive scheme described above. Luminosities calculated in the massless scheme roughly approximate the massive results if the scheme parameter γ takes values between 1/2 and 1, with deviations that can strongly depend on the parton momentum fractions. We furthermore see a substantial dependence of our results on the scale at which the DPDs are computed using the LO splitting formula, which hints at large corrections from higher-order corrections.
Whilst the calculation of massive splitting kernels at NLO is beyond the scope of this work, we derived several properties of these kernels, including their limiting values for small or large y, and their dependence on n F and the renormalisation scale µ. These properties are encoded in explicit parametrisations, given in (5.36) to (5.44) for one heavy flavour and in (6.34) to (6.38) for charm and bottom. These results hold for all parton polarisations. The non-trivial part of the NLO kernels is described by interpolating kernels V I a 1 a 2 ,a 0 that depend on two momentum fractions and on y m Q .
For unpolarised partons, DPDs obey momentum and number sum rules. Using that these must hold for both n F and n F + 1 active flavours, we derived corresponding sum rules for the massive splitting kernels. Their all-order form is given in equations (7.26) and (7.36), and their NLO part in equations (7.30), (7.39), (7.44), and (7.46). Using these sum rule constraints, we have constructed a model ansatz for the unpolarised massive splitting kernels at two loops. Within this model, we find that the inclusion of NLO effects reduces unphysical discontinuities when switching from massive to massless b quarks in the calculation of F bg .
It would of course be preferable to have an explicit calculation of these kernels. This must, however, be left to future work.

A Numerical studies for W pair production
In sections 4.1 and 4.2 we presented numerical results for the two kinematic settings in (4.1) and (4.2). Here we show and discuss some analogous plots for the setting in (4.3), which corresponds to the production of two W bosons at the LHC. This is because of the phenomenological importance of W -pair production, and it shows to which extent mass effects for charm and bottom persist in DPDs at the scale µ = m W and in the associated parton luminosities. Compared with the dijet production setting (4.1), the DPDs in the W production setting are (i) evolved to a higher scale and (ii) evaluated at parton momentum fractions that are larger by a factor m W /(25 GeV) ≈ 3.1.
As one may expect, discontinuities in the y dependence of the splitting DPDs are generally weaker at µ = m W than at µ = 25 GeV. This is clearly seen for the cb distribution by comparing figures 26(b) and 8(c). Nonetheless, figure 26 shows that discontinuities for µ y just above 1 GeV persist even after evolution to the electroweak scale.
For double parton luminosities, we consider the two parton combinations ccbb and cbsc, which respectively contribute to opposite-sign and like-sign W -pair production. The following plots for ccbb can be compared with the ones for ccbb in the dijet production setting, because with splitting, evolution and flavour matching evaluated at LO, DPDs for bb and bb are identical.
It is therefore not surprising that the qualitative behaviour of the ccbb luminosity in figure 27(a) is the same as for ccbb in figure 11(a). In particular, the 1v1 contribution is dominant and has a large ν dependence at low Y . In the cbsc channel, the ν dependence is very small everywhere, as expected for a parton combination that cannot be directly produced by 1 → 2 splitting.
The scheme dependence of the ccbb luminosity is shown in figure 28. Compared with the dijet production setting in figure 13, the qualitative behaviour of the ratios is very similar, whilst their deviation from unity is somewhat reduced in most cases. In both settings, the smallest deviations in the massless scheme are obtained for γ = 1/2. The dependence of the luminosities on the splitting scale (figure 29) follows the same pattern we found in the two other settings. For ccbb, which can be produced directly by gluon splitting in both DPDs, the scale variation is weak at low Y and grows towards large Y . By contrast, a strong scale variation with little Y dependence is seen for cbsc, just like in panels (b) to (d) of figure 16. The band for the sum of 1v2 and 2v1 contributions is quite asymmetric in figure 29(b), which indicates important contributions from large y also in this setting.
For the matching scale dependence, we observe large effects in figure 30 when flavour matching is performed at LO, even though the matching scales for charm and bottom are much smaller than the scale µ = m W at which the DPDs are evaluated. As in the kinematic settings for dijet or tt production, the scale variation becomes small when the NLO matching coefficients are included in the calculation.
Let us finally investigate the reason for the peculiar splitting scale dependence in channels with a heavy QQ pair in both DPDs, seen in figures 16(a), 17(a), and 29(a). In the left panels of figure 31, we show the bb splitting DPD with momentum fractions that correspond to the ccbb luminosity at Y = 0 or Y = 3 in the W production setting. For Y = 0 there is some dependence on the splitting scale at low µ y , but in this region the DPD is very small and has little influence on L ccbb . For Y = 3, there is a strong splitting scale dependence over the full range of µ y , which results in a strong scale dependence of L ccbb .
To elucidate the origin of this behaviour, we show in the right panels of figure 31 what happens if we set all DPD splitting kernels other than V bb,g to zero. For Y = 0 this has almost no impact on the DPD if µ y is above 10 GeV, whereas for Y = 3 it significantly decreases the DPD unless µ y is close to the final scale µ. This means that for x 1 = x 2 (corresponding to Y = 0) the bb distribution at large µ y arises almost entirely from direct g → bb splitting. For x 1 ≪ x 2 (corresponding to Y = 3), a significant part of F bb originates from other splitting channels such as F gb and F gg by DGLAP evolution. This is not surprising because the 1 → 2 splitting kernels for these channels are enhanced for asymmetric momentum fractions with a soft emitted gluon. This kinematic enhancement partially compensates the factor of a s that comes with each DGLAP evolution step.
With DGLAP evolution after the 1 → 2 splitting process playing an important role for      figure 27 for the W production setting. Central curves are for µ split = µ y * , and bands correspond to the variation specified in (4.13).   figure 27 for the W production setting. Central curves are for µ Q = m Q , and bands correspond to the variation specified in (4.14).  Dependence on µ split of the bb splitting DPD in the W production setting, computed with α = 1/4 and β = 2 in the massive scheme. The parton momentum fractions correspond to the ccbb luminosity at Y = 0 (top row) or at Y = 3 (bottom row). According to (4.9) this means x 1 = x 2 ≈ 5.7 × 10 −3 for Y = 0 and x 1 ≈ 2.9 × 10 −4 , x 2 ≈ 0.12 for Y = 3. In the left column the complete result is shown, whereas in the right column only the contribution from g → bb is taken into account in the DPD splitting formula.
large Y , changing µ split has a big impact because it significantly changes the interval [µ split , µ] in which the DPD can evolve. For Y = 0, DPD evolution after the 1 → 2 splitting plays a minor role at high µ y , and the plot in figure 31(a) shows that the resulting splitting scale dependence is very weak.
The preceding analysis is corroborated by the y dependence of the DPD at large µ y . One can read off in figure 31 that for Y = 0 it is very close to the 1/y 2 behaviour of the massless DPD splitting formula, whereas it is much flatter (close to 1/y) for Y = 3. The significant flattening of the y dependence of splitting DPDs by DGLAP evolution was already observed in [13].

B Basis functions for the model in section 8
In this appendix, we specify the basis functions that appear in the decompositions (8.10) to (8.12) and (8.22) to (8.25). Given their large number, we discuss their general structure and common building blocks here. An explicit list of all basis functions can be found in the ancillary files associated with this paper on the arXiv.