Two-loop splitting in double parton distributions: the colour non-singlet case

At small inter-parton distances, double parton distributions receive their dominant contribution from the splitting of a single parton. We compute this mechanism at next-to-leading order in perturbation theory for all colour configurations of the observed parton pair. Rapidity divergences are handled either by using spacelike Wilson lines or by applying the delta regulator. We investigate the behaviour of the two-loop contributions in different kinematic limits, and we illustrate their impact in different channels.


Introduction
To interpret measurements at the Large Hadron Collider in its high-luminosity phase, it is of great importance to understand the strong-interaction part of proton-proton collisions as much as possible. This provides a strong motivation for the study of double parton scattering (DPS), which is a mechanism in which two pairs of partons initiate two separate hard-scattering processes in a single collision. Due to an increased parton flux for decreasing parton momentum fractions, the mechanism becomes more important for higher collision energy. It is therefore relevant also to the planning of future hadron colliders at the energy frontier. The theoretical investigation of DPS started long ago [1][2][3][4][5][6][7], and in the last decade several approaches to a systematic description in QCD have been put forward [8][9][10][11][12][13][14][15][16][17][18][19][20]. Following early experimental studies [21,22], a variety of DPS processes have been analysed at the Tevatron and the LHC, see [23][24][25][26][27] and references therein. There is also a large body of phenomenological work on such processes. An example is the recent study [28] of four-jet production, which also contains a wealth of further references. For a detailed account of theoretical and experimental aspects of DPS, we refer to the monograph [29]. Whilst DPS is often suppressed compared with single parton scattering (SPS), it is enhanced in certain kinematic regions, in particular when the products of the two hard-scattering processes have small transverse momenta [8,9,12,13] or when they are far apart in rapidity [25,30,31]. There are also channels in which SPS is suppressed compared with DPS by coupling constants. A prominent example is the production of like-sign gauge boson pairs W + W + or W − W − [27,[32][33][34][35][36], which also provides a background to the search for new physics with like-sign lepton pairs [37,38]. To compute DPS cross sections, one needs double parton distributions (DPDs), which generalise the concept of parton distribution functions (PDFs) to the case of two partons. To determine DPDs from experimental data is considerably more complicated than the determination of PDFs, and our knowledge of DPDs remains quite fragmentary. There is a large number of calculations of DPDs in quark models [39][40][41][42][43][44][45][46][47][48][49], and information about the Mellin moments of DPDs can be obtained in lattice QCD [50,51]. One may also try to construct an ansatz for DPDs that fulfils non-trivial theory constraints [52][53][54][55]. A regime in which DPDs can actually be computed is when the transverse distance y between the two partons becomes small. The two observed partons are then produced from a single parton in a perturbative splitting process, so that the DPD is given by the convolution of a splitting kernel with a PDF. Several phenomenological studies suggest that this splitting mechanism can have a substantial impart on DPS observables [16,18,19,[56][57][58][59][60]. The study [61] finds visible effects even for like-sign W pairs, where the splitting contribution starts at order α 2 s rather than α s . The perturbative splitting mechanism in DPS is intimately connected with a double counting problem between DPS and SPS in physical cross sections [13,62]. A systematic framework to solve this problem has been presented in [19]. Broadly speaking, the perturbative splitting mechanism in DPS is cut off for small inter-parton distances y, where its contribution is better described as a higher-order corrections to SPS since the splitting occurs at the same scale as the hard collision. Double counting is removed by a subtraction term, which can easily be constructed using the perturbative splitting formula for DPDs. The quantum numbers of two partons in a hadron can be correlated in various ways [6,13,14]. Among these, the correlation between their colour state is least well studied, and most investigations are restricted to the case in which the parton colour is uncorrelated. It was realised early on that colour correlations between the partons are suppressed in DPS cross sections by Sudakov factors [63]. A simple numerical estimate in [14] found this suppression to be quite strong for hard scattering process at the electroweak scale, whereas it becomes marginal for scales around 10 GeV. Colour correlations in DPS could thus have a visible effect in the production of additional jets with moderate p T or of mesons with open or hidden charm. In the context of perturbative splitting, colour correlations between the partons appear naturally, and how strongly they are suppressed by Sudakov logarithms was not addressed in [14]. To leading order (LO) in α s , the splitting kernels for DPDs have been computed in [13], and it was found that colour correlations between the two partons are generically large. In many situations, LO results receive substantial corrections from higher orders. This is typically true for hard-scattering cross sections in hadron-hadron collisions, which motivates an analysis of DPS at next-to-leading order (NLO). Since such an analysis requires all perturbative ingredients to be available at NLO, we computed the splitting kernels for DPDs at this order in [64], restricting ourselves to unpolarised partons and uncorrelated parton colours. The purpose of the present paper is to extend that computation to the colour correlated case. Among other aspects, this will enable us to investigate the colour structure of small x logarithms. In a different setting, colour effects in small x dynamics for double hard scattering were investigated in [65], using the BFKL formalism. The computational techniques employed in [64] can readily be applied to the colour correlated case. However, a major new element is the appearance of so-called rapidity divergences, which are closely related with the Sudakov logarithms mentioned above. Such rapidity divergences are familiar from factorisation for processes with small measured transverse momenta, and it is interesting that they appear in DPS even if transverse momenta are integrated over. A variety of regulators to handle these divergences can be found in the literature [66][67][68][69][70][71][72][73][74][75][76]. In this work, we use two of them, which provides a strong cross check of our results. As one alternative, we take spacelike Wilson lines as proposed by Collins [68]. This regulator has been used in the factorisation proof for Drell-Yan production in [68] and in our work on factorisation for DPS [13,19,20,77,78]. It may be regarded as a descendant of the formalism in the original work of Collins, Soper, and Sterman [79,80], where an axial gauge was used. Our second alternative is the so-called δ regulator of Echevarria et al. in the form described in [71,72]. This regulator has been used in several two-loop calculations for transverse-momentum dependent (TMD) factorisation [71,72] and also for the two-loop soft factor in DPS [81]. To the best of our knowledge, our work presents the first use of the Collins regulator in a two-loop setting. This paper is organised as follows. In section 2, we recall the basics of DPDs with non-trivial colour dependence and then work out the formalism necessary for treating ultraviolet and rapidity divergences in the calculation of perturbative splitting DPDs. Details of the two-loop calculation are given in section 3. Analytical results for the two-loop kernels are presented in section 4, and some numerical illustrations of two-loop effects are shown in section 5. We conclude in section 6. A number of formulae and technical explanations can be found in appendices A to D.

General theory
We begin this section by recalling the basic properties of DPDs with a non-trivial dependence on the parton colour in the formalism set up in [13,78]. We then describe the interplay between ultraviolet renormalisation and the treatment of rapidity divergences, and we show how these operations are to be carried out at the level of hard splitting kernels for DPDs at small inter-parton distance y. The main result of this section is a set of formulae that specify how to compute the two-loop splitting kernels for renormalised DPDs from bare kernels and the soft factor for double parton scattering.

Overview
We are interested in collinear, i.e. transverse-momentum integrated DPDs, which we denote by R 1 R 2 F a 1 a 2 (x 1 , x 2 , y, µ, ζ p ). These distributions depend on a number of variables.
• x 1 and x 2 are the longitudinal momentum fractions of partons a 1 and a 2 in a fast moving proton. In general, the labels a 1 and a 2 also specify the polarisation of the two partons, but throughout this work we consider the case where both partons are unpolarised.
With proper adjustments, the results of the present section readily generalise to the polarised case.
• y is the spatial distance between the two partons in the plane transverse to the proton momentum. We write y for the distance vector and y for its length |y|.
• µ is the renormalisation scale. As shown in [78], it is possible to define separate scales µ 1 and µ 2 associated with the two partons. Throughout the present work, we take these scales to be equal. This is natural when computing the contribution to DPDs in which the two partons originate from a single hard splitting process. From this, one can obtain distributions with different µ 1 and µ 2 by using the appropriate evolution equations.
• ζ p is a parameter associated with the regularisation of rapidity divergences. It is boost invariant and has dimension mass squared. Its precise definition depends on the rapidity regulator used and will be given below. A corresponding dependence is familiar from transverse-momentum dependent parton distributions (TMDs). It is remarkable that we encounter the same type of dependence for collinear distributions.
We now discuss the colour dependence. The operator matrix elements associated with DPDs have four open colour indices, two for the partons a 1 and a 2 in the process amplitude and two for the same partons in the conjugate process amplitude. Examples are shown in figure 1. We organise the colour dependence by coupling the two indices for a 1 to an irreducible representation R 1 and those for a 2 to an irreducible representation R 2 such that one has an overall colour singlet. The relevant representations depend on the parton type. In all cases, both partons can be in the colour singlet (R 1 = R 2 = 1). In addition, we have the following possibilities.
• For F qq , the two quark pairs can each be coupled to an octet (R 1 = R 2 = 8).
• For F gq , the two gluons can be coupled to an asymmetric or a symmetric octet (R 1 = A or R 1 = S). The two quarks are coupled to an octet (R 2 = 8) in both cases. Analogous combinations exist for F qg .
• For F gg , each of the two gluons can be coupled to an antisymmetric or a symmetric octet (in all four combinations), or to the 27 representation (R 1 = R 2 = 27). Furthermore, one gluon can be coupled to a decuplet and the other to an anti-decuplet (R 1 = 10 and R 2 = 10 or vice versa).
We have the same representations when quarks are replaced by antiquarks. Of course, the octet representations of SU(3) are readily generalised to the adjoint representations of SU(N ). The coupling of two adjoint indices in SU(N ) with N = 3 involves the generalisations of R = 10, 10, 27, and an additional irreducible representation [82,83]. Throughout this work, we give results for singlet and octet representations for general number of colours N . Results for the decuplet and antidecuplet are given for general N or for N = 3, whereas for R = 27 we always set N = 3. We call the colour space basis for DPDs just described the "t channel basis". An alternative, which we call "s channel basis", is to couple the indices of a 1 and a 2 to irreducible representations in the amplitude and in the conjugate amplitude [6]. The conversion between the two colour bases can be found in [84]. We note that the combinations of R 1 and R 2 considered in [6,13,84] and in the original version of [78] are incomplete; this mistake is corrected in the erratum of [78]. We now recall some basic properties of the DPDs, which were derived in [78]. Their ζ p dependence is described by a Collins-Soper equation which is analogous to the one derived by Collins and Soper for TMDs [79]. We write J instead of K for the kernel relevant to collinear DPDs in order to avoid confusion with the TMD case. Figure 1: Assignment of colour labels for a two-quark distribution (left) and a quark-gluon distribution (right).
The kernel R J depends only on the multiplicity of the representation R, so that R 1 J = R 2 J for all possible colour combinations in a DPD. Moreover, it is independent of the type of partons a 1 and a 2 . In the singlet channel, one has the exact relation so that colour singlet DPDs have no rapidity dependence. Quite remarkably, the octet kernel 8 J(y, µ) is equal to the Collins-Soper kernel for the rapidity evolution of gluon TMDs [81]. The renormalisation scale dependence of J is given by with an anomalous dimension that has a perturbative expansion Here and in the following we write a s (µ) = α s (µ) 2π . (2.5) The one-loop coefficients in (2.4) read The scale dependence of the DPDs is given by the double DGLAP equations [78] ∂ ∂ log µ 2 where R denotes the conjugate of the representation R. Note that conjugation is only relevant for the decuplet and anti-decuplet, whilst R = R for the singlet, all octets, and R = 27. The convolutions are defined as The effective integration range in these convolutions is smaller than indicated, because DPDs F (x 1 , x 2 ) vanish for x 1 + x 1 > 1 as a consequence of momentum conservation. The same support property holds for DPD splitting kernels and renormalisation factors. Note that the factors x 2 1 and x 2 2 that multiply ζ p in the arguments of the evolution kernels in (2.7) are fixed by the arguments of the DPD on the l.h.s., i.e., they are not part of the convolution integrals on the r.h.s. The reason for this rescaling of ζ p is explained in section 2.5. The rapidity dependence of the DGLAP kernels is given by where we abbreviate RR P (z, µ) = RR P (z, µ, µ 2 ). The evolution kernels have a perturbative expansion Perturbative splitting. In the limit of small y, the DPDs are given by the perturbative splitting form where we abbreviate with γ being the Euler-Mascheroni constant. Corrections to (2.11) are of order y 2 Λ 2 , where Λ is a hadronic scale. They arise from twist-four contributions in the operator product expansion, as explained in section 3.3 of [19]. We define a convolution product between a two-variable and a one-variable function as with abbreviations (2.14) When denoting the kernel in the rightmost expression of (2.13) by V (z 1 , z 2 ), we also have the relations so that u can be expressed either in terms of the momentum fractions of the full convolution integral or the momentum fractions of its integrand. Note that the factors x 1 x 2 multiplying ζ p on the r.h.s. of (2.11) are fixed by the arguments of the DPD on the l.h.s., in analogy to our earlier discussion for evolution kernels. Convolutions w.r.t. different variables are associative, i.e. we have where A, B and C depend on one momentum fraction, whilst D depends on two momentum fractions and a rapidity parameter. We can hence write these convolutions without brackets.
Here we have defined the combined convolution We also have where A ⊗ B is the usual Mellin convolution for functions of a single momentum fraction. We note in passing that for rescaled functions, the convolution integrals take the same form as for the usual Mellin convolution, We extend our notation for convolutions to include appropriate summation over indices denoting the parton type (quarks, antiquarks, gluons). For functions A ab (x), B ab (x) and f a (x) of a single momentum fraction, we write as usual We do not use the summation convention for parton indices, i.e. sums are indicated explicitly when indices are given. For functions V a 1 a 2 ,a 0 (x 1 , x 2 ) and F a 1 a 2 (x 1 , x 2 ) of two momentum fractions, we define For the combination of convolutions in the first and second momentum fraction, we then have

Bare distributions and soft factors
As specified in [78], DPDs are constructed from hadronic matrix elements, which we call "unsubtracted" following the nomenclature of Collins [68], and from the soft factor for DPS. In analogy to the modern definition of TMDs in [68], this construction implements the cancellation of rapidity divergences. It also avoids the explicit appearance of the soft factor in the factorisation formula for the cross section. This is important for minimising the amount of non-perturbative functions, given that the cross section includes the region of large y, where the soft factor cannot be computed in perturbation theory.
To treat rapidity divergences, we use two schemes in our work. One is the δ regulator in the form described in [71,72], and the other is the regulator of Collins using spacelike Wilson lines [68]. With compatible definitions of the rapidity parameter ζ p in the two schemes, we must obtain identical results for the final two-loop splitting kernels, which constitutes a strong cross check of our procedure and computation. We note that several other schemes have been used in the literature, such as the analytic regulator of [66,67], the CMU η regulator of [73,74], the exponential regulator of [75], and the "pure rapidity regulator" of [76]. There are also two earlier variants of the δ regulator, described in [69] and [70], and compared with the Collins regulator in [85] and [70], respectively. A detailed discussion of TMDs in the above schemes is given in appendix B of [86]. The construction described in the following is laid out in [78] for the Collins regulator and adapted to the δ regulator as described in app. B of that reference. We regulate ultraviolet divergences by working in 4 − 2 space-time dimensions. To start, we define bare (i.e. unrenormalised) and unsubtracted distributions with open colour indices, where n i = 0 for quarks and antiquarks, and n i = 1 for gluons. We use light-cone coordinates v ± = (v 0 ± v 3 )/ √ 2 for any four-vector v µ and write its transverse part in boldface, v = (v 1 , v 2 ). It is understood that the transverse proton momentum p is zero and that the proton polarisation is averaged over. The colour indices r i and r i (i = 1, 2) are in the fundamental or the adjoint representation as appropriate. For unpolarised quarks and gluons, the twist-two operators in (2.23) read Repeated colour indices are summed over. W (ξ, v) is a Wilson line along the path parameterised by ξ − sv with s going from 0 to ∞. Details on the direction v of the path are given below. The twist-two operator for antiquarks is given by O ii q (y, z) = − O ii q (y, −z). All operators (including Wilson lines) are constructed from bare fields. DPDs for definite colour representations of the two partons are now defined by with the colour space matrices M R 1 R 2 given in appendix A. In the splitting formula for bare DPDs, we also need the definition of bare PDFs, which reads Here the operators (2.24) appear in the colour singlet representation, i.e. with their colour indices set equal and summed over. Bare unsubtracted DPDs in momentum space are defined by a Fourier transform w.r.t. the inter-parton distance, These are the primary quantities we will compute at two loops in momentum space, as described in section 3. The bare soft factor S B,a 1 a 2 for DPS is defined as the vacuum matrix element of Wilson line operators along two different directions v L and v R , as specified in section 3.2 of [78]. It is a matrix in colour space with 8 indices that can be turned into a matrix R 1 R 2 ,R 1 R 2 S a 1 a 2 in representation space by multiplication with appropriate projectors. A non-trivial result for collinear DPS factorisation (but not for its TMD counterpart) is that this matrix is independent of the parton types a 1 and a 2 , that it has nonzero entries only for R 1 = R 1 and R 2 = R 2 , and that these entries depend only on the common multiplicity of the four representations. One can thus denote them by R S, with R being any one of the four representations. The soft factor depends on a variable related with the rapidity dependence, and it satisfies a Collins-Soper equation for | | 1. The kernel J B (y) is the unrenormalised analogue of J(y, µ) in (2.1). One easily derives a composition law for the region where | R |, | L | and | R + L | are large, using the differential equation (2.28) and the initial condition R = L . This composition law is used to absorb part of the soft factor into the DPD for the right-moving proton and the other part into the DPD for the left-moving one. We now specify R and L , as well as the rapidity parameters ζ andζ, which are respectively associated with the right-and the left-moving proton. We write x 1,2 (x 1,2 ) for the parton momentum fractions of the DPD for the right (left) moving proton in the collision, with the large momentum component of the proton being p + (p − ).
• In the scheme of Collins, one takes Wilson lines with finite rapidity, where the rapidity of a Wilson line along v is defined as Y = 1 2 log |v + /v − |. The variables R and L are given by where Y R and Y L are the rapidities of the Wilson lines in the soft factor and unsubtracted DPDs, and Y C is an additional rapidity introduced for defining subtracted DPDs. For an explanation of the rationale behind this, we refer to section 10.11.1 in [68] and section 2.1 in [87]. Removing the regulator corresponds to taking the lightlike limit of Wilson lines, i.e. Y R → ∞ and Y L → −∞, whilst keeping Y C fixed.
Rapidity parameters for DPS are defined as and satisfy the normalisation condition where s is the squared centre-of-mass energy of the proton-proton collision. Note that the definition (2.31) refers to the momenta of the colliding protons. This differs from the convention in the modern TMD literature, where ζ refers to the momentum of the extracted parton [68]. Such a definition would be awkward in the case of DPDs, where the parton momenta often appear in convolution integrals.
Denoting by v R (v L ) the direction of the Wilson line with positive (negative) rapidity, we get (2.33) It is understood that v + R and v − L are positive, whilst v + L and v − R are negative, such that the Wilson lines are spacelike. Concentrating now on the right-moving proton, we define the boost invariant quantity • With the δ regulator, Wilson lines are taken along lightlike paths. Rapidity divergences are regulated by exponential damping factors exp(δ − z + ) and exp(δ + z − ) in the Wilson lines pointing in the plus or minus directions, where z + and z − go from 0 to −∞. The regulating parameters δ + and δ − have dimension of mass and transform like plus or minus components under boosts along the collision axis. Both parameters are sent to zero when the regulator is removed.
The matrix element defining the bare soft factor depends on the product δ + δ − because of boost invariance, and for dimensional reasons the logarithms associated with rapidity divergences depend on y 2 δ + δ − . We write where ζ andζ again satisfy (2.32). Introducing the boost invariant variable we have Notice that ρ is dimensionless here, whereas it has dimension of mass squared for the Collins regulator.
In both schemes, the bare unsubtracted DPD F B us for a right-moving proton depends on the plus-momentum fractions x 1 , x 2 , on y, and on the rapidity regulator variable for Wilson lines along v L . Because of boost invariance, the dependence on the regulator variable is via the combination ρ. For the Collins regulator, there are two boost invariant variables p + v − L and v + L v − L , which must appear in the combination (2.34) because the Wilson lines are invariant under a rescaling of v L . The subtracted (but still unrenormalised) DPD F B is defined as [78] , (2.39) where L depends implicitly on ρ and ζ p as specified in (2.35) or (2.38). An analogous definition is made for the left moving proton. We recall that bare quantities are independent of µ if expressed in terms of the bare coupling α 0 . It is understood that all factors in (2.39) are to be taken in 4 − 2 dimensions, until renormalisation is performed at a later stage.

Bare distributions in the short-distance limit
In the limit of large ∆ or small y, bare unsubtracted DPDs can be expressed as a convolution of splitting kernels with bare PDFs. We write with kernels being related by in accordance with (2.27). The kernels have perturbative expansions The bare and renormalised couplings are related by α 0 = µ 2 α s Z α /S , where Z α is the renormalisation constant for the running coupling. We give our results for two definitions of the MS scheme parameter, The construction (2.39) is multiplicative in y space and hence involves a convolution in ∆ space. We therefore exclusively work in y space when constructing subtracted DPDs. For the associated splitting kernel we then have Notice that V B depends on the combination x 1 x 2 ζ p , which according to (2.31) and (2.36) depends the plus momenta x 1 p + and x 2 p + of the partons in the DPD, as does ρ defined in (2.34) or (2.37). In fact, the kernels V B and V B us cannot depend on the proton momentum but only on quantities involving parton kinematics. We emphasise that x 1 x 2 ζ p and ρ remain fixed when the kernels are convoluted with a PDF. We expand the bare soft factor S B (y, 2 L ) and the bare splitting kernel V B (y, ζ) in analogy to V B us (y, ρ) in (2.42), and then obtain The  B us to be a linear function of log ρ. The relation between splitting kernels in ∆ and y space is different for the two regulator schemes and will be discussed next. B us depend on δ + via the dimensionless ratio ρ. Furthermore, they depend on and on the parton momentum fractions z 1 and z 2 . The kernel W (n) B us does not depend on ∆, because it is dimensionless and there is no other quantity with mass dimension to form a boost invariant and dimensionless ratio. With the Fourier integrals given in appendix B, one can perform the Fourier integral in (2.41) and obtains where ζ 2 = π 2 /6. As noted below (2.47), V B us is a linear function of log ρ, which we write as depending only on , z 1 , and z 2 . By virtue of (2.48), an analogous decomposition holds for W Collins regulator: The coefficients W (n) B us in the perturbative expansion (2.42) depend on the rapidity regulator via the dimensionless and boost invariant ratio ρ/∆ 2 . Since V (2) B us is a linear function of log ρ, the same must be true for W (2) B us . We write the latter as Fourier transform to y space then gives with the integrals in appendix B, where in analogy to (2.48). Again, the coefficients V (2,m) B us depend only on , z 1 , and z 2 . Notice that the highest pole in has the same order for V (2) B us and W (2) B us , in contrast to the case of the δ regulator.

One-loop soft factor for collinear DPS
At small y, the soft factor can be computed in perturbation theory. At order a s , the graphs for the bare soft factor have one gluon exchanged between two Wilson lines with different rapidity. The corresponding Wilson lines are joined to each other as in figure 2a, or they are separated as in figures 2b and c. Graphs a and b are familiar from the soft factor for single-parton TMDs, whereas graph c is specific to DPS. The complete set of graphs is given in figure 25 of [13]. An important result is that graphs b and c give the same result for equal transverse distance between the two Wilson lines. The parts of the graphs that are independent of the colour structure read with the Collins regulator. For details on the Feynman rules used, we refer to section 3 of [13] and to appendix D of [78]. A factor 2 in (2.54) takes into account the second graph in figures 2b and c. With v − L and v + R being positive and v + L and v − R negative, one finds that in U c all poles in − are on the same side if + < 0, so that one obtains zero in this case. For + > 0, one can evaluate the − integral by picking up the residue of the gluon propagator pole and thus obtains U c (y) = U b (y). For the δ regulator, one needs to replace and in (2.54). One can then repeat the argument just given for the Collins regulator. We note that the analyticity properties of the regulated eikonal propagators are essential to obtain the above (this includes the fact of using spacelike and not timelike vectors v L and v R for the Collins regulator). Other regulators, such as the η regulator [73,74], do not readily allow for the contour deformations in the integration over − , so that further considerations are required in such cases. The graphs in figure 2a are evaluated along the same lines, and one readily finds U a = −U b (0) for both regulators. Following the analysis in [68], we exclude graphs with a gluon exchanged between eikonal lines of equal rapidity. Such graphs are zero for lightlike Wilson lines and tree-level gluon propagators, but not for Wilson lines with finite rapidity. The corresponding Wilson line selfinteractions do not appear in the derivation of factorisation, and their appearance in the soft factor defined as a vacuum matrix element of Wilson lines may be regarded as an artefact. In fact, the corresponding loop integrals are not even well defined for spacelike Wilson lines, because they contain pinched poles (see appendix B in [86]). A more systematic treatment of this issue would be desirable but is beyond the scope of this work. Let us now evaluate the loop integrals (2.54) for the two regulators.
Collins regulator: For the Collins regulator the result of performing the − and + integrations can be obtained from equations (3.41) and (3.44) in [13] by setting the gluon mass used in that work to zero. Taking the limit Y R − Y L 1, we obtain where we have added graphs that always appear in combination in the DPS soft factor. We note that the infrared divergences of the individual graphs cancel in this combination. Carrying out the Fourier transformation (see appendix B), we obtain Here we have defined 1 J with 2C F , we can compare with the expression of the TMD soft factor for quarks in [86] and find that our result is consistent with equation (B.11) in that work.
δ regulator: Starting from (2.54) with the replacements (2.55) and (2.56), and performing the integrals over − and + , one obtains For > 0 one can replace 1/( 2 − 2δ + δ − ) with 1/ 2 because the difference between the two corresponding integrals goes to zero for δ + δ − → 0. The Fourier integral then gives Using (2.38) and expanding in , we get J with 2C F and setting x 1 = x 2 = x, we find that our result is consistent with equation (B.27) in [86]. To see this, we use their equation (B.2) and take into account that they define light-cone coordinates as v ± = v 0 ∓ v 3 .

Renormalised DPDs and splitting kernels
The renormalisation of R 1 R 2 F B has been briefly described in section 5.2 of [78]. It proceeds by convolution with a renormalisation factor RR Z for each of the two partons, in close similarity to what is done in the colour singlet case [13,64,88]. A new aspect for colour non-singlet channels is that DPDs and renormalisation factors depend on the rapidity parameter ζ : Parton indices are suppressed here and should be reinstated according to (2.20), (2.21), and (2.22). This should be done before carrying out the sum over the colour representations R 1 and R 2 , whose possible values differ between quarks and gluons. The scale dependence of RR Z is given by we can easily construct the one-loop renormalisation factor in the MS scheme as This follows from the RGE derivative to be used in (2.66), together with the requirement that RR P has the form (2.9) and is independent.
Inserting the splitting form (2.45) for R 1 R 2 F into (2.65), we obtain which can be solved for Z −1 order by order in a s . We see that the factor 11 Z(µ) renormalises both colour-singlet DPDs and ordinary PDFs. It has no ζ dependence, because 1 γ J = 0.
We also need the one-loop coefficient of the renormalisation constant for the running coupling in (2.46), which reads where T F = 1/2, and n F is the number of active quark flavours. The running coupling in 4 dimensions satisfies Rescaling of the rapidity parameter. As is the case in the evolution equation (2.7), the rescaling factors x 2 1 and x 2 2 in (2.70) are fixed by the momentum fractions on the l.h.s. and are not part of the convolution integrals. They appear because the renormalisation factor for parton a i can depend on the plus-momentum x i p + of that parton, but not on the plus-momentum of the other parton or the plus-momentum of the proton. This becomes evident when renormalisation is formulated at the level of twist-two operators and not of their matrix elements [78]. The definition of ζ p in (2.31) or (2.36) refers to the squared proton plus-momentum p + , so that the rescaled factors x 2 1 ζ p and x 2 2 ζ p refer to the square of the appropriate parton plus-momentum. Let us show how these rescaling factors combine to a dependence on x 1 x 2 ζ p in the last line of (2.70). Adapting the argument given for the renormalisation of two-parton TMDs in section 3.4 of [78], one finds that the renormalization factor R Z satisfies a Collins-Soper equation where we make all arguments explicit, including the parton momentum fraction z. Here Λ(µ) is the additive counterterm that renormalises the Collins-Soper kernel We note that the right-hand side becomes R Λ(µ 1 ) + R Λ(µ 2 ) + R J B (y) if one takes different renormalisation scales for the two partons. Using (2.74), we have Using this and its analogue for the factor with x 2 2 ζ p in (2.70), we get Of course, this only happens if one takes equal renormalisation scales for the two partons.

Two-loop splitting kernels for DPDs
Expanding the renormalised splitting kernel in (2.70) to second order, we obtain where it is understood that V (1) and V (2) also depend on two parton momentum fractions, which are integrated over as specified in (2.13). At one loop, the bare splitting kernels V B are finite at = 0, so that their renormalised version in (2.78) is simply given by Here we use the following notation for the expansion of an dependent quantity Q( ) in a Laurent series around = 0: B given in (2.47) and the counterterm contributions given by It is useful to rewrite the second counterterm as where we abbreviate for the remainder of this section. Since V B ( ) has no poles in 1/ , the O( ) terms inside the square brackets of (2.83) are not relevant. We now derive an explicit form of the renormalised two-loop kernels for the two regulators. For the sake of legibility, we omit colour representation labels in the following.
Rapidity divergences must cancel in the presence of the ultraviolet regulator, i.e. the terms with log ρ must cancel in 4 − 2 dimensions. This leads to the consistency relation B ∝ S −1 , and R ∝ S , this relation can indeed hold for different choices of the MS scheme parameter S . Using (2.83) and (2.86), we obtain B us can have only single poles in , which must cancel against those in the counterterm V ct 1 . This implies the condition where we used (2.79). Combining the finite terms, we find The difference between the two definitions of the MS scheme parameter S in (2.44) starts at O( 2 ), as can be seen from (2.59) and (2.60). Since V (2,0) B us has only single poles in , we conclude that V (2) fin is identical in the two schemes. We note that the convolutions withP in (2.88) and (2.89) involve a sum over representation labels when these are restored according to (2.82). Let us check that the ζ p dependence of V (2) is as required by the Collins-Soper equation (2.1). According to (2.78) and (2.89), we have where we used the short-distance expansion of J(y) derived in section 7.2 of [78]: As a further cross check, we take the derivative of V (y, µ ζ) ⊗ f (µ) w.r.t. µ. Using (2.9), we find that the double DGLAP equation (2.7) is satisfied to the required order in a s .
Collins regulator: The decomposition (2.52) of the bare unsubtracted kernel and the explicit form (2.61) of the one-loop soft factor for the Collins regulator give The cancellation of rapidity divergences in 4 − 2 dimensions leads to the same relation (2.86) as for the δ regulator. Using this relation, we obtain and hence Since V ct 1 contains only single poles in , we find that the coefficient of the double pole in V where we used (2.79) and R = 1 + O( 2 ). The remaining single pole of (2.95) must cancel against the one in V ct 1 , which implies Combining the finite terms in (2.95) with those of V ct 1 , we find that the renormalised two-loop kernel V (2) has the form (2.89) with (2.98) Using (2.96), we can rewrite (2.100) The dependence on the MS scheme parameter S cancels out in R 2 V (2,0) B us and R V B , so that V (2) fin is identical for the standard and the Collins definition of the MS scheme.
Scheme dependence. We have seen that V (2) fin is independent of the MS scheme implementation, both for the δ and for the Collins regulator. The only scheme dependence of the two-loop kernels therefore comes from the term with c MS in (2.89). This leads to a MS scheme dependence of the DPD at small y, denotes the term of order a n s in the expansion (2.78). Of course, this dependence must cancel in a physical quantity. To see how this happens, let us consider the double Drell-Yan process. As is well-known, the unsubtracted hard scattering cross section RR σ us for R = R = 1 has a single 1/ pole at O(α s ), due to collinear divergences related to the incoming quark and antiquark. Double poles 1/ 2 cancel between real and virtual corrections. This cancellation does not take place if R = 1, because the colour factors of real and virtual graphs differ in this case [63]. At order a s , the product S −1 −2 leads to a scheme dependent term proportional to c MS in RR σ us . We can determine its coefficient by using that the subtraction of collinear and soft-collinear poles is tantamount to the convolution with an inverse renormalisation factor ( RR Z) −1 for each of the incoming partons. This ensures the µ independence of the physical cross section. According to (2.68) we have for the double pole at O(a s ). The scheme dependent part of the subtracted hard cross section at this order is therefore −c MS a s R 1 γ (0) J 2 times the Born termσ (0) . Expanding the double Drell-Yan cross section to NLO, we have twice this factor times the LO cross section from the scheme dependence of the hard cross sections. This cancels the scheme dependence due to the NLO part (2.101) of the splitting DPDs for each incoming proton.

Two-loop calculation
Most of the techniques described in section 4 of [64] for the calculation of the two-loop kernels with R = 1 carry over to general representations R. We hence begin this section by recalling just briefly the main steps of the calculation. We then discuss the aspects specific for colour non-singlet channels, namely the colour factors and the treatment of the rapidity dependence. To compute bare unsubtracted two-loop kernels, we apply the factorisation formula (2.40) to the DPD F B us,a 1 a 2 /a 0 (∆, ρ) of partons a 1 and a 2 in a parton a 0 . Expanding the formula in a s and writing f B,b/a 0 for the bare PDF of parton b in parton a 0 we obtain for the second order term. The bare unsubtracted two-loop kernel is hence directly given by the two-loop graphs for the bare unsubtracted DPD of partons a 1 and a 2 in an on-shell parton a 0 . We compute with massless quarks and gluons, treating both ultraviolet and infrared divergences by dimensional regularisation. In the second step of (3.1) we used where the second relation holds because the corresponding loop integrals do not depend on any mass scale.

Channels, graphs, and colour factors
Depending on the order at which a splitting process a 0 → a 1 a 2 first appears, we distinguish between • LO channels g → gg, g → qq, q → qg, and • NLO channels g → qg, q → gg, as well as q j → q j q k , q j → q jqk , q j → q kqk , where j and k label quark flavours that may be equal or different.
More LO and NLO channels are obtained by interchanging partons a 1 and a 2 or by charge conjugation, where the kernels for charge conjugated channels are identical or opposite in sign (see section 4). At order a 2 s , a dependence on the rapidity parameter ζ p only appears in the LO channels. At two-loop order, we have real graphs with one unobserved parton in the final state, as shown in figures 3 and 4. In addition, there are virtual graphs with a vertex or propagator correction on one side of the final state-cut, see figure 5. In figure 3, we have indicated those graphs that have topology UND (upper non-diagonal). They play a special role and will be discussed below. In Feynman gauge, one has additional graphs with an eikonal line. These graphs can be obtained from the ones without eikonal lines by applying the graphical rules given in figure 6. Graphs obtained by applying these rules to graphs with UND topology will be referred to as UND graphs as well. The flavour structure of the channels without external gluons can be decomposed as where V q q,q (z 1 , z 2 ) = V qq ,q (z 2 , z 1 ). The graphs for the kernels on the r.h.s. are shown in figures 4(k) to 4(o).
Colour structure of graphs. Let us now take a closer look how the different graphs behave as a function of the colour representations R 1 and R 2 . It is simplest to discuss this in light-cone gauge A + = 0, where graphs with eikonal lines are absent. At LO, one has a single graph without eikonal lines for each splitting process. As a consequence, a kernel for the representations R 1 and R 2 can be obtained from the colour singlet one by a simple rescaling: with factors c a 1 a 2 ,a 0 (R 1 R 2 ) given in (4.5). We have exactly the same rescaling for virtual correction graphs at NLO. This is because for a given channel, all virtual subgraphs in figure 5 depend on the colour indices of the external legs via t a ij or f abc , as does the tree-level graph in the same channel. For real emission graphs, the situation is less simple. We find that graphs that have the same colour factors in the singlet channel do not necessarily have the same colour factors for other representations. Examples are graphs 3(a), 3(b), 3(c) and graphs 4(a), 4(b). Interestingly, several graphs with UND topology have a zero colour factor for some representations. Examples are graph 3(c) for R 1 R 2 = AA, SS, graphs 3(o) and 4(d) for R 1 R 2 = 8A and 8S, and graph 4(i) for R 1 R 2 = AA. Figure 3: Real graphs for LO channels. The parton lines at the bottom of the graph correspond to a 0 , and those on top to a 1 , a 2 , a 2 and a 1 from left to right. Additional graphs are obtained by interchanging a 1 ↔ a 2 and by reflection w.r.t. the final state cut. Graphs with topology UND (upper non-diagonal) are marked here and further discussed in the text. Figure 4: Real graphs for NLO channels. Additional graphs are obtained by interchanging a 1 ↔ a 2 and by reflection w.r.t. the final state cut. The association of parton lines is as in figure 3.  Despite these complications, we do find a number of simple patterns in the colour factors for the real graphs. Together with the simple scaling rule for virtual graphs, we find the following.
• Denoting the sum of real and virtual two-loop graphs for a channel by R 1 R 2 Γ a 1 a 2 ,a 0 , we find that the two octet channels for gluons are related as However, there is no such relation in the q → gg channel.
• With the same notation, we find 10 10 Γ gg,a = 10 10 Γ gg,a = 0 for a = g, q. (3.6) This result is valid for general N , as well as the equality c gg,g (10 10) = c gg,g (10 10) = 0. As a consequence, all splitting kernels are zero in the decuplet sector, both at LO and at NLO.
• For the sum R 1 R 2 Γ real a 1 a 2 ,a 0 of real two-loop graphs, we find a structure for g → qq, and for q → gg, where N = 3 for R 1 R 2 = 27 27 as usual. The functions Γ F and Γ A are colour independent. We observe that the terms going with Γ A have the same scaling with R 1 R 2 as the LO and the virtual NLO graphs, whereas the terms going with Γ F have a different scaling. We note that the colour factors of single graphs for R 1 R 2 = 11 may be linear combinations of C A and C F or of C A C F and C 2 F . For instance, the colour factor of graph 3(j) in the singlet channel is C A − 2C F , and the one of graph 4(i) is which is the form we will use in the presentation of our results.
• For q → gg, we obtain nonzero results also in the mixed octet channels. They satisfy AS Γ real gg,q = − SA Γ real gg,q (3.10) and cannot be written in terms of Γ A gg,q and Γ F gg,q .
Apart from the contribution from real and virtual graphs to V B us , the full two-loop kernel V (2) receives contributions from the ultraviolet counterterms (2.82). These are single or double poles in 1/ that cancel the corresponding poles in V B us , so that their colour structure must match that of the ultraviolet divergent terms in the sum of real and virtual two-loop graphs.
Of course, virtual two-loop graphs are only present in LO channels. The explicit colour dependence of the one-loop splitting kernels RR P appearing in the counterterms is given in equation (7.91) of [78].
In LO channels, we finally need to include the contribution going with the one-loop soft factor in (2.47). According to (2.61) and (2.64), R S B depends on the representation R via a global factor R γ (0) J , so that for its contribution to R 1 R 2 V (2) we have This is the colour structure of the counterterm V ct 2 and of the terms with double logarithms in the expression (2.89) of V (2) . It gives zero for R 1 R 2 = 11, where rapidity divergences cancel between real and virtual graphs. The scaling relations (3.5) and (3.6) hold for the sum of real and virtual two-loop graphs, and hence also for all counterterms. Therefore, they hold for the full two-loop kernels V (2) in the channels g → gg, g → qg, and q → qg. Since they also hold for the LO kernels, they hold for the full splitting DPDs up to order a 2 s . For channels without external gluons, the kernels on the r.h.s. of (3.3) receive contributions from exactly one graph, see figure 4(k) to 4(o). For each of them, one therefore has a unique colour factor at given R 1 R 2 , which gives the colour factor of the full two-loop kernel and hence for the full splitting DPDs up to order a 2 s . These factors will be given in (4.35). For q → gg, the relations in (3.8) and (3.10) extend to the full two-loop kernel because only real graphs and their counterterms contribute in that case. The colour structure in the channel g → qq is more complicated than the relation (3.9) for real graphs and will be given in (4.21) and (4.22).

Computation of the graphs and handling of rapidity dependence
Let us briefly specify the kinematics of the two-loop graphs. We work in a frame where the momentum of the incoming on-shell parton a 0 has a plus-component k + and zero minus-and transverse components. Momenta are assigned as shown in figure 7(a) for real graphs; for virtual graphs one has to remove the line a 3 and set k 3 = 0. The minus-and transverse components of k 1 and k 2 must be integrated over, because the fields in the operators (2.24) have a light-like distance from each other. In addition, one must integrate over ∆ − , given that y + = 0 in the matrix element (2.23). We write z 1 = k + 1 /k + and z 2 = k + 2 /k + for the plus-momentum fractions of the observed partons, which corresponds to writing the splitting kernel as V (z 1 , z 2 ). In the convolution (2.13) of this kernel with a PDF, one integrates over z with z 1 = uz and z 2 =ūz, with u and u given in (2.14). Figure 7: Left: Assignment of momenta in real emission graphs. Plus momenta are fixed to k + 1 = z 1 k + , k + 2 = z 2 k + , and ∆ + = 0. Compared with the symmetric assignment in [19] we have shifted k 1 and k 2 such that ∆ appears only on the r.h.s. of the cut. Right: Assignment of momenta in vertex correction graphs to the left of the cut. An analogous assignment holds for loops to the right of the cut, with k 1 and k 2 shifted by +∆ and −∆, respectively.
Real graphs. One of the minus-momentum integrals in real graphs can be carried out trivially by using the on-shell condition of the emitted parton a 3 . The two remaining minusmomentum integrals are performed by closing the integration contour in the complex plane, picking up exactly one propagator pole per integral. Details are given in section 4.2.1 of [19]. After these steps, the remaining integrals are over the transverse momenta k 1 and k 2 in 2 − 2 dimensions. We find it useful to write for the plus-momentum fraction and transverse momentum of the emitted parton a 3 . The on-shell condition then reads 13) and the convolution (2.13) of the splitting kernel with a PDF then can be written as an integration over z 3 from 0 to 1 − x with (3.14) Let us first assume that we take lightlike Wilson lines without any rapidity regulator in the operators (2.24). For nonzero z 3 this is sufficient for obtaining well-defined results. Instead of Feynman gauge, one can then also use the light-cone gauge A + = 0 to compute the graphs. We computed in both gauges and found agreement. At the point z 3 = 0, we encounter two types of singularities in the real graphs. They arise if parton a 3 is a gluon, which must couple to an eikonal line if one computes in Feynman gauge.
1. Soft singularities are due to the region where both k 3 and z 3 go to zero at fixed k 2 3 /z 3 . The minus-momentum k − 3 of the final state parton a 3 remains finite in this limit and thus does not lead to a suppression due to other lines in the graph going far off shell.
We regulate these singularities by working in 4 − 2 dimensions. The volume of the transverse momentum integration d 2−2 k 3 then contributes a fractional power z 1− 3 in the relevant momentum region. With one factor 1/z 3 from the invariant phase space of a 3 and another factor 1/z 3 from the eikonal propagator, one obtains a behaviour z −1− 3 for the graph, unless there are additional factors of z 3 in the numerator. The convolution integral in z 3 is well defined, provided that one takes < 0, as is appropriate for the regularisation of divergences in the infrared.
2. Rapidity singularities are due to the region in which z 3 becomes small but k 3 does not.
With k − 3 scaling like 1/z 3 in this limit, at least one line in the graph goes far off shell, and its propagator denominator gives a factor z 3 . With 1/z 3 from the phase space of a 3 and 1/z 3 from the unregulated eikonal propagator, one obtains a singular z −1 3 behaviour of the kernel if there are no additional powers of z 3 from additional off-shell lines or from the numerator of the graph.
We find that the only graphs with rapidity divergences are those that have UND topology (see figure 3 and figure 6). These graphs must be evaluated with a rapidity regulator. All other graphs can be evaluated without such a regulator, with a caveat that will be discussed in appendix C.
Let us note in passing that the two kinematic regions just discussed resemble the ultrasoft and soft modes in soft-collinear effective theory (SCET) [89,90]. Ultrasoft gluons have sufficiently small momentum components to couple to collinear lines without carrying them far off-shell. By contrast, the minus-momentum of soft gluons is large enough to carry rightmoving collinear lines far off shell. We now show how the two different rapidity regulators work for the two-loop graphs under consideration. For this part of the calculation, we work in Feynman gauge. If we use the δ regulator, the eikonal propagators with momentum k 3 have a denominator k + 3 ± iδ + . In the sum over complex conjugate graphs, we find that only the real part remains, where c.c. denotes the complex conjugate and ρ = k + 1 k + 2 /(δ + ) 2 . Note that this is consistent with the definition (2.37) for ρ, because in the DPD we have k + 1 = x 1 p + and k + 2 = x 2 p + for the observed partons. If we use the Collins regulator, we have in the sum over complex conjugate graphs, where PV denotes the principal value prescription and ρ = 2k + 1 k + 2 v − L |v + L | in accordance with (2.34). Here we have used the on-shell condition (3.13) for the emitted gluon. Notice that for the δ regulator, the eikonal propagators do not affect the loop integrals over transverse momenta, whereas for the Collins regulator they do. We explain in appendix C that it is not convenient to carry out the transversemomentum integration with eikonal propagators of the form (3.16). Instead, we first make use of simplifications that arise in the limit ρ → ∞. In order to combine the rapidity singularities in the real graphs with those from virtual graphs and from the soft factor, we use the distributional identities where following [91] we use the notation for plus distributions. It is understood that the functions multiplying the distributions in (3.17) to (3.19) are sufficiently smooth and vanish outside the interval 0 ≤ z 3 ≤ 1. A derivation of (3.17) and (3.18) is given in appendix D. Note that z 1 and z 2 depend on z 3 as specified in (3.14). Comparing (3.17) with (3.18), we see that the term with log ρ involves the same loop integral for both regulators. This implies that the contribution from real graphs to the coefficient V B us receives an additional contribution from the last term in the square brackets of (3.18). The loop integrals for the log ρ terms (3.17) or (3.18) can be carried out with elementary methods, once one has taken the limit z 3 → 0 in the integrands. The same holds for the integrals involving the last term in the square brackets of (3.18) if we write perform the integrals for fixed α, and then take the derivative at the point α = 0. To carry out the loop integrals going with L 0 (z 3 ) in (3.17) or (3.18), as well as the loop integrals for graphs without rapidity divergences, we proceed as described in [64]. We use integrationby-part reduction [92,93] as implemented in LiteRed [94]. The resulting master integrals are evaluated using the method of differential equations [95][96][97][98]. For the latter, we perform in particular a transformation to the canonical basis [99] using Fuchsia [100]. The initial conditions for the differential equations are taken at the point z 3 → 0. To compute the loop integrals in this limit, we use the method of regions [101]. After carrying out the loop integrations, we have terms in which L 0 (z 3 ) is multiplied by a function going like z 0 3 or like z − 3 in the limit z 3 → 0, which respectively corresponds to a rapidity or to a soft singularity. In the former case, the plus-prescription regulates the 1/z 3 behaviour in convolutions over z 3 . In the latter case, we can rewrite L 0 (z 3 ) z − 3 = z −1− 3 because is to be taken negative for soft singularities. In a final step, we expand our results around = 0 and use the distributional identity [102] We find that in the sum over all real graphs, the terms explicitly given on the r.h.s. are multiplied with a factor z 3 . As a result, the δ(z 3 ) contribution disappears, whereas the plusdistribution terms simplify to z 3 L n (z 3 ) = log n z 3 with n = 0, 1, 2.
Virtual graphs. For virtual graphs, we have the kinematic constraints k − 2 = −k − 1 and k 2 = −k 1 , and we take k − 1 and k 1 as independent integration variables. In addition, we must integrate over a loop momentum . For vertex correction graphs, the momentum routing is shown in figure 7(b). The integrals over ∆ − , k − 1 , and − are performed by complex contour integration in a similar way as for the real graphs. This leaves us with integrals over k 1 , , and + . The + integral runs over a finite interval: outside this interval, one of the minusmomentum integrals gives zero because all poles are on the same side of the real axis. Writing + = z k + , we find endpoint singularities at z = 0, which are either soft singularities or rapidity singularities. Their discussion is fully analogous to the one for z 3 = 0 singularities in the real graphs. Rapidity divergences are made explicit by using the analogues of the distributional identities (3.17) and (3.18) with z instead of z 3 . Note that in this case z 1 and z 2 are fixed during the integration over z , which does however not change the r.h.s. of the identities. Soft singularities are made explicit with the analogue of (3.21) after the transverse-momentum integrals have been performed. The virtual graphs that have rapidity divergences are those obtained by the rules in figure 6 from vertex correction graphs in which the momentum is carried by a gluon, such as in figure 5(f).
Combining all graphs. After the steps described so far, we can add the contributions from real and virtual graphs, using that the latter are multiplied with δ(z 3 ) due to kinematics. Since the loop integrals accompanied by log ρ are elementary, we can compute V (2,1) B us to all orders in and thus verify the condition (2.86), which ensures that all rapidity divergences cancel when combining the unsubtracted kernels with the soft factor. For the δ regulator, we find that the single poles in V (2,0) B us satisfy the consistency relation (2.88). For the Collins regulator, we denote by δV (2,0) B us the sum of contributions from the last term in (3.18) and from its analogue for z . We find that This is exactly what is required to fulfil the consistency relations (2.96) and (2.97) for double and single poles with the Collins regulator and to obtain the same result for the finite part V (2) fin with both regulators (see (2.90) and (2.98)). Our calculation thus establishes the cancellation of rapidity divergences at all orders in , as well as the cancellation of all poles in for two rapidity regulators, and we obtain the same finite result for V (2) in both cases. We regard this is as a strong cross check of our results. At this point, we must point out an issue with the definition of the δ regulator. In [72] it is stated that in the twist-two operator defining an unsubtracted TMD, the parameter δ should be rescaled by the parton momentum fraction x; see equation (3.9) in that paper. A corresponding rescaling in our case would lead to some modification of the term log ρ V (2,1) B us ( ) in (2.85). The second term with log ρ in the same equation comes from the soft factor and would not be modified. One would then obtain a different result for the final kernel V (2) and thus lose agreement with the result for the Collins regulator. Moreover, the extra term from the modification would be multiplied by V (2,1) B us ( ), which has a pole in 1/ according to (2.86). Since the counterterms in our calculation are completely fixed, the modification would not give a finite result for = 0. Clearly, our calculation does not admit any rescaling of δ in the unsubtracted DPD. The rescaling prescription for δ in [72] is a consequence of the fact that the ζ parameter appearing in the Z factor for TMDs was defined with respect to p + rather than xp + in that work. As we explained in section 2.5, the rapidity parameter in a Z factor must refer to the parton momentum in the renormalised operator. If this is the case, no rescaling prescription for δ should be applied. 2

Results for the two-loop kernels
In this section, we present several general features of our results for the DPD splitting kernels, with a focus on their colour structure and various kinematic limits. Numerical illustrations will be given in section 5. The full analytic expressions of the kernels are given in the ancillary files associated with this paper on arXiv. According to (2.89), the two-loop kernels have the form with L defined in (2.12), c MS given in (2.60), and R γ (0) J in (2.6). The full splitting DPD at two-loop accuracy is obtained from (2.78). In the following, we call V [2,0] the non-logarithmic part and V [2,1] the logarithmic part of the two-loop kernel. The last line in (4.1), including the constant c MS , will be referred to as the double-logarithmic part. Let us recall that the one-loop splitting kernels have the form so that the convolution (2.13) simplifies to with u = x 1 /x and x = x 1 + x 2 . These kernels were already computed in [13] and are given by By definition, one has c a 1 a 2 ,a 0 (11) = 1 for all channels. The colour-singlet kernels 11 V (1) (u) are equal to the familiar DGLAP splitting functions without the plus prescriptions and delta functions: We recall thatū = 1 − u. Setting N = 3 in (4.5), we obtain numerical values which shows that colour correlations are quite large for the splitting into gg and qg. The dependence of the two-loop kernels on the momentum fractions z 1 and z 2 can be cast into the form with regular terms V [2,k] reg (uz,ūz) that are finite or have an integrable singularity at z = 1 for 0 < u < 1. Such a singularity is at most a power of log(1 − z). Note that the values u = 0 and u = 1 are outside the region relevant for DPDs in cross section formulae, where both x 1 and x 2 must be strictly positive. The behaviour of the kernels in the limit u → 0 or u → 1 is discussed in section 4.4. Terms with 1/[1 − z] + and δ(1 − z) appear only in LO channels. It is noteworthy that V [2,0] does not have any term going with the plus distribution 1/[1 − z] + . All such terms appearing in two-loop graphs cancel when the one-loop counterterms are added. We have no explanation for this finding. We note that a more general form of the kernel is given in equation (110) of [64], where the function multiplying 1/[1 − z] + depends on both u and z. This can be brought into the form (4.9) by expanding that function around z = 1 and moving all terms with powers of (1 − z) into the regular term.
Symmetries. Let us now discuss the properties of splitting kernels that follow from charge conjugation invariance. Channels in which all external partons are quarks or antiquarks will be called "pure quark channels" in the following. The kernels for pure quark channels that are related by charge conjugation are equal. For kernels with external gluons, we have RR V qq,g (z 1 , z 2 ) = RR Vq q,g (z 1 , z 2 ) , where the sign factor η(R) is −1 for R = A and +1 otherwise. The relation in the last line implies AS V gg,g = SA V gg,g = 0 , 10 10 V gg,g = 10 10 V gg,g . (4.11) A detailed proof of these relations is somewhat lengthy and shall not be given here. In a first step, one can prove relations for the unsubtracted distributions F us,a 1 a 2 , using the charge conjugation properties for the colour projected twist-two operators given in (A.8). At this stage, a sign factor η(A) = −1 appears for each gluon pair coupled to an antisymmetric octet. Extending these relations to the subtracted distributions F a 1 a 2 is easy, because the soft factor appearing in that step depends only on the multiplicity of the colour representations and does not distinguish between quarks and antiquarks. Corresponding relations for the splitting kernels are then obtained from the splitting formula (2.11) by using the charge conjugation relations for PDFs.
Together with the trivial symmetry relation we find that the kernels are even under the exchange z 1 ↔ z 2 , whereas R 1 R 2 V gg,q (z 1 , z 2 ) is odd for the combinations R 1 R 2 = AS and SA and even for all others.

Colour dependence
We now specify the colour dependence of the two-loop kernels for each partonic channel. In the following, R(uz,ūz), D(u), and S(u) respectively denote contributions to the terms V reg (uz,ūz), δ(1 − z) V δ (u), and V p (u) [1 − z] + in (4.9). The functions R, D, and S are independent of the number of colours N . They may or may not depend on the representations R 1 R 2 , as indicated by the presence or absence of a corresponding superscript. The coefficient β 0 , defined in (2.72), is the only place where a dependence on the number n F of active quarks appears in the kernels. From now on, we set T F = 1/2 for simplicity. We also recall that all results for R = 27 are valid for N = 3 only.
g → gg : Using the scaling factors in (4.5) one may write the non-logarithmic and logarithmic parts of the g → gg kernel as For some of the functions in (4.17) one finds simple relations between different colour channels: gg,g , and D β [2,1] gg,g are independent of R. They originate from contributions with simple LO scaling, such as virtual graphs and associated counterterms. In particular one finds that [2,1] gg,g = 11 V (1) gg,g . (4.18) • As a consequence of (3.5), one has gg,g (4.19) for the full two-loop kernels. The functions R, S, and D in (4.17) are hence equal for R 1 R 2 = AA and R 1 R 2 = SS.
• Due to charge conjugation invariance, the kernels for R 1 R 2 = AS and SA are zero at all orders.
• An interesting feature is that the plus distribution terms in the octet channels vanish, This is because these channels have a vanishing colour factor for the eikonal line graphs with UND topology, which are the only two-loop graphs giving rise to plus distribution terms.
g → qq : For the g → qq kernel, one finds are independent of R.
• All terms going with C F scale like the LO kernels.
• The relations (3.9) for the sum of real graphs lead to the following simple relations for the regular and plus-distribution terms with colour factor C A : q → qg : The colour structure for this channel has the form qg,q (4.23) Here, some but not all functions obey simple relations between different colour representations: • As in the previous channels, the δ distribution terms in the non-logarithmic part exhibit LO scaling, such that D qg,q , and D β [2,0] qg,q are independent of R. The same holds for the term going with β 0 in the logarithmic part, which is given by qg,q (4.24) in analogy to (4.18).
• All terms with C 2 F scale like the LO kernels.
• As a consequence of (3.5), there is a simple scaling relation for the full kernels in the two octet representations: The functions on the r.h.s. of (4.23) are hence equal for R 1 R 2 = 8A and R 1 R 2 = 8S.
• As in the g → gg channel, one finds that the plus-distribution terms vanish for the octet representations, This can be again explained by a vanishing colour factor of the eikonal line graphs with UND topology.
We now turn to the kernels for the NLO channels, which contain regular terms but no δ or plus distributions. The double logarithmic part of V (2) is zero in these channels.
g → qg : The kernel for g → qg has the form qg,g for k = 0, 1 . Noteworthy features of this channel are: • The terms with C F scale like the LO kernels.
• The full kernels for the two octets are related by as a consequence of (3.5). The functions for R 1 R 2 = 8A and R 1 R 2 = 8S on the r.h.s. of (4.27) are hence equal.
q → gg : As a consequence of the relations in (3.8), one can write gg,q for k = 0, 1 (4.29) in all colour channels except for the mixed octets.
• The terms multiplying C A C F in the square brackets are colour independent, whereas the terms multiplying C 2 F have an additional scaling factor • Since c gg,g (10 10) = 0, one has 10 10 V gg,q = 0, and the same holds for R 1 R 2 = 10 10.
In the mixed octet channels, one has AS V [2,k] gg,q = − SA V [2,k] gg,q =c(AS) AS R [2,k] gg,q for k = 0, 1 (4.31) with a global colour factorc Pure quark channels. Using the decomposition (3.3) and the symmetry relations (4.12) and (4.15), one can deduce results for all pure quark channels from the five kernels q q ,q . (4.33) For R = 1, the first two have a colour factor C F (C A − 2C F ), whereas the last three have a colour factor C F . One finds Because each kernel corresponds to a single two-loop graph, one finds simple scaling relations between octet and singlet: q q ,q = c qq,g (88) . Relations between DPDs at small y. According to (4.25) and (4.28), the two-loop kernels for both q → qg and g → qg follow the same scaling relation between the two octets as the q → qg splitting at one loop. We therefore have the relation for the DPDs at small y. Corrections to this and the following three relations are of order O(a 3 s ) and O(y 2 Λ 2 ). The kernels for the octet combinations AA and SS are opposite to each other in the g → gg channel, but they are not for q → gg because of the terms with C 2 F in (4.29). Using the values of the rescaling factors c andc, we obtain the relation We expect the relative difference between AA F gg and − SS F gg to grow with x 1 + x 2 , as the quark singlet distribution then becomes more important than the gluon distribution in the splitting DPDs. This is confirmed by the numerical examples in section 5.
The mixed octet distribution is driven by the valence combination f q − fq, with the minus sign between quark and antiquark distributions resulting from the fourth relation in (4.10). Unlike all other two-gluon distributions, AS F gg (x 1 , For two gluons in the decuplet representation, we find 10 10 F gg = 10 10 F gg = 0 (4.39) at two-loop accuracy. This result holds for all N . It is an interesting question whether these distributions remain small when y is in the non-perturbative region. For pure quark channels, we get simple relations between the singlet and octet distributions if the quark flavors differ, because in this case only the kernel V qq ,q or V qq ,q contributes. As a consequence, one has for q = q . For equal quark flavors, no simple relation is obtained at two-loop accuracy. We emphasise that the relations (4.36) to (4.40) should be evaluated for µ ∼ x 1 x 2 ζ p ∼ 1/y in order to avoid logarithmically enhanced contributions from higher orders. An exception is the relation (4.39), which is stable under evolution. In the following subsections, we discuss the behaviour of the two-loop kernels and their convolution with PDFs in various kinematics limits. We closely follow the presentation for the colour singlet kernels in [64] where appropriate. In the channels g → gg, q → qg, and g → qg, we will not give results for kernels for R 1 R 2 = SS, which are proportional to the kernels for R 1 R 2 = AA according to (4.19), (4.25), and (4.28). For the mixed octet combinations, we can limit our attention to AS V gg,q . Finally, we will no longer discuss the kernels in the decuplet sector, which are all zero.

Threshold limit: large x 1 + x 2
Let us start with the threshold limit x → 1. From (2.13), (4.1) and (4.9), we deduce that the convolution of kernels with PDFs gives 3 The power for the subleading contributions in the last line is n = 0 or n = 1 and follows from the behaviour of V [2,k] reg (uz,ūz) for z → 1. The contribution enhanced by log(1 − x) is due to plus distribution terms. Such terms are absent in V (1) . They appear in V (2) only for LO channels, where they are accompanied with a logarithm L and hence vanish for µ = y/b 0 . The relevant coefficients read 11 V [2,1] gg,g p = 4C 2 A p gg (u) , 27 27 with the functions p(u) given in (4.7). Here we write V [2,1] ... | p instead of V [2,1] ..., p for better readability. The plus distribution terms in the g → gg and q → qg channels are zero for R 1 R 2 = AA and R 1 R 2 = SS.

Small x 1 + x 2
To understand how the x → 0 limit is analysed, let us first consider the ordinary Mellin convolution (2.8) of two functions P (z) and f (z). If both functions approximately behave like 1/z at small z, then the convolution integral approximately goes like dz/z over a wide range x z 1 in which the arguments of both P (z) and f (x/z) are small. If this range is wide enough, its contribution dominates the result. The same observation applies to the convolution (2.13) if V (uz,ūz) ∼ 1/z 2 for small z.
To be more specific, one finds that for V (uz,ūz) = w(u)/z 2 with an enhancement factor where k is a positive integer in the upper line of (4.44). The form of f (x) in the lower line provides a reasonable first approximation for the small x behaviour of most PDFs, with α not too far away from zero. Based on this discussion, we now extract the leading 1/z 2 power behaviour of the two-loop kernels for the different channels.
g → gg : q → gg : In several (but not all) cases, we observe Casimir scaling by a factor C F /C A between the q → gg and g → gg kernels: gg,g for R 1 R 2 = 11, AA, SS, 27 27 , 27 27) 16 The kernel AS V [2,0] gg,q (uz,ūz) goes like 1/z 2 at small z, but the region z 1 does not dominate the small x limit of its convolution with PDFs in (4.38). This is because the valence combination f q (z) − fq(z) has a small z behaviour far away from the 1/z law that underlies the argument at the beginning of this subsection.
Remaining channels: For the splitting q → q q , we also find Casimir scaling: 4 for all R and k = 0, 1 . The kernels for the other pure quark channels, as well as those for q → qg and g → qg have no leading 1/z 2 behaviour at small z. They behave like z n with integer n ≥ −1, and their convolution with a PDF is not dominated by the region x z 1. We observe that in the channels with a leading 1/z 2 behaviour, the non-logarithmic part R 1 R 2 V [2,0] of the kernel scales with c(R 1 R 2 ) like the LO terms, whereas the logarithmic part R 1 R 2 V [2,1] in general does not. (An exception to this rule is the case R 1 R 2 = AS, which is special as we just explained.)

Small x 1 or x 2
We now analyse the limit in which x 1 or x 2 becomes small compared with x 1 + x 2 , i.e. the limit u → 0 or u → 1. The second limit can be reduced to the first by swapping the parton indices a 1 and a 2 , and we will always take u → 0 for definiteness. The limit u → 0 is subtle because it gives rise to singularities in the two-loop kernels that are absent for finite u. We isolate these singularities by writing where the first term is integrable over z in the limit u → 0, whereas the second one develops a 1/(1−z) singularity at z = 1. This singularity arises from graphs in which the initial parton a 0 splits into a 2 and another parton. That parton has a momentum fraction 1−z 2 = 1−z(1−u), which appears in the denominator of its propagator. For V r and for the terms with a plus or δ distribution in V (2) , one can readily take the limit u → 0. One obtains 1/u for the leading behaviour. For V s , one must take the u → 0 limit at 4 The result for W [2,2] q q ,q in equation (161) of [64] is wrong and should read W [2,2] q q ,q = (CF /CA) W [2,2] q q ,g .
the level of the convolution V s ⊗ f and not at the level of the kernel. This part of the kernel has the general form where on the r.h.s. we have changed variables from z toz = 1−z. For ease of writing, we omit the superscript [2, k] on V s and on the coefficients h henceforth. h 1,a and h 1,b are numerical constants, whilst h n (z, u) and h 1,c (z, u) are functions which in the limits of smallz or small u are either finite or have a singularity of order logz or log u. Notice that no explicit factor 1/u appears when taking the u → 0 limit of V s . We hence erroneously discarded the contribution from V s when discussing the limit of small x 1 in our previous work [64]. However, all terms except for the last one in (4.50) give rise to a 1/u behaviour of the convolution qq,g ≈ c qq,g (88) 11 V [2,0] qq,g , 88 V [2,1] qq,g ≈ c qq,g (88) 11 V [2,1] qq,g . (4.61) q → gg : gg,g for R 1 R 2 = 11, AA, SS, 27 27 , 16 where we omit the case R 1 R 2 = AS for the reason given after (4.47).
Remaining channels: qq,g , for all R and k = 0, 1 . The kernels for all other channels have no leading 1/z 2 behaviour. Taking the small x limit of the small u results in section 4.4 means to select the convolution terms (1/z) ⊗ f , since this will give a small x enhancement as explained in section 4.3. Such a behaviour is only found in very few cases, namely in where in the last step we have performed the convolution for specific choices of the PDF, as detailed in (4.44). If we first take u → 0 and then the small x limit, we have instead with a leading u −1 log u behaviour but no small x enhancement. Clearly, the approach to the triple Regge limit strongly depends on the direction in the plane of x 1 and x 2 in this case.

Numerical illustration
In this section, we illustrate the impact of NLO corrections on the splitting DPDs. For easier comparison, we use the same settings as in section 5.4 of our study [64] for the colour singlet case. We take y = 0.022 fm for the inter-parton distance. This corresponds to a scale at which the logarithm L in (2.12) is zero. For the PDFs in the splitting formula, we take the NLO set of CT14 [103], along with the coupling α s used in that set. 5 This gives α s (10 GeV) = 0.178 , α s (5 GeV) = 0.212 , α s (20 GeV) = 0.153 (5.2) at the scales on which we focus in the following study. We work with n F = 5 active quark flavours.
The splitting DPD at two-loop accuracy is given by the sum of the LO and NLO terms F (1) and F (2) defined in (2.102). Note that we evaluate the one-loop expression F (1) with NLO PDFs here. Thus, the comparison of F (2) with F (1) directly reveals the impact of the two-loop splitting kernels. As we saw in [64], the difference between evaluating F (1) with LO or NLO PDFs is typically quite large, at least in the gg channel. This simply reflects that PDFs at moderately large scales depend quite strongly on the order at which they are determined and evolved.

Two-gluon distributions
In figure 8 we show the full splitting DPD F gg = F gg as a function of x 1 for different values of x 2 . We take scales µ = x 1 x 2 ζ p = µ y and use the standard definition of the MS scheme, so that according to (2.60) the scheme parameter in the two-loop kernels (4.1) is c MS = −ζ 2 /2. The sign of the distributions (not visible in the logarithmic plots) is the same as at LO, i.e. 11 F gg and SS F gg are positive and AA F gg and 27 27 F gg are negative. The relative size of RR F gg for different R reflects the scaling factors of the LO kernels given in (4.5) and (4.8). In some panels of figure 8, one can see slight differences in the shape of the distributions, which are due to the two-loop terms. To see this more clearly, we plot the ratio F (2) F (1) in figure 9. We find that the a 2 s corrections exhibit a quite varied structure as a function of the momentum fractions x 1 and x 2 . Not exceeding 30%, they are of moderate size in this setting. Note that RR F (2) RR F (1) is nearly equal for R = A and R = S at small x 1 + x 2 , whilst for large values of x 1 + x 2 a clear difference between the two representations is observed. The growing difference with increasing x 1 + x 2 is what we anticipated when discussing our analytic result (4.37). We also evaluated F (2) F (1) for µ y = 5 GeV and µ y = 80 GeV, always with central values µ = x 1 x 2 ζ p = µ y of the scales. Compared with the choice in (5.1), the overall size of the    ratio changes due to the changed value of α s (µ y ). There is also a mild change in the shape of the curves, due to the scale dependence of the PDFs in the splitting formula. An example is shown in figure 10. We continue our study with the value µ y = 10 GeV in (5.1). To gauge the dependence of our results on the PDFs used in the splitting formula, we produced plots for the NLO sets of ABMP16 [106], NNPDF3.1 [107], and MSHT20 [108]. 6 As illustrated in figure 11, the curves for ABMP16 and MSHT20 are quite similar to those for CT14 up to about x 1 + x 2 ∼ 0.4, whereas differences become more pronounced for larger x 1 + x 2 . The same holds for NNPDF3.1. In this case, the gluon distribution at µ = 10 GeV has a zero for x ≈ 0.7, which leads to a corresponding zero in RR F (1) gg . At order a 2 s , the splitting DPDs have a dependence on the MS scheme definition, which cancels at the level of DPS cross sections as explained at the end of section 2.6. According to (4.1), a change in the scheme parameter c MS results in a shift in the ratio RR F (2) RR F (1) that is proportional to a s R γ (0) J and independent of x 1 and x 2 . Comparing figure 12 with figure 9(c), we see that this shift is moderate for two-gluon distributions in the octet channels but quite large for R = 27. We now investigate how the importance of a 2 s contribution changes if we evaluate the DPDs at different scales. In figure 13, we show the ratio F gg for three different values of µ, taking x 1 x 2 ζ p = µ y in all three cases. We see that for both µ = µ y /2 and µ = 2µ y , the impact of the two-loop kernels becomes substantial at least in some region of x 1 and x 2 . This holds for all colour representations and is in stark contrast to the case µ = µ y . A change of the rapidity parameter ζ p at given µ affects only the double logarithmic part of the kernel (4.1). For the ratio RR F J and independent of x 1 and x 2 (as does the change of MS scheme definition). We show the ratio for x 1 x 2 ζ p = µ with µ = µ y /2 and µ = 2µ y in figure 14 and see that also in this case the impact of the two-loop terms is in general large. The large NLO corrections are mainly due to the single logarithmic terms L V [2,1] in the two-loop kernel. We conclude that, at least for two-gluon DPDs, it is most favourable to use the fixed-order splitting formula for µ and x 1 x 2 ζ p very close to µ y , so as to avoid large radiative corrections       gg for x 2 = 10 −3 , evaluated at different values of µ = x 1 x 2 ζ p . from higher orders. The DPD at any other scale can then be obtained by using the evolution equations (2.1) and (2.7). Let us finally take a look at the mixed octet distribution AS F gg , which starts at order a 2 s and is driven by the quark valence distributions. Comparing the plots in figure 15 with those in figure 8, we see that AS F gg is tiny compared with the two-gluon distributions in the channels that receive a contribution from g → gg splitting. The zero of AS F gg at x 1 = x 2 is a consequence of the asymmetry of this distribution under the exchange x 1 ↔ x 2 .

Quark-gluon distributions
We now turn our attention to the quark-gluon channel. Here, a new phenomenon occurs compared with the two-gluon case. This is seen in figure 16, where we show the term F (1) and the separate contributions from the subprocesses q → qg and g → qg to F (2) . For x 1 ∼ x 2 and for x 1 > x 2 , the a 2 s contributions are small compared with the a s term, but when x 1 becomes much smaller than x 2 , the contribution from g → qg quickly overtakes the a s term and completely dominates the sum of terms.  This can be understood from our results for the limit of small u = x 1 /(x 1 + x 2 ) in section 4.4. According to (4.56), the two-loop contribution F (2) (g → qg) behaves like 1/u in that limit, whereas F (1) and F (2) (q → qg) do not have a 1/u enhancement. Since in figure 16 we plot contributions to the scaled DPD x 1 x 2 F at fixed x 2 , the contribution from g → qg tends to a constant for small x 1 and the contributions from q → qg go to zero. Closer inspection reveals that F (2) (q → qg) has a log 2 u enhancement compared with F (1) for small u, which explains why the a 2 s term for q → qg slowly overtakes the a s term in that limit. As explained in section 4.4, the 1/u enhancement of F (2) (g → qg) originates from the graph in figure 4(c), where a gluon with momentum fraction of order u is emitted from a gluon with much larger momentum fraction. This is not possible in the q → qg channel at order a 2 s . As is well known in small x physics, higher orders will enhance the 1/u behaviour by factors of log k u but not lead to higher powers like 1/u 2 . Hence, one does not expect a change as drastic as the step from one to two loops in figure 16 when going to three loops or higher. This situation is reminiscent of hard-scattering cross sections, where the step from LO to NLO accuracy can be huge if qualitatively new contributions first appear at NLO. The graph in figure 4(c) is not only responsible for the 1/u behaviour of F (2) (g → qg), but it also contributes to the LO splitting g → gg followed by one step of LO evolution with the splitting of a gluon into qq. At small u, one has |F qg | in all relevant colour channels, so that |F (1) qg | will quickly increase in size when evolved to higher scales.

Conclusions
DPDs at small inter-parton distance y are dominated by the splitting of one parton into two. We computed the kernels that describe this splitting at order a 2 s for all colour configurations of the observed partons. All partons are taken to be unpolarised. The colour dependence of the DPDs is classified by a pair of irreducible representations R 1 and R 2 of the colour group. For representations different from the singlet, Sudakov double logarithms in the renormalisation scale µ and the rapidity parameter ζ p appear in the splitting kernels. These logarithms go along with rapidity divergences in the intermediate stages of the calculation, which need to be regularised appropriately. We computed the kernels independently with the Collins regulator, which uses spacelike Wilson lines, and with the δ regulator of Echevarria et al. Intermediate results differ in the two cases, for instance by the order of ultraviolet poles in DPD matrix elements and the soft factor, but in both cases we find complete cancellation of ultraviolet and rapidity divergences when all building blocks are assembled. The resulting splitting kernels are identical for both regulators. In the context of multi-loop calculations, the Collins regulator is more involved because eikonal propagators depend not only on plus-momenta. However, in the limit of large Wilson line rapidities, these propagators take a simple distributional form. With this form, loop integrals are no more difficult than those to be computed with the δ regulator, where eikonal propagators depend only on plus-momenta. Whilst in our two-loop calculation, there is at most one eikonal propagator per graph, we see no impediment to using the same procedure at higher orders. At order a s , the splitting DPDs for a given parton combination are proportional to each other for the different colour channels R 1 R 2 . This no longer holds at order a 2 s , where the colour dependence of the splitting kernels presents a rather varied picture. Nevertheless, we identified a number of simple patterns.
1. At two-loop accuracy, the splitting DPD for gg in decuplet representations is zero, and the distributions 8A F qg and 8S F qg for the gluon in the antisymmetric and symmetric octet are proportional to each other.
2. At two loops, terms containing Sudakov double logarithms are proportional to the oneloop splitting result times the one-loop anomalous dimension (2.6) of the Collins-Soper kernel for the rapidity dependence of the DPD.
3. A threshold logarithm log(1 − x 1 − x 2 ) appears in the splitting DPDs at two loops and is accompanied by a renormalisation group logarithm L = 2 log(yµ/b 0 ). Its coefficient is nonzero for the distributions 11 F gg , 27 27 F gg , 11 F qq , 88 F qq , and 11 F qg , but vanishes for gg and qg distributions in the octet channels.
4. When the sum z = z 1 +z 2 of parton momentum fractions in the two-loop kernel becomes small, a leading 1/z 2 behaviour is found for g → gg, q → gg, and g → qq. This gives rise to a high-energy logarithm log(x 1 + x 2 ) when convoluted with PDFs of a suitable shape, as specified in (4.44). If µ equals µ y = b 0 /y, the leading 1/z 2 terms for different channels R 1 R 2 scale exactly like the one-loop kernels, but for other values of µ they do not.
5. There is no simple pattern for the colour dependence of splitting DPDs in the limit of small u = x 1 /(x 1 + x 2 ). The strongest enhancement in this limit is by a factor u −1 log u. It appears in the g → gg and q → gq channels and goes along with a renormalisation group logarithm L.
6. In the triple Regge limit x 1 x 1 + x 2 1, a double enhancement like u −1 log(x 1 + x 2 ) appears in 11 F gg , SS F gg , and 27 27 F gg for PDFs of a suitable shape. It originates from a 1/(uz 2 ) behaviour of the splitting kernels and is accompanied by a factor L. In the case of SS F gg , the subprocess q → gg is enhanced in this way, whereas g → gg is not. The behaviour of other DPDs depends on the order in which the limits x 1 x 1 + x 2 and x 1 + x 2 1 are taken.
7. Because of charge conjugation invariance, the distributions AS F gg or SA F gg receive no contribution from g → gg, whilst q → gg andq → gg contribute with opposite sign. The splitting formula for these channels hence involves the valence distributions f q − fq.
At two-loop accuracy we find that AS F gg = − SA F gg .
Full analytic expressions of the kernels are given in the ancillary files associated with this paper on arXiv. A detailed numerical study of two-loop splitting effects in DPDs is beyond the scope of this work. However, the plots in section 5 show a few examples in selected kinematics. We find that for gg distributions, the a 2 s kernels provide corrections not exceeding a few 10% relative to the a s terms if we evaluate the splitting formula at µ = x 1 x 2 ζ p = µ y . However, when we change µ by a factor of 2 in either direction, the a 2 s corrections become of order 100% and larger in parts of the x 1x 2 plane. To obtain reliable perturbative results for scales different from µ y , it is therefore better to evaluate the splitting formula at µ y and then use the appropriate evolution equations. The distribution AS F gg is found to be tiny compared with distributions that receive a contribution from g → gg splitting. For qg distributions at µ = x 1 x 2 ζ p = µ y , we find small to moderate a 2 s corrections for x 1 ∼ x 2 and x 1 > x 2 . For x 1 x 2 , however, the a 2 s contribution from g → qg splitting completely dominates F qg , because it has a 1/u enhancement that is absent in the q → qg channel up to that order. With the results of the present paper, we have added a further ingredient for an analysis of double parton scattering at NLO. An extension to the case of polarised DPDs seems straightforward with the methods developed here and in [64]. Complications due to the use of γ 5 and µνρσ in dimensional regularisation can be avoided using the procedure employed in [78]. We intend to come back to this in future work.

A Colour space matrices
In this appendix, we list the explicit matrices in colour space that appear in the relation (2.25) between DPDs for definite colour representations for two-gluon distributions F aa bb gg . These matrices are obtained from equation (I.11) of [78]. The matrices for antiquarks are the same as for quarks. Note that according to the relation between quark and antiquark operators given below (2.24), the index i in O ii q refers to the quark in the amplitude, whereas in O ii q it refers to the antiquark in the conjugate amplitude.
Charge conjugation. Let us see how the colour projected products of twist-two operators transform under charge conjugation. One can use light-cone gauge A + = 0 for this purpose, which avoids the need to transform Wilson lines. Denoting the charge conjugation operator in the space of fields by U c , we have where the transformation for O g follows from the transformation t a ij U c A µa U † c = −t a ji A µa of the gluon potential. The space-time arguments of fields do not change under charge conjugation. We see that charge conjugation is tantamount to interchanging quark and antiquark labels and transposing the colour indices in the fundamental representation, provided that one contracts each adjoint index a of gluonic operators with a generator t a . To make use of this, we rewrite the colour matrices M R 1 R 2 as After this, all generators t a in M R 1 R 2 are associated with the adjoint index a of a gluon operator. The charge conjugation behaviour of the contraction The sign factor η(R) is −1 for R = A and +1 otherwise, and it understood thatḡ = g. The minus sign for each antisymmetric octet reflects the fact that the tensor f aa c leads to a commutator [t a , t a ] in (A.5) and (A.6). Furthermore, we see that charge conjugation interchanges decuplet and antidecuplet.
transverse loop integrals. We find, however, that such integrals do not occur for the graphs Integrating over a test function, we have for the difference of the two versions For sufficiently small σ, we can replace ψ(t) with ψ(0) in the first term on the r.h.s., and for the second term we can use the mean value theorem for integrals. This gives with some value t σ ∈ [2σ, 1] different from the one that appears in (D.3). Rescaling the integration variable as t = στ , we finally obtain which concludes our proof. The distributional identity (3.17) we use with the δ regulator has the same form as (D.1) with t 2 + σ 2 (1 − t) 2 instead of t 2 − σ 2 (1 − t) 2 on the l.h.s. The PV prescription is not needed in that case. The proof of this identity proceeds in analogy to what we have shown, with the simplification that one does not need to split integrals into two parts as we did in (D.2) and (D.8).