Proof of sum rules for double parton distributions in QCD

Double hard scattering can play an important role for producing multiparticle final states in hadron-hadron collisions. The associated cross sections depend on double parton distributions, which at present are only weakly constrained by theory or measurements. A set of sum rules for these distributions has been proposed by Gaunt and Stirling some time ago. We give a proof for these sum rules at all orders in perturbation theory, including a detailed analysis of the renormalisation of ultraviolet divergences. As a by-product of our study, we obtain the form of the inhomogeneous evolution equation for double parton distributions at arbitrary perturbative order.


Introduction
In high-energy hadron hadron collisions, two or more partons in each incoming hadron may simultaneously take part in hard-scattering subprocesses. The importance of such multiparton interactions increases with the collision energy, because the density of partons in a hadron increases as their momentum fraction becomes smaller. Theoretical predictions with highest possible accuracy are required for a wide range of final states and kinematic regions in order to use the full potential of the LHC and of possible future hadron colliders. It is therefore highly desirable to develop a theory and phenomenology of multiparton interactions based on first principles in QCD. The most important contribution typically comes from double parton scattering (DPS), which is the subject of this work. To compute DPS cross sections one needs double parton distributions, which give the probability density for finding two specified partons inside a hadron. Our knowledge of these distributions is still very poor, given their dependence on several variables and the difficulty to separate the DPS contribution to a physical process from the contribution due to single hard scattering. Often a e-mail: markus.diehl@desy.de one makes the simplest assumption that the two partons are entirely uncorrelated. This can at best be a first approximation. In the region of relatively large momentum fractions many studies in dynamical models find that correlations are actually strong (see [1] for a recent review).
In such a situation, theoretical constraints on double parton distributions (DPDs) are valuable. One type of constraint comes from the mechanism in which the two partons originate from the short-distance splitting of a single parton. This mechanism, which we will call 1 → 2 splitting, dominates DPDs at small distance between the two partons and can be expressed in terms of perturbative splitting functions and the usual single parton distributions (PDFs). The impact of 1 → 2 splitting on DPDs and DPS cross sections has been investigated from several points of view, see e.g. [2][3][4][5][6][7][8][9][10][11][12][13]. A different constraint is provided by the DPD sum rules proposed by Gaunt and Stirling [14], which express the conservation of quark flavour and of momentum and nicely fit together with the interpretation of DPDs as probability densities. Attempts to construct DPDs satisfying these sum rules were made in [14,15]. A study of W + W + and W − W − production with the DPDs proposed in [14] can be found in [16].
DPDs depend on a renormalisation scale, just like ordinary PDFs. This dependence is described by a generalisation of the DGLAP evolution equations. Due to the 1 → 2 splitting mechanism, these equations contain an inhomogeneous term whose form at leading order (LO) has been extensively discussed in the literature [17][18][19][20]. Gaunt and Stirling noted that if the sum rules they postulated are valid at some scale, then their validity is preserved by LO evolution to any other scale [14]. More detailed analyses of how the momentum sum rule remains valid in the presence of parton splitting can be found in [8,21]. Still lacking is however a full proof that the sum rules hold at some starting scale. A derivation using the light-cone wave function representation is given in appendix C of [22]. This framework formalises the physical picture of the parton model and in particular allows one to keep track of kinematic and combinatorial factors. An explicit analysis of ultraviolet (UV) divergences and of the associated scale dependence is however missing in [22]. The same holds for rapidity divergences that are present in light-cone wave functions (but cancel in the DPDs appearing in the sum rules).
The aim of the present paper is to provide an explicit proof of the DPD sum rules in QCD. After defining DPDs and stating the sum rules in Sect. 2, we show in Sect. 3 how the sum rules arise at the first nontrivial order in a simple perturbative toy model. We recall the essentials of light-cone perturbation theory in Sect. 4 and then use this formalism in Sect. 5 to give an all-order proof of the sum rules for bare (i.e. unrenormalised) DPDs. In Sect. 6 we show how DPDs are renormalised and establish that the sum rules remain valid if UV divergences are subtracted in a suitable scheme. In Sect. 7 we explore the consequences of our analysis on the evolution of DPDs: we obtain the general form of the inhomogeneous term beyond LO (confirming the NLO result given in [20]), we derive sum rules for the associated evolution kernels, and we cross check that the sum rules are preserved by evolution at any order in α s . Our findings are briefly summarised in Sect. 8.

Definitions and sum rules
In this section, we recall the definition of PDFs and DPDs and then state the sum rules to be proven in the remainder of this work. We restrict ourselves to unpolarised partons and transverse-momentum integrated distributions throughout. Bare PDFs and DPDs are defined as where n j = 0 if parton j is a fermion, n j = 1 if it is a gluon, and n j = −1 if one considers scalar partons. We use lightcone coordinates v ± = (v 0 ± v 3 )/ √ 2 for any four-vector v μ and write its transverse part in boldface, v = (v 1 , v 2 ). The twist-two operators read for unpolarised quarks, antiquarks and gluons, respectively and are constructed from bare (i.e. unrenormalised) fields. The transverse index i in (2) is to be summed over. We only consider colour-singlet DPDs here, so that the colour indices of the quark or gluon fields in (2) are implicit and to be summed over. For scalar partons we have Throughout this work, we use light-cone gauge A + = 0, so that Wilson lines do not appear in the above operators. Note that, just like ordinary PDFs, colour-singlet DPDs are free of rapidity divergences. Modifications of the standard lightlike Wilson lines in Feynman gauge (or their equivalents in other gauges) can be employed to regulate such divergences as discussed for DPS in [23,24], but this is of no concern for our present work. The DPD in (1) depends on the transverse distance y between the two partons. The discussion of sum rules requires introducing their Fourier transform to transverse momentum space, where D = 4 − 2ε is the number of space-time dimensions in dimensional regularisation. For the discussion of graphs in the next sections, it is useful to introduce the momentum space Green functions in terms of which we have and The above distributions contain ultraviolet divergences and require renormalisation. Renormalised distributions will be denoted without the subscript B and have an additional dependence on the renormalisation scale μ. We postpone a detailed discussion to Sect. 6 but already note here that the integral over y in (4) diverges at y = 0 in D = 4 dimensions and hence requires additional renormalisation. This divergence is due to the 1 → 2 splitting mechanism mentioned in the introduction and leads to the inhomogeneous term in the evolution equations for DPDs. It only appears for DPDs depending on the momentum , but not for their counterparts depending on the distance y, as was already noted in [2]. The DPD sum rules hold for the momentum space distributions at zero transverse momentum, which we abbreviate as Up to the additional renormalisation just mentioned, these distributions correspond to the integral of F j 1 j 2 (x i , y; μ) over all y, as can be seen in (4). The momentum sum rule for DPDs reads whereas the conservation of quark flavour in QCD corresponds to the number sum rule where j 2 denotes a quark or an antiquark. The parton label j v indicates the difference of parton and antiparton distributions, i.e. F j 1 j 2,v = F j 1 j 2 − F j 1 j 2 . The number of valence partons j in a hadron is denoted by N j v , so that e.g.
Using the convention that j denotes a quark if j is an antiquark, we have N u v = 2 in an antiproton.

Analysis of low-order graphs and its limitations
In this section, we show for a simple example how the DPD sum rules for bare distributions can be obtained from Feyn-man graphs in covariant perturbation theory. We consider all graphs at lowest order in α s . In a second example, we exhibit the limitations of this covariant approach. This leads us to use light-cone perturbation theory in the following sections, where we formulate a proof that is valid at all orders in α s . It is clear that neither covariant nor light-cone perturbation theory are suitable for actually computing parton distributions, which are non-perturbative objects. We must thus assume that general properties of Green functions -in our case the sum rules -remain valid beyond perturbation theory. This is similar to the spirit of perturbative proofs of factorisation in QCD [25,26].
We use a toy model with scalar "quarks" of two flavours, u and d, which we take to be mass degenerate. The coupling between these quarks and the gluons is as required by gauge invariance. We consider a scalar "hadron" that has a pointlike coupling to u andd and compute parton distributions at lowest order in perturbation theory.

Sum rules with a gluon PDF
Let us first consider the case in which parton 1 in the sum rules is a gluon. At lowest order, the gluon PDF appearing on the r.h.s. of the sum rules is given by four graphs, two of which are depicted in Fig. 1. The other two are obtained by reversing the arrows on the quark lines (so that in graph a the gluon couples to thed instead of the u). They lead to identical expressions due to the symmetries of our model, viz. charge conjugation and the identical masses of the two quark flavours. The contributions of the graphs to the gluon PDF are where the integration element d P DF is given by Here g denotes the strong coupling and m the quark mass. For simplicity we set the coupling between the hadron and the quarks to 1. A factor of two for the diagrams with reversed arrows on the quark lines is included in these expressions. For brevity, we omit the subscript B for bare distributions throughout this section.
To obtain the graphs for the DPDs appearing in the sum rules, one can start from the PDF graphs and insert the operator for parton 2 on one of the lines that go across the final state cut in the PDF, as illustrated in Fig. 2. For our specific case, the result of this procedure is shown in Fig. 3. The operator (3) for scalar quarks or antiquarks simply provides a factor 1 in the graphs, and we obtain where the integration elements d D P D,i (i = 1, 2) are given by with 1 = p − k 1 − k 2 and 2 = k 2 . Notice the particular form of the second momentum fraction arguments in F gd , which will turn out to be useful for showing the sum rules. Reversing the arrows on the quark lines in Fig. 3, one obtains the remaining one-loop contributions to the DPDs with one gluon. Given the symmetries of our model, we have . . .

(a)
Graph for a PDF . . .

Fig. 2
Transition from a given PDF graph to a corresponding DPD graph Fig. 3 Graphs for gluon-quark or gluon-antiquark DPDs corresponding to the gluon PDF graphs in Fig. 1. The loop momenta are defined in (19) and (20), and each graph is for the momentum fractions specified in (15)- (17). Four more graphs are obtained by reversing the direction of arrows on the quark lines p p and likewise for the other three graphs.
We notice that the expressions from PDF and DPD graphs that correspond to each other are already quite similar . While the momentum dependent part of the numerators agrees exactly, the main difference are the two additional denominator factors in the DPD, which are for the line that runs across the final state cut in the PDF. In order to get rid of these, we perform the integrations over their minus momentum components using the theorem of residues. After suitable variable substitutions, we can close the integration contour in such a way that we only pick up the poles of the two lines corresponding to parton 2. This removes the additional denominator factors in the DPD and sets the corresponding momenta to their on-shell value, just like it is the case in the PDF. The relevant substitutions read for F gd 1.2 and F gd 2.2 , where in the last case k − 2 is kept as an integration variable. The integration over k − sets the line corresponding to parton 2 on the l.h.s. of the cut to its on-shell value, while the same is achieved on the r.h.s. of the cut by integrating over k − . After performing the integrations over all minus components, we obtain for the expressions in (11)-(17) and where we have abbreviated (25) and the measure for the remaining integrations is The similarity between PDF and DPD expressions has become very close. To show the sum rules, we note that at order α s , the only nonzero DPDs involving a gluon are those we have already discussed. In particular, F gū , F gd and F gg only appear at order α 2 s . For the number sum rule for u quarks, we first combine all contributions from graphs 1.1 and 1.2: In the first step we used the symmetry between u andd in our model, and in the second step we performed a change of variables in F gd . The last equality is easily seen from the explicit expressions in (21) and (23). One readily derives the analogue of (27) for the contributions from graphs 2.1 and 2.2 and hence for the sum over all graphs. Since N u v = 1 in our model, this shows that the number sum rule for u quarks is fulfilled. In the same manner, one can show the number sum rule ford quarks. For the momentum sum rule, we start again with the contributions from graphs 1.1 and 1.2: The first two steps are justified just like their analogues in (27), and the last two steps make again use of (21) and (23). An analogous relation can be derived for graphs 2.1 and 2.2 and thus for the sum over all graphs, which confirms the momentum sum rule.
We have seen that both sum rules hold individually for each PDF graph and the set of corresponding DPD graphs. This is already suggested by Fig. 2 and will remain true in the all-order proof in Sect. 5. However, a crucial step in the preceding derivation was that for each DPD graph we could perform the integrations over minus momenta in such a way that, after applying the theorem of residues, the momentum of parton 2 to the left and to the right of the final state cut was set on shell. This is not readily possible for other graphs, as we shall now see.

Sum rules with a quark PDF
We consider now the case in which parton 1 in the sum rules is a u quark (the expressions for d quarks are identical for symmetry reasons). At order α s the number of graphs contributing to each PDF is much higher than in the previous section. There are the real emission graphs in Fig. 4, as well as graphs with a cut quark loop and a vertex or propagator correction to the left or to the right of the cut. For the graph in Fig. 4a and the corresponding graphs for DPDs, one can establish the validity of the sum rules exactly as in the previous section. This situation is however different for the remaining graphs of Fig. 4.
To illustrate where the problem is, let us consider the graph in Fig. 4b and the corresponding DPD graphs in Fig. 5. Before integrating over minus momenta, we make the same change of variables as previously, using (19) for F ug 2.1 and (20) for F ud 2.1 . At this point, one finds that in both graphs it is impossible to close integration contours such that one picks up only the propagator poles of parton 2, given that on the r.h.s. of the Fig. 4 Real emission graphs contributing to quark or antiquark distributions. Four more graphs are obtained by reversing the arrow on the quark line cut their pole in k − is on the same side of the real axis as the propagator pole of thed that directly couples to the hadron. Performing the integrations over the minus momenta, we find and where and d is defined as before in (26). Whereas the first term in (30) has the desired structure for establishing the sum rules, this is not the case for the second term, which originates from the "unwanted" propagator pole when the minus momentum integrals are performed using residues. We have explicitly checked that this extra term disappears when one sums over all contributing graphs, and the sum rules remain valid also in this example. However, the simple proof we used in the previous section no longer works. This situation bears some similarity to the proof of cancellation of Glauber gluons in single [25,26] or double hard scattering [27]. For simple cases, this cancellation can be established in covariant perturbation theory, using the theorem of residues for integrations over minus momenta in a similar way as here, but for more complicated graphs, that method turns out to be cumbersome [27]. It is not clear whether a general proof could be given at all in that framework. In the next sections we will see that light-cone perturbation theory provides a powerful tool for proving the sum rules at all orders in the strong coupling, as it is for establishing Glauber gluon cancellation [25][26][27] .

Light-cone perturbation theory
Light-cone perturbation theory (LCPT, also called lightfront perturbation theory) is quite similar to old-fashioned time ordered perturbation theory, with the difference that the vertices of a graph are ordered in "light-cone time" The rules of LCPT can be derived from regular covariant perturbation theory by performing the integrations over all internal minus momenta, thus setting all internal lines on-shell. This is for instance shown in chapter 7.2.3 in [26], whose normalisation conventions we adopt in the following. Further discussion of LCPT can be found in [28][29][30][31][32][33][34][35].
Because in LCPT all internal lines are treated as on-shell, PDF and DPD graphs with identical x + ordering of vertices automatically have the same denominator structure, which we will use to prove the validity of the DPD sum rules to all orders for bare distributions. For brevity we give only the basic rules of LCPT here and discuss details and subtleties when we encounter them. As in the rest of this paper, we use light-cone gauge A + = 0 for the gluon.
1. Starting from a given Feynman graph, one assigns a lightcone time x + j to each vertex and considers all possible orderings of the x + j . When drawing LCPT graphs, we follow the convention that x + increases from left to right on the l.h.s. of the final state cut, while on the r.h.s. it increases from right to left. 2. Coupling constants and vertex factors are the ones known from covariant perturbation theory. An exception are momentum dependent vertices, which are discussed below.
3. Plus and transverse momentum components, k + l and k l , of a line l are conserved at the vertices. 4. For each propagating line l in a graph, one has a factor 1/(2k + l ) together with a Heaviside step function (k + l ) if the routing of k l is from smaller to larger values of x + . 5. Loop momenta are integrated over their plus and transverse components with measure 6. For each state i between two vertices at consecutive lightcone times where P − i is the minus component of the sum of all external momenta entering the graph before x + i . The sum is over the on-shell values of the minus components of all lines l between x + i and x + i+1 .
For particles with spin, one furthermore has to consider that the dependence of the propagator numerators on the particles minus momenta leads to a separation into propagating and instantaneous contributions. Decomposing the covariant four-momentum as where by definition k and k os only differ in their minus components, one can rewrite the covariant fermion propagator as The first term describes the propagation of a fermion and the second term the propagation of an antifermion, both with positive plus momentum as stated in point 4 above. The third term is independent of k − and hence instantaneous. In LCPT graphs it gives rise to a vertical fermion line whose ends are associated with the same light-cone time x + . For the gluon propagator in light-cone gauge, we have where n is the light-like vector projecting on plus components, kn = k + . In analogy with (36), the first term in (37) can be further decomposed into parts with (k + ) or (−k + ). The last term in (37) is again instantaneous. The decomposition (35) has to be made for all Feynman rules in covariant perturbation theory that have a dependence on minus momenta in the numerator, in particular for the three-gluon vertex. We will however only be concerned with gluon lines that are either internal to a graph or associated with the twist-two operator (2) for an observed parton. Any gluon vertex in a graph then has all its Lorentz indices contracted with a gluon propagator. The difference (k −k os ) μ ∝ n μ between a covariant and an on-shell momentum gives zero when contracted with G μν g , so that in the three-gluon vertex we can simply replace k with k os in the numerator factor.

All-order proof for bare distributions
In this section we derive the DPD sum rules for bare, i.e. unrenormalised, distributions. We consider graphs in LCPT at arbitrary fixed order in α s , using perturbation theory in the same spirit as discussed at the beginning of Sect. 3. Having established the sum rules for any fixed order in α s , one immediately obtains their validity for the sum over all perturbative orders.

Representation of PDFs and DPDs in LCPT
To begin with, we adapt the representations (6) and (7) of PDFs and DPDs in terms of Green functions to the LCPT formalism. For the bare PDF we find where The label g now specifies a cut LCPT graph, which has a definite light-cone time ordering of vertices. We label the independent momenta in a graph by k i , always starting with k 1 for the observed parton. The total number of independent momenta is N (g) − 1, and M(g) − 1 of these go across the final state cut. We collectively write {x} and {k} for the set of light-cone momentum fraction and transverse momentum arguments of a graph. As already discussed in Sect. 3, we can obtain DPDs from the PDF for parton j 1 by inserting the operator for a second parton on one of the final state parton lines. The result is where we have a sum over all parton lines l on which the operator for parton j can be inserted, with δ j, f (l) selecting the parton type and δ(x l −z) the plus momentum fraction. Notice that both the plus momentum and the transverse momentum components of the two observed partons are equal on both sides of the final state cut. The former is always the case for a DPD, whilst the latter holds because for the sum rules we are considering the case = 0.
Starting from (7) we have arrived at (40) by making the variable substitutions and performing the integrations over k − and k − . As described in Appendix B of [27], the result of these integrations is that on each side of the final state cut the two vertices corresponding to the operator insertions for the observed partons j 1 and j are associated with the same light-cone time x + . This is not surprising, because the two operators O j 1 and O j 2 in the definition (1) of a DPD are taken at the same light-cone time.
We now perform the integration over k − 1 in the expression (38) of the PDF. The following is a simplified version of the argument in chapter 14.4.3 of [26]. A general LCPT graph for the PDF, illustrated in Fig. 6, can be written in the form and The "numerator" in (42) includes vertex factors, propagator numerators and factors of ±i, and its detailed form is not relevant for the present argument. The graph is cut across the final state F A , and the sums in I , I and F are over intermediate states ξ either before or after the light-cone time of the operator insertion. Let us now take the sum over all graphs g that differ only by the state F A where the cut is made but are otherwise identical. Numbering the states in F(F A ) from 1 to N , we can write where we abbreviated D f = l∈ f k − l,os . Rewriting the δ function in this expression as we find Using the theorem of residues, one can easily see that this expression vanishes for N ≥ 2 when one integrates over k − 1 . For N = 1 on the other hand, the initial δ function in (40) is reproduced. Thus one can conclude that for a PDF only needs to consider those x + orderings of the vertices for which there are no states "later", i.e. with greater x + , than the operator insertion vertices H and H on each side of the final state cut.
The preceding argument can be repeated for the K − integration in the expression (40) for a DPD, and one finds again that only time orderings with no intermediate state after the operator insertions contribute. As a result, we have . . . Fig. 7 LCPT graphs for a PDF or a DPD. It is understood that the subgraphs denoted by blobs are identical in panels (a) and (b). One may interchange the time ordering between the vertices V 1 and V 2 , and independently the time ordering between the verticesṼ 1 andṼ 2 where the sum over all graphs g can be restricted to the time orderings just discussed. We integrated the DPD over its second momentum fraction z with weight z m , as is required for the sum rules (where m = 0 or m = 1).

Equality between PDF and DPD graphs
The next step in the proof is to establish the equality for all graphs g and all partons l that contribute in (48) and (49). This includes the statement that there is a unique correspondence between the LCPT graphs g that contribute to f j 1 and those that contribute to F j 1 j , as indicated in Fig. 7.
We already showed in the last section that in a PDF graph the vertex H of the twist-two operator insertion must be later in light-cone time than the vertex V 2 where the final state line l leaves the graph. Let us now show that there can be no instantaneous propagator attached to a twist-two operator insertion. The vertex V 1 where parton 1 leaves the graph must then come before H as well, and in a DPD graph both V 1 and V 2 must come before H . In both PDF and DPD graphs, V 1 may come before or after V 2 . Corresponding statements hold forṼ 1 ,Ṽ 2 andH to the right of the cut.
To show this, we identify the vertex rules (in momentum space) associated with the unpolarised twist-two operators (2). For quarks we get a factor γ + /2 that connects the Dirac indices of the parton to the left and the right of the cut, and for antiquark we get a factor −γ + /2. This gives zero when multiplied with the instantaneous part of the fermion propagator (36) because (γ + ) 2 = 0. In light-cone gauge A + = 0, the twist-two operator for gluons gives a factor (k + ) 2 δ ii . Here k + denotes the gluon plus momentum, which is equal on both sides of the cut. The index i (i ) denotes the gluon polarisation index to the left (right) of the cut and is restricted to be transverse. This gives zero when contracted with the instantaneous part of the gluon propagator (37).
We thus find that LCPT graphs contributing to a PDF are related to those contributing to a DPD by inserting the operator for the second parton on a final state line in the PDF graph. This is analogous to the statement we used for Feynman graphs in Sect. 3 but now includes the statement about the relative time orderings between vertices. With PDF and DPD graphs having the same time ordering of vertices, they have identical light-cone energy denominators (33). We now compare the numerator factor associated with the line l selected by the operator for parton 2 in the DPD and the factor associated with the corresponding final state line in the PDF. In the DPD, the momenta k and k carried by l to the left and the right of the cut have the same plus and transverse components, as already noted after (40). Their on-shell values are hence identical as well, i.e. k os = k os .
Furthermore, k is equal to the momentum of the corresponding final state line in the PDF. If l is a quark, we get in the DPD graph. Here a factor (/ k os + m)/(2k + ) appears for each propagating line (see (36)), the factor γ + /2 comes from the twist-two quark operator, and the factor 2 on the l.h.s. is taken from the l.h.s. of (50). On the r.h.s. of (52) we recognise the factor for the final state line l in the PDF graph, which proves (50) for quarks. The same argument is easily repeated for antiquarks. If l is a gluon, we get in the DPD graph, where the factor 2/k + = 2(x l p + ) −1 on the l.h.s. is again taken from the l.h.s. of (50). The r.h.s. of (53) is the factor for a final state gluon in the PDF graph, which completes the proof of (50).

Number sum rule
We can now insert the relation (50) into the expression (49) for the integral of the DPD over its second momentum argument and set the power m = 0. The validity of the number sum rule for a bare DPD then requires that It thus remains to show that where j 2 denotes either a quark or an antiquark. The sum over l on the l.h.s. gives the number of partons with flavour j 2 crossing the final state cut in the PDF graph, minus the corresponding number of partons with flavour j 2 . If the observed parton j 1 in the PDF has flavour j 2 (j 2 ), that number is increased (decreased) by 1. The result is obviously equal to the difference of partons with flavour j 2 and those with flavour j 2 in the hadron, which is indeed N j 2,v . The number sum rule is thus verified.

Momentum sum rule
Inserting the relation (50) into (49) with m = 1, we find that the momentum sum rule holds if g l This just means that the sum over the momentum fractions x l of all partons crossing the final state cut in a PDF graph must be equal to 1 − x 1 , which is a direct consequence of momentum conservation. This concludes our proof of the sum rules for bare DPDs.

Renormalisation
Up to now, we have essentially shown that the parton model interpretation of the DPD sum rules is reflected in the graphs that represent single or double parton distributions in QCD, where quark number and parton momentum are conserved quantities. However, these graphs have short-distance singularities that must be renormalised. It is known that the literal interpretation of PDFs as probability densities can be invalidated by renormalisation. This is most obvious for the positivity of the distributions, because one has to subtract terms that become infinite if the UV regulator is removed. It is hence important to establish whether the sum rules retain their validity after renormalisation in a specified scheme. We will show that this is indeed the case for the MS scheme. As in the previous section, our arguments are valid at arbitrary order in α s .

Convolution integrals
The calculations in the following sections make heavy use of multiple convolution integrals involving functions of one or two momentum fractions. To keep expressions readable, we introduce a shorthand notation that avoids giving the explicit arguments of the integrands, and we derive some useful rules of computation. In the following let D be a function of two momentum fractions, while A, B, C are functions of one momentum fraction only. We define where the integration boundaries are determined by the support properties of the functions under the integral. Specifically, when a one-variable function, say A, is a PDF we have A(x) = 0 if x < 0 or x > 1, and when a two-variable function is a DPD we have D( The same properties hold also for the associated evolution kernels and renormalisation factors (which may include delta and plus distributions at the endpoints of their support). The convolution with respect to the second argument of D is introduced in analogy to (57) and denoted by ⊗ 2 . We then have a combined convolution A second type of convolution integral we will encounter is Interchanging the order of integrations, it is straightforward to show that so that we can write these convolutions without the brackets. Using the support properties of the functions stated above and making the integration boundaries explicit, one also has where ⊗ denotes the usual Mellin convolution for functions of a single momentum fraction. As a shorthand for integrals over momentum fractions, we write for functions of one or two momentum fractions, respectively. We further introduce operators X n and X n 2 that act on a function by multiplying with a power of the appropriate momentum fraction, such that in convolution integrals X n A and X n 2 D can be used without explicitly giving their momentum arguments. For functions of one argument, one has the well-known rules and it is easy to see that for functions of two momentum fractions 2 We will also make use of the following identity: where the subscript on the convolution symbol in the last line indicates that the result depends on the momentum fraction x 1 .

Renormalisation of DPDs
We will now analyse how the DPDs in the sum rules, i.e. momentum space DPDs evaluated at = 0, have to be renormalised. Let us first consider position space DPDs F B (y). These have short-distance singularities for each of the twisttwo operators in their definition (1). For a single-parton distribution these singularities are renormalised with factors Z i, j (x; μ, ) as follows: Correspondingly a bare DPD is renormalised with a Z factor for each parton, i.e.
The Fourier transform (4) from position to momentum space involves an integration over all y and produces an additional short-distance singularity, which is due to the 1 → 2 splitting mechanism. As discussed in [13,36], this mechanism dominates the DPD at small y and gives with perturbative coefficient functions V B (x 1 , x 2 , y; μ) that depend on y via powers of (yμ) 2ε α s (μ). The Fourier transform of (69) w.r.t. the transverse distance y has a logarithmic singularity in D = 4 dimensions and gives a simple pole 1/ε for D = 4 − 2ε. Notice the difference between this and the UV divergences in the twist-two operators, which lead to higher powers of 1/ε with increasing powers of α s . The splitting singularity in F B ( ) is renormalised additively with a renormalisation factor Z i 1 i 2 , j (x 1 , x 2 ; μ, ) depending on two momentum fractions: Introducing single and double Mellin moments by the integrals one easily sees that (70) turns into in agreement with the leading-order analyses in [17,18].
The renormalisation factor for PDFs has a perturbative expansion whereas its analogue for the splitting singularity has no treelevel term: For later use, we define the inverse Z −1 of the PDF renormalisation factor by The expansion (73) implies a corresponding expansion can easily be expressed in terms of Z (n) i, j by solving (75) order by order in α s . At first order, one simply has Z Implementation of the MS scheme The derivations in the remainder of this paper will be significantly simplified by using a particular implementation of the MS renormalisation scheme. We start with the definition of this scheme given in Sect. 3.2.6 of [26], where the bare and the renormalised couplings are related by and a generic renormalisation factor reads where the order M(n) of the highest pole depends on the quantity being renormalised. The tree-level value Z (0) is not important here. The coefficients B nm and Z nm are independent of ε, but Z and thus Z (0) and Z nm may depend on additional variables like momentum fractions. The standard choice for the factor S ε is S ε = (4π e −γ ) ε , where γ is the Euler-Mascheroni constant. The alternative S ε = (4π) ε / (1 − ε) was proposed in [26], and the following arguments are valid in both cases. The counterterms in (77) and (78) contain finite parts that result from multiplying powers of 1/ with powers of S . We now define a second renormalisation scheme by and where B nm and Z nm are again independent of ε. The counterterms in this scheme are pure poles in . This will be essential for the arguments in the following sections. Let us show that the two schemes defined by (77), (78) and by (79), (80) give the same renormalised quantities at ε = 0. To this end we introduce a rescaled strong coupling so that (79) and (80) become and Consider now a renormalised quantity R(α s , ε, B nm , Z nm ) in the first scheme and its counterpart R (α s (ε), ε, B nm , Z nm ) in the second scheme. Because (77) and (78) have the same functional form as (82) and (83), the renormalised quantities in both schemes also have the same functional form, i.e.
The coefficients B nm and Z nm (B nm and Z nm ) are uniquely fixed by the requirement that renormalised quantities have no poles in ε when expanded in α s and ε (α s and ε). Since α s (ε) only differs from α s by terms of order ε, one must also obtain an expression without poles in ε when expanding R in α s and ε. As a consequence, the renormalisation coefficients in the two schemes are identical, B nm = B nm and Z nm = Z nm , so that (84) implies With α s (0) = α s we thus find that in the primed scheme the value of R at the physical point is In the original scheme, the value of R is at the physical point. Consider now a case in which the renormalised quantity is an observable (e.g. the hadronic vacuum polarisation). Then it must have the same value in the two schemes, and we can conclude that α s = α s . It follows that for any other quantity, including quantities that are not observables (such as renormalised PDFs or DPDs), the two schemes give the same result at the physical point ε = 0. For the standard choice S ε = (4π e −γ ) ε , the relation (79) takes the form of a minimal subtraction scheme with μ 2 = μ 2 /(S ε ) 1/ε = μ 2 e γ /(4π). This way of implementing MS subtraction is in fact well known in the literature. Our above argument shows that one can also use the implementation of (79) and (80) for different choices of S ε . In the remainder of this work, we will use this implementation, omitting the primes on α s , B nm and Z nm .

Number sum rule
We are now ready to prove the number sum rule for renormalised DPDs. Using the notation introduced in Sect. 6.1, we need to show that the difference is zero. Using the expression (70) for the renormalised DPD and the rules of computation from Sect. 6.1, we get The number sum rule for renormalised PDFs implies a sum rule for their renormalisation factors, which reads A proof can be found in Sect. 8.6 of [26]. We recall that in the case where i is a gluon we define ı = i, so that the above relation is valid for all parton labels. For the first term in (89) we thus have Here we have used the number sum rule for unrenormalised DPDs, as well as the relation between bare and renormalised PDF in the term proportional to N i 2,v . Putting all terms together, we obtain In the last step of (92) we expressed the bare PDF in terms of the renormalised one, using the inverse renormalisation factor introduced earlier. In the MS scheme, the perturbative expansion coefficients Z (n) i 1 i 2 , j and Z (n) i, j involve only pole terms in , and it is easy to see from (75) that the same is true for Z −1(n) i, j . The tree-level part of Z i 1 , j in (93) is proportional to δ i 1 , j and hence vanishes when multiplied with the combination of Kronecker symbols in parentheses. R i 1 i 2 k is hence a sum of pure pole terms in . Since i 1 i 2 is finite at = 0 according to its definition (88), we can conclude that R i 1 i 2 k = 0 and hence i 1 i 2 = 0. This proves the number sum rule for renormalised DPDs.
The previous argument implies that k R i 1 i 2 k ⊗ Z k, j = 0, which together with (93) yields a number sum rule for the renormalisation factor of the splitting singularity.

Momentum sum rule
The proof of the momentum sum rule for renormalised DPDs proceeds in close analogy to the previous subsection. We need to show that is zero. The first term of this expression can be rewritten as The momentum sum rule for renormalised single-parton distributions implies that i X Z i, j = 1 (97) for any j, as shown in Sect. 8.6 of [26]. Using this and the momentum sum rule for bare DPDs, we have and hence In this expression we can rewrite using (64). Multiplying (75) with x and again using (64), we see that x Z −1 i, j (x) is the inverse of x Z i, j (x) w.r.t. Mellin convolution and matrix multiplication. This allows us to write with The tree-level part of Z i 1 , j (x) cancels in this expression because it is proportional to δ(1 − x), so that in the MS scheme R i 1 is a sum of pure pole terms in . Since i 1 is finite at = 0, we must have R i 1 = 0 and hence i 1 = 0, which concludes our argument. (102) we obtain a momentum sum rule

DPD evolution and its consequences
The renormalisation of momentum space DPDs results in an inhomogeneous evolution equation, which at leading order in α s has been known for a long time [17,18]. Our all-order formulation of DPD renormalisation in Sect. 6.2 allows us to derive the form of the corresponding evolution equation at arbitrary order in α s . Differentiating (70) and using that bare distributions are independent of the scale μ, we obtain The DGLAP equations for the renormalised PDFs f i = where P i, j (x; μ) denotes the usual DGLAP evolution kernels. We therefore have We now define evolution kernels P i 1 i 2 , j (x 1 , x 2 ; μ) associated with the splitting singularity in DPDs: which is equivalent to and gives us the form for the inhomogeneous double DGLAP equation at arbitrary order in perturbation theory. Our result confirms the form given for NLO evolution in equation (16) of [20]. Comparing (108) with (109) we see that the renormalisation factor for the splitting singularity satisfies the same form of evolution equation as the renormalised DPD, in analogy to the renormalisation factor for PDFs in (105). With the definitions (71) for single and double Mellin moments, we find in agreement with the LO formulae derived in [17,18]. Equations (104)-(110) are valid in D = 4 − 2 dimensions, and whenever the l.h.s. of an equation is finite for → 0, it is understood that this limit can be taken. In the MS scheme, renormalisation factors contain only pole terms in , plus an independent tree-level term in the case of Z i, j (but not of Z i 1 i 2 , j ). An important consequence of this is that the evolution kernels P i, j and P i 1 i 2 , j are independent of . They must be finite for → 0, and any terms with a positive power of would induce positive powers of on the r.h.s. of (105) or (108). Such terms cannot appear, because the form of the renormalisation scale derivative in D dimensions implies that there are only terms of order n with n ≤ 0 on the l.h.s. of (105) and (108).
Using these results, we can obtain the kernel P i 1 i 2 , j by isolating the term of order 0 on the r.h.s. of (107). The only renormalisation factor with a tree-level term in that equation is Z −1 j,k , so that P i 1 i 2 , j is given by the term of order 0 in dZ i 1 i 2 , j /d ln μ 2 . Using (111), we thus find that where Res denotes the residue of the single pole in . Applying the same type of reasoning to (105), we obtain Res Z i, j x; α s (μ), .
Combining these relations with the sum rules (94) and (103) for the 1 → 2 renormalisation factor Z i 1 i 2 , j , we readily obtain corresponding sum rules 1−x 1 0 dx 2 P i 1 i 2,v , j (x 1 , x 2 ) = δ i 1 ,ı 2 − δ i 1 ,i 2 − δ j,ı 2 + δ j,i 2 P i 1 , j (x 1 ), for the 1 → 2 evolution kernels, where we have given momentum fractions explicitly instead of using the compact notation of Sect. 6.1. We note that the momentum sum rule (115) has the same form as a corresponding sum rule derived in [37,38] for "two-body inclusive decay probabilities", see equation (49) in [38]. These quantities were introduced to describe the evolution of hadronic jets. To investigate their relation with our 1 → 2 splitting kernels goes beyond the scope of this paper, but we note that the two types of functions start to differ at order α 2 s [39]. The derivation of the DPD sum rules in Sect. 6 did not refer to any particular value of μ and is hence valid independently of the renormalisation scale. As a consistency check, one can use the all-order form (109) of DPD evolution and the sum rules just stated to verify that the DPD sum rules (9) and (10) are stable under a change of scale. This means that the renormalisation scale derivative of their l.h.s. is equal to the renormalisation scale derivative of their r.h.s. Let us first show this for the number sum rule. Using our compact notation, the derivative of the l.h.s. of (10) can be written as according to (109) and the rules of computation from Sect. 6.1. In the first term of (116) we can use the DPD sum rule at scale μ and in the last term the sum rule (114) for the 1 → 2 evolution kernel. The second term is zero thanks to the sum rule for the PDF evolution kernels, which readily follows from (90) and (113). We thus get On the r.h.s. we recognise the derivative d f i 1 /d ln μ 2 and thus the appropriate derivative of the r.h.s. of the DPD number sum rule (10). For the momentum sum rule, we proceed in full analogy and start with d d ln μ 2 The second term on the r.h.s. is zero thanks to the momentum sum rule i X P i, j = 0 (120) for the DGLAP kernels. Using the DPD momentum sum rule at scale μ and the relation (115), we get d d ln μ 2 where in the last step we have used the relation (64). The last line of (121) is the scale derivative of the r.h.s. of the DPD momentum sum rule (9), as required. We note that the inhomogeneous term in the double DGLAP equation is essential for the preceding arguments to work. For leading-order evolution, this was already emphasised in [8,21].

Conclusion
The sum rules proposed by Gaunt and Stirling [14] present one of the few general constraints on double parton distributions that are currently known. This has motivated us to give a detailed proof for them in QCD. We saw in Sect. 3 that an analysis of Feynman graphs in covariant perturbation theory yields the sum rules in simple cases but quickly becomes complicated for certain types of graphs, which makes this technique unsuitable for a general proof. Instead, we used light-cone perturbation theory in Sect. 5 to show the validity of the DPD sum rules for bare, i.e. unrenormalised, distributions at any order in the coupling. In Sect. 6 we analysed the renormalisation of DPDs and showed that in the MS scheme this procedure yields renormalised distributions that again satisfy the sum rules. As by-products of our analysis, we derived in Sect. 7 an all-order evolution equation for DPDs in momentum space and obtained sum rules for the kernel P i 1 i 2 , j that appears in the inhomogeneous term of that equation. These sum rules can be used to verify explicitly that the DPD sum rules are consistent with evolution at any order in perturbation theory. They will also provide valuable cross-checks for the calculation of P i 1 i 2 , j beyond the known order α s . Introducing a compact notation and deriving a number of relations for convolution integrals in one or two variables (Sect. 6.1) allowed us to keep the computations for renormalised DPDs reasonably short and transparent.
To construct DPD models that fulfil the sum rules -exactly or approximately -is by far not an easy task [14,15]. The results of the present work provides an additional motivation for further efforts in this direction. In a forthcoming numerical study [40] we will show how the sum rules can be used to improve existing models for DPDs in position space.