Elements of a theory for multiparton interactions in QCD

We perform a detailed investigation of multiple hard interactions in hadron-hadron collisions. We discuss the space-time, spin and color structure of multiple interactions, classify different contributions according to their power behavior and provide several elements required for establishing all-order factorization. This also allows us to analyze the structure of Sudakov logarithms in double hard scattering. We show how multiparton distributions can be constrained by lattice calculations, by connecting them with generalized parton distributions, and by calculating their behavior at large transverse parton momenta.


Introduction
When two hadrons collide at high energies, more than one parton in one hadron can have a hard interaction with a parton in the other hadron and produce particles with large mass or transverse momentum. The effects of such multiparton interactions are suppressed or average out in sufficiently inclusive observables, but they have important consequences for the details of the hadronic final state. The possible importance of multiparton interactions has been realized long ago [1,2] and phenomenological estimates have been given for many final states such as four jets (possibly including b quarks) [3][4][5][6][7][8][9], jets associated with photons or leptons [10], four leptons produced by the double Drell-Yan process [11][12][13] or from two charmonium states [14][15][16], as well as a number of channels with electroweak gauge bosons [17][18][19][20][21][22][23][24][25][26]. Experimental evidence for multiple hard scattering has been found in the production of multijets [27][28][29] and of a photon associated with three jets [30][31][32][33].
A mini-review of the subject can be found in [34] and an overview of how multiparton interactions are modeled in current Monte Carlo event generators is given in [35]. At LHC energies, the phase space for having several hard interactions in a protonproton collision is greatly increased compared with previous experiments, and it is expected that the effects of multiple interactions will be important in many processes [36][37][38][39]. This poses a challenge in searches for new physics and at the same time offers the possibility to study multiple interactions in much more detail than before. First experimental results on multiple hard scattering at the LHC have already appeared [40] and more can be expected in the near future [41].
Understanding multiparton interactions is also important for heavy-ion physics, where pp or proton-nucleus collisions are used as a baseline for collective effects in nucleus-nucleus collisions. Compared with pp collisions, multiparton interactions with nuclei have the additional feature that the different scattering partons may come from the same nucleon or from different nucleons in the nucleus. Dedicated investigations of multiple interactions in pA collisions can be found in [42][43][44][45][46].
Phenomenological estimates of multiparton interactions, as well as their implementation in event generators, are based on a rather simple and physically intuitive picture, whose basic ingredient is the probability to find several partons inside a proton. On the other hand, a systematic description of multiparton interactions in QCD has not been achieved so far. In the present work, we present a number of steps in this direction. A brief account of our main results has been given in [47]. We require all parton-level scatters to have a hard scale, so that the concepts of hard-scattering factorization and of parton distributions can be applied. Since transverse momenta of final-state particles play a crucial role in the characterization of multiple interactions, we fully keep track of this degree of freedom and base our discussion on transverse-momentum dependent multiparton distributions.
In section 2 we give a lowest-order analysis of multiple hard scattering. We find that the intuitive picture just mentioned emerges for a subset of all relevant contributions to the cross section, but that there are other contributions which may be of comparable size and hence call for further investigation. In section 3 we take first steps to extend existing factorization theorems for single-hard scattering processes with measured transverse momentum [48][49][50][51] to the case of multiple hard scattering. While many ingredients for a full proof of factorization are still missing (and the possibility that factorization is broken cannot be ruled out), we obtain a number of encouraging results that allow us in particular to analyze the structure of Sudakov logarithms. Section 4 gives more details about the distribution of two quarks or antiquarks in the proton, in particular about the effects of spin correlations and the possibility to learn more about multiparton distributions by calculating their moments in lattice QCD or by linking them to generalized parton distributions. The predictive power of perturbation theory is increased in kinematics where all observed transverse momenta (as well as their vectors sums) are large on a perturbative scale. Complications and simplifications that arise in this regime are discussed in section 5, where we will also encounter the conceptual problem of separating single from multiple hard-scattering contributions in a systematic and consistent fashion. Section 6 contains our conclusions.

Momentum and position space structure
In this section we investigate the structure of multiparton interactions in momentum and position space, restricting ourselves to graphs with the lowest order in the strong coupling.
To avoid a clutter of indices we consider scalar partons described by a hermitian field φ, deferring the inclusion of spin and color degrees of freedom to sections 2.2 and 2.3. Our derivation of the cross section formula for multiparton interactions uses standard methods. For cross sections integrated over transverse momenta in the final state, similar derivations can be found in the literature [52,53]. The extension to cross sections differential in transverse momenta is new. For ease of language we refer to the colliding hadrons as protons throughout this work, bearing in mind that our results apply without change to pp collisions or to any other hadron-hadron collision.

Definition of multiparton distributions
We begin by defining the multiparton distributions that appear in the cross section formula we will derive shortly. The following definitions need to be completed by a prescription to renormalize ultraviolet divergences and by Wilson lines that take into account collinear and soft gluons as required to achieve factorization for the cross section. These issues will be discussed in section 3.
The building block from which multiparton distributions can be defined is the n parton correlation function where T denotes time-ordering andT anti-time-ordering of the fields. This function describes the emission of n partons in a scattering amplitude and in its complex conjugate. Throughout this work we assume an unpolarized target: if the target carries spin then an average over its polarization is implicit in (2.1) and all subsequent expressions. The parton four-momenta in the correlation function are subject to the constraint In (2.1) we have chosen the position of the first field in the matrix element to be ξ ′ n = 0. Taking this position as arbitrary and integrating over it with a factor exp(−iξ ′ n l ′ n ) yields a delta function for the constraint (2.2). The structure of the cross section will be more transparent if we use symmetric variables 3) The constraint (2.2) then turns into n i=1 r i = 0 (2.4) k 1 − 1 2 r 1 k n − 1 2 r n . . . y 1 + 1 2 z 1 1 2 z n k n + 1 2 r n k 1 + 1 2 r 1 . . . y 1 − 1 2 z 1 − 1 2 z n p p and we can rewrite the correlation function (2.1) as where we have replaced r n using (2.4). In addition we have used translation invariance to shift position arguments in the matrix element by ξ n /2. Substituting position variables according to and z n = ξ n , we obtain The assignment of momentum and position arguments is shown in figure 1. We now introduce light-cone coordinates v ± = (v 0 ± v 3 )/ √ 2 and v = (v 1 , v 2 ) for any four-vector v. In a frame where p = 0 we define multiparton distributions This can be written as where we have used the abbreviation (2. 10) for the bilinear parton operators and traded the factors k + i for derivatives ∂ ↔ + = 1 2 (∂ → − ∂ ← ) + acting on the fields. When going from (2.8) to (2.9) we have replaced the time-or antitime-ordered products appearing in (2.7) by usual products, which are understood to be normal ordered. To justify this it is crucial that the arguments of all fields in the operators (2.10) have a vanishing plus-component. For a generic configuration with all y i and z i different from zero and from each other, all fields in (2.10) have a spacelike separation, so that they commute because of causality and can be written in any order. The case where fields have a lightlike separation requires special treatment, and different methods for this case have been used in the literature for related matrix elements, see [54,55] and [56]. As we shall see in section 5, lightlike field separations in (2.9) also lead to divergences that need to be regulated.
We also introduce distributions that depend partially or entirely on transverse positions (y i and z i ) instead of transverse momenta (k i and r i ): O(y i , z i ) p (2.11) and In the arguments of (2.11) and (2.12) it is understood that the average transverse position of the first two field operators is y n = 0. The three forms (2.9), (2.11) and (2.12) can be used interchangeably, and each of them has advantages in different situations. As we shall see, the momentum representation (2.9) naturally appears in Feynman graph calculations, the mixed representation (2.11) has a rather simple physical interpretation, and the position space representation (2.12) is most convenient for the discussion of Sudakov logarithms. The factors of 2π, k + i , and 2p + in (2.8) to (2.12) have been chosen such that the collinear (i.e. transverse-momentum integrated) distribution (2.13) as well as the distribution admit a probability interpretation. F (x i , y i ) is the probability to find n partons with plus-momentum fractions x i and transverse distances y i from parton number n, and F (x i , k i , r i = 0) is the probability to find n partons with plus-momentum fractions x i and transverse momenta k i . By contrast, F (x i , k i , y i ) is not a probability (due to the uncertainty relation one cannot simultaneously fix transverse momentum and transverse position) but rather has the structure of a Wigner distribution [57] in the transverse variables. Its integral over all k i gives the probability to find partons at transverse positions y i , and its integral over all y i gives the probability to find partons with transverse momenta k i . A related interpretation for generalized parton distributions can be found in [58]. In figure 1 we can identify k i as the "average" transverse momenta of the partons and y i as their "average" transverse position, where the "average" is taken between the partons to the left and to the right of the final-state cut in the figure. In a physical process, this corresponds to an average between partons in the scattering amplitude and its complex conjugate.
The interpretation of multiparton distributions becomes more explicit if one represents them in terms of the light-cone wave functions of the target, see [59]. Most conveniently derived in the framework of light-cone quantization, this representation is analogous to the wave function representation for single-parton densities [60] and generalized parton distributions [61,62]. The distributions in (2.13) and (2.14) can be written in terms of squared wave functions in impact parameter or transverse-momentum space, which makes their probability interpretation manifest. The wave function representation also offers a way to model multiparton distributions in the region of large momentum fractions, where one can expect a small number of partonic Fock states to be dominant. We shall not pursue this avenue in the present work.
In later chapters we will also need collinear distributions that depend on the momentum transfer variables r i , (2. 15) We will see in the following section that F (x i , y i ) or equivalently F (x i , r i ) appear in multiple-scattering cross sections. This is not the case for the distributions 16) which give the probability to find n partons with momentum fractions x i and unspecified transverse positions or transverse momenta. We note that the integrals over k i in (2.13), (2.15) and (2.16) are logarithmically divergent and require appropriate regularization, which will be discussed in sections 5   The definitions in this section are given for right-moving partons, with x i being plusmomentum fractions. Analogous definitions for left-moving partons are obtained by exchanging the plus-and minus-components of all position and momentum vectors.

Cross section for n hard scatters
We now evaluate the cross section for a process with n scatters at parton level, as sketched in figure 2. We work in a reference frame with p =p = 0 and consider kinematics where the squared c.m. energy q 2 i of each scatter is large and where each transverse momentum |q i | is much smaller than q + i and q − i . Defining we can then approximate where s = (p +p) 2 is the squared overall c.m. energy. We neglect the target mass throughout, so that s ≈ 2pp ≈ 2p +p− and the flux factor in the cross section is 1/(4pp). One can trade the momentum fractions x i andx i for q 2 i and the rapidities where we have again used (2.18). We note that for the very high s achieved at the LHC, both x i andx i are rather small, except if |Y i | or q 2 i is very large.
The cross section for n hard scatters is given by where the combinatorial factor C contains a factor k! for each set of k identical hardscattering final states. 1 The remnant of proton p (p) consists of m (m) spectators with momenta p X,j (pX ,j ). H i denotes the squared matrix element for the ith hard scatter, with truncated propagators of the incoming parton lines. H i includes integration over the internal phase space of the final state produced by the hard scatter, with only the fourmomentum q i kept fixed. If such a final state is the decay product of a single particle with mass M and width Γ (e.g. a W or a Higgs boson) then H i includes a factor which in the limit of narrow width constrains q i to be on the mass shell. If the final state is a stable single particle with mass M , then H i includes a delta function 2πδ(q 2 i − M 2 ) (2. 23) so that together with the integration element d 4 q i /(2π) 4 in (2.21) one obtains the correct one-particle integration measure d 3 q i 2q 0 i (2π) 3 . We now rewrite the cross section in terms of the correlation functions (2.1). To this end we use j=1 p X,j ) e iξn(p− m j=1 p X,j ) 1 An often used notation for two hard scatters is to write m/2 in the place of 1/C, with m = 1 if the hard-scattering final states are identical and m = 2 if they are distinct.
Using the analogous relation for the matrix element betweenX andp and rewriting the momentum conservation constraint in (2.21) as we can express the cross section as 26) where in the last step we have switched to the set of symmetric variables (2.3). They have the important property that the kinematic constraints on r i andr i do not involve the final-state momenta q i , which will lead to a great simplification below.
Hard-scattering approximation. The parton-level scattering processes involve a hard scale, which we collectively denote by Q 2 ∼ q 2 i without assuming a particular hierarchy among the individual squared momenta q 2 i . The case where one of them is much larger than the others is of particular relevance for the description of the underlying event, but we shall not investigate the consequences of such a hierarchy in the present work. In the graph of figure 2 it is understood that partons emerging from the shaded blobs have virtualities much smaller than Q 2 . The components of the various four-momenta thus scale like and where Λ denotes the size of the transverse momenta |q i | or the scale of non-perturbative interactions, whichever is larger. The momentum conservation constraint δ (4) (r i +r i ) enforces that the components r + i ∼r − i ∼ Λ 2 /Q (2.29) are small, although by general scaling arguments they could be of order Q. The constraint δ (4) (q i − k i −k i ) leads to 30) up to relative corrections of order Λ 2 /Q 2 . We make these approximations in the correlation functions Φ andΦ and see that the longitudinal momenta of the partons entering the hard scattering are fixed by the final-state kinematics. In the squared hard-scattering matrix element H i (q i , k i ,k i , r i ,r i ) we can neglect all transverse momenta and all components of order Λ 2 /Q. With (2.30) this only leaves a dependence on the independent variables q + i and q − i . Since H i is invariant under a boost along the z axis, it can then only depend on 2q + i q − i ≈ q 2 i . Altogether we then have . (2.31) Inserting this into the cross section (2.26) and using the definition (2.8) of the multiparton distributions gives (2.32) Rewriting d 4 q i = p +p− dx i dx i d 2 q i , we obtain our final result for the cross section in momentum representation, where we have introduced the cross section for the ith parton-level subprocess and used the approximation (2.18). We have carried out the integrations overr i using the constraints δ (2) (r i +r i ), so that the distributions for the two protons are evaluated at opposite values of their last arguments. Fourier transforming these to position space, we have (2.35) and the distributions are evaluated at equal values of y i . Transforming also the arguments k i andk i , we have where all position arguments in the two distributions coincide. The interpretation of the distributions F (x i , k i , y i ) discussed in section 2.1.1 extends to the cross section formula (2.35). In each individual hard subprocess, two partons with average transverse momenta k i andk i produce a final state with transverse momentum q i . The ith scatter occurs at an average transverse distance y i from the nth scatter. The hard scatters are approximated to be local in transverse space, so that their average distance is equal to the average distance between the colliding partons in each proton. We thus find a rather intuitive interpretation of the variables in our cross section formula, provided that we "average" the transverse momenta and positions between the amplitude and its conjugate. Let us however emphasize that we have obtained (2.35) from calculating Feynman graphs using standard hard-scattering approximations, without any appeal to classical or semiclassical arguments.
Integrating the cross section over all transverse momenta q i we obtain a simple result in terms of collinear multiparton distributions. This formula has long been known and provides the basis of most phenomenological analyses of multiple interactions in the literature. It was derived in [52] for scalar partons in a way very similar to the one we have employed here.

Single vs. multiple hard scattering
The approximations we have made in the previous section give the leading term of an expansion in powers of Λ/Q. Let us investigate how the resulting cross section (2.35) scales with Q. As can readily be seen from its definition (2.11), the mass dimension of To obtain this power behavior, it is essential that the distribution is invariant under a boost along the z axis. For instance, a hadronic matrix element that transforms like the plus-component of a vector would be proportional to p + or another large plus component and thus scale like Q times the appropriate power of Λ. Note that the dependence of F (x i , k i , y i ) on the large scale Q via renormalization group or Sudakov logarithms (see section 3) is neglected at the level of power counting. The hard-scattering cross sections have a power behaviorσ i ∼ Q −2 , and the integrations over transverse momenta count as Finally, the distances y i in (2.35) are generically of size 1/Λ so that d 2 y i ∼ Λ −2 . Putting all ingredients together, one finds for the cross section of n hard scatters. One obtains of course the same result if the power counting is done for the representations (2.33) in momentum space or (2.36) in position space, using Let us compare this with the cross section for producing the final states with momenta q i in a single hard scattering. With the factorization formula for this case reads whereσ is the appropriate hard-scattering cross section and f (x, k) and f (x,k) are transverse-momentum dependent single-parton densities. The definition of f (x, k) can be obtained from (2.11) by setting n = 1, which gives a power behavior f (x, k) ∼ Λ −2 . We now make (2.40) differential in the internal momentum variables of the final state, which we choose as and q i with i = 1, . . . , n − 1. We then have The differential hard-scattering cross section on the r.h.s. behaves as Q −2n , so that we have (2.43) We obtain the important result that if one leaves the cross section differential in the transverse momenta q i , the contributions from single and from multiple hard scattering have the same power behavior in the large scale Q, so that multiple hard scattering is not power suppressed. It is easy to see that the power behavior in (2.38) and (2.43) holds for any combination of single and multiple hard scatters, e.g. when producing the final states with momenta q 1 and q 2 in a single hard scatter and each final state with momentum q 3 , q 4 , etc. in a hard scatter of its own. Let us now see what happens if we integrate over the q i . In the multiple-scattering mechanism, each transverse momentum q i is the sum k i +k i of two parton momenta and thus limited to be of size Λ, so that the phase space volume is n i=1 d 2 q i ∼ Λ 2n . With a single hard scattering, however, the individual momenta q i can be as large as the hard scale Q, and only their sum q is limited to be of order Λ by the constraint q = k +k in (2.42). The phase space volume in this case is therefore for the cross sections integrated over all transverse momenta. Multiple-scattering contributions are now suppressed by at least one power of Λ 2 /Q 2 and are hence power corrections to the contribution from a single hard scattering, as has been known for a long time [63]. This is indeed necessary for the validity of the familiar collinear factorization theorems, which only take into account single hard scatters. Power counting in the hard scale Q provides an essential criterion for determining which contributions to the cross section are important. There are, however, other important factors to keep in mind. We already mentioned Sudakov logarithms in q 2 i /Q 2 , which appear in the cross section differential in q i and are different for single and multiple hard scattering. They will be discussed in section 3.4. Another aspect in which single and multiple scattering contributions differ is the dependence on the momentum fractions x i andx i , which can be rather small as we remarked after (2.20). We will return to this point in section 2.4.

Impact parameter representation
The cross section in (2.36) involves distributions F (x i , z i , y i ) that depend on the transverse positions of the scattering partons but still refer to proton states with definite (zero) transverse momenta. In this section we give a formulation completely in transverse position space, closely following the construction of impact-parameter dependent parton distributions in [62,64,65].
To begin with, we define a non-forward correlation function Φ(l i , l ′ i ; p, p ′ ) exactly as in (2.1) but with a state p ′ | having a different momentum than the state |p . Using the same arguments as in section 2.1.1 we can derive a representation of the form (2.7) for Φ(l i , l ′ i ; p, p ′ ), with p| replaced by p ′ |. The constraints on the parton momenta read in this case. In the same manner we define multiparton distributions F (x i , k i , r i ; p, p ′ ), F (x i , k i , y i ; p, p ′ ) and F (x i , z i , y i ; p, p ′ ) as in (2.8) to (2.12), but taken between states p + , p ′ | and |p + , p . Note that we take the same plus-momentum in the bra and ket state, even if their transverse momenta are different. We now consider a transverse boost, i.e. a Lorentz transformation that changes the transverse components of a four-vector v as and leaves plus-components unchanged. Invariance under this transformation implies In impact parameter space we then have We now introduce proton states with definite impact parameter: One readily obtains their normalization of momentum eigenstates (recall that at fixed p one has dp 0 /p 0 = dp + /p + in the invariant integration element). For later use we also give the projector on one-particle states, which is readily checked by taking the matrix element between the one-particle states in (2.50) and using (2.51). We finally define the center of momentum of m particles with plus-momenta p + i and transverse positions b i as By virtue of Lorentz invariance, this is a conserved quantity. Note the analogy between (2.46) and non-relativistic boosts if v is a momentum and if one replaces plus-momenta by masses. The center of momentum is thus the analog of the center of mass in the non-relativistic case, which is of course conserved. Let us consider the matrix element of the same operator as in (2.12), but taken between impact parameter instead of transverse-momentum eigenstates. We have The delta function in the last line reflects the conservation of the center of momentum, which equals for the bra and ket states in the matrix element, respectively. Here x s = 1 − n i=1 x i and z s is the center of momentum of the spectator partons. We define impact-parameter dependent multiparton distributions by If we set z i = 0 then the matrix element in (2.54) is taken at d = 0 and hence becomes diagonal. We can interpret F (x i , z i = 0, y i ; b) as the probability to find n partons with plus-momentum fractions x i in a target that is localized in impact parameter space, with parton number n at a transverse distance b from the center of the target and partons 1 to n − 1 at relative transverse distances y i from parton n. Inverting (2.56) and setting ∆ = 0 we get and can therefore represent the multiple-scattering cross section (2.36) as Integration over q i leads to z i = 0 as in (2.37). The resulting cross section formula was already derived in [66], and it has a very intuitive geometric interpretation shown in figure 3. As already noted after (2.36), the approximations we have made for the hard-scattering subprocesses imply that each pair of colliding partons in the hadrons p andp must be at the same position in impact parameter space. The relative distances y i between the partons are hence the same in both hadrons, but the distance of the partons from the center of their parent hadron is in general different in p andp. The relative transverse distance b −b between the hadrons is integrated over in the cross section. Our result (2.58) shows that the representation of the cross section in terms of impactparameter dependent distributions remains simple even if the transverse momenta q i are kept fixed. In the geometric interpretation just described, we then have to replace "distances" by "average distances", with the average taken between the amplitude and its conjugate. What is lost in this case is a probability interpretation of the multiparton distributions. The two fields associated with a parton in the target are now taken at a relative transverse distance z i , whose typical size is |z i | ∼ 1/|q i |. Figure 3. Visualization of the cross section formula (2.58) for n = 2 when q 1 and q 2 are integrated over. Each hard scatter produces a heavy gauge boson in this example.

Reduction to single-parton distributions
In order to build a phenomenology of multiple interactions, one needs a simple ansatz for multiparton distributions as a starting point. It is natural to approximate those distributions that have a probability interpretation by the product of single-parton densities. In this section we show how one can formally implement this approximation and generalize it to the distributions F (x i , z i , y i ) or F (x i , k i , y i ), which do not represent probabilities.
To this end we insert complete sets of intermediate hadron states in the operator product appearing in the multiparton distributions: Note that the two parton fields in each operator O(y i , z i ) are associated with the same plus-momentum fraction x i in the multiparton distributions. The approximation that gives a product of single-parton distributions is to assume that among all intermediate states |X i the dominant ones are single-proton states. This reduces the complete sets of intermediate states to the projection operators (2.52), and one obtains Translation invariance and the definition (2.50) of impact-parameter states imply and hence Using (2.54) for n = 1, we have where f (x, z; b) can be written as A reader familiar with generalized parton distributions will recognize that is a transverse-momentum dependent generalized parton distribution at zero skewness. We will shortly need the collinear distributions as well. Introduced long ago in [64,65], the impact parameter density f (x; b) gives the probability to find a parton with momentum fraction x at a transverse distance b from the center of the proton. The delta function on the r.h.s. of (2.63) implies that in (2.62), so that we obtain the desired approximation Figure 4. Illustration of the approximate relation (2.70) for n = 2.
of a multiparton distribution. Setting z i = 0 and integrating over b, we obtain in particular the collinear multiparton distribution F (x i , y i ) in terms of impact-parameter dependent single-parton densities, This relation is illustrated in figure 4, which uses the representation of parton distributions as squared light-cone wave functions we mentioned briefly before (2.15).
Let us now insert (2.69) into the cross section (2.58). For measured transverse momenta q i , the different single-parton distributions are entangled by their z i dependence. By contrast, the q i integrated cross section simplifies to where the only integration variable linking the different factors is the relative distance ρ = b −b, and where we have renamed the integration variableb to y n in the second step. In different forms, this relation (or more precisely its analog for quarks and gluons instead of scalar partons) has long been used as a starting point of phenomenological studies, see e.g. [67][68][69][70][71] and [8,72]. 2 As observed in [59] for the case of collinear distributions, the reduction of multiparton to single-parton distributions also takes a simple form in the transverse-momentum representation. This remains true if one keeps the transverse parton momenta unintegrated. To see this, we integrate (2.69) over b and Fourier transform w.r.t. y i and z i as specified by (2.11) and (2.12). Changing integration variables from b and y i to the impact parameter 2 We note that in [8,72] the impact parameter arguments of f are ρ − yi and yi instead of yi − ρ and yi (if we translate to our notation). This is equivalent in the spin independent sector, where the single-parton distributions are independent of the direction of the impact parameter.
arguments of the distributions on the r.h.s. of (2.69), we obtain where we recall that r n = − n−1 i=1 r i . Integrated over the momenta k i this simply reads The arguments r i in (2.73) can easily be anticipated from figure 1. We emphasize that the relations (2.69) to (2.73) have been obtained by restricting a sum over all intermediate states to a single proton. We do not have a motivation for this restriction other than observing that it results in neglecting correlations between different partons in the proton. It seems plausible to assume that this is a reasonable first approximation, at least in a certain region of variables, but one should not expect it to be very precise. Possible deviations from this approximation and their phenomenological consequences have recently been discussed in [8, [73][74][75][76][77].

Parton spin
Let us now see how the scattering formulae (2.33) to (2.37) are modified in QCD, where partons have nonzero spin. In (2.21) to (2.31) the squared amplitude H i of the ith hard scattering and the hadronic matrix elements of parton field operators acquire spinor indices in the case of quarks and Lorentz indices in the case of gluons. These indices can be treated as in the case of a single hard scattering. For the time being we still omit color degrees of freedom, which will be discussed in section 2.3.

Quarks
The correlation function for n quarks entering the hard scattering is When this is integrated over the parton minus-momenta, the anti-time and time ordering can be omitted and one can reorder the fields as Figure 5. The numbering of fields in two-parton distributions specified in (2.85). The color indices j, j ′ , k and k ′ will be discussed in section 2.3.1.
where a = q, ∆q, δq labels the polarization and We recognize the operators that appear in the definition of single-parton densities for unpolarized, longitudinally polarized and transversely polarized quarks, see e.g. [78,79]. For antiquarks entering the hard scattering one proceeds in an analogous way. The corresponding operators are The overall minus sign in (2.82) reflects a change in the order of field operators from q α (y i − 1 2 z i )q β (y i + 1 2 z i ) toq β (y i + 1 2 z i ) q α (y i − 1 2 z i ), cf. our remark after (2.76). In the case of O ∆q a further minus sign is included in Γ ∆q , so that the operator corresponds to the difference of antiquarks with positive and negative helicity.
From now on we concentrate on two-parton distributions. The formalism can be extended without conceptual difficulties to higher multiple interactions, but the resulting expressions become rather unwieldy. As one encounters nontrivial features already for double hard scattering, it is natural to elaborate this case first. To simplify the discussion, we introduce a compact notation for the Fourier transformed matrix element of a product of field operators ϕ. Their indices are assigned according to as shown in figure 5. Throughout this paper we consider unpolarized incident hadrons, so that an average over the proton spin is understood in (2.84). A two-quark distribution is then given by and if the parton with momentum fraction x 2 is an antiquark one has instead In straightforward extension of the case of single-parton distributions [79], the matrix elements defining distributions for quarks and antiquarks are thus connected as with sign factors σ q = σ δq = +1 and σ ∆q = −1. Definitions and relations analogous to (2.86), (2.87) and (2.88) hold for the case where the parton with momentum fraction x 1 is an antiquark. The previous arguments can be repeated for the partons in the left-moving proton, with the roles of plus and minus components interchanged. We define the hard-scattering cross section for a right-moving quark and a left-moving antiquark aŝ with spin projectors constructed from the collinear momenta k i,c introduced before (2.79), i.e. k + i,c = k + i for right-moving partons and k − i,c = k − i for left-moving ones, with all other components equal to zero. The spin projectors match the Fierz decomposition (2.78) and the operators in (2.80) and (2.82), and they can be expressed in terms of quark or antiquark spinors as in (2.79). It is understood that for each label δq or δq the cross section (2.89) depends on a transverse Lorentz index, which has not been explicitly displayed. In most reactions the partonic subprocess involves only chirality conserving interactions. Since incoming quarks and antiquarks are approximated as massless in the hard scattering, only the combinations Figure 6. Graphs for double hard scattering initiated by different combinations of quarks and antiquarks in the amplitude and the conjugate amplitude. σ q,q ,σ ∆q,∆q ,σ q,∆q ,σ ∆q,q andσ δq,δq are then nonzero. For parity conserving processes such as the production of a virtual photon, one is left with onlyσ q,q ,σ ∆q,∆q andσ δq,δq . Hard-scattering cross sectionsσ i,āa for right-moving antiquarks and left-moving quarks are defined as in (2.89) with an appropriate change of spinor indices.
We now have everything at hand to write down the expression for the double-scattering graphs of figure 6a and b. For a single quark flavor, one has where S = 2 if the final states of the two hard scatters are identical and S = 1 otherwise. It is straightforward to Fourier transform the previous expressions either from the interparton distance y to the relative transverse momentum r, or from average transverse momenta k i ,k i to transverse positions z i , as we did in (2.9), (2.12) and (2.33), (2.36) for scalar partons.
Notice that (2.91) involves a polarization dependence in the multiparton distributions and hard-scattering cross section. This is because, even for unpolarized hadron beams, the polarization of the two partons with momentum fractions x 1 and x 2 can be correlated among themselves. We will discuss this in more detail in section 4.1.1.
The two-quark and quark-antiquark distributions considered so far have the form O 1 O 2 , where the O i are bilinear operators from (2.80) or (2.82). As we discussed after (2.14), these distributions can be interpreted as probabilities or pseudo-probabilities in the sense of Wigner distributions for two partons in the proton that carry momentum fractions x 1 and x 2 , respectively.
There are further double-scattering graphs that contribute to the cross section and involve distributions which represent interference terms rather than probabilities. In figure 6c we show the case where the parton with momentum fraction x 1 is a quark in the scattering amplitude and an antiquark in the conjugate scattering amplitude. Such interference terms in fermion number have no equivalent in single hard-scattering processes, where they are forbidden by fermion number conservation. For their description we introduce interference distributions (2.92) In the absence of a probability interpretation, the choice of quark vs. antiquark labels in the Dirac matrices is pure convention. We assign labels such that a indicates a quark and a an antiquark in the amplitude, i.e. for the parton indices 1 and 2 in figure 5. The graph in figure 6c contributes to the cross section as We see that the contraction of Dirac indices ties together the two hard-scattering kernels and the spin projectors P , so that one cannot define separate partonic cross sectionsσ 1 andσ 2 . The power behavior of this contribution is the same as in (2.91). Taking different quark flavors into account, we obtain further interference terms. The contributions in figure 7a and b involve the interference of different quark flavors, and those in figure 7c and d the combined interference in fermion number and flavor. The relevant matrix elements are easily written down, reading e.g. (ū 3 Γ a 2 d 2 ) (d 4 Γ a 1 u 1 ) for the lower part of figure 7a.
Which interference distributions are of appreciable size is interesting from the point of view of nucleon structure and important for phenomenology. One may for instance imagine that diquark-like correlations in the nucleon wave function play an important role in this context. In section 2.5 we will argue that for small values of x i both fermion number and quark flavor interference distributions should become relatively small.

Gluons
If gluons enter a hard-scattering subprocess, special attention needs to be paid to their polarization. In covariant gauges such as Feynman gauge, an unlimited number of rightmoving gluons with polarization in the plus direction can be attached to a hard scattering graph without any power suppression. The effect of these gluons is resummed into Wilson lines, which we will discuss in detail in section 3.2.1. Alternatively, one may work in lightcone gauge A + = 0, where the corresponding gluon polarization is absent. One then has to be careful about subtle effects from Wilson lines at infinity, see our remark at the end of section 3.2.1.
Once the right-moving gluons with plus polarization (and the left-moving gluons with minus polarization) are taken into account, the leading contribution to the cross section comes from gluons with transverse polarization, corresponding to field operators A j with j = 1, 2. It is for these gluons that one introduces parton distributions similar to those of quarks. To decompose the product of two gluon potentials with transverse polarization, we use the relations where the indices j, j ′ , k, k ′ = 1, 2 are restricted to be transverse and where ǫ jj ′ is the two-dimensional antisymmetric tensor with ǫ 12 = 1. The tensor τ jj ′ ,kk ′ is symmetric and traceless in each of the index pairs (jj ′ ) and (kk ′ ). As an analog of the decomposition (2.77) for fermions, we can thus write for the squared hard-scattering matrix element, where in the last step we have used the relation τ jj ′ ,l l ′ τ l l ′ ,kk ′ = τ jj ′ ,k ′ k . The tensors depending on j, j ′ in (2.95) are to be contracted with a product A j ′ A j of gluon potentials in the multigluon correlation function In analogy to the definition (2.8) for scalar partons (and in contrast to the one for quarks) we include a factor k + i for each gluon i when defining multi-gluon distributions F from Φ. One then obtains with polarization labels a = g, ∆g, δg and The operators O g and O ∆g appear in the usual densities for unpolarized and longitudinally polarized gluons. By contrast, O l l ′ δg describes the interference of two gluons whose helicities differ by two units, or equivalently the difference between linear gluon polarization in two orthogonal directions. Such distributions have previously been discussed in different contexts, see [80,81] and [82][83][84][85][86][87].
In going from (2.96) to (2.97) we have traded gluon potentials for field strengths using the relation G +j = ∂ + A j valid in the light-cone gauge A + = 0. Under the Fourier transform this turns k + i A j ′ A j into (k + i ) −1 G +j ′ G +j and explains the factor 1/(x i p + ) for each parton in (2.97). It is plausible that gluon field strengths rather than potentials should appear in the definition of gluon distributions, since G µν has a simple behavior under gauge transformations and can be used to construct gauge invariant operators. How a definition with G +j emerges in Feynman gauge is rather involved and has been shown explicitly for the case of a single hard scattering in [88]. Parton-level cross sections for gluons are defined as in (2.89) with spin projectors following from (2.95). In P g one readily recognizes the average over the two transverse gluon polarization. The expressions (2.98) to (2.100) are for right-moving gluons. For left-moving gluons, one has to change + into − coordinates in (2.98) and reverse the sign of the ǫ tensor in Π ∆g and P ∆g . This is because in a covariant decomposition of the matrix elements the two-dimensional ǫ tensor arises from the four-dimensional one as ǫ jj ′ = ǫ +−jj ′ and thus changes sign when + and − coordinates are interchanged. Using our shorthand notation (2.84) we can write two-gluon distributions as Of course, there are also multiparton distributions involving both quarks and gluons. When discussing the mixing of two-quark and two-gluon distributions in section 5.1.3 we shall need quark-gluon distributions of the type with a 1 = g, ∆g and a 2 = q, ∆q.

Color
In contrast to single-parton densities, where two parton fields are always coupled to a color singlet, multiparton distributions have a nontrivial color structure. We limit ourselves to two-parton distributions here, i.e. to correlation functions with four parton fields. In this section we give general decompositions of their color structure. Dynamical aspects where color plays an essential role will be encountered throughout section 3, as well as in sections 5.1.3 and 5.2.2.

Quarks
For two-quark distributions we write where j, j ′ and k, k ′ are color indices and N is the number of colors. The indices 1 and 2 on the quark fields are shorthand for the position space arguments associated with momentum fractions x 1 and x 2 , as given in (2.86). For ease of writing we do not display the polarization labels a 1 , a 2 of F when not necessary. The functions 1 F and 8 F can be projected out as We see that for N = 3 the quark lines carrying the same longitudinal momentum are coupled to color singlets and color octets in 1 F and 8 F , respectively. 3 Obviously, only 1 F admits an interpretation as the joint density of quarks with momentum fractions x 1 and x 2 , summed over their respective colors. The prefactor of 8 F in (2.103) has been chosen such that it also appears in the projection (2.104). For this choice the color singlet and color octet distributions enter with equal weight in the cross section of processes where hard scatters produce color-singlet systems. In this sense, the size of 8 F relative to 1 F directly indicates its relevance to phenomenology. For parameterizing the color structure of F jj ′ ,kk ′ one can alternatively use 1 F and the matrix element in which quark lines carrying different longitudinal momentum couple to color singlets. We note that this combination becomes equal to 8 F in the limit of large N . It can be rewritten in terms of matrix elements that involve bilinear quark operators with no uncontracted color or spinor indices. This is achieved by a Fierz transform of Γ a 2 Γ a 1 w.r.t. the spinor indices ofq 3,j and q 1,j , followed by a Fierz transform w.r.t. the other two indices. Writing O a 1 ,a 2 = (q 3,j Γ a 2 q 2,k ) (q 4,k Γ a 1 q 1,j ) ,Õ a 1 ,a 2 = (q 4,k Γ a 2 q 2,k ) (q 3,j Γ a 1 q 1,j ) (2.108) one has For convenience we use the notation 8 F for general N .
where τ is defined in (2.94) and the transverse indices k, k ′ of the tensor operators on the r.h.s. are summed over as appropriate. The global minus sign in both equations comes from the reordering of fermion fields between O andÕ. The inverse transformation goes with the same matrices. Of course, the distributions 1F do not have a probability interpretation since the quark fields coupled to color singlets carry different momentum fractions.
To illustrate that the color octet combination 8 F need not be small let us consider a three-quark system, as is done in constituent quark models. Irrespective of the details in the model, the color part of the three-quark wave function is ǫ jkl . The color structure of a two-quark distribution is thus given by where l is the color index of the spectator quark and therefore summed over. With (2.104) one readily finds 8 F = − √ 2 ( 1 F ). The combination in (2.106), where the quark lines {13} and {24} are coupled to color singlets is then 1 The preceding expressions can easily be adapted for the quark-antiquark distributions F a 1 ,ā 2 defined in (2.87). With color indices labeled as in figure 5, the corresponding matrix element reads (q 2,k Γā 2 q 3,k ′ ) (q 4,j ′ Γ a 1 q 1,j ) and is decomposed as on the r.h.s. of (2.103) with interchanged indices k and k ′ . An extra minus sign appears in the transformation laws (2.109) and (2.110) whenever a label ∆q is changed to ∆q, because Γ ∆q = −Γ ∆q .
An analogous color decomposition can finally be made for the interference distributions I a 1 ,ā 2 defined in (2.92), In analogy to (2.106) one can alternatively use 1 I together with By the same transformation as in (2.109) and (2.110), with appropriate sign changes for the antiquark matrices Γā 2 , one can rewrite this as a linear combination of matrix elements where the quark bilinears have no uncontracted spin or color indices. Using the relation t a jk ′ t a j ′ k = 1 2 δ jk δ k ′ j ′ − 1 2N δ jk ′ δ j ′ k one can also rewrite (2.112) as The transformation between 1 I, 8 I and3I, 6 I is orthogonal. For N = 3 we can rewrite δ jk ′ δ j ′ k − δ jk δ j ′ k ′ = ǫ jj ′ l ǫ k ′ kl and recognize that3I describes the case where the quarks with momentum fraction x 1 are coupled to a color antitriplet, whereas 6 I describes the case where they form a sextet.

Gluons
The color structure for multi-gluon distributions requires the coupling of several color octets and is hence more involved than for quarks. For a two-gluon distribution we proceed by first coupling each of the gluon pairs {14} and {23} to irreducible representations and then coupling these pairs to an overall color singlet. For the color structures that can mix with quarks we write (2.118) with a shorthand notation G a ′ Π a i G a = Π jj ′ a i G a ′ ,+j ′ G a,+j for the contractions of gluon polarization indices. As is readily seen from , , each of the pairs {14} and {23} in 1 F , A F and S F is respectively coupled to a singlet, an antisymmetric and a symmetric octet. For hard-scattering processes producing color singlet states, these distributions enter the cross section as The ellipsis in (2.118) and (2.120) stands for terms where the gluon pairs are in higher color representations. For SU (3) these are 10, 10, 27, and the full decomposition reads with tensors [89] t aa ′ ,bb ′ In 10 F the indices (aa ′ ) are coupled to 10 and (bb ′ ) to 10, whereas in 10 F the opposite is the case. The normalization factors in (2.121) are such that the production of color singlet particles involves the combination Useful relations between the f and d tensors can be found in [90]. We conclude this section with the color decomposition of the quark-gluon distributions introduced in (2.102). The quark lines can only couple to a color singlet or octet, which has to be matched by the gluon lines in order to obtain an overall singlet. A complete decomposition is thus given by , , The factor i in (2.123) has been chosen so that A F is real valued (since if aa ′ c is Hermitian w.r.t. the indices a and a ′ ). The normalization factors multiplying A F and S F are the geometric means of their counterparts in (2.103) and (2.118).

Power counting and dominant graphs
In section 2.1.3 we have already compared the power behavior in Λ/Q of single and multiple hard scattering cross sections. We now take a closer look at this issue and extend our analysis to the interference of single and multiple scattering. As building blocks for establishing the power behavior of the cross section we take correlation functions Φ n involving n parton fields and amplitudes T k→m for hard-scattering processes with k incoming partons and m final-state particles. The relevant parton correlation functions are obtained by replacing the scalar parton fields in (2.1) by quark fields q and q, or by the transverse components A j of the gluon potential as in (2.96). To treat quarks and gluons on a common footing, it is convenient to use modified correlation functions Φ ′ n that are divided by √ l + for each quark or antiquark line with momentum l, and modified hard-scattering amplitudes T ′ k→m that are multiplied with the corresponding factor √ l + . Furthermore, pairsq β and q α of quark fields in Φ ′ n are contracted with one of the Dirac matrices Γ a in (2.81) that give the dominant contributions to the cross section. 4 The products T ′ k→m T ′ * k ′ →m of modified hard-scattering amplitudes with their complex conjugates are to be contracted with the corresponding Dirac matrices specified in (2.78).
have the same mass dimension and are invariant under boosts along the z axis, the power behavior of the modified correlation functions is regardless of the parton species. The power on the r.h.s. is just the mass dimension of Φ ′ n . By definition, all internal lines of the hard-scattering subgraphs are off shell by order Q 2 , so that the power behavior of the amplitudes T ′ k→m (where the propagators of external particles are truncated) is also determined by their mass dimension. For the processes considered in the following, one has as can readily be checked for the example graphs in figure 8. For definiteness we consider the production of two particles with large masses and respective four-momenta q 1 , q 2 . Examples are the weak gauge bosons W , Z or a Higgs boson. The power behavior of the cross section is the same if we replace one or both of the heavy particles by a set of light particles such as a lepton pair or a pair of jets, provided that we integrate over the internal phase space of the final-state particles while keeping q i fixed. Replacing for instance a particle with momentum q i and mass M i by two massless particles with momenta p 1 and p 2 , we have to change For the purpose of power counting, it is not important which of the matrices Γa is taken and which pairs of quark fields are contracted together if there are more than two of them. We will not specify these details in the present section and use Φ ′ n in a generic sense. Likewise, color indices are not relevant for power counting and will be omitted. where dΩ 1 is the solid angle of p 1 in the rest frame of q i . We recall that x i = q + i /p + andx i = q − i /p − are defined in terms of final-state momenta and thus directly observable. According to (2.127) the scaling behavior of the squared hard-scattering amplitudes changes by 1/Q 2 when going from (2.128) to (2.129), which is compensated by the phase space volume sdΩ 1 ∼ Q 2 . One may put restrictions on the phase space integration, such as a minimum transverse momentum of p 1 , as long as dΩ 1 remains of order 1. For each further final-state particle, the squared amplitude acquires an extra 1/Q 2 , which is compensated by an extra phase space integration with volume of order Q 2 .
After these preliminaries we can establish the power behavior of the conventional mechanism with a single hard scattering, shown in figure 8a. The cross section formula can be obtained in exactly the same way as in section 2.1.2. Omitting all factors that are not relevant for power counting (including the δ functions constraining x ixi in (2.128)) we have where for simplicity we have not displayed the momentum arguments of T ′ , Φ ′ andΦ ′ (which can readily be inferred from figure 8a). It is understood that in the second step we have made the collinear approximation and neglected l andl in the hard scattering, as well as l − compared withl − , andl + compared with l + . The power behavior of the integration regions is d 2 l ∼ Λ 2 and dl − ∼ dl + ∼ Λ 2 /Q, so that together with (2.126) and (2.127) we obtain (2.131) For the double hard-scattering contribution in figure 8b one has Note that we have used the constraint δ(q + 1 − l ′+ 1 −l ′+ 1 ) to fix the large component l ′+ 1 at its value q + 1 in the collinear approximation, thus leaving the integral over the small componentl ′+ 1 . If instead one uses the constraint to fixl ′+ 1 = q + 1 − l ′+ 1 one would have to count the integration element dl ′+ 1 as order Λ 2 /Q sincel ′+ 1 can only vary by that amount. An analogous remark applies to the constraint δ(q − 1 − l ′− 1 −l ′− 1 ). The power behavior of (2.132) is and hence the same as for single hard scattering, in agreement with the result we obtained for scalar partons in section 2.1.3.
Let us now see how the power behavior changes if on one side of the final-state cut the two quark-antiquark annihilation graphs are connected by a hard gluon. We then have an interference between double hard scattering and a single hard-scattering process as shown in figure 8c, This is power suppressed compared with the contributions in figures 8a and b and may therefore be neglected. It is instructive to see why the power counting changes between (2.132) and (2.134). Compared with T ′ * 2→1 T ′ * 2→1 , the hard-scattering amplitude T ′ * 4→2 is down by a factor of 1/Q 4 , which in the example of figure 8c is due to two additional quark propagators and one additional gluon propagator relative to figure 8b. The additional loop integrations over the large components l ′+ 1 andl ′− 1 in (2.134) each scale like Q, but for the transverse momentum integrations one now has Altogether one has thus lost a factor of Λ 2 /Q 2 . By an analogous argument one finds that the differential cross section for the pure single hard-scattering mechanism in figure 8d is power suppressed by a factor of Λ 2 /Q 2 compared with the one in figure 8c.
The graphs in figure 8c and d involve single hard scatters with four incoming partons. There is, however, also an interference between double hard scattering and single hard scattering with two incoming partons. This involves correlation functions for three partons, of which at least one must be a gluon due to fermion number conservation. An example is shown in figure 9a, which gives (2.135) This is the same power behavior as the squared single and double hard scattering contributions in figures 8a and b, so that interference terms of this type are not power suppressed.
The example graph at hand has a suppression by α s since two-gluon fusion into gauge bosons only starts at one-loop level, but for other final states like jets there is no such suppression. We will encounter these interference terms again in section 5.2.1 (see figure 38). Adding a hard gluon between the two single scatters on the left of figure 9a leads to the interference between different single hard-scattering processes in figure 9b. In the same way as above one finds that it is power suppressed by Λ 2 /Q 2 compared with the leading contributions to the cross section.
The contributions discussed so far have hard-scattering subprocesses with the same number of incoming partons from one and the other proton. One can, however, also have a parton in one proton scatter on two partons in the other proton. Examples for this are shown in figures 9c and d, and one finds that their power behavior is the same as for figure 9b.
The pattern emerging from the preceding examples is clear: leading-power contributions are obtained as long as all hard-scattering processes involve only two incoming partons. This includes contributions from single scattering, double scattering and their interference. For each hard scattering initiated by four partons one has a suppression by Λ 2 /Q 2 , and each hard scattering initiated by three partons comes with a suppression factor Λ/Q.

Cross section integrated over transverse momenta
So far we have considered the cross section differential in q 1 and q 2 . We now discuss how the power counting is changed when the cross section is integrated over these transverse momenta. As we already observed in section 2.1.3, the integration measure d 2 q 1 d 2 q 2 counts differently depending on whether q 1 and q 2 are both restricted to be of order Λ, or whether they can individually be of order Q and only their sum q 1 + q 2 is restricted to size Λ. The latter requires a single hard-scattering process in both the amplitude and its conjugate. For our examples we thus find an integration volume d 2 q 1 d 2 q 2 of order Λ 2 Q 2 for graphs 8a, d and 9b, c, d, whereas in the other cases the integration volume is of order Λ 4 . The resulting power behavior of the cross section is given in the figures.
We see that the pattern of power suppression is different from the one we found for the cross section differential in q 1 and q 2 . In particular, the leading-power contribution now comes only from the standard single hard-scattering in graph 8a.
The power behavior of the other contributions can be made more transparent by taking a closer look at the correlation functions they involve. As is evident from (2.130), the singlehard-scattering contribution of graph 8a goes with the transverse-momentum integrated correlation function dl − d 2 l Φ ′ 2 and its counterpart forΦ ′ 2 , which are proportional to the usual collinear quark or antiquark densities. By contrast, integration of (2.134) over q 1 and q 2 gives a four-parton correlation function (2.136) and its counterpart forΦ ′ 4 . In these correlation functions all independent transverse parton momenta are integrated over, and correspondingly all field operators have the same transverse position. In physical terms, the single hard scattering in the conjugate amplitude has forced all hard scatters in figure 8c to take place at the same transverse position. 5 By contrast, the double hard scattering contribution in figure 8b has two pairs of fields with a relative transverse distance y as we have seen in section 2.1.2, corresponding to two hard scatters taking place at positions that can be separated by a typical hadron size. This difference has recently been pointed out in [66]. One obtains the same twist-four correlators (2.136) when integrating the contribution of graphs 8d and 9d over q 1 and q 2 .
Similarly, one finds that the transverse-momentum integrated cross sections from graphs 9a, b and c involve correlation functions which are proportional to collinear twist-three distributions. Again, a single hard scattering in the amplitude or its conjugate is enough to put all fields at the same transverse position. The power behavior like Λ 4 /Q 4 of the integrated cross sections for graphs 8c and d is now readily understood, as it involves a collinear twist-four distribution for both colliding protons, each of which is responsible for a power suppression by Λ 2 /Q 2 . Likewise, graphs 9a, b and c involve the product of two collinear twist-three distributions, and graph 9d the product of a twist-two with a twist-four distribution. In both cases the integrated cross section is therefore suppressed by Λ 2 /Q 2 (which happens to be the same suppression factor as for the double-hard-scattering contribution of graph 8b).
In the transverse-momentum integrated cross section, graphs 8c and 9a with a double hard scattering in the amplitude play no particular role compared with their respective counterparts, graphs 8d and 9b, which involve the same correlation functions and have the same power behavior. Indeed, one may regard graphs 8c and 9a simply as higher-twist contributions with disconnected hard-scattering graphs on one side of the final-state cut, rather than associating them with multiple hard scattering. This was recently advocated in [66].
We emphasize that such a view is appropriate only if the cross section is integrated over transverse momenta. For observed transverse momenta q 1 and q 2 we have a different power behavior for graphs 8c and d, as well as for graphs 9a and b. In particular, the interference contribution from graph 9a then has the same leading-power behavior as graphs 8a and b. Let us also note that for graph 9a the quark and antiquark in each proton are not at the same transverse position for fixed q 1 and q 2 . If we express the correlation functions Φ ′ 3 and Φ ′ 3 through matrix elements p|A j (0)q(ξ 2 )Γ a q(ξ 1 )|p and p|A k (0)q(ξ 1 )Γā q(ξ 2 )|p then the transverse-momentum integrations in (2.135) can be carried out and give

Effects at small x
Typical values of x i andx i at the LHC can be quite small, as we already noted after (2.20). At √ s = 7 TeV and Although phenomena at small x are not the main focus of this work, we wish to make a few comments on them in the present section.
We begin by recalling that the usual densities for quarks, antiquarks and gluons rise with small x. This rise can be approximately described by power laws q(x) ∼q(x) ∼ x −1−λq and g(x) ∼ x −1−λg , with exponents λ q and λ g between 0 and 1 that depend on the factorization scale µ. The abundance of small-x partons can be understood as a consequence of repeated radiation, which is essentially described by ladder graphs. Such graphs are in particular resummed by the DGLAP evolution equations, which make the rise at small x steeper when µ is increased.
In the simple approximation where correlations between partons are neglected, multiparton distributions are the product of single-parton densities as discussed in section 2.1.5. The distribution of n quarks or antiquarks then approximately behaves like all momentum fractions are sufficiently small. If all momentum fractions are of similar size, x i ∼x i ∼ x, this gives a factor x −2n(1+λq) in the cross section (2.35). If the same final state is produced by a single quark-antiquark annihilation, the corresponding factor is only x −2(n+λq) according to (2.42). 6 The multiple scattering mechanism is thus enhanced for small momentum fractions, both for observed and integrated transverse momenta q i . In terms of graphs, this enhancement can be traced back to multiple ladders, one for each pair of partons with the same momentum fraction x i in figure 2.
We expect that such an enhancement exists, although the above estimate based on completely uncorrelated partons is likely too simplistic.
Note that a strong rise at small x is only observed for parton densities that mix with gluons under evolution, but not for combinations like q(x) −q(x) or u(x) − d(x), which rise more slowly than x −1 . A corresponding pattern is expected for multi-parton distributions. Since they cannot mix with multigluon distributions, the interference distributions in fermion number or quark flavor discussed at the end of section 2.2.1 are not enhanced at small x. We hence expect them to play a minor role in small-x kinematics.
The preceding arguments apply to both quark and gluon distributions in the framework of hard-scattering factorization, and based on the experience with single-parton densities one expects them to be relevant for momentum fractions x i ∼x i of order 10 −2 or smaller. At very small x the gluon is by far the dominant parton species in the proton, and one may use high-energy factorization and the BFKL approach to describe the dynamics of gluon ladders. The primary expansion variable of this approach is log 1 x , rather than the ratio Q/Λ used in the power counting arguments on which hard-scattering factorization is based. Basic quantities in high-energy factorization are Green functions depending on transverse gluon momenta, which bear close resemblance with the transverse-momentum dependent gluon distributions discussed in this work and naturally allow one to keep track of transverse momenta q i in the final state.
Investigations of multiparton scattering in the BFKL approach can be found in [91][92][93][94][95]. In agreement with the arguments given above, one finds that the two-gluon distribution receives a contribution from two independent BFKL ladders, with a small-x exponent twice as large as for a single BFKL ladder [92]. More complicated graphs with four gluons in the t channel have been analyzed in [92,94,95]. As to the high-energy behavior of three t channel gluons, all solutions found so far for the corresponding evolution equations have a weaker small-x growth than a single BFKL pomeron [96,97].
Let us see how small-x dynamics affects the different graphs investigated in section 2.4. In line with our above discussion, we assume that correlation functions for four partons give a faster growth of the cross section with energy than correlation functions for two partons. We have no definite expectation for the corresponding behavior of three-parton correlation functions; if the results just mentioned for gluons in the small-x limit are a guide, then contributions with three t channel partons are not favored for small momentum fractions. For the cross section differential in q i we then find that the multiple-scattering mechanism of figure 8b is actually favored over the single-scattering graph 8a, which has the same power behavior in Λ/Q but a weaker rise at small x. With the caveat just mentioned, the interference contribution in figure 9a is expected to be less important.
The cross section integrated over q i is dominated by the conventional single-scattering mechanism in figure 8a by power counting in Λ/Q. Among the contributions that are suppressed by Λ 2 /Q 2 the double-scattering graph 8b is enhanced at small x. To a lesser extent the same is true for graph 9d, which involves four t channel partons in only one of the two protons. There may be situations where the small-x enhancement overcompensates the power suppression by Λ 2 /Q 2 , for instance in the high-energy production of minijets, where the hard scale Q is not too large. In such cases the BFKL approach may be more adequate than the one using hard-scattering factorization.

The "effective cross section"
The cross section for double hard scattering is often written as σ ds = σ 1 σ 2 /(C σ eff ), where σ 1 and σ 2 are single hard scattering cross sections, C is the combinatorial factor introduced below (2.21) and σ eff is an "effective cross section" characterizing the strength of multiple interactions. Let us see to which extent such a formula holds true in the light of the results we have derived so far.
Under the assumption that there are no correlations between different partons in the target hadron we derived the factorized form (2.70) for multiparton distributions in a model theory with scalar partons. This derivation carries over to the color singlet distributions of two unpolarized quarks, antiquarks or gluons, i.e. to 1 F q 1 ,q 2 , 1 F q 1 ,q 2 , 1 Fq 1 ,q 2 , 1 Fq 1 ,q 2 and 1 F g,g , where the two quark flavors q 1 and q 2 may be different. If one further assumes that the impact-parameter dependent distributions of a single quark, antiquark and gluon have for all parton species c, then the cross section (2.71) for double hard scattering takes the form The second form of (2.141) has recently been given in [98] and uses that the Fourier transform F (r) of F (b) depends only on r 2 because of rotation invariance. It is natural to ask whether (2.139) extends to the cross section differential in . Inserting this into the cross section formula (2.36) does not lead to a factorized form, because x i z i appears in the arguments of the impact parameter profile F . A simplification occurs however if the measured transverse momenta q i are large compared to a hadronic scale Λ (while being much smaller than Q). 7 The typical values of z i in the cross section are then small compared with 1/Λ, whereas typical values of b and b + y are of hadronic size. We can thus approximate and σ eff as in (2.141). Both (2.139) and (2.142) can be made differential in further variables describing the sets of particles produced by the two hard scatters. If one integrates these relations over kinematic variables in the presence of cuts, they only retain their validity if each cut refers to particles in one of the two sets but not in both.
The assumptions that allow one to relate the cross sections for double and single hard scattering by a single process independent constant σ eff are quite strong, and a number of effects can invalidate (2.139) and (2.142): • an impact parameter profile F c (b) that is not the same for different parton distributions. The effect of this was estimated for a specific model in [75].
• a correlation between the x and b dependence in the single-parton distributions . Evidence that such a correlation is appreciable for x above 0.1 comes from the calculation of the Mellin moments dx  [100,101] for γp → J/Ψ p in terms of generalized parton distributions shows that the average squared impact parameter b 2 of small-x gluons in the proton has a weak logarithmic dependence on x [72,102,103]. An estimate of how a correlation between x and b in f c (x; b) affects multiparton interactions has been made in [104].
• correlations between different partons in the proton, which invalidate the relations (2.69) and (2.70) between two-parton and single-parton distributions. In [105] it was argued that such correlations are significant.
• an appreciable size of multiparton distributions that describe spin correlations between two partons (section 2.2), of distributions where partons with the same momentum fraction x i are not coupled to color singlets (section 2.3), or of interference distributions in fermion number or quark flavor (section 2.2.1).
Finally, the assumption that the observed cross section is given by contributions from either single or double hard scattering is invalid if their interference (see figure 9a) is important. All in all, we feel that (2.139) or (2.142) may be useful for order-of-magnitude estimates but should be used with great caution. Of course, one may define σ eff as the ratio (dσ 1 /dΓ 1 ) (dσ 2 /dΓ 2 ) (S dσ ds /dΓ 1 dΓ 2 ) of differential cross sections for single and double scattering. Since this ratio can depend on the process and on all kinematic variables, σ eff is then not a universal constant.

Beyond lowest order: factorization and Sudakov logarithms
So far we have analyzed the lowest-order graphs that contribute to multiple scattering processes. For a systematic treatment in QCD we need to go beyond this approximation and in particular take into account graphs where additional gluons are exchanged. A complete analysis should eventually establish whether an all-order factorization formula can be written down for a given observable. We will not attempt to do this here, but provide some building blocks for such an analysis. We use the framework of hard-scattering factorization, which essentially organizes the dynamics according to virtualities (as opposed to high-energy or small-x factorization, where the organizing principle is based on rapidity). We focus on the cross section differential in small transverse momenta and in particular investigate the structure of Sudakov logarithms. In section 3.5 we will make some remarks on transverse-momentum integrated cross sections, described by collinear factorization. For reasons given in section 3.2 we will concentrate on the double Drell-Yan process, i.e. on the production of two electroweak gauge bosons, which for definiteness we take to be virtual photons. Likewise, we will use the single Drell-Yan process as an example when we recall the ingredients for factorization with a single hard scattering.

Dominant graphs
One of the first tasks when establishing factorization for a given process is to identify the dominant graphs in the kinematic limit one is interested in. The appropriate tool for hardscattering factorization is the method of Libby and Sterman [106,107], which we briefly recapitulate. The first step is to trade the limit of large kinematic invariants (which we collectively denoted by Q earlier) for the limit of vanishing masses of all partons. In doing so, one uses that up to an overall normalization the quantities of interest depend on the ratio of Q and the masses. If we keep small transverse momenta in the differential cross section, those must be sent to zero as well. 8 One is thus led to examine which graphs and which phase space regions give rise to mass divergences. Such divergences come from the poles of Feynman propagators, but only if for a suitable loop integration variable there are poles on both sides of the real axis, which "pinch" the contour of the loop integration. If there is no pinch, the poles can be avoided by deforming the integration contour. One finds that lines that give pinch singularities are either soft (i.e. all their momentum components are close to zero) or collinear to one of the incoming or outgoing particles of the process. All other lines are far off-shell (possibly after complex contour deformation). The leading contribution in the large Q limit of a given graph comes from regions of phase space in the vicinity of the pinch singular configurations just described. To obtain a factorization formula, one has to express subgraphs with collinear or soft lines in terms of matrix elements that make sense beyond perturbation theory. Parton densities and related quantities are constructed from these matrix elements. Off-shell lines are organized into hard subgraphs, which can be calculated perturbatively.
A physically intuitive interpretation of the previous construction is provided by the Coleman-Norton theorem [108]. The pinch singular configurations of a graph correspond to a scattering process where the lines with collinear momenta correspond to classical trajectories in space-time. The trajectory associated with each line is proportional to its four-momentum, so that it shrinks to a point for soft lines. In the "reduced graph" that represents the corresponding classical process, off-shell lines in the original graph are likewise contracted to points.
The preceding analysis is based on the denominators of Feynman propagators and gives only a necessary condition for the occurrence of mass singularities. A power counting analysis taking into account the numerators of Feynman graphs (similar to the one we gave in section 2.4) provides further restrictions on the contributions that actually dominate a given observable. At this level, the polarization of gluon lines is found to play a crucial role.
For single Drell-Yan production at fixed small transverse photon momentum, one finds that the dominant graphs have the structure shown in figure 10a. For each of the colliding protons there is a collinear subgraph. On either side of the final-state cut there is one hard subgraph producing the final state boson, connected with each collinear subgraph by exactly one fermion line and an arbitrary number of gluon lines, which must have polarization in the plus direction for right-moving and in the minus direction for leftmoving collinear gluons. Finally, there is a soft subgraph with soft gluons attaching to either of the collinear subgraphs. There are no soft gluons coupling to the hard subgraphs.
The dominant graphs for double Drell-Yan production are easily identified and just have an additional hard subgraph for the second produced gauge boson on either side of the final-state cut. As we have already seen in section 2.4, hard subgraphs that are connected to each collinear graph by a single parton line have leading power behavior. The power counting for the soft graph is not affected by having one or two hard subprocesses. Finally, the absence of soft gluons coupling to a hard subgraph has the same reason as in the single Drell-Yan case, namely that such soft gluons increase the number of hard propagator denominators in the hard subgraph, without providing a compensating large numerator factor or phase space volume. The space-time representation in the sense of the Coleman-Norton theorem is shown in figure 11 for the graphs in figure 10. Parton lines from one and the other proton meet at one point in the t-z plane and annihilate into a gauge boson. For double Drell-Yan production, the two bosons are produced at the same point in t and z. The transverse momenta of partons and the produced bosons are neglected in this interpretation (see above), so that the classical scattering process takes place at fixed transverse coordinates (x and y).

"Rescattering" contributions
Before discussing in detail the leading graphs in figure 10, we wish to comment on graphs of the type shown in figure 12a. They have been associated with "rescattering" in the literature [109,110] and were calculated in terms of two hard 2 → 2 QCD processes, where the parton with momentum k in the figure is treated as an outgoing parton in the first scattering and as an incoming parton in the second one. It is understood that the transverse momenta p 1 , p 2 andp are all large (we denote their order by p T below).
We argue here that this is not a correct way to calculate the graph, at least not within the usual hard-scattering factorization framework used in [109,110]. According to our discussion in the previous section, the lines that enter or exit a hard-scattering subgraph must correspond to pinched singularities and thus admit a classical space-time interpretation in the sense of the Coleman-Norton theorem. This is not possible for the line with momentum k in figure 12a. As illustrated in figure 12b, the two partons emerging from a hard 2 → 2 process have large transverse momenta and, being on shell, thus have finite rapidities. In other words, their velocity in the z direction is smaller than the speed of light. As soon as such a parton has propagated over a finite distance, it can no longer scatter on another parton from one of the two initial protons, since those partons do move with the speed of light along z. The proper treatment of the parton with momentum k is thus to regard it as an internal line in a single hard-scattering process with three incoming partons (l 1 , l 2 ,l) and three outgoing ones (p 1 , p 2 ,p). As we saw in section 2.4 such a contribution is power suppressed (if p 1 + p 2 +p is integrated over, it involves a parton distribution of higher twist) and can hence be neglected.
Put differently, the parton with momentum k is generically far off-shell in the leading region of the graph in figure 12a. A kinematical analysis readily shows that the final-state momenta p 1 , p 2 ,p fix the sum l + 1 + l + 2 to a large value of order p T , up to small corrections of order 1/p T . The value of l + 1 is however integrated over a large interval of order p T . For a particular value of l + 1 in this interval, the propagator of k does have a pole, but this pole is not a pinch singularity (the gluons adjacent to k are far off-shell when k 2 = 0 and their propagator poles are a distance of order p T away in the complex l + 1 plane). One can thus deform the integration contour of l + 1 such that k 2 is always of order p 2 T and thus large. 9

Collinear and soft gluons
We now return to the graphs in figure 10. They contain an arbitrary number of collinear and soft gluons, and further simplifications are required to obtain a useful factorization formula that involves a limited number of nonperturbative quantities.
In existing factorization theorems, the effects of collinear and of soft gluons are described by Wilson line operators, to all orders in the strong coupling. The possibility to obtain such a simple structure is crucial for establishing factorization. Detailed analyses of this issue can be found in [48][49][50][51] for single Drell-Yan production or for its crossed-channel analogs, the production of back-to-back hadrons in e + e − annihilation or semi-inclusive deep inelastic scattering (SIDIS). By contrast, for hadron-hadron collisions producing back-toback jets or hadrons with measured transverse momenta, serious obstacles to establishing factorization have been identified in [111] and in previous work cited therein. A systematic treatment of transverse-momentum dependent factorization for jet or hadron production in multiple hard scattering will probably need to wait until a suitable formulation for single hard scattering has been found.
We therefore limit our considerations in this section to the double Drell-Yan process. Extending our arguments to the production of other colorless particles is trivial if the hard scattering is initiated by quarks or antiquarks and should be possible if it is initiated by gluons. We shall not attempt to give a full proof of factorization even for double Drell-Yan production. Instead, we will analyze the lowest-order graphs with an additional exchanged collinear or soft gluon. To a large part this will be a recapitulation of the corresponding analysis for the single Drell-Yan process. We nevertheless give the necessary steps in some detail, in order to see how the arguments generalize to double hard scattering. We will pay particular attention to the color indices for quarks and antiquarks, since the color structure of two-parton distributions is nontrivial compared with the single-parton case. Finally, we will point out which further issues need to be settled to obtain a full proof of factorization. Figure 13 shows an example where several gluons collinear to the right-moving proton p couple to a left-moving quark or antiquark. The quark or antiquark is thus taken far off shell, so that its propagator and its coupling to the gluon belong to one of the hardscattering subprocesses.

From collinear gluons to Wilson lines in parton distributions
We now recapitulate the analysis of one such coupling, which is well-known from single Drell-Yan production, taking particular care of color indices and of the distinction between quarks and antiquarks. The relevant part of the graph in figure 14a can be written as When actually calculating the hard scattering, one can nevertheless integrate l + 1 along the real axis; the pole of 1/(k 2 + iǫ) then provides an absorptive part to the hard-scattering amplitude. The possibility to deform the integration contour of l + 1 justifies the perturbative treatment of the propagator for k. a ℓ +l Figure 14. Collinear gluons in the Drell-Yan process. Top row: subgraphs with a right-moving gluon coupling to a left-moving quark or antiquark before it annihilates. Bottom row: corresponding graphs after the off-shell propagators have been replaced by eikonal lines. where in a shorthand notation we write . . .q j A α,a . . . and . . . q j ′ . . . for the hadronic matrix elements of the right and left moving proton, respectively. The subscript c onl indicates the collinear approximation specified after (2.90), i.e.l − c =l − ,l + c = 0 andl c = 0. Instead of the spinor u(l c ) for the incoming quark we could also use the projection operator P (l), see the discussion after (2.79) and (2.90). The vertex with the produced photon and the spinor for the incoming right-moving quark are not needed for our argument and have been omitted. Our sign convention for the strong coupling g is such that the covariant derivative reads D µ = ∂ µ + igA µ .
The expression in (3.1) has the structure R α H α , where R is the matrix element of the right-moving proton and H a hard-scattering amplitude. One therefore has |R + | ≫ |R − |, |R|, whereas all components of H α are generically of the same size. To leading-power accuracy we therefore have R α H α ≈ R + H − . We now introduce an auxiliary spacelike where in the last step we have used the conditions on the components of R α , H α and v α just stated, as well as |ℓ + | ≫ |ℓ − |, |ℓ| for the momentum ℓ of the right-moving gluon. For reasons given in the next section, we have provided an iǫ prescription to the factor 1/ℓv in (3.3) such that the pole in ℓ + is on the same side of the real axis as in the propagator of the off-shell quark that couples to the photon in figure 14a: In the hard-scattering amplitude we have thus traded the coupling −igt a γ α of the gluon to the quark and the adjacent quark propagator i γ(ℓ +l c ) for the coupling −igt a v α of the gluon to a so-called eikonal line and the eikonal propagator i/(ℓv + iǫ).
Repeating the same steps for the graph in figure 14b gives The change from an incoming quark to an incoming antiquark in the hard scattering has changed the overall sign of the propagator i γ(ℓ +l c ), which is reflected in an overall sign change of the eikonal propagator i/(ℓv + iǫ). On the other hand, the momentum flow in the graph and the resulting iǫ prescriptions have remained the same. It is instructive in this context to compare Drell-Yan production with SIDIS, where one has an outgoing quark or antiquark in the hard scattering. The corresponding graphs are shown in figure 15a and b. Taking the same vector v as before, we have Compared with graphs 14a and b, the relative flow of the momenta ℓ andl c in the off-shell quark or antiquark has changed. Hence the corresponding propagator has a denominator instead of the one in the second equation of (3.4). As a result, the sign of iǫ in the eikonal propagator is now reversed. A graphical notation for eikonal lines needs to specify the flow of the momentum ℓ relative to 1. the color flow (and hence the fermion number flow in the quark line which is represented by the eikonal line). This determines the overall sign of the eikonal propagator. We denote the color flow by an arrow on the eikonal line, which points in the same direction as the arrow on the original fermion line.
2. the flow of the large momentuml c in the original fermion line, which is either an incoming or an outgoing line in the hard-scattering subprocess. This determines the sign of iǫ in the eikonal propagator. We indicate this graphically by a full or an empty circle at the end of the eikonal line, such that the large momentum flows from the full to the empty circle. Since incoming and outgoing partons in the hard scattering can be associated with a classical path in space-time according to section 3.1, the full circle represents the past and the empty circle the future time direction.
The corresponding Feynman rules are given in figure 16, and the graphs corresponding to the eikonal representation in (3.6), (3.7) and (3.8) are shown in the bottom rows of figures 14 and 15. 10 We now briefly review how eikonal lines are generated by Wilson line operators in the hadronic matrix elements that appear in a factorization formula. The relevant part of the expression (3.6), together with the relevant integrations over momentum and position variables reads Using the representation we can rewrite this as (3.12) 10 Our graphical notation differs from that in the literature. In [48] for instance, an arrow on the eikonal line was associated with the flow of the large momentum, and the overall sign due to the color flow was indicated by explicit color indices and taken into account in the vertex between a gluon and an eikonal line, rather than in the eikonal propagator.

Introducing the Wilson line
where P denotes path ordering, we recognize the term in square brackets in (3.12) as the term of order g in the expansion of W † (ξ; v). In a full factorization proof, one has to show that the coupling of two or more collinear gluons to the incoming quark line in figure 14a exponentiates, so that their combined effect is the replacement in the operator defining the parton distribution. Likewise, the expression in (3.7) corresponds to the one-gluon term in the replacement The conditions we imposed on v after (3.2) hold in a frame where p moves fast to the right. One readily finds that in the rest frame of p one has v 0 > 0, so that the Wilson line (3.13) relevant for Drell-Yan production has a path pointing into the past. By contrast, the reversed sign of iǫ in the eikonal propagators for SIDIS corresponds tō with a future-pointing Wilson line The preceding discussion was for right-moving collinear gluons and generalizes trivially to left-moving collinear gluons in the proton with momentump. The corresponding Wilson lines are to be defined with an auxiliary vector w that satisfies and either |w − | ∼ w + or |w − | ≪ w + in a frame wherep moves fast to the left. In the rest frame ofp one then has w 0 > 0. The manipulations in the preceding arguments are all concerned with a single hardscattering subprocess at a time, so that they readily apply to double Drell-Yan graphs such as in figure 13, where they give the order g part of a Wilson line for each quark or antiquark operator in the multiparton distributions. The full operator for a two-quark distribution then reads for instance The open color indices j, j ′ , k, k ′ , which were carried by quark fields in the lowest-order formula, are now carried by the "ends" of the four past-pointing Wilson lines. The projection on color singlet and color octet distributions is done as in (2.103).
Let us now mention how the previous arguments need to be generalized to obtain a complete factorization proof for double Drell-Yan production.
• The step from (3.5) to (3.6), which eliminates an internal fermion propagator in the hard-scattering graph, is elementary when applied to the lowest-order hard scattering. For more complicated graphs (with loop corrections or further external gluons) one needs a Ward identity to achieve this simplification. In a model theory with Abelian gluons, this is quite simple to establish, see e.g. [50, chapter 10.8]. The formulation for QCD is more complicated and involves external ghost lines in addition to external gluons in the hard scattering (see [50, chapters 11.3 and 11.9]).
• We have considered only one gluon coupling to each hard-scattering subgraph. One needs to show that the coupling of an arbitrary number of gluons exponentiates and gives a full Wilson line W (y; v) or its complex conjugate. Again, this is simple to show for Abelian gluons (see [50, chapter 10.8]). To the best of our knowledge, an explicit proof for transverse-momentum dependent distributions in QCD has not yet been given.
We note that the present and the previous point only concern one hard-scattering subprocess at a time. It should therefore be straightforward to extend arguments valid for the single Drell-Yan process to the case of double Drell-Yan production.
• The two Wilson lines W ( 1 2 z 2 ; v) and W (y + 1 2 z 1 ; v) in (3.19) correspond to gluons in the scattering amplitude, where all gluons fields should be time ordered. With v 2 < 0 the gluon operators in one Wilson line have a spacelike separation, so that they commute and can readily be brought into the order required by path ordering. Two gluon operators in different Wilson lines do not necessarily have this property, and the possibility to reorder the fields needs to be investigated. A similar statement holds for the two Wilson lines W † (− 1 2 z 2 ; v) and W † (y − 1 2 z 1 ; v) that correspond to gluons in the conjugate scattering amplitude.
• The operator in (3.19) is not explicitly gauge invariant, because the Wilson lines end at different positions at infinity, namely at a i − ∞v with finite spacelike a i for i = 1, 2, 3, 4. The same issue already arises for single-parton distributions and has been discussed in [113,114] for lightlike Wilson lines, i.e. for v 2 = 0. In a gauge where the gluon potential (and any product of gluon potentials) has zero expectation value at a − ∞v, one can trivially complement the operator (3.19) with Wilson lines that go in the transverse direction and connect the lightlike Wilson lines to a common reference point, e.g. to −∞v.
After projecting the open color indices at this reference point onto color-singlet or color-octet combinations, the resulting operator is explicitly invariant under local gauge transformations. The extra Wilson lines in the transverse direction are essential in the gauge vA = 0, where the Wilson lines in (3.19) reduce to unity, see the discussion in [113].
As we will see in section 3.2.3, the choice v 2 = 0 is not suitable for transversemomentum dependent factorization. To obtain a gauge invariant definition of the relevant parton distributions, one needs to extend the procedure just described to the case where v 2 < 0. This holds both for single and multiple hard scattering.
Coupling of a soft gluon to a collinear parton that (a) enters the hard scattering or (b) is a spectator.

Soft gluons and the soft factor
We now turn to the exchange of soft gluons between right-and left-moving partons, i.e. to the soft subgraph in figure 10, and show how it can be described in terms of a soft factor that is defined as the vacuum expectation value of Wilson lines. For definiteness we consider one soft gluon with momentum ℓ, exchanged between the soft subgraph S α and the collinear subgraph R α of the right-moving partons. Here α is the polarization index of the gluon considered, and the indices for any other external gluons are omitted for simplicity. We assume that the components of ℓ are of comparable size, |ℓ + | ∼ |ℓ − | ∼ |ℓ|, as well as the momentum components of all other soft gluons attached to S. The components of S α are then also comparable to each other. Since |R + | ≫ |R − |, |R| we then have Introducing an auxiliary spacelike vector w as in (3.18) with |w − | ≪ w + , we furthermore have Sw ≈ S − w + , so that we can write with a factor w α /(ℓw + iǫ) that will eventually turn into a Wilson line. The iǫ prescription for the pole at ℓw = 0 is adequate for ℓ flowing from S into R in the scattering amplitude, i.e. on the left of the final-state cut in figure 10. We note that this prescription corresponds to the one for collinear gluons in the previous section, cf. figures 14 and 17a. The approximations in (3.20) and (3.21) break down in the so-called Glauber region, i.e. for soft momenta dominated by their transverse components, |ℓ| ≫ |ℓ + |, |ℓ − |. A major part of a factorization proof for hadron-hadron collisions is to establish that this momentum region does not contribute to the final factorization formula. With the iǫ prescription we have chosen, the pole of 1/(ℓw + iǫ) is on the same side of the real ℓ − axis as the propagator pole of the quark with momentum l + ℓ in figure 17a, which is readily seen by adapting (3.4). In the graph of figure 17a one can avoid the Glauber region by a contour deformation to complex ℓ − . With the same contour deformation one can however not avoid propagator poles in graphs where the gluon couples to a spectator parton (rather than to the parton Figure 18. Graphical illustration of the Ward identity (3.22). Shown is only the part of the quark-antiquark distribution to the left of the final-state cut. The Feynman rules for eikonal lines are given in figure 16. entering the hard subprocess). An example is shown in figure 17b, where the pole in ℓ − of the propagator for the line p − l − ℓ is on the opposite side of the real axis than the pole of the propagator for the line l + ℓ in figure 17a. To apply the Ward identities discussed below, one has to make the same contour deformation for ℓ − in both graphs of figure 17 and will thus pick up a residue contribution from the propagator of the spectator parton. For single Drell-Yan production one can show that the sum over all such residue contributions cancels due to unitarity, see the discussion in [50, chapters 14.3 and 14.4] and in the original literature cited therein. We do not know whether and how such arguments can be extended to the case of double hard scattering and leave this issue as an important task for further investigation. We will proceed under the assumption that such an extension can be made.
Following the procedure for single Drell-Yan production, the next step in our argument is to use a Ward identity to relate the collinear subgraph with a gluon attachment to the same subgraph without a gluon. For the correlation function describing quark-antiquark emission and an additional gluon in the amplitude, this identity reads and is depicted in figure 18. Analogous identities can be written down for the emission of two quarks or two antiquarks, with a factor i/(ℓw + iǫ) for each quark line and −i/(ℓw + iǫ) for each antiquark line in the amplitude. We leave it to future work to give a general proof of these identities, but verify them here for two simple examples. Our first example is a quark-antiquark pair with a pointlike coupling to a target. The corresponding two-parton distribution is then proportional to δ jk i(γ l 1 ) −1 ⊗ (−i)(γ l 2 ) −1 , where l 1 and l 2 are the respective momenta of the quark and antiquark, and j and k are their respective color indices. The tensor product ⊗ refers to the spinor indices, whose coupling at the vertex with the target we need not specify for our argument. Attaching a soft gluon in the amplitude, we have to add the graphs in figure 19a and b. Contracting the gluon polarization index α with ℓ α and using the same trick as in the step from (3.5)  Figure 19. Graphs for a gluon coupling to a quark-antiquark system that originates from a colorless target via a pointlike vertex. The indices j, k and a refer to color. Figure 20. Graphs for a gluon coupling to a three-quark system that originates from a colorless fermion via a pointlike vertex. j, k, l and a are color indices.
Multiplication with Sw/(ℓw + iǫ) gives (3.22) for this particular case. As a second example we take a colorless fermion target coupled to three quarks by a pointlike vertex. The two-quark distribution is then proportional to where l 3 is the momentum of the spectator quark. Coupling a gluon to this system, we get the three graphs shown in figure 20, which after contraction with ℓ α give Figure 21. Diagram with one soft gluon exchanged between the left-and right-moving partons to the left of the final-state cut.
with the tensor V a ijk = t a jm ǫ mkl + t a km ǫ jml + t a lm ǫ jkm . This tensor is zero, because it is completely antisymmetric and hence proportional to V a ijk ǫ ijk = 2(t a jj + t a kk + t a ll ) = 0. Multiplication with Sw/(ℓw + iǫ) finally gives the equivalent of (3.22) for a two-quark distribution.
The preceding arguments can readily be adapted for a soft gluon attached to leftmoving collinear partons by exchanging + and − components of the relevant vectors. The auxiliary vector w is then replaced by v as in (3.2), with |v + | ≪ v − . Likewise, one can repeat all arguments for soft gluons in the conjugate amplitude, i.e. to the right of the final-state cut in figure 10. In the corresponding Ward identities one then has to use the Feynman rules on the r.h.s. of figure 16.
Consider now the diagram in figure 21, where in the amplitude one gluon is exchanged between the left-and right-moving partons. Its contribution to the cross section is proportional to where in the last step we have shifted the integration variables l 1 andl 2 . For simplicity we have omitted a global factor, as well as the expressions for qq → γ * , which in the hardscattering approximation only depend on the external momenta q 1 and q 2 and thus do not appear under the loop integrals (see section 2.1.2).
To provide a representation beyond perturbation theory, we represent the soft subgraph (which for two external gluons is just the gluon propagator) as a matrix element, Here we have omitted the time ordering between the fields, which requires justification when ξ andξ do not have a spacelike separation. We gloss over this point here (see also our discussion at the end of section 3.2.1) but return to it briefly at the end of section 3.3.1. Using (3.27) and (3.11) we then have for the first term in (3.26) where ξ iT denotes the four-vector with ξ + iT = ξ − iT = 0 and transverse components ξ i . The corresponding expression for the diagram without soft gluon exchange is obtained by replacing the last line in (3.28) by δ jm δ kn . Using (3.13), we recognize the factors in square brackets in that line as the order g terms in conjugate Wilson lines W † (ξ 1T ; w) and W † (ξ 2T ; v). In the transverse plane, the paths of these Wilson lines are at the positions that are Fourier conjugate to the transverse quark momenta l 1 andl 2 in (3.28). The three other terms in (3.26) give analogous contributions, with Wilson lines W (ξ 2T ; w) and W (ξ 1T , v) at the positions that are Fourier conjugate to the transverse antiquark momenta l 2 andl 1 , respectively.
After a change to symmetric momentum and position variables as specified between (2.1) and (2.7), and after restoration of global kinematic factors, the second to fifth lines on the r.h.s. of (3.28) turn into the product F a 1 ,ā 2 (x i , z i , y)Fā 1 ,a 2 (x i , z i , y) of two-parton distributions in transverse position space, and the Wilson lines are to be evaluated at the appropriate transverse positions of the quark or antiquark fields in the definition of these distributions.
It is straightforward to repeat the preceding derivation for a soft gluon exchanged to the right of the final-state cut, as well as for the case where the gluon crosses this cut. For an model theory with Abelian gluons, it is not difficult to see how soft subgraphs with an arbitrary number of external gluons add up to full Wilson lines, in close analogy to the case of single Drell-Yan production. We do not attempt here to give a corresponding proof for the nonabelian theory, given that even for the single Drell-Yan process this is quite involved. The structure suggested by our analysis of one-gluon exchange is however clear: the effect of all soft subgraphs is to multiply the Born-level cross section (2.36) in position space representation by a soft factor. This factor is the vacuum expectation value of a product of Wilson lines, with one Wilson line for each external quark or antiquark in the multiparton distributions. We thus have where the "further terms" describe the remaining combinations of quarks or antiquarks in the two-parton distributions, as discussed in section 2.2.1. The soft factor reads S qq mm ′ ,nn ′ ;jj ′ ,kk ′ (z i , y) We notice that Wilson lines W (ξ; v) W † (ξ; w) and W (ξ; w) W † (ξ; v) are contracted pairwise in their color indices. From our derivation we see that this color contraction follows from the fact that the hard scatters produce color-singlet particles, so that the color indices of annihilating quarks and antiquarks are directly contracted with each other. The "further terms" in (3.29) have a soft factor S qq multiplying Fā 1 ,ā 2 F a 1 ,a 2 and a soft factor S I multiplying the product of interference distributions Iā 1 ,a 2 I a 1 ,ā 2 . These factors are defined in analogy to (3.30) with an appropriate interchange of arguments and indices for W and W † . In analogy to two-parton distributions, we can represent S qq in a singlet-octet basis for index pairs jj ′ , kk ′ , etc.
Defining the matrix we then have One can of course rewrite the cross section in terms of distributions F (x i , k i , y) or F (x i , k i , r) depending on transverse momenta. The result involves a Fourier transformed soft factor and is a convolution in transverse-momentum variables. The soft factor (3.30) for double Drell-Yan production generalizes the corresponding factor appearing in the single Drell-Yan process, which reads for the annihilation of a right-moving quark with a left-moving antiquark. The color indices are now contracted to an overall singlet, as they are in 11 S qq . The analog of (3.33) is where the "further term" corresponds to a right-moving antiquark and a left-moving quark.
For the discussion in subsequent sections we note that at z = 0 the product of Wilson lines in (3.34) reduces to the trace of the unit matrix, so that S q (0) = 1. Similarly, one finds from (3.30) and (3.31) that To close this section let us collect the issues in the soft-gluon sector that need to be worked out for a full factorization proof. Some of them we have already mentioned.
• One needs to show that the exchange of gluons in the Glauber region cancels in the cross section. Such a cancellation requires a specific choice of iǫ prescription in the eikonal propagators. For the prescription in (3.21), which corresponds to pastpointing Wilson lines, one can show that Glauber gluons do cancel in single Drell-Yan production. It is natural to expect that the same prescription is appropriate for the double Drell-Yan process, if there is any choice for which Glauber gluons decouple in that case.
• The Ward identity (3.22) for attaching one gluon to a collinear subgraph needs to be proven, and it needs to be extended to the case where additional gluons are attached to the subgraph. One then needs to show that the attachment of an arbitrary number of gluons exponentiates to the Wilson lines in the soft factor (3.30).
In a model theory with Abelian gluons, a corresponding proof should be a rather simple extension of the corresponding arguments for single-parton distributions, which can be found in [50, chapter 10.8]. An explicit proof for transverse-momentum dependent factorization in QCD is still lacking even for single hard scattering, as far as we know.
• It must be shown that explicit time ordering of the gluon operators in the soft factor (3.30) can be omitted. It must also be established that one can complement the Wilson lines along v and w in the soft factor in such a way that one has an explicitly gauge invariant definition. We expect that for both issues it should be possible to extend a proof for single Drell-Yan production to the double Drell-Yan process, but we are not aware of an explicit proof for the single Drell-Yan case.
The second and third bullet items are closely connected with the corresponding points for collinear gluons, which we discussed at the end of the previous section.

Towards a factorization formula
In section 3.2.1 we have seen how collinear gluons give rise to the Wilson line operators (3.19) in the matrix elements defining multiparton distributions. However, these Wilson line operators contain not only collinear but also soft gluons, which are already taken into account in the soft factor (3.30). At the level of graphs, this is reflected in the fact that the gluon momentum ℓ in figures 14 and 17a can be either collinear or soft. To prevent double counting of soft gluon contributions, the factorization formula for the cross section requires appropriate subtractions.
Let us briefly recapitulate how this problem can be solved for single Drell-Yan production. The necessary subtractions can be performed by dividing out vacuum matrix elements of the form (3.34). There is a certain freedom of whether to absorb these matrix elements into the soft factor or into the parton distributions that appear in the final factorization formula . The former choice was made in the original work [48] of Collins and Soper,11 whereas both [49] and [51] have made the latter choice. Finally, in recent work by Collins [50] (see [115] for a brief summary) all matrix elements of the form (3.34) have been absorbed into the parton distributions, which gives a factorization formula without an explicit soft factor. Whichever choice is made, a consistent formulation requires one to take matching iǫ prescriptions in eikonal lines when treating collinear or soft gluon attachments, as we did in (3.3) and (3.21).
Another detail that admits several choices is the direction of the path in Wilson lines. In section 3.2.1 we have seen that the approximations needed for right-moving collinear gluons require a vector v that corresponds either to large negative or to central rapidity, where we define the rapidity of a spacelike vector as One cannot take the limit y v → −∞, i.e. one cannot take v lightlike in the parton density, since this would give divergences from the region where gluons coupling to eikonal lines have small ℓ + but large ℓ − , i.e. from the region of large negative gluon rapidities [48,50,116].
The approximations for soft gluons in section 3.2.2 require a vector v with large negative rapidity and a vector w with large positive rapidity. Again one cannot take the limit where y v → −∞ and y w → +∞, as we shall see explicitly in section 3.3.1. However, this limit can be taken for appropriate combinations of matrix elements, which leads to important simplifications, see [49] and [50]. A further technical point is that the matrix elements discussed so far include contributions from self energy graphs of Wilson lines and from graphs where gluons are exchanged between different Wilson lines pointing in the same direction (see e.g. figure 24 below). For spacelike vectors v and w such graphs give infinite results already at tree level, as shown in appendix A of [117]. Such graphs do not appear in the derivation of the factorization formula: as we have seen in the two previous sections, Wilson lines appear when treating gluon exchange between partons that have a large rapidity difference. The offending graphs cancel in the combination of matrix elements that appears in the final factorization formula, but in the individual factors they must be explicitly excluded. (Only in the scheme of [50] do these graphs already cancel in the parton distributions.) Finally, the hard-scattering subgraphs have radiative corrections themselves. Since we require the produced bosons to have small transverse momenta, there are only virtual corrections: radiation into the final state can only be collinear or soft and is included in the collinear or soft factors. For Drell-Yan production at one-loop accuracy, one thus only has the vertex correction to the quark-antiquark-photon three-point function. The regions of soft and collinear gluon momenta in the virtual graphs have to be explicitly subtracted in the definition of the hard-scattering cross section, in order to ensure that this factor is dominated by large virtualities. This removes in particular the well-known soft divergences of the virtual graphs (which in the more familiar case of inclusive observables cancel when real emission graphs are added).
We expect that the above procedure can be generalized to the case of double Drell-Yan production. The division by vacuum expectation values of the form (3.34) will be replaced by multiplication with the inverse of the matrix (3.30) in color space. We leave it to future work to show that this can actually be done. In the remainder of this section, we will take a closer look at the elementary building blocks of factorization, namely at the soft factor in (3.30) and at the dependence of the proton matrix elements of the operator (3.19) on the direction v of the Wilson lines.
To conclude this section we note that a factorization theorem for the double Drell-Yan process also needs to provide a proper separation between the production of the two gauge bosons by one or two hard-scattering processes. We will discuss this problem in section 5.2.3.

The soft factor at small transverse distances
If all three transverse distances y, z 1 , z 2 are small compared with a hadronic scale Λ −1 , the soft factor in (3.30) is dominated by perturbative dynamics and can be evaluated in perturbation theory. In this section we compute the short-distance form of this factor to leading order in the strong coupling.

The basic graphs
The expansion at O(α s ) of the soft factor (3.30) involves the three types of graphs shown in figure 23, which we now calculate. Graphs a and b already appear in the soft factor (3.34) for the single Drell-Yan process, whereas graph c is specific for multiparton interactions. As discussed in the previous section, we discard graphs as in figure 24, where gluons are exchanged between Wilson lines pointing both along v or both along w.
To regulate ultraviolet divergences, we work in 4−2ǫ dimensions, and to exhibit infrared divergences in individual graphs we use a small gluon mass λ. For the time being we omit the color matrices t a in the Feynman rules. One finds that each pair of graphs in figure 23 gives the same result, so that in the following we only list the expressions of the left-hand graphs and multiply by two. Before renormalization, the vertex correction graphs in figure 23a give Figure 24. Graphs with gluon exchange between eikonal lines having the same rapidity. These graphs are excluded in the evaluation of soft factors, as discussed in the text.

39)
where we recall that the vectors v and w have zero transverse momenta and satisfy v − > 0, v + < 0, w + > 0, w − < 0. For ℓ + < 0 all poles in ℓ − are on the same side of the real axis, so that this region gives a zero contribution to the integral over ℓ − . For ℓ + > 0 we close the integration contour in ℓ − around the pole of the gluon propagator and obtain As both poles in (ℓ + ) 2 are on the same side of the real axis, one can deform the integration contour to −∞ < (ℓ + ) 2 < 0 and obtains We see that both vectors v and w must be chosen away from the light cone, since taking even one of them lightlike gives an infinite result for U a . With the definition (3.37) for the rapidity of a spacelike vector, we have We require a large rapidity difference, |y w − y v | ≫ 1, i.e. v − w + ≫ v + w − . This is satisfied by the choice y v ≪ 1 and y w ≫ 1 made in section 3.2.2, but it also allows one of v or w to have central rapidity, as long as the rapidity of the other one is large (positive or negative). We can then approximate tanh(y w − y v ) ≈ 1. The expression (3.41) is ultraviolet divergent, and using MS subtraction we obtain the renormalized result For the graphs in figure 23b, which describe gluon emission into the final state, we have Comparing with (3.40) we observe that U a + U b = 0 at ξ = ξ ′ . For ξ = ξ ′ the integral over ℓ in (3.44) is ultraviolet finite and no subtraction is needed before we set ǫ = 0. We thus obtain where b 0 = 2e −γ and γ is the Euler number. The graphs in figure 23a and b give the full O(α s ) contribution for the vacuum matrix element S q defined in (3.34), which appears in single Drell-Yan production. Let us evaluate this contribution as a side result. The color factor for all graphs is N −1 tr(t a t a ) = C F in this case, so that we have S q (z, µ) = 1 + S(z, µ) + O(α 2 s ) with for z = 0. Notice that the infrared divergences regulated by a gluon mass λ have cancelled in the sum over all graphs, as is required for a perturbative evaluation of S q . For z = 0 the relation U a + U b = 0 ensures that the condition S q (0) = 1 does not receive radiative corrections, in agreement with the general result discussed below (3.34). This complete cancellation between real and virtual corrections plays a crucial role for collinear factorization, see section 3.5. The fact that the limit z → 0 of (3.46) is infinite rather than zero is due to our use of modified minimal subtraction. For z = 0 the ultraviolet divergences in U a and U b cancel each other (as do the finite parts of the graphs), so that there is no ultraviolet subtraction for their sum. At z = 0, however, U a + U b does require ultraviolet subtraction since the first term is ultraviolet divergent whereas the second term is not.
Finally, the graphs in figure 23c give (3.47) For ℓ + < 0 all poles in ℓ − are on the same side of the real axis and give a zero net result, whereas for ℓ + > 0 the integral over ℓ − can be obtained from the residue of the gluon propagator. This gives the same expression as in the second step of (3.44). The graphs in figure 23c hence give the same result as those in figure 23b: For later use we note that the graphs in figures 23b and c change sign when the flow of color charge in one of their eikonal lines is reversed, whereas those in figure 23a remain the same.

Soft factor associated with two-quark distributions
With these building blocks at hand we can construct the soft factor associated with twoparton distributions in the cross section formula (3.33). We first discuss the factor S qq in some detail and give the results for S qq and S I in the end.
The graphs contributing to S qq are those in figure 25. In the graphs of figure 25a and b 1 there is either no gluon coupling to the pair {14} or no gluon coupling to the pair {23} of eikonal lines, so that the lines have the same color coupling at the bottom and the top of the graph. The color factor for the vertex correction graphs 25a is C F , independent of the color flow along the eikonal lines and of the way in which their color indices are coupled.
For figure 25b 1 we have color factors 1 N 2 tr(t a t a ) tr 1 1 = C F for 11 S qq , where the prefactors 1/N 2 and 4/(N 2 −1) come from the analog of the color decomposition (3.31) for S qq . We note that these color factors are the same as for the ladder graphs we shall discuss in section 5.1.3. By contrast, the graphs in figure 25b 2 and c can change the color coupling of the eikonal lines. As an illustration, consider the first graph in figure 25b 2 with the color indices at the bottom multiplied by octet matrices t b jj ′ t b kk ′ . The color structure is then and we see that this graph contributes to both 81 S and 88 S. If the indices jj ′ and kk ′ are coupled to singlets, then the graph is proportional to t a mm ′ t a nn ′ and thus contributes to 18 S. For the first graph in figure 25c the color factors are analogous, with the difference that one has instead of (3.50). Putting everything together and using the results of the previous section, we have Figure 25. Graphs for the soft factor associated with two-quark distributions. The "mirror graphs" not shown are as in figure 23. The numbers, position arguments and color labels on the eikonal lines correspond to those of the quark lines in the distribution F a1,a2 , see figure 5.
The terms S c come with an overall minus sign, because in graph 25c one of the eikonal lines coupled to a gluon has its color flow reversed compared with graph 23c, which reverses the sign in the eikonal propagator according to the Feynman rules in figure 16. The matrix elements in (3.52) are such that we can replace S a and S b = S c by their Figure 26. A higher-order graph for the soft factor, which connects more than two eikonal lines.
sum, which is infrared finite according to (3.46).
with a color factor and linear combinations Note that the µ dependence of S has canceled in S d and S y . At one-loop level the soft factor for two-quark distributions can thus be expressed in terms of its analog S for single-parton distributions. This simplification will most likely no longer hold at higher orders in α s , since one then has graphs like in figure 26, where more than two eikonal lines are connected.
It is easy to see that the soft factor S qq (z i , y) for quark-antiquark distributions is obtained from S qq (z i , y) by replacing z 2 → −z 2 . This corresponds to the interchange of the labels 2 and 3 in the definitions of F a 1 ,a 2 and F a 1 ,ā 2 (see (2.85) to (2.87)) and uses the result that the kernels S b and S c for gluons that cross or do not cross the final-state cut are identical.
The soft factor multiplying the interference distributions is obtained from the same graphs as those in figure 25, apart from changes in the color labels and in the color flow of the eikonal lines. Making the appropriate replacements of position arguments in S qq we which can be rewritten as (3.57)

Collins-Soper equation
As is well-known, cross sections with measured transverse momenta |q i | ∼ q T much smaller than the hardest scale Q in the process contain large logarithms in q T /Q, which need to be summed to all orders in α s in order to have a perturbatively stable result. A powerful method to resum these Sudakov logarithms is due to Collins, Soper and Sterman (CSS) [118]. This method uses transverse-momentum-dependent factorization and is therefore, up to now, limited to the production of color singlet particles, such as a Drell-Yan lepton pair or a Higgs boson. In this section we show how this formalism extends to double Drell-Yan production, and we calculate the corresponding Sudakov factor to next-to-leading logarithmic accuracy. We begin by a brief account of the CSS method for processes with a single hard scattering initiated by quark-antiquark annihilation. As we mentioned in section 3.2.3, the Wilson lines in the definition of transverse-momentum dependent parton distributions must be taken along a direction v with a finite rapidity. Their dependence on this rapidity is governed by the Collins-Soper (CS) equation [48], whose solution resums Sudakov logarithms to all orders. By Lorentz invariance, the distributions depend on v via the scalar parameter 12 (3.58) 12 To avoid the appearance of square roots, our definition (3.58) follows the convention of Ji et al. [51] and differs from the one of Collins and Soper [48], with ζ 2 | here = ζ [48] . Figure 27. Graphs describing the dependence of a single parton distribution on the Wilson line rapidity y v at order α s . The complex conjugate graphs are not shown. Note that graph b includes the case where the gluon couples to the left quark line, i.e. graph a is included in b.
As discussed earlier, we require v to be spacelike (although the manipulations in the following would also work for timelike v, which is the choice adopted in [51]). In terms of the rapidities y v of v and y p of the proton, we have for a right-moving proton, where M is the proton mass. The approximations are valid for y v ≪ y p , which we have seen to be necessary in section 3.2.1. This implies ζ ≫ M . Following the original paper [48] we work here with "unsubtracted parton distributions" in the parlance of [51,119] and [50], i.e. with matrix elements of operators constructed from quark fields and Wilson lines along v as in (3.19). Subtractions for the soft momentum region are not made in the parton distributions but in the soft factor that appears in the cross section (see [49]). The fields in the operator are renormalized, so that the distributions depend on an ultraviolet renormalization scale µ. Fourier transforming from transverse momentum to position space, we then have single-quark distributions f (x, z; ζ, µ). The µ dependence is given by a homogeneous renormalization group equation [48] d d log µ f (x, z; ζ, µ) = 2γ q α s (µ) f (x, z; ζ, µ) , (3.60) where γ q = 3C F α s /(4π) + O(α 2 s ) can be identified with the anomalous dimension of the quark field in the axial gauge vA = 0 (where the Wilson lines reduce to unity). In a covariant gauge it corresponds to renormalization of the composite operator W (ξ)q(ξ) or q(ξ)W † (ξ), see (3.19).
In terms of graphs, the dependence of f on v arises from the propagators of eikonal lines and from their coupling to gluons, as is obvious from the Feynman rules in figure 16. A power counting analysis shows that dominant contributions to ∂f /∂ζ come from regions where the momentum ℓ flowing through eikonal lines is either hard or soft; contributions with ℓ collinear to the proton are power suppressed. The only important graphs where ℓ is a hard momentum have the form of a vertex correction shown in figure 27a (at higher orders this vertex graph becomes dressed with further gluon and quark lines). If the momentum ℓ flows through spectator partons (see figure 27b and c), the hard region is power suppressed and only the region of soft ℓ is important.
The contribution from the region of hard momenta ℓ reads Since ℓ is hard, the kernel G can be calculated perturbatively. G is extracted from subgraphs whose external lines are the quark on the left of the final-state cut and the adjacent eikonal line, or the corresponding two lines on the right of the cut. As a result, G is independent of the transverse distance z between the two quark field operators. Furthermore, it depends only on the longitudinal quark momentum xp rather than on the proton momentum p, so that the dependence on ζ is via the combination xζ = 2(xp)v/ |v 2 |. For dimensional reasons this parameter must be divided by µ, since G is dominated by hard momenta and hence independent of nonperturbative scales. At leading order in α s , one obtains G from the graph of figure 27a and the complex conjugate graph, with subtractions made for the region of soft ℓ. Using MS subtraction for the ultraviolet divergences, one finds [48,119] G If the momentum carried by the eikonal lines and hence by the gluons they couple to is soft, we have graphs with soft gluons coupling to parton lines that move fast to the right. We can then use the same procedure as in section 3.2.2, i.e. approximate the gluon coupling to the right-moving particles and then use a Ward identity. The gluons then couple to eikonal lines with large positive rapidity, associated with a vector w, with one eikonal line for each parton line attached to the collinear subgraph. This gives where we used ∂/(∂ log ζ) = −∂/(∂y v ) and where we have explicitly displayed the dependence of S q defined in (3.34) on the two vectors v and w. As shown in [50], one can take the limit of lightlike w in (3.63) and thus has . (3.65) Having taken the limit of infinite y w the dependence of K on y v has disappeared as well, since by Lorentz invariance K could only depend on y w − y v . For large z the kernel K is dominated by nonperturbative dynamics, just as S q . For sufficiently small z we can however use the perturbative expression of S q derived in the previous section. From (3.46) we readily obtain verifying that at lowest order K is independent of v and w. An analysis of the ultraviolet divergences in G and K shows that they satisfy renormalization group equations [48] so that the sum G + K is independent of µ. At lowest order in α s this is readily verified from (3.62) and (3.66), which also give the leading term of the anomalous dimension. The O(α 2 s ) term is also known [118]. We are now ready to generalize the CS equation to (unsubtracted) two-quark distributions. The contribution from hard eikonal momenta ℓ is again given by vertex graphs as in figure 27a, with one graph for each of the four quark legs. This gives just the sum of kernels G(x 1 ζ/µ) + G(x 2 ζ/µ). As for the soft momentum region, the argument leading to (3.63) can be repeated. In section 3.2.2 we have seen that the Ward identity for soft gluons coupling to collinear lines in a two-parton distribution has a nontrivial color structure. We therefore now have a matrix equation in color space, Analogous equations hold for quark-antiquark and for interference distributions, with kernels K constructed from the appropriate soft factors S. Note that the kernels are sensitive to the color charge of partons (i.e. to the difference between quarks and antiquarks) but not to their polarization. For K this follows from the analogous property of S, whereas for G it follows from parity invariance applied to the relevant subgraph with one external quark and one eikonal line. Putting everything together, we can write the CS equation for a two-quark distribution into the form According to (3.67) the µ dependence cancels between the kernels G and K in (3.71). The matrix M is independent of µ as well. This can be traced back to the fact that the only ultraviolet divergent graphs for S (and hence for K) have the form of a vertex correction as in figure 25a. As discussed in the previous section, these graphs only contribute to the color diagonal elements 11 S and 88 S and are the same in both cases, since they are insensitive to the color of the different eikonal lines. Finally, the dependence of F a 1 ,a 2 , F a 1 ,ā 2 etc. on the renormalization scale is given by for both 1 F and 8 F . This is because ultraviolet renormalization in F is performed for individual operators W (ξ)q(ξ) andq(ξ)W † (ξ).

General solution
Before solving (3.71) let us first consider the simpler equation where we have abbreviated We have included the argument of the running coupling in G since this will be needed shortly. For the moment we do not assume that K is given by a perturbative expansion. The solution of (3.74) can be obtained by adapting the well-known solution of the CS equation for single-parton distributions [48,118,120]. We have where F µ 0 specifies the initial condition of evolution in ζ. The scale µ 0 should be chosen such that F µ 0 does not depend on widely disparate scales. If this is not possible because z i and y widely differ in size, further steps may be required in order to resum all large logarithms. We note that since S is µ independent (see below), the µ dependence of F µ 0 is given by the same renormalization group equation as in (3.73). The Sudakov exponent in (3.76) reads with x equal to x 1 or x 2 . It is well-known from the solution f (x, z; ζ, µ) = exp −S(xζ, z, z; µ 0 ) f µ 0 (x, z; µ) (3.78) of the CS equation (3.64) for single-quark distributions. We note that a more general set of solutions can be obtained by multiplying xζ with a constant C 2 in the upper integration limit of (3.77); the initial condition F µ 0 in (3.76) then depends on that constant. One can easily restore the C 2 dependence of the expressions to follow, but for simplicity we limit ourselves to the choice C 2 = 1 here. The integrand of (3.77) contains functions that depend on the ratio of two large scales. To make this dependence explicit, one uses the renormalization group equation (3.67) for G and K. Obviously, K 12 has the same µ dependence as K, so that the µ dependence cancels between G and K 12 in (3.74) and (3.77). Using (3.67) one can rewrite (3.79) Inserting this into (3.77) one can perform the integration over ζ ′ for the terms containing γ K or K 12 and obtains The term with γ K in (3.80) gives rise to the leading double logarithm in xζ/µ 0 , whereas the other terms give only single logarithms. Using (3.62) and (3.68) and neglecting the running of α s one has at leading order in α s . A more precise expression is obtained by rewriting dµ/µ = 1 2 dα s /β(α s ), where β = dα s (µ)/d log µ 2 . After expanding 1/β(α s ) in α s , the integral in (3.80) is straightforward to evaluate for the one-loop expression (3.62) of G and the two-loop expression of γ K (given e.g. in [118]).
The form (3.80) is valid even if K 12 (z 1 , z 2 , µ) cannot be evaluated perturbatively because one or both of z 1 and z 2 are large. If both distances are small, K(z i , µ) and thus K 12 (z 1 , z 2 , µ) is given by a power series in α s (µ). An alternative form of the Sudakov exponent [118] is then obtained by rewriting (3.79) as G xζ ′ /µ, α s (µ) + K 12 z 1 , z 2 , µ, α(µ) where we now distinguish between the explicit µ dependence of K 12 and the implicit dependence via the running coupling. One then obtains where all perturbative functions are evaluated with α s at the same scale. With the one-loop expression of K in (3.66) we have A natural choice for the starting scale of evolution in ζ is thus with a constant C 1 of order 1. If one takes C 1 = b 2 0 then K 12 vanishes. It is now easy to write down the solution of the full CS equation (3.71) for a two-parton distribution. It is given by with S given by (3.80) for arbitrary values of z i and by (3.84) if both z 1 and z 2 are small. The logarithm in the second line has been chosen such that it coincides with the one that multiplies 2K 12 when one evaluates −S(x 1 ζ) − S(x 2 ζ) from (3.80). Other choices are possible and lead to different initial conditions 1 F µ 0 and 8 F µ 0 . Unless all distances z 1 , z 2 and y are small, the matrix M cannot be calculated perturbatively and we cannot further simplify the exponentiated matrix in (3.87). Nevertheless, (3.87) contains some important information, since it gives the explicit form of the dependence on the large scales x 1 ζ and x 2 ζ. In particular, we see that to leading double logarithmic accuracy, where only squared logarithms of x 1 ζ/µ 0 and x 2 ζ/µ 0 are retained, the Sudakov factor for two-quark distributions is the same for 1 F and for 8 F and given by the product of the corresponding Sudakov factors for single-quark densities with momentum fractions x 1 and x 2 . At next-to-leading logarithmic accuracy, 1 F and 8 F mix under evolution in ζ, with the amount of mixing depending on the transverse distances z 1 , z 2 and y.
It should be possible to generalize the CS equation (3.71) and its general solution (3.87) to the case of multiparton distributions for more than two partons. The same holds for multi-gluon distributions, where the general structure will remain the same but the kernels G and K will be different.

Small transverse distances
Let us now consider the situation when z 1 , z 2 and y are all small enough to calculate M in perturbation theory. The kernels K(z i , µ) in (3.71) are then given by (3.66) at leading order in α s , and the Sudakov exponent S in the solution (3.87) of the CS equation can be evaluated from (3.84). It remains to investigate the matrix e LM in (3.87), where we abbreviate We treat the kernel M qq for two-quark distributions F a 1 ,a 2 in detail and discuss its analogs for quark-antiquark distributions F a 1 ,ā 2 and interference distributions I a 1 ,a 2 later. Using the definitions (3.70), (3.72) and our perturbative result (3.53) for S qq , we readily find with c given in (3.54) and in analogy to (3.55). Using the explicit form (3.66) or the renormalization group equation (3.67) for K, we see that M is µ independent, as we anticipated earlier. We must of course choose a scale in α s when using the one-loop result (3.66) for evaluating K d and K y , which gives rise to a residual scale dependence of order α 2 s . The situation is the same as for a physical (and hence formally µ independent) quantity evaluated in fixed-order perturbation theory. An appropriate scale of α s in K d and K y will be constructed from z i and y.
Let D qq be the diagonal matrix with the eigenvalues of M qq and let E qq be the matrix whose columns are the corresponding eigenvectors, i.e.
One then has The matrix (3.89) has eigenvalues and a matrix of eigenvectors so that Let us see how this matrix behaves for One can then Taylor expand the square root in (3.93) and obtains if K y > 0, whereas the role of d + and d − in (3.97) is interchanged if K y < 0. In both cases one gets where we have traded the color factor c = 1/ √ Since c ∼ 1/N , the condition (3.96) holds in the large-N limit. Inserting (3.98) into (3.87), we see that 1 F (ζ) is then controlled by the initial condition 1 F µ 0 because the admixture from 8 F µ 0 is suppressed, although only by 1/N . Whether 8 F (ζ) is dominated by 1 F µ 0 or 8 F µ 0 depends on whether the 1/N suppressed factor in the lower row of (3.98) or the exponential e −Lb 2 Ky is smaller. In either case 8 F (ζ) is parametrically smaller than 1 F (ζ). To which extent the large-N limit gives a valid description of the physics for N = 3 depends on the relative size of K y and K d , as well as the relative size of 1 F µ 0 and 8 F µ 0 . This can only be decided by a more detailed analysis, which we will not attempt here.
There is, however, a region of phase space where (3.96) holds beyond the large-N limit. From its definition (3.90) we see that K d vanishes if z 1 = z 2 = 0. In the double-scattering process one has |z 1 | ∼ |z 2 | ∼ 1/q T , so that the limit |z 1 |, |z 2 | ≪ |y| is relevant in the region where |y| is much larger than 1/q T . Taylor expansion then gives for the factor in the off-diagonal elements of the matrix (3.98), whereas for the exponential factor we find where we have chosen µ 0 as in (3.86). We thus find that the octet admixture in the ζ evolution of 1 F (ζ) is power suppressed by |z 1 ||z 2 | y 2 , and that 8 F (ζ) is power suppressed compared with 1 F (ζ) by .

(3.103)
In straightforward generalization of single Drell-Yan production [48,50], an adequate choice of ζ in the cross section for double hard scattering is x 1 x 2 ζ 2 ∼ Q 2 . Together with |z 1 ||z 2 | ∼ 1/q 2 T this gives p ∼ (α s /π) log(Q/q T ). Again, a more quantitative picture can only be obtained by a detailed analysis.
The preceding results all rely on the validity of perturbation theory for the soft CS kernel and thus require not only z 1 and z 2 but also y to be perturbatively small. We cannot draw any strict conclusions about the case where z 1 and z 2 are small, whereas y is in the nonperturbative region. However, we observe that the power suppression parameter |z 1 ||z 2 | y 2 becomes smaller rather than larger in this case. One may thus speculate that the general features of our analysis, namely the autonomous ζ evolution of 1 F and the suppression of 8 F will continue in the nonperturbative regime.
We conclude this section by noting that the CS equation and its solution for quarkantiquark distributions F a 1 ,ā 2 is readily obtained from the previous results by replacing z 2 → −z 2 and that corresponding replacements are to be made for Fā 1 ,a 2 and Fā 1 ,ā 2 . This follows from the corresponding property of the soft factor S qq discussed at the end of section 3.3.2.
Interference distributions. The Collins-Soper equation for interference distributions 1 I a 1 ,ā 2 and 8 Iā 1 ,ā 2 has the same form as (3.71) with F replaced by I. The appropriate kernel M I in the perturbative regime follows from S I in (3.57) and reads with K d and K y given in (3.90). This matrix has eigenvalues and exponentiates to For |K d | ≪ K y , i.e. if |z 1 |, |z 2 | ≪ |y|, we can Taylor expand the eigenvalues as The result simplifies if we use the orthogonal matrix that implements the basis transformation from 1 I, 8 I to the combinations3I, 6 I introduced in (2.117). We then have (3.109) Repeating the argument that led to (3.103) we obtain with p given in (3.102). Likewise, comparing (3.109) with (3.98) we find For |z 1 |, |z 2 | ≪ |y| the sextet combination of I is hence suppressed compared with the antitriplet one, which in its turn is small compared with the singlet combination 1 F . We finally note that, in contrast to the case of F , the limit |K d | ≪ K y just discussed does not give the same result as the large-N limit. In the latter one finds with relative corrections of order K y /(N K d ). This expansion is obviously not useful if one has |K d | ≪ K y . More generally, the large-N limit for the ζ evolution of I will only be useful in kinematical regions where K y /(N K d ) is small enough for N = 3.

Collinear factorization
The analysis in the previous sections was concerned with transverse-momentum dependent (TMD) factorization, i.e. with cross sections differential in transverse momenta that are small compared with the large scale. We now turn to collinear factorization, adequate for cross sections with integrated transverse momenta. We point out the main changes compared with TMD factorization but do not work out the formalism in detail. As in previous sections, we first recapitulate the situation for single Drell-Yan production. The main changes compared with TMD factorization are as follows.
• We recall from section 3.2.2 that, after a complex contour deformation that avoids the Glauber region, the effect of soft gluon exchange is described by a soft factor S q (z). Integration of the cross section (3.35) over q sets z equal to zero in this factor. Because the cancellation between real and virtual graphs gives S q (0) = 1 as discussed in section 3.3.1, there is no net effect of soft gluons in the q T integrated cross section.
With the elementary soft factor S q reduced to unity, subtractions of soft-gluon contributions as discussed in section 3.2.3 are not required, neither for the parton distributions nor for the hard-scattering subprocess.
• Setting z = 0 in the quark and antiquark distributions f (x, z), which is equivalent to integrating f (x, k) over k, gives rise to short distance singularities in addition to those that are removed by defining the distributions with renormalized quarks fields and Wilson lines. The dependence on the ultraviolet subtraction scale µ is described by the well-known DGLAP evolution equations.
The rapidity divergences (from gluons with small ℓ + and large ℓ − ) that prevent us from taking lightlike Wilson lines in the definition of f (x, z) cancel between real and virtual corrections when z = 0 [116]. Indeed, the relevant one-loop graphs are those in figure 27b and c (without the derivative −∂/∂y v ), and the approximation discussed in section 3.4 , which connects these graphs to the soft factor S q (z), is valid for any ℓ − as long as ℓ + is small.
Collinear parton distributions can hence be defined with lightlike Wilson lines and do not depend on a parameter ζ. Correspondingly, the q T integrated cross section is free of Sudakov logarithms. In the operator definition of f (x, µ), the Wilson lines (3.113) The sections of the paths that go to infinity in W † (− 1 2 z; v) and W ( 1 2 z; v) have cancelled, and a path of finite length between − 1 2 z and 1 2 z remains.
• The hard-scattering subprocess now receives radiative corrections not only from virtual graphs but also from real ones, since emission of partons with large transverse momenta in the final state is permitted once we do not fix the transverse momentum q T of the Drell-Yan photon. As already mentioned, a subtraction for the soft-gluon region is not required in this case, in contrast to the situation for TMD factorization discussed in section 3.2.3. Subtractions are however needed for the regions where momenta are collinear to one of the partons entering the hard subprocess. These subtractions must be performed in a way that matches the ultraviolet subtractions in the parton densities. In particular, the µ dependence due to collinear subtractions in the hard subprocess has to cancel against the µ dependence of the parton distributions in the cross section.
Let us now investigate the situation for double Drell-Yan production, limiting ourselves to a one-loop analysis as we have done throughout the preceding sections. A key to understanding the role of soft gluons is to set z i = 0 in S qq (z i , y), which results from integrating the cross section (3.33) over q i . Our discussion in section 3.3.1 implies that at z 1 = z 2 = 0. The one-loop contributions to 11 S qq cancel between the vertex corrections 25a and the real graphs 25b 1 , in full analogy with the case of S q discussed above. In 18 S qq and 81 S qq we have a cancellation between the real graphs 25b 2 and the virtual graphs 25c. We see from (3.114) that in the q T integrated cross section for double Drell-Yan production the contributions from color singlet and color octet distributions decouple from each other, and that they have a different behavior concerning soft gluon exchange. In the term with color singlet distributions 1 F we have a cancellation of soft gluon effects, in full analogy to single Drell-Yan production. Also, the graphs for the hard-scattering subprocess are exactly as in the single Drell-Yan process, with a cancellation of the soft-gluon region but with necessary subtractions for the regions of collinear parton momenta. From our above discussion it follows that collinear two-parton distributions 1 F can be defined with the same operators as their single-parton analogs, with lightlike Wilson lines W [y − 1 2 z 1 , y + 1 2 z 1 ] and W [− 1 2 z 2 , 1 2 z 2 ] between quark and antiquark fields, and that their contribution to the q T integrated cross section is free of Sudakov logarithms. Like their single-parton counterparts, the distributions have ultraviolet divergences; the scale dependence that follows from their subtraction will be discussed in section 5.3.2.
The contribution of collinear color-octet distributions 8 F is quite different. Because real and virtual graphs have different color factors, soft gluon effects do not cancel between them, and their net effect is described by 88 S qq (y). As a consequence, the different factors in the cross section formula require soft subtractions, as they do in the case of measured transverse momenta. Since the color indices of [q(− 1 2 z 2 )W † (− 1 2 z 2 ; v)] k ′ and [W ( 1 2 z 2 ; v)q( 1 2 z 2 )] k are not contracted, the two Wilson lines do not merge into a single one of finite length, and the same holds for their analogs with arguments y − 1 2 z 1 and y + 1 2 z 1 . The vector v in the Wilson lines cannot be taken lightlike, so that collinear octet distributions will depend on a parameter ζ. The resulting Collins-Soper equation gives rise to Sudakov logarithms, which suppress the color octet contribution to the q T integrated cross section. This important result was already obtained in [121], based on the observation that in the hard-scattering subprocesses there is no cancellation of the soft-gluon region. An adequate scale µ 0 for the initial condition of the CS equation will in this case be a hadronic scale, inverse to the typical distance |y| between the two scattering partons.
Let us finally take a look at the interference distributions I a 1 ,ā 2 . From (3.56) we find a soft factor There is hence no cancellation of soft-gluon effects, so that a formulation of collinear factorization will in this case be similar to the one for the color octet distributions 8 F just discussed, with the additional complication of mixing between the color singlet and octet channels.

Spin structure
Multiparton distributions have a nontrivial spin structure because the polarizations of different partons can be correlated among themselves, even in an unpolarized proton. In the following two sections we first investigate some general properties of spin correlations between two quarks and then show that they have observable consequences in multiple scattering cross sections. We will encounter several examples for parton spin correlations in section 5.2.2.

Spin decomposition
Let us first take a closer look at the spin dependence of the two-quark distributions F a 1 ,a 2 introduced in (2.86), making use of rotation and parity invariance. We always assume that the hadron is unpolarized, i.e. that the matrix element (2.86) is averaged over the hadron spin. The simplest cases are the distributions which are parity even, i.e. scalar functions. By contrast, the distributions F q,∆q and F ∆q,q are parity odd, i.e. pseudoscalar functions. Their general form is where f 1 , f 2 and f 3 are scalar functions with the same arguments as in (4.1). The scalar functions are in general neither even nor odd in k i or y since their dependence on k 1 y, k 2 y and k 1 k 2 is not constrained by symmetry. Note that the three two-dimensional vectors y, k 1 and k 2 are linearly dependent, so that the three cross products in (4.2) are linearly dependent as well. Expressing e.g. y as a linear combination of k 1 and k 2 one obtains and can thus write F q,∆q as ǫ jj ′ k j 1 k j ′ 2 times a single scalar function. However, that scalar function is singular when k 1 and k 2 become collinear, as is evident from the denominators in (4.3). To avoid such artificial singularities, one can use (4.2) if it is necessary to make the appearance of ǫ jj ′ explicit.
Using that 1 2 (1 ± γ 5 ) projects on quarks with definite helicity, one can readily identify the combinations of quark polarizations that are described by the above functions. In a schematic notation one has where the superscript in q ± denotes the quark helicity. The distribution F ∆q,∆q thus describes the degree to which the two quark helicities are aligned rather than antialigned, whereas F q,∆q and F ∆q,q describe the correlation between the helicity of one of the quarks and one of the cross products in (4.2).
To illustrate that spin correlations between two partons need not be small, let us consider the simple case of a SU (6) symmetric three-quark wave function of the proton. Its spin-flavor part reads where + and − respectively indicate that the quark spin is aligned and antialigned with the proton spin. As is well known, this wave function gives ∆u/u = 2/3 and ∆d/d = −1/3 for the longitudinal polarization of u and d quarks, which reproduces at least the trend of what is empirically found for the lowest x moments of the polarized quark densities. For two-quark distributions one finds and thus an appreciable correlation between the longitudinal polarizations of the quarks. Of course, the study of a three-quark wave function tells us little about partons with x ∼ 10 −2 or smaller, which are of particular relevance for LHC phenomenology. To the extent that they are known, polarized single-parton densities in this x range are small compared with their unpolarized counterparts, which means that there is only a weak spin correlation between a small-x quark and the proton as a whole. This is not too surprising, given that a small-x quark and the proton are far apart in phase space. It does however not imply small spin correlations between two quarks that have small but comparable momentum fractions x 1 ∼ x 2 and are thus closer in phase space. How large such correlations are is an important open question.
The distributions defined with one or two tensor operators O j δq are associated with transverse quark polarization, since 1 2 (γ + + s j iσ j+ γ 5 ) projects on quarks with a transverse spin vector s j . We now discuss their parametrization in terms of scalar or pseudoscalar functions. Let us begin with F j ∆q,δq and F j δq,∆q , which transform like two-dimensional vectors. They can hence be written as a sum of three scalar functions that are respectively multiplied by y j , k j 1 and k j 2 . Only two of these functions are independent because of the linear dependence of the three vectors. A minimal parametrization is obtained by taking y j andỹ as basis vectors. This gives where we have inserted the proton mass M on the r.h.s. so that f and g have the same mass dimension as F . Here and in the following we denote scalar functions by f and pseudoscalar ones by g. The latter can be represented in the same way as F q,∆q . If one wants to avoid pseudoscalar functions, one can replace the basis vectorỹ j by Since bothỹ and k y are orthogonal to y, they must be proportional to each other, and explicitly one findsỹ If one inserts this into (4.8) then k j y is multiplied by scalar functions, which are however singular when k 1 + k 2 and y become collinear. One could replace k 1 + k 2 in (4.9) by another linear combination of k 1 and k 2 , but this would only move the singularities to a different part of phase space.
The distributions F j q,δq and F j δq,q transform like axial vectors, so that one has A decomposition in terms of scalar functions can be obtained by replacing y j with ǫ jj ′ k j y . The tensor distribution F jj ′ δq,δq can finally be written as + y jỹj ′ +ỹ j y j ′ M 2 g s δq,δq + y jỹj ′ −ỹ j y j ′ M 2 g a δq,δq . (4.12) Notice that the four basis tensors t jj ′ p in this decomposition are orthogonal to each other, i.e. t jj ′ p t jj ′ q ∝ δ pq with p, q = 1, 2, 3, 4. Contracting (4.12) with the transverse polarization vectors s j 1 s j ′ 2 of the quarks, we see in particular that f δq,δq goes with s 1 s 2 and thus describes the correlation between the two transverse quark spins.
In summary, we can represent the spin structure of F a 1 ,a 2 (x i , k i , y) by eight scalar and eight pseudoscalar functions for each combination of quark flavors. The pseudoscalar functions can be traded for scalar ones, which have however artificial singularities for particular values of the vectors y, k 1 and k 2 .
If one integrates over transverse momenta to obtain collinear distributions, one finds because one cannot construct a pseudoscalar function with only one vector y. Likewise, the functions g in (4.8), (4.11) and (4.12) integrate to zero, so that we are left with (4.14) The eight functions f on the right-hand side now depend on x 1 , x 2 and y 2 and are obtained from their counterparts in (4.1), (4.8), (4.11) and (4.12) by integration over k 1 and k 2 . The above decompositions are given for distributions in a right-moving proton. For a left-moving proton one has to change the sign of ǫ jj ′ and hence ofỹ and of all pseudoscalar functions, see our remark below (2.100). Analogous decompositions can be written down for distributions F a 1 ,ā 2 , Fā 1 ,a 2 , Fā 1 ,ā 2 that involve antiquarks, as well as for interference distributions I a 1 ,ā 2 and Iā 1 ,a 2 .
Symmetry properties The terms appearing in the decompositions (4.1) to (4.14) are consistent with rotation and parity invariance. Let us now discuss their symmetry properties. Using that the operators in (2.80) satisfy O * a (y i , z i ) = O a (y i , −z i ) one finds that the distributions F a 1 ,a 2 (x i , k i , y) are real valued, For distributions that are purely defined in momentum or position space, see (2.9) and (2.12), this implies These functions are in general not real-valued, nor are the scalar or pseudoscalar functions one can introduce to parameterize them in analogy to (4.1) to (4.12).
For the symmetry properties of parton distributions with respect to time reversal, the Wilson lines appearing in their definition are essential. As we argued in section 3.2.1, multiparton distributions involve the past pointing Wilson lines W given in (3.13). Upon time reversal these turn into the future pointing Wilson lines W ′ in (3.17). A distribution is called T even (odd) if it is even (odd) under time reversal without taking into account this change of Wilson lines. The time reversal invariance of strong interactions thus implies that T odd distributions are only nonzero thanks to Wilson line effects; a prominent example from spin physics is the Sivers distribution function [122]. However, time reversal does force distributions to vanish if they are T odd and have Wilson lines that are invariant under time reversal. This is the case for the Wilson lines along a finite lightlike path that appear in collinear single-parton densities and in the collinear two-parton distributions 1 F in the color singlet sector, as we discussed in section 3.5. By contrast, collinear color octet distributions 8 F , as well as interference distributions 1 I and 8 I, have Wilson lines that do change under time reversal.
After these preliminaries we can now investigate the time reversal properties of twoquark distributions. We find with sign factors η q = +1 and η ∆q = η δq = −1, where the superscripts indicate the type of Wilson line in the matrix element defining the distributions. The relations (4.15) to (4.17) also hold for the distributions F a 1 ,ā 2 , Fā 1 ,a 2 , Fā 1 ,ā 2 with antiquarks and for the interference distributions I a 1 ,ā 2 , Iā 1 ,a 2 .
Since the scalar functions parameterizing F W a 1 ,a 2 (x i , k i , y) are in general neither even nor odd in y, they are not T even or odd either. The scalar functions that parameterize the collinear distributions in (4.14) are however even in y. As a consequence, F j ∆q,δq (x i , y) and F j δq,∆q (x i , y) are T odd and all other distributions in (4.14) are T even. For the color singlet sector this implies whereas the corresponding color-octet distributions 8 F j ∆q,δq (x i , y) and 8 F j δq,∆q (x i , y) can be nonzero due to the Wilson lines appearing in their definitions. Analogous statements hold for the corresponding distributions with one or two antiquarks. Collinear interference distributions 1 I and 8 I are not restricted by time reversal invariance.

Spin effects in gauge boson pair production
In this section we show that the quark spin correlations discussed in the previous section have observable consequences in multiparton interactions. As we did earlier in this paper, we consider the production of a pair of gauge bosons γ, Z or W . We include the decay of each boson into a lepton pair, which carries information on the spin state of the gauge boson. While these processes have a rather small cross section, they may be suited for experimental studies due to their clean final-state signature. We do not present a full analysis here, but highlight the effects of selected parton spin correlations.
For simplicity we limit our attention to those distributions that do not involve explicit vectors y orỹ on the r.h.s. of the decompositions in the previous section, i.e. to F q,q , F ∆q,∆q and the term δ jj ′ f δq,δq in F jj ′ δq,δq . For definiteness we analyze the graph of figure 6a, with two quarks emitted from the right-moving proton and two antiquarks from the left-moving one. We approximate the transverse momenta q i of the bosons by zero when calculating their production and decay, as deviations from this limit are suppressed by powers of q T /Q. The partonic cross section for the production of a lepton pair can be written as the product of a production tensor of the boson and a tensor for its decay, where µ is associated with the boson in the amplitude and µ ′ with the one in the complex conjugate amplitude. For unpolarized or longitudinally polarized quarks one easily finds with coefficients A and B depending on kinematic variables and the vector and axialvector couplings of the gauge boson to the quark q. In the case of a photon one has B = 0. The transverse tensors g µµ ′ ⊥ and ǫ µµ ′ ⊥ have as nonzero components g 11 ⊥ = g 22 ⊥ = −1 and ǫ 12 ⊥ = −ǫ 21 ⊥ = 1. From (4.20) it follows that the overall cross section depends on the combinations F q,q Fq ,q + F ∆q,∆q F ∆q,∆q and F q,q F ∆q,∆q + F ∆q,∆q Fq ,q (4.21) of multiparton distributions. The contraction of g µµ ′ ⊥ and ǫ µµ ′ ⊥ with the vector boson decay matrices results in different angular distributions of the leptons. If one integrates over their angles then only the contribution from g µµ ′ ⊥ remains. We thus find that nonzero values of F ∆q,∆q and F ∆q,∆q modify both the total rate and the lepton angular distribution compared with the contribution from the unpolarized term F q,q Fq ,q .
We now turn to transverse quark polarization. In this case the production tensor from each hard scattering depends also on the transverse indices associated with the quark polarization. We recall that the quark field bilinearsq iσ +j γ 5 q are chiral odd, so that in the helicity basis they correspond to quarks or antiquarks with opposite helicities in the scattering amplitude and its conjugate. As a result transverse quark polarization does not contribute to the production of a W boson. Keeping only the term δ jj ′ f δq,δq in the decomposition (4.12) of F jj ′ δq,δq and the corresponding term in the decomposition of F kk ′ δq,δq , one finds for the neutral bosons γ or Z where the polarization indices µ, µ ′ belong to the boson with momentum q 1 and the indices ν, ν ′ to the boson with momentum q 2 . We observe that the polarization indices of the two bosons are entangled in (4.22). Contracting with the well-known boson decay matrices, one obtains an azimuthal dependence like cos(2ϕ), where ϕ is the relative azimuthal angle between the two leptons (as opposed to the antileptons). 13 We thus obtain the important result that a correlation between transverse quark and antiquark spins, as expressed by the distribution f δq,δq in (4.12), leads to a correlation between the decay planes of the two produced bosons.
It is instructive to rewrite the production tensors in terms of boson polarization vectors ǫ + = −(0, 1, i, 0)/ √ 2 and ǫ − = (0, 1, −i, 0)/ √ 2, which respectively correspond to angular momentum +1 or −1 along the z axis. One finds We can easily understand why each boson is transversely polarized. Recall that a massless quark and antiquark can only annihilate into a vector boson if their helicities are coupled to ±1. Since we neglect the transverse momentum of the bosons, their angular momentum along z must also be ±1. The tensors g µµ ′ ⊥ and iǫ µµ ′ ⊥ in (4.23) correspond to the same boson polarization in amplitude and conjugate amplitude and thus do not give rise to an azimuthal dependence in the leptonic decays, but they do give different distributions in the polar angles of the leptons, or equivalently in their rapidities. By contrast, (4.24) involves the interference between J z = 1 and −1 for each of the bosons, which readily translates into the cos(2ϕ) dependence already mentioned.
It is natural to expect that spin correlations between partons also lead to angular correlations in the final state for other double-scattering processes, such as the production of two dijets. In this case two-parton distributions involving linear gluon polarization can contribute as well. We note that in the analysis of [9] uncorrelated dijet planes were taken as a characteristic feature of the double-scattering mechanism. This is only adequate if parton spin correlations in the proton are negligible.
Returning to four-lepton production, let us compare our results for double hard scattering with the contribution from a single qq annihilation, remaining in kinematics where q 1 and q 2 can be neglected compared with Q. The corresponding Feynman graphs involve either quark exchange in the t or u channel, or an intermediate boson in the s channel in case one or both final-state bosons are charged, see figure 28a and b. In addition, the four-lepton final state can be produced by graphs as in figure 28c, where only one lepton pair comes from the decay of a vector boson. Such graphs were recently discussed and termed "single-resonance graphs" in [13]. They should be taken into account unless each lepton pair has an invariant mass inside the Z or W mass peaks.
The dependence of the single-scattering cross section on the azimuthal angle ϕ between the leptons can be deduced from symmetry arguments since we set q 1 and q 2 to zero in the parton-level process. Let us assume that the initial qq pair has total angular momentum J z = 1 along the z axis. This must then hold for both the scattering amplitude and its conjugate, since a single quark or antiquark in an unpolarized proton is unpolarized. One of the lepton pairs originates from the decay of a gauge boson and is thus in a partial wave with J = 1. The possible combinations of (J z 1 , J z 2 ) for the two lepton pairs are thus (2, −1), (1, 0), (0, 1) and (−1, 2), where the first and last possibilities are only possible for single-resonance graphs. In the cross section this gives a ϕ independent term and a modulation with cos ϕ from all graphs, as well as modulations with cos(2ϕ) and cos(3ϕ) from single-resonance graphs. The same angular terms are obtained when the initial qq pair has J z = −1, whereas the configuration where the pair has J z = 0 decouples in the hard-scattering graphs of figure 28. In summary, the cos(2ϕ) modulation we found in the double-scattering mechanism competes with single-scattering contributions involving single-resonance graphs. 14 One may hope that the two sources of cos(2ϕ) dependence can be separated by a more detailed analysis -making for instance use of the fact that the single-resonance graphs also give a cos(3ϕ) term -but this issue must be left to a dedicated study.

Mellin moments and lattice calculations
If one takes Mellin moments in x 1 and x 2 of the color singlet distributions 1 F , then the light-cone operators O a in their definition turn into local operators. The corresponding moments of single-parton densities can be calculated in lattice QCD, and we will now investigate to which extent the same can be done for two-parton distributions. From our discussion in section 3.5 it follows that the Mellin moments of color octet distributions 8 F do not involve local operators because of their Wilson line structure. We therefore limit ourselves to the color singlet sector.
Using the relation (2.88) and its analogs for F a 1 ,ā 2 and Fā 1 ,ā 2 , one obtains double Mellin moments with σ q = σ δq = +1 and σ ∆q = −1. For each label δq the moments M n 1 ,n 2 a 1 ,a 2 and the corresponding operator on the r.h.s. carry an additional transverse Lorentz index not displayed in (4.25). On the r.h.s. we have the twist-two quark operators familiar from the operator product expansion, where Let us rewrite the r.h.s. of (4.25) in a manifestly covariant form. We first introduce the covariant decomposition where the ellipsis represents terms with uncontracted vectors y µ and terms involving the metric tensor g µν . The reduced matrix element O n 2 q O n 1 q can only depend on the invariants py and y 2 . We then choose a frame where p = 0 and y + = 0, so that py = p + y − and y 2 = −y 2 . This allows us to write (4.25) in the desired form  A corresponding representation is readily obtained for M n 1 ,n 2 ∆q,∆q . For one or two polarization labels δq the analogs of (4.27) involve the tensor structures in the decomposition (4.14) of the two-quark distributions.
The matrix element in (4.27) and its counterparts with polarized quarks can be evaluated on a lattice in Euclidean spacetime if one chooses y 0 = 0. This is rather similar to lattice studies of transverse-momentum dependent single-quark distributions [123,124], with the main difference that the operators taken at different spacetime points are single quark fields in that case, whereas they are gauge invariant bilinear operators here. The restriction y 0 = 0 entails where p and y denote the spacelike three-vectors. Thus, the integral over all py at fixed y 2 on the r.h.s. of (4.28) can unfortunately not be evaluated from results on a discrete Euclidean lattice, where the maximal momentum is fixed by the lattice spacing. This is completely analogous to the single-quark case, as discussed in [124]. Despite this limitation of principle, we hope that lattice data in a certain range of py and y 2 will in the future provide genuinely nonperturbative information about the behavior of multi-parton distributions. We note that a lattice calculation has been reported in [125] for the correlation function of two vector currents at equal time in a proton at rest. This corresponds to setting n 1 = n 2 = 1 and py = 0 in (4.27). The reduced matrix element O n 2 a 2 O n 1 a 1 at py = 0 is directly related to an integral of the two-quark correlation function Φ(k 1 , k 2 , r) defined in (2.75). The relative plus-momentum r + is integrated over in that case, rather than being set to zero as required for the distributions F (x i , k i , r) that appear in double hard-scattering cross sections.

Relation with generalized parton distributions
In section 2.1.5 we derived an approximate relation between multi-and single-parton distributions in a model theory with scalar partons. We now extend this relation to distributions for two quarks or antiquarks, taking into account the different combinations of fermion number and color. For the time being we neglect aspects related to the proton spin, which will be discussed in section 4.3.1. The distributions we will deal with are The general result (2.69) for n scalar partons readily carries over to the two-quark distributions 1 F : where the impact parameter dependent single-quark distributions f a (x, z; b) are defined in analogy to the scalar case in (2.64) and (2.65). Setting z 1 = z 2 = 0 in (4.31), one obtains collinear distributions on both sides and has the probability interpretation represented in figure 4. As a counterpart to (2.72) one can transform the relation (4.31) into transversemomentum space, where it reads with distributions f a (x, k; ∆) defined in analogy to (2.66). Integrating over k i we obtain the relation recently given in [59]. Relations analogous to (4.31) and (4.32) are obtained for 1 F a 1 ,ā 2 by replacing the label a 2 withā 2 on both sides.
To reduce 1F to single-particle distributions we could repeat our earlier derivation that started with (2.59). We find it more convenient to work in the transverse-momentum rather than impact parameter representation. We insert a complete set |X of intermediate states between the color singlet operators (q 4 Γ a 2 q 2 ) and (q 3 Γ a 1 q 1 ) and assume that single-proton intermediate states dominate. We then have After an appropriate shift of the position arguments in the bilinear field operators and a change of integration variables from y, z 1 , z 2 to u 0 = 1 2 (z 1 − z 2 ), u 1 = 1 2 (z 1 + z 2 ) + y and The integrations over u − 0 and u 0 fix the momentum p ′ of the intermediate state.
In particular, its plus-momentum is p ′+ = (1 − x 1 + x 2 )p + , which reflects that the operatorq 3 Γ a 1 q 1 describes the emission of a quark with plus-momentum x 1 p + and the reabsorption of a quark with plus-momentum x 2 p + . The matrix elements in the approximation (4.34) are thus given by generalized parton distributions (GPDs), which play a prominent role in the description of hard exclusive processes, see [126][127][128] and the reviews [129][130][131]. To evaluate the unapproximated form in (4.33) one would need the corresponding matrix elements for all transitions p → X. This is obviously impractical, although for selected transitions to single baryons, e.g. for p → ∆(1232), some information can be obtained [129,131]. GPDs are defined by where P = 1 2 (p + p ′ ) and Γ a is one of the matrices in (2.81). The parameter is often called skewness. One finds that k is the average transverse momentum of the two quark legs and x their average plus-momentum divided by the average plus-momentum P + of the proton states. For ease of notation we do not indicate the polarization states of the protons, which are in general different. A parameterization of the matrix elements (4. 35) for spin 1/2 hadrons in terms of scalar functions can be found in [132]. Invariance under the transverse boost specified by v → v − (v + /P + ) P gives the relation f a (x, ξ, k; p, we can thus rewrite the relation (4.34) as where k ± = k 1 ± k 2 and In complete analogy one derives where the generalized parton distributions fā for antiquarks are defined by replacing O a (0, z) with Oā(0, z) in (4.35). One finds fā(x, ξ, k; ∆) = σ a f a (−x, ξ, −k; ∆) (4.41) with the same sign factors σ q = σ δq = +1 and σ ∆q = −1 that appeared in (2.88). We also note that generalized parton distributions with positive and negative skewness parameter are easily related to each other by taking the complex conjugate of (4.35). For the distributions 1F a 1 ,ā 2 and 1Ĩ a 1 ,ā 2 we obtain where we have again k ± = k 1 ± k 2 but now In this case we have |x| ≤ ξ, which describes the emission of a quark-antiquark pair. Again, this could be anticipated from figure 5 since now the parton lines combined to color singlets are {12} and {34}, with each pair being on the same side of the final-state cut in the double parton distribution. An important difference between the approximations for 1F , 1 I, 1Ĩ and the one for 1 F given in (4.31) is that the generalized distributions on the r.h.s. of (4.38), (4.40) and (4.42) do not reduce to collinear functions if we integrate over k 1 and k 2 , because these momenta appear in their fourth arguments.
The preceding derivations can easily be extended to distributions describing interference between different quark flavors. The distributions corresponding to figure 7 are defined with bilinear operatorsdΓu orūΓd. For a proton target, the ground state in the sum over intermediate states inserted between the two operators is then a neutron. Isospin symmetry relates the resulting matrix elements to matrix elements in the proton: n|d Γu|p = p|ūΓd|n = p|ūΓu|p − p|d Γd|p . Under the assumption of SU (3) flavor symmetry one can derive similar relations for distributions involving strange quarks [129,131].
Although the relation between multiparton distributions and GPDs is an approximation whose accuracy is not easy to estimate (and although our current knowledge of GPDs is far less advanced than that of ordinary parton densities) this relation provides opportunities to obtain information about multiple interactions that is hard to get by other means. One example are the different interference distributions discussed above, which are so far entirely unknown. Perhaps even more important is that GPDs give rather direct information about the distribution of single partons in the impact parameter b, which is Fourier conjugate to a transverse momentum transfer ∆ that can be measured in physical processes. This is in stark contrast to the interparton distance y in two-parton distributions, which appears as an integration variable in cross section formulae like (2.91) and is not directly related to observable kinematic quantities. We already mentioned in section 2.6 that studies of GPDs give evidence for a correlation between the longitudinal momentum and the impact parameter of partons in the proton. For values of the momentum fraction where such a correlation is strong, it is hardly plausible that there should be no correlation between x 1 , x 2 and y in two-parton distributions, even if there were important corrections to approximations like (4.31).

Spin correlations
We now take a closer look at the role of the proton spin in the approximate relation between double and single parton distributions, which we have glossed over up to now. For our purpose, a suitable choice to describe the spin state of a proton is the light-cone helicity λ = ± 1 2 , which is equal to the usual helicity in a frame where the proton plus-momentum p + tends to infinity (see e.g. [133] or [130, section 3.5.1]). We denote the corresponding momentum eigenstates by |p + , p, λ .
When inserting intermediate proton states between the two color singlet operators in a two-parton distribution for an unpolarized proton, we schematically have 44) or an analogous relation with states |p + , b, λ of definite transverse position. The sum over states on the r.h.s. thus includes single-parton matrix elements where the proton helicity differs in the bra and the ket state. Corresponding sums over polarization states should hence be inserted in the relations (4.31), (4.38), (4.40) and (4.42).
To discuss the implications of this observation, let us focus on the collinear distribution 1 F (x i , r). At the level of matrix elements we have a relation where the superscripts λ and λ ′ of f a denote the proton helicities as in (4.44) and an average over proton helicities is understood in 1 F . A standard decomposition of the spin dependence of generalized parton distributions involves two distributions H and E for unpolarized quarks and two distributionsH andẼ for longitudinally polarized quarks. The distributionẼ does not contribute in the case of zero skewness ξ = 0 we are dealing with here. Using the conventions and the matrix elements for definite proton light-cone helicity in eq. (54) of [130], we have where M is the proton mass and H q (x, ξ, t), E q (x, ξ, t) andH q (x, ξ, t) are the usual GPDs defined in [130]. H q andH q are the respective generalizations of the unpolarized and longitudinally polarized quark densities q and ∆q. Changing the basis of the proton spin states, one can see that E q is related to unpolarized quarks in a transversely polarized proton [65]. Inserting (4.46) into (4.45), we get The term with H corresponds to the simplest approximation of the two-parton distribution as a product of single-parton distributions, whereas the one with E appears in addition. E describes a correlation between the position of a single quark and the proton spin, and (4.48) shows how such a correlation may lead to a correlation between two quarks in an unpolarized proton. It is difficult to say whether this correction term alone already provides an improved approximation of 1 F q,q (x i , r), but one may take it as an indicator for the possible departure from a simple factorized ansatz that neglects all correlations. In cases where the correction from E is large, it is plausible to expect that the factorized ansatz involving only H will fail. Applying the same method to the distribution 1 F ∆q,∆q , we obtain from (4.47) Again one should be cautious regarding the validity of this approximation. For the region of small but similar x 1 and x 2 we already argued in section 4.1.1 that one may well have sizeable correlations between the longitudinal polarization of two quarks, even if there is little correlation between the longitudinal polarizations of one quark and the proton as a whole. Conversely, in kinematics where the product of quark-proton spin correlations in (4.49) is sizeable it seems natural to assume that quark-quark spin correlations are sizeable as well.

Perturbatively large transverse momentum
So far we have treated multiple interactions as a two-scale problem, in which the virtualities q 2 1 , q 2 2 ∼ Q 2 define a large scale whereas the transverse momenta |q 1 |, |q 2 | and the scale Λ of nonperturbative interactions are treated as small. We now make a distinction between the different scales previously treated as small, requiring |q 1 | ∼ |q 2 | ∼ q T to be large compared with the hadronic scale Λ. We thus have a three-scale problem characterized by the hierarchy Λ ≪ q T ≪ Q . (5.1) Large q i implies that at least some of the transverse parton momenta k i andk i must be large. The occurrence of partons with large transverse momentum k T can be thought of as resulting from the perturbative splitting of partons with low k T , which leads to a factorization formula for transverse-momentum dependent parton distributions in terms of a hard-scattering kernel and collinear distributions. This significantly adds predictive power since collinear distributions depend on fewer variables than k T dependent ones.
Example graphs for the case of a single-quark distribution are shown in figure 29. The description based on such graphs was extensively used for spin effects and azimuthal correlations in Drell-Yan production [134][135][136] and semi-inclusive DIS in [117,136,137], building on the seminal work in [48,118].
This description carries over to the case of two-parton distributions and is discussed in section 5.1. In the subsequent sections we investigate a competing mechanism for the generation of high transverse momentum, in which the two partons with momentum fractions x 1 and x 2 originate from the perturbative splitting of a single parton. We will see that this mechanism has profound consequences for the theoretical description of multiple interactions.

Ladder graphs at large y
The single and double ladder graphs in figure 30 are natural generalizations of the ladder graph for a single parton density in figure 29a. In the following we concentrate on these ladder graphs, bearing in mind that in a covariant gauge there are further graphs with eikonal lines as in figure 29b. Those do not change the conclusions we obtain for the ladder graphs. Indeed, they are absent in the axial gauge vA = 0, where the Wilson lines in the definition of parton distributions reduce to unity (apart from pieces at infinity, as discussed at the end of section 3.2.1).
When interpreting the graphs in figures 29 and 30 it is important to bear in mind that they represent a separation of dynamics at different scales, with lines attached to the lower blob having virtualities of order Λ, whereas propagators in the upper part of the graphs are for virtualities of the order of the large transverse momentum q T . An important feature of figure 30 is that no hard gluons are exchanged between the parton lines that have different momentum fractions x 1 and x 2 at the top of the graphs. The requirement that both l − 1 2 r and l+ 1 2 r have small virtualities forces |r| to be of order Λ, which translates into interparton distances |y| of hadronic size in the Fourier transformed distributions F (x i , k i , y). Figure 30. Ladder graphs for the region of small r with large k 1 (a) or large k 1 and k 2 (b). The power behavior refers to F (x i , k i , r) and is discussed in the text. Here and in the following we omit the dashed line that indicates the final-state cut as in figure 29.

Power behavior
Let us first investigate the general power behavior associated with ladder graphs. In the following we refer to q T as the hard scale (compared with Λ), keeping in mind that q T is still much smaller than Q. We proceed in a similar way as in section 2.4. In particular, we use the modified parton correlation functions Φ ′ n , which contain a factor 1/ √ l + for each quark or antiquark of momentum l and in which pairs of quark fields are contracted with a Dirac matrix Γ a from (2.81). For the transition from k to m partons in the t channel, we correspondingly use hard-scattering coefficients V ′ k→m that include a factor √ l + for each incoming quark or antiquark and a factor 1/ √ l + for each outgoing one. Spinor indices in V ′ are contracted with an appropriate matrix 1 2 γ + , 1 2 γ + γ 5 , 1 2 iσ +j γ 5 for outgoing lines and with 1 2 γ − , 1 2 γ 5 γ − , 1 2 iσ −j γ 5 for incoming ones. V ′ is invariant under a boost along z and thus can only depend on the scale q T but not on Q (cf. the corresponding argument for Φ ′ in section 2.4). One thus obtains as one can easily check for the example graphs below. Note that compared with the hard-scattering amplitudes in (2.127) we now have 3m instead of m because V ′ includes the propagators of the outgoing partons, as is appropriate for the calculation of parton distributions. The power behavior for the single-ladder graph in figure 30a can be obtained from 3) The factor p + and the integrations over minus-momenta come from the definition of F , whereas k + 1 k + 2 compensates the corresponding factors in V ′ and Φ ′ . It is understood that V ′ includes a δ function for each parton line going across the final-state cut. This does not affect the power counting, since one could first consider the hard-scattering amplitude without cut and then take the appropriate discontinuity in the s channel. The momenta k 2 ± 1 2 r and l 1 ± 1 2 r attach to the parton distribution at the bottom of the graph and hence have virtualities of order Λ, whereas k 1 ± 1 2 r emerges from the hard scattering and hence has virtuality of order q T . As a result, the momentum components k − 2 , l − 1 , r − ∼ Λ 2 /p + and |l 1 | ∼ Λ are small and can be neglected in the hard-scattering kernel V ′ . We used this when rearranging the order of integrations in the second step. By contrast, the large components k − 1 ∼ q 2 T /p + and k + 1 , k + 2 , l + 1 ∼ p + are to be kept in V ′ . For the power behavior we obtain with (2.126) and (5.2). We recognize the 1/q 2 T behavior that is characteristic of the splitting of one parton (the incoming quark) into two partons (the outgoing quark and the gluon).
The power behavior for the double-ladder graph in figure 30b is obtained by the same type of analysis: If all transverse momenta are small, the distribution F (x i , k i , r) scales of course like Λ −4 .
Let us now see how the power behavior of the two-parton distributions translates into the power behavior of the cross section where we have omitted numerical factors as well as labels for parton species, spin and color. We have multiplied the cross section with s 2 for convenience, since this gives factors sσ(x ixi s) of order 1 on the r.h.s. To have both large q 1 and q 2 requires at lowest order in α s either a single-ladder graph in the distribution for each colliding proton, or a double-ladder graph in one of the distributions with no hard gluons in the other, as shown in figure 31. In both cases one has for the product of distributions, and the integration volume d 2 k 1 d 2k 1 δ (2) (q 1 − k 1 −k 1 ) is of order Λ 2 sincek 1 and thus k 1 − q 1 are restricted to be of size Λ. Similarly, one finds d 2 k 2 d 2k 2 δ (2) (q 2 − k 2 −k 2 ) ∼ Λ 2 in both cases, so that the overall power behavior is By similar arguments one finds that the power behavior remains the same at higher order in α s , when one can have more than two ladders in the graphs for the cross section. Any decrease by a factor Λ 2 /q 2 T in the product F (x i , k i , r) F (x i ,k i , −r) is compensated by an increase from Λ 2 to q 2 T in the integration volume over the transverse parton momenta k i ork i .

Factorization formulae
Let us now investigate the structure of the factorization formulae for ladder graphs. Still omitting spin and color indices for the moment, we write our result (5.3) as . (5.8) Figure 31. Ladder graphs contributing to the cross section for boson pair production at large q T .
The lower limit on the l + 1 integration reflects that the gluon with momentum l 1 − k 1 crosses the final-state cut and hence cannot have negative plus-momentum. Up to a numerical factor, the second expression in square brackets is just the transverse-momentum dependent double-parton distribution F (u 1 , x 2 , l 1 , k 2 , r) with u 1 = l + 1 /p + . The first factor in square brackets is invariant under boosts along z and can thus only depend on k 1 and l + 1 /k + 1 = u 1 /x 1 . We can write it as a numerical factor times k −2 1 P u 1 /x 1 , k 1 , where P is dimensionless. If P is a scalar it depends on k 1 only via k 2 1 and only because there are other dimensionful variables ζ and µ, which we have not displayed for ease of writing. For certain parton polarizations, P is a tensor with transverse indices and can hence depend on the components of k 1 , as discussed below.
We can finally Fourier transform (5.8) from r to y and then have where the factor 1/π has been chosen for convenience. Inserting this and its analog for F (x 1 ,x 2 ,k 1 ,k 2 , y) in the cross section formula (2.35), we have where we have used the δ function constraints in (2.35) to eliminate k 1 andk 2 . We can now approximate q 1 −k 1 ≈ q 1 and q 2 − k 2 ≈ q 2 , after which the integrations overk 1 and k 2 only concern the double-parton distributions, which are then integrated over both transverse-momentum arguments. The result is Only collinear two-parton distributions appear on the r.h.s., so that the relation (5.9) is only needed in the form Repeating the preceding arguments for the double-ladder graph, one obtains from the result (5.5) and for the contribution of figure 31b to the cross section, where again only collinear two-parton distributions appear. The analog of (5.12) for single-parton distribution reads At order α s the kernel P (x/u) is just the usual DGLAP splitting function, up to terms proportional to δ(1 − x/u) which will be discussed shortly. To see this, let us consider the k T integrated parton density defined with a naive cutoff, Using that the transverse-momentum dependent density depends only on the square of k we then have and comparing with the DGLAP equation for the l.h.s. we can identify the kernel in (5.15) as the familiar splitting function. The preceding argument is oversimplified in two respects. Firstly, the calculation of f (x, k) for large k only involves real graphs like those in figure 29 at leading order in α s , because to obtain a parton with large k one needs a recoiling parton in the final state. (Higher-order graphs can include virtual loops, so that our argument cannot be applied any more.) By contrast, the evolution equation for the collinear parton density f (x), which is integrated over all k, involves both real and virtual graphs at order α s . The latter give contributions proportional to δ(1 − x/u) to the DGLAP splitting kernels, which are absent from P (x/u) in (5.15). Secondly, the distribution f (x, k) must be defined with non-lightlike Wilson lines as discussed in section 3, which leads to a dependence on the parameter ζ defined in (3.58). Since the collinear distribution f (x) has no such dependence, it is the kernel in (5.15) that must depend on ζ. The explicit calculation in [117] shows that the ζ dependence of P (x/u) comes with a factor δ(1 − x/u), which in the light of our discussion in section 3.2.3 is plausible if one observes that the point x = u in (5.15) corresponds to infinite negative gluon rapidity in figure 29.
Let us mention that there is an analog of (5.15) for the distribution f (x, z) at small transverse distance z. The corresponding hard-scattering kernel differs again from P (x/u) by terms proportional to δ(1 − x/u). This is because f (x, z) is given by an integral d 2 k e izk f (x, k) over all k, so that already at order α s virtual graphs appear in addition to real ones.
We now turn our attention to the role of color in two-parton distributions at high transverse momentum, which we have glossed over so far. Since the graphs we are discussing do not connect parton lines with different momentum fractions x 1 and x 2 , the color coupling of the distributions on the left and on the right of the factorization formulae (5.12) and (5.13) are the same. For distributions 1 F (x i , k i , y) in the color singlet channel, the kernels P in (5.12) and (5.13) coincide with the one in (5.15) at least at leading order in α s , since the relevant hard-scattering graphs to be calculated are identical. The leading-order kernels for the color octet distributions 8 F (x i , k i , y) differ by an overall color factor from those for 1 F (x i , k i , y), which is the subject of the next section. Note that, unlike their color singlet analogs, the collinear color octet distributions appearing on the r.h.s. of (5.12) and (5.13) depend on ζ as discussed in section 3.5. Given our discussion of the cancellation or non-cancellation of soft contributions between real and virtual graphs in section 3.5, we expect that the kernels for the position space distributions 1 F (x i , z i , y) and 8 F (x i , z i , y) at small z i will differ by more than an overall color factor already at leading order. A systematic investigation of this is left to future work.
The power counting in section 5.1.1 and the discussion in the present section do not depend on whether the parton lines in the ladder graphs are quarks or gluons. As is well-known for single parton distributions, a quark with high transverse momentum can originate from a gluon with low transverse momentum and vice versa. The corresponding elementary ladder graphs are shown in figure 32 below.
We now discuss the spin structure of the ladder graphs. As we have seen in sections 2.2.1 and 2.2.2, there are three polarization combinations for each quark or gluon in a two-parton distribution, which we can choose as unpolarized (q, g), longitudinally polarized (∆q, ∆g) and transversely (δq) or linearly (δg) polarized. The possible transitions between these combinations in the factorization formulae (5.12) and (5.13) are restricted by symmetries. Transverse quark polarization δq is described by a chiral-odd operator q iσ +j γ 5 q and the chirality of light quarks is conserved in hard-scattering subprocesses, so that the only transitions for transversely polarized quarks are of the form δq → δq. In the longitudinally polarized sector one has all possible transitions between ∆q and ∆g on the left and on the right of (5.12) and (5.13), with transitions to other polarizations being forbidden by parity invariance. Likewise, one has all possible transitions between unpo-larized quarks and gluons. In addition, ladder graphs allow the transitions g → δg and q → δg from unpolarized collinear distributions to linearly polarized gluons at high k, as has been observed in the study of single Higgs production in [83,84]. Since δg corresponds to a helicity difference of two units between the gluon on the left and the gluon the right of the final-state cut, there is no collinear distribution for a single linearly polarized gluon in a proton, so that transitions δg → g and δg → q played no role in [83,84]. However, one finds that the corresponding hard-scattering kernels are nonzero and hence allow these transitions for two-parton distributions. One thus has all possible transitions between q, g and δg.
For distributions that involve polarizations δq or δg the kernels P in (5.12) and (5.13) are tensors with transverse Lorentz indices, constructed from δ jl and from the large transverse momentum k 1 or k 2 in the ladder. Explicit calculation at order α s shows that the kernel P jl δq δq for the transition from F l δq,a to F j δq,a (with arbitrary a) is proportional to δ jl . As a result, ladder graphs do not generate the distributions g s δq,δq and g a δq,δq in the decomposition (4.12) of F jj ′ δq,δq (x i , k i , y), given that they come with tensors that are absent in the collinear distributions F l l ′ δq,δq (x i , y) according to (4.14). This adds to the predictive power of the perturbative mechanism at large transverse momenta. For transitions involving linear gluon polarization we find kernels where the transition from F l l ′ δg,a to F jj ′ δg,a is described by P jj ′ ,l l ′ δg δg , the transition from F g,a to F l l ′ δg,a by P l l ′ δg g etc., and where k i is k 1 or k 2 . The second tensor in (5.18) is symmetric and traceless and describes two units of orbital angular momentum, which compensates the mismatch of helicities in the transitions g → δg, q → δg, δg → g and δg → q. For later use we note that terms involving this tensor vanish by rotation invariance when (5.9) or (5.13) is integrated over k 1 and k 2 .
The representation of the cross section derived in this section is based on a two-step procedure. In the first step we have used factorization to separate the annihilation processes qq → V into vector bosons V with mass or virtuality of order Q from transverse-momentum dependent two-parton distributions, in which the largest scale is q T . In a second step, we have used factorization to compute these distributions in terms of hard-scattering processes at scale q T and distributions that reflect physics at a hadronic scale Λ, where it turned out that in the cross section we only need the latter distributions integrated over k 1 and k 2 .
An alternative procedure is to first use factorization to represent the graphs in figure 31 as the product of collinear two-parton distributions and inclusive hard-scattering processes qq → V + X, where at lowest order in α s the unobserved system X consists of just one gluon. In a second step one can then simplify the corresponding hard-scattering kernels by taking the limit q T ≪ Q we are interested in. The relation between these two procedures has been studied in detail for single Drell-Yan production or for semi-inclusive deep inelastic scattering in [117] and [134][135][136][137]. An important property of the procedure using transversemomentum dependent distributions in a first step is that it permits the resummation of Sudakov logarithms with the method of Collins, Soper and Sterman [118]. Figure 32. Elementary ladder graphs for transitions between quarks and gluons.

Color factors and quark-gluon transitions
Let us now compute the color factors for the transitions from collinear to transversemomentum dependent two-parton distributions, restricting ourselves to the leading order in α s . For this it is sufficient to consider ladder graphs for the hard scattering, because graphs with eikonal lines as in figure 29b can be eliminated by choosing the gauge vA = 0.
Multiplying the color structure of the ladder graphs in figure 32 with the appropriate color matrix for the incoming partons, one obtains the following color transitions: Reversing the fermion lines in the ladder graphs changes the order of multiplication for the t matrices on the l.h.s. of the above relations. This leads to a change of sign on the r.h.s. for the transitions A g → 8 q and 8 q → A g, whereas transitions between 8 q and S g or between singlets are unchanged. This implies that the difference of distributions for q andq does not mix with gluons in the singlet or the symmetric octet channel but does mix with gluons coupled to an antisymmetric octet.
In the cases where there is mixing, the relation (5.12) for a double-parton distribution at large k 1 becomes a matrix equation, which can be written as where the arguments of F and P are as in (5.12). For polarizations δq and δg the distributions and kernels carry tensor indices, which were discussed in the previous section and will be omitted here. The matrix structure in (5.20) can be generalized to the relation (5.13) for double-ladder graphs, but the resulting expressions are rather cluttered with indices and will not be given here.
In the color singlet sector one has where n F is the number of quark flavors and where the second parton a may be an unpolarized or polarized quark, antiquark or gluon. 15 To obtain the color factors for the off-diagonal elements one must take into account the prefactors in the definitions (2.103), (2.118) and (2.123). In the upper left 2 × 2 submatrix of 1 P we recognize the structure of the mixing matrix in the usual DGLAP equations. Mixing in the symmetric octet sector involves the vectors If a indicates a gluon, one should replace 8 F q,a + 8 Fq ,a by S F q,a + S Fq ,a and 8 F q,a − 8 Fq ,a by A F q,a − A Fq ,a . The splitting matrices now read (5.24) 15 We note that in [47] the possibility of transitions between q, g and δg was overlooked, and only the mixing between q and g was considered.
There is no mixing for the combinations q [ 1 F q,a − 1 Fq ,q ], nor for the difference of distributions for two quark flavors, nor for matrix elements where the quark flavors differ on the two sides of the final-state cut as in figure 7. In these cases the behavior at large k 1 is described by (5.12) with P replaced by C F P qq for color singlet and by − 1 2N P qq for color octet combinations.
Experience with the usual parton densities tells us that gluons quickly dominate over sea quarks as one goes to momentum fractions below 0.1 (except possibly if one considers very low factorization scales). One therefore expects that for typical values of x 1 and x 2 at LHC, two-parton distributions at high transverse momentum are dominated by those combinations that can originate from gluons in (5.20).
Comparing (5.21) with (5.23) and (5.24) we find that the color factors are always smaller in the octet channels than in the singlet channel, with the biggest suppression occurring for P qq . In the large-N limit the singlet matrix 1 P has one eigenvalue N 2 P qq and two eigenvalues with color factors N for the submatrix in the g, δg sector. Both S P and A P have the same two eigenvalues, but with color factors N 2 instead of N , and another eigenvalue of order 1. One can hence expect a dominance of color singlet distributions for sufficiently large transverse momentum, which would significantly simplify the theoretical description and the phenomenology of multiple interactions. How strong the suppression of color octet channels is in given kinematics should, however, be studied quantitatively before drawing strong conclusions.
We have also calculated the color factors for higher color representations of gluon distributions, restricting ourselves to N = 3 as we did in (2.121). Mixing with quark distributions is of course not possible in this case. We find that the color factors for decuplet and antidecuplet distributions are zero, so that ladder graphs do not admit these color combinations, at least not at leading order in α s . For the 27 representation we obtain where a = g or δg. The color factor is equal to −1 and thus smaller in magnitude than the factors N or 1 2 N we have in the singlet and octet sectors, respectively. The color factors we have obtained agree with those given in [138], provided that one restores a missing factor 2/N in the expression of P 8f in eq. (54b) of that paper. 16 Our color factors for transitions in the gluon sector are also in agreement with eq. (A.6) in [140].
The splitting matrices for longitudinally polarized quarks and gluons in (5.20) are obtained from the upper left submatrices for unpolarized quarks and gluons in (5.21) to (5.24) by changing the kernels but keeping the color factors. Likewise, the splitting kernel P δq δq for transverse quark polarization comes with the same color factors as P qq . In both cases, it is again the color singlet sector that has the largest color factors and will therefore be enhanced at high transverse parton momentum.
Let us finally consider ladder graphs for the quark-antiquark interference distributions I represented in figure 5c. The color independent part of the splitting kernel is different Figure 33. Graphs for a quark-antiquark distribution that involve the perturbative splitting of one into two partons on both sides of the final-state cut. The power behavior refers to F (x i , k i , r) and is discussed in the text.
for distributions I and F because of their different spin structure (in one case a gluon is exchanged between a quark and an antiquark line and in the other case between two quark lines), but we shall not pursue this issue further here. We can, however, easily determine the color structure of the graphs. For definiteness, consider the exchange of a gluon between the two lines with momentum fraction x 1 and color indices j, j ′ . The color decomposition that remains invariant under this exchange is the one in (2.116), since For N = 3 we thus find a color factor 1 3 if the quarks are coupled to a sextet and − 2 3 if they are coupled to an antitriplet. Both factors are smaller than C F = 4 3 in the color singlet channel.

Parton splitting at high transverse parton momenta
We now turn to another mechanism that generates large transverse momenta in multiparton distributions: the perturbative splitting of one parton into two partons, both of which subsequently take part in hard-scattering processes. This mechanism turns out to be enhanced by powers of q T /Λ in the cross section. In addition, it leads to conceptual issues concerning the very notion of multiparton interactions. In the following sections we derive several results about the splitting mechanism, but we will be left with a number of open questions for future research.

Power behavior
There are a multitude of graphs involving the splitting of one parton into two partons, and in order to assess their importance we use power counting as our first guiding principle. Simple examples for parton splitting are shown in figure 33. They allow all transverse momenta k 1 , k 2 and r to be large. This leads to large virtualities for k 1 , k 2 and r, so that their minus components are large as well, At lowest order in α s one has disconnected graphs as in figure 33a, which leads to the kinematic constraint We will see shortly that this constraint plays a special role when two-parton distributions are combined to calculate a cross section. The power behavior of graph 33a can be determined using the same method as in section 5.1.1, and we have where the power counting of integration volumes is dr − ∼ dk − 1 ∼ q 2 T /p + and dk − 2 ∼ Λ 2 /p + in order to fulfill the constraint (5.28) (5.29) is proportional to the transverse-momentum dependent distribution of a gluon in the proton.
Starting at order α 2 s one has connected graphs as in figure 33b. The restrictions (5.28) are then lifted, and we obtain a power behavior The graphs just discussed describe the transition from two to four partons in the t channel. Let us compare them with transitions starting from three or four partons. The corresponding graphs admit a variety of topologies, and we shall not give a comprehensive treatment here. Important examples are shown in figures 34 and 35. They are subject to different kinematic restrictions: • graphs 34a and 35a require |k 1 + k 2 | ∼ Λ and are thus analogous to 33a, • in analogy to 33b, graphs 34b and 35b produce partons with unconstrained transverse momenta, • in graphs 34d and 35d one must have |k 1 + 1 2 r| ∼ Λ since the rightmost quark line is disconnected, • graphs 34c and 35c are subject to both constraints |k 1 + k 2 | ∼ Λ and |k 1 + 1 2 r| ∼ Λ.
The power behavior of the resulting two-parton distributions can be obtained by the same methods as previously and is given in the figures. We see that within a given kinematic group, the graph with the smallest number of partons initiating the hard scattering is dominant by power counting, i.e. two partons in cases a and b, and three partons in cases c and d. The graph with the leading power behavior also has the lowest power of α s . Figure 34. Example graphs for the transition of three to four partons in the t channel and the corresponding power behavior of F (x i , k i , r). They involve the splitting of one into two partons only on one side of the final-state cut.
c d Figure 35. As figure 34, but for the transition of four to four partons in the t channel.
We note that for graphs starting with four partons there are topologies leading to different kinematic constraints, such as those in figure 36. We shall not discuss these in the following. So far we have assumed that the transverse momenta k 1 , k 2 , r are all large. However, the graphs we have discussed remain under perturbative control in more restricted kinematics as well. In graph 33a for instance, we need large transverse momenta for the four upper parton lines, i.e. large k 1 ± 1 2 r (recall that k 2 ≈ −k 1 for this graph). This allows either r or k 1 to be small, as long as the other is large. Both configurations will be important in our further discussion. The power behavior we have derived above remains unchanged in those kinematic regions, as we shall see explicitly in section 5.2.2.
Cross section. Let us now see how the different contributions to multiparton distributions enter the cross section for large q 1 and q 2 . Taking the lowest-order parton splitting Figure 36. Ladder graphs with the kinematic constraint |k 1 − k 2 | ∼ Λ and the resulting power behavior of F (x i , k i , r).
contribution of figure 33a for both protons, we arrive at the graph in figure 37a. Both k 1 + k 2 andk 1 +k 2 are restricted to be of order Λ in this case, which implies In other words, the produced bosons must be almost back to back in transverse momentum. To determine the power behavior of the cross section, we note that the integration element d 2 k 1 d 2k 1 δ (2) (q 1 − k 1 −k 1 ) scales like q 2 T since k 1 can be freely chosen of size q T . Once this choice is made, k 2 can only differ from −k 1 by an amount of order Λ, so that d 2 k 2 d 2k 2 δ (2) (q 2 − k 2 −k 2 ) scales like Λ 2 . The corresponding constraint onk 1 +k 2 is then automatically fulfilled by virtue of (5.31). With d 2 r ∼ q 2 T and the scaling behavior (5.29) the cross section formula (5.6) then gives Going one order higher in α s , one has graphs as in figure 37b with a connected two-to-four parton transition on one side. The constraint (5.31) is then lifted, and q 1 and q 2 can be chosen independently. The integration elements in the cross section formula scale as in the previous case, but due to the stronger falloff in q T in (5.30), one now has At yet higher order in α s one obtains the same power behavior if both two-parton distributions contain a connected two-to-four parton transition: the extra factor Λ 2 /q 2 T from the two-parton distribution is compensated by an increase from Λ 2 to q 2 T in the loop phase space, since both k 1 and k 2 can then be chosen independently of order q T .
We note that both (5.32) and (5.33) contribute at the same power of Λ 2 /q 2 T if one integrates the cross section over q 1 and q 2 in a region of size q T . This is because the contribution (5.32) has a restricted phase space of order d 2 q 1 d 2 q 2 ∼ Λ 2 q 2 T . In the differential cross section, however, the contribution (5.32) gives a peak in the distribution of q 1 + q 2 , which is enhanced not only by a power of α s but also by q 2 T /Λ 2 . There are more contributions to the cross section with the same power behavior as the one we have just encountered. We recall from our discussion in section 2.4 that in the differential cross section, double parton scattering has the same power behavior as Figure 37. Graphs for the cross section with two-to-four parton transitions in the t channel for both colliding protons. Graph a only contributes when |q 1 + q 2 | ∼ Λ.
a b Figure 38. Graphs for the cross section with two-to-three parton transitions in the t channel for both protons. a b Figure 39. Graphs for the cross section with a single hard scattering.
the interference of two hard scatters in the amplitude with a single hard scatter in the conjugate amplitude (see graph 9a). If the two partons initiating the two hard scatters in the amplitude come from the splitting of a single parton, we have graphs like in figure 38. 17 For graph 38a one finds the same scaling behavior (5.32) as for graph 37a, and for graph 17 The single hard scattering to the right of the final-state cut proceeds through a loop in our example, because gluons have no direct coupling with electroweak gauge bosons. Other processes, like the production of two dijets, can proceed already at tree level. The powers of αs in our example are thus not representative of the generic case, whereas powers of Λ/qT are.
38b one finds the same scaling behavior (5.33) as for graph 37b. The same power behavior is again found for the case where the two gauge bosons are produced in a single hard scatter, both in the amplitude and its conjugate. The corresponding cross section formula can be found in (2.42). For incoming gluons the hard scattering proceeds through a loop as on the r.h.s. of graphs 38 a and b, whereas for incoming quarks or antiquarks one has graphs as those in figure 39. If |q 1 + q 2 | ∼ Λ then one needs no parton exchange of virtuality q T in any of the parton densities, and one immediately finds the same power behavior as in (5.32). If |q 1 + q 2 | ∼ q T then at lowest order in α s one parton distribution has large transverse momentum generated by a ladder graph as shown in fig. 39b. According to (5.15) one has f (x, k) ∼ α s /q 2 T for |k| ∼ q T , and for the cross section one obtains the same power behavior as in (5.33).
Adapting our discussion at the end of section 5.1.2 we see that the graph in figure 39b can be calculated either in terms of transverse-momentum dependent parton densities, one of which involves a ladder graph, or in terms of collinear parton distributions and the parton-level process qq → V 1 V 2 + g, where V 1 and V 2 denote the produced vector bosons. The result is the same in both cases.
In a similar way, figures 37 and 38 can be interpreted as graphs for two-boson production by a single hard scattering process at one-loop level, namely by gg → V 1 V 2 for graphs a and gg → V 1 V 2 + g for graphs b. The quark lines in each loop are then typically off-shell by order Q, which is the hard scale set by the final state. Note that this differs from the case when one interprets the same graphs as representing double hard scattering (figure 37) or the interference of double and single hard scattering ( figure 38). In that case the quark lines in the loops (except for those on the r.h.s. of graphs 38a and b) are understood to have typical virtualities of order q T , which allows one to treat them as incoming on-shell partons in the tree-level subprocesses qq → V , whose large scale is Q. Detailed inspection of the quark loops in figure 37 shows that they receive contributions with the same scaling behavior in q T /Q from the two regions where all quark virtualities are either of order q T or of order Q. 18 One thus obtains the same power behavior for the graphs in each of the two interpretations just discussed. The interpretation in terms of a single hard-scattering process producing two gauge bosons for graphs 37a, 38a and 39a and two gauge bosons plus a gluon for graphs 37b, 38b and 39b makes it clear that each group of graphs has the same scaling behavior, respectively given by (5.32) and (5.33).
In section 5.1.1 we found that ladder graphs as in figure 31 contribute to the scaled cross section with a power Λ 2 /q 4 T , with no distinction between the cases where q 1 + q 2 is of order Λ or q T . This means that the contributions of figures 37, 38 and 39 are enhanced over the ladder graphs by q 2 T /Λ 2 for |q 1 + q 2 | ∼ q T and by q 4 T /Λ 4 for |q 1 + q 2 | ∼ Λ. As we already discussed in section 2.5, one can however expect that at small x 1 and x 2 this enhancement is counteracted by the stronger rise of the ladder graphs with decreasing parton momentum fractions, since the ladder contributions involve two-parton distributions, whereas the graphs in figures 37, 38 and 39 depend on single-parton distributions. Whether this small-x enhancement is more important than powers of q 2 T /Λ 2 cannot be determined on generic grounds, so that one will want to keep both types of contribution in a flexible theoretical description.
With this in mind, we now turn our attention to the graphs in figures 40 and 41, which involve parton splitting and thus single-parton distributions for one proton but a twoparton distribution for the other. In the graphs of figure 40 the two-parton distributions in one proton force r to be of order Λ, whereas in figure 41 an additional gluon exchanged between partons with momentum fraction x 1 and x 2 allows r to be of order q T .
The corresponding power behavior is readily obtained from our results for the relevant parton distributions (given in figures 30, 33 and 35) and the available loop phase space in each graph. We find and the analogous scaling behavior with an extra power of α s for the graphs in figure 41.
For |q 1 + q 2 | ∼ q T , we thus find the same behavior as for the ladder graphs in figure 31, which involve however two two-parton distribution in the cross section and therefore have a stronger small-x enhancement. In the region |q 1 + q 2 | ∼ Λ we have an extra power of q 2 T /Λ 2 , as in the other parton splitting contributions discussed so far. a b Figure 42. Graphs with three partons in the t channel for one proton (a) or for both (b).
In figure 34 we have graphs initiated by proton matrix elements with three partons in the t channel. Examples for their contribution to the cross sections are given in figure 42. They behave as The contribution from graph 42a is thus suppressed compared with the one from graph 37a, although only by a power of Λ/q T , which corresponds to the loss of one power Λ/q T between the splitting graphs initiated by two or three partons in the t channel (cf. figures 33a and 34c). Likewise, graph 42b is suppressed by Λ 2 /q 2 T compared with graph 37a, thus having the same power behavior as graph 40a, but lacking the small-x enhancement of the latter. An analogous situation is found for graphs that are like in figure 42 but have an extra gluon across the final-state cut (constructed e.g. from graphs 33b or 34d) and thus contribute to the region |q 1 + q 2 | ∼ q T . To the extent that one power of Λ/q T is a small enough suppression parameter and that the small-x enhancement of four-parton matrix elements is important, one can hence neglect contributions involving three-parton matrix elements. Our results for the power behavior of the different contributions are collected in table 1.
The contributions in the first three rows of the table were recently investigated in [98]. It was pointed out in that work that the 2 × 4 contribution in our table has a further enhancement compared with the 4×4 term, which is due to the fact that the latter involves the product F (x i , r)F (x i , −r) of two distributions that decrease with r, whereas the former involves only one factor F (x i , r) multiplied by the perturbative splitting contribution that is approximately r independent for |r| ∼ Λ.
To compare our results with those in [98] we note that |q 1 + q 2 | ∼ δ ′ was required to be in the perturbative domain in that work, whereas we treat it as comparable to a soft scale. Our power counting results apply to this case as well as far as the q T behavior is concerned, if one understands Λ as either |q 1 + q 2 | or a generically soft scale, without the ability to distinguish between them. What is important for our results is the hierarchy Λ ≪ q T ≪ Q, which in the notation of [98] reads δ ′ ≪ δ ≪ Q. The fact that in [98] four-jet production rather than the double Drell-Yan process was studied does not prevent us from a partons example graphs power behavior (t channel) Table 1. Power behavior to the scaled cross section s 2 dσ/(dx 1 dx 2 dx 1 dx 2 d 2 q 1 d 2 q 2 ) from various contributions. An entry m × n in the first column means that the cross section involves matrix elements with m and n partons in the t channel for the first and the second proton, respectively.
comparison since, as we pointed out earlier, our power counting results hold independently of the particular hard-scattering processes. We agree with [98] that the 2 × 4 and the 4 × 4 contributions to the cross section respectively behave like 1/q 2 T and 1/q 4 T , and that the 2 × 2 contribution does not have a 1/q 2 T behavior but depends logarithmically on q T . However, the authors of [98] write that the 2 × 2 term is comparable to the 2 × 4 and 4 × 4 terms. We emphasize that the 2 × 2 contribution goes like 1/Λ 2 in the scaled cross section and is therefore power enhanced compared with the other two contributions for q T ≫ Λ. This comes out of our power counting analysis and is confirmed by explicit calculation, see (5.74) below. What can potentially make the 2 × 4 and 4 × 4 terms more important is their small-x behavior, as we noted above.

Splitting in two-parton distributions
After the general analysis in section 5.2.1 we now investigate splitting contributions to two-parton distributions in more detail. We begin with the graph in figure 33a, which describes the splitting process g → qq.
From the color structure of the graph we readily find that it gives rise to color octet distributions that are suppressed compared with the color singlet ones by a factor The color singlet distributions are given by where α and β are polarization indices of the gluon potentials in the correlation function Φ g , whose definition follows from (2.96). Φ g is already summed over the gluon color indices, and the corresponding trace over color matrices has given a factor 1 2 . As discussed in section 2.2.2, α and β are restricted to be transverse at leading-power accuracy. The second and third line of (5.37) represent the hard part of the process, where we can neglect the difference between the transverse and minus-components of k 1 and k 2 , see (5.29). We introduce and change integration variables from k − 1 and k − 2 to k − and κ − . The integration over κ − only concerns the gluon correlation function Φ g , which can be decomposed as [82] xp + dκ − Φ g,jj ′ (κ) where M denotes the proton mass. In terms of the operators introduced in (2.98) we have f g 1 is the usual transverse-momentum dependent density of gluons, whereas the gluon Boer-Mulders function h ⊥g 1 describes linearly polarized gluons and is essentially unknown at present (see [85][86][87] for processes where this distribution could be studied). Writing the product of propagator denominators in (5.37) as we see that the integrations over r − and k − can conveniently be performed using the theorem of residues, after a change of variables to (k − 1 2 r) − and (k + 1 2 r) − . Performing the fermion trace, we finally obtain With the abbreviationū = 1 − u the kernels read T l l ′ q,q (u) = −T l l ′ ∆q,∆q (u) = δ l l ′ (u 2 +ū 2 ) , T l l ′ δq,δq (u) jj ′ = −2δ l l ′ δ jj ′ uū (5.43) and U l l ′ mm ′ q,q (u) = −U l l ′ mm ′ ∆q,∆q (u) = −2τ l l ′ ,mm ′ uū , U l l ′ mm ′ δq,δq (u) jj ′ = 2τ l l ′ ,j ′ m ′ δ jm u + 2τ l l ′ ,jm ′ δ j ′ mū − 2τ l l ′ ,mm ′ δ jj ′ uū , (5.44) where j and j ′ are the indices of the Dirac matrices iσ +j γ 5 in the definition of the distributions. The kernels T and U not listed in (5.43) or (5.44) are zero. Note that k − 1 2 r is half the difference between the transverse parton momenta k 1 − 1 2 r and k 2 + 1 2 r on the left of figure 33a, whereas k + 1 2 r is half the corresponding momentum difference on the right. To ensure that the quark lines with momenta k 1 ± 1 2 r and k 2 ± 1 2 r in figure 33a are far off-shell it is sufficient that one of the transverse momenta r and k is large, as already mentioned earlier. This implies that (5.42) describes the large r behavior of F a 1 ,ā 2 (x i , k i , r) for small k 1 and k 2 , as well as its behavior for large k at small r.
We see that the short-distance splitting process gives rise to a rich spin structure, with all chiral even two-parton distributions being nonzero. The relations F q,q = −F ∆q,∆q and F ∆q,q = −F q,∆q reflect that the perturbative gluon splitting leads to a 100% correlation between the helicities of the quark and antiquark: if the quark has positive helicity the antiquark has negative one, and vice versa. For values of u around 1 2 , the transverse spin correlation encoded in F δq,δq is as large as the unpolarized distribution F q,q .
The splitting contributions to other two-parton distributions are obtained in close analogy to the case we have just discussed, and in the following we only give the relevant starting expressions and results. A reader not interested in the details may skip forward to the paragraph after equation (5.60).
The graph in figure 33a also contributes to the interference distributions I a 1 ,ā 2 , with the same ratio 8 I a 1 ,ā 2 1 I a 1 ,ā 2 = −1 √ N 2 − 1 of octet and singlet distributions as in (5.36). The expression for 1 I a 1 ,ā 2 can be obtained from the one in (5.37) by interchanging (k 2 − 1 2 r)γ and (k 1 + 1 2 r)γ in the fermion trace. The result has the same structure as in (5.42), with the kernels T a 1 ,ā 2 replaced by and the kernels U a 1 ,ā 2 replaced by All other kernels are zero. We see that the splitting contribution to the interference distributions I a 1 ,ā 2 is generically of the same size as for the distributions F a 1 ,ā 2 . We now turn to the analog of figure 33a for the splitting process q → gq. This graph (not shown here for brevity) involves propagators for the outgoing gluons and requires a choice of gauge. If we work in the light-cone gauge An = A + = 0 with n = (1, 0, 0, −1)/ √ 2, then the gluon propagator has a numerator D αβ (ℓ) = −g αβ + n α ℓ β + ℓ α n β ℓ + (5.47) and the q → gq splitting contribution to quark-gluon distributions reads Since j is a transverse index, the numerator factor of the first gluon propagator simplifies to −g αj + n α (k 1 − 1 2 r) j /(k 1 − 1 2 r) + . If we work in covariant gauge instead, these two terms correspond to the first two terms of the gluon field strength G +j = ∂ + A j − ∂ j A + + O(g) in the operator definition of the quark-gluon distribution. An analogous statement holds for the second gluon propagator. The expression (5.48) involves the quark correlation function Φ q for an unpolarized proton, for which one has to leading-twist accuracy, or equivalently The q → gq splitting process gives rise to all possible color couplings in the quark-gluon distribution in (2.123), with color factors Contrary to the case of g → qq analyzed above, the splitting mechanism now favors color octet distributions over color singlet ones. Evaluating (5.48) we obtain with T l l ′ g,q (u) = δ l l ′ (1 +ū 2 )/u , T l l ′ ∆g,∆q (u) = δ l l ′ (1 +ū) , T l l ′ ∆g,q (u) = −iǫ l l ′ (1 +ū 2 )/u , T l l ′ g,∆q (u) = −iǫ l l ′ (1 +ū) , T l l ′ δg,q (u) jj ′ = 2τ l l ′ ,jj ′ū /u (5.53) and U l l ′ m g,δq (u) k = 2δ l l ′ δ kmū /u , All other kernels are zero, in particular F δg,∆q is not generated by the splitting mechanism at leading order in α s . Analogous results can be derived for the splittingq → gq.
The splitting of one gluon into two gives a contribution to two-gluon distributions, which reads in the gauge A + = 0. Evaluating this expression, we obtain a result with the same structure as for g → qq in (5.42), 1 and The kernels T ∆g,g , T δg,g , U δg,g and U δg,∆g are respectively obtained from T g,∆g , T g,δg , U g,δg and U ∆g,δg by interchanging u ↔ū and the appropriate indices. The remaining kernels are zero. For the different color combinations we find where as in the case q → gq color octet distributions are enhanced over color singlet ones. The factors for the higher color representations in the case N = 3 are 10 F a 1 ,a 2 g→gg = 10 F a 1 ,a 2 g→gg = 0 , The 27 representation is hence even more strongly enhanced than the two color octet combinations. Decuplet and antidecuplet distributions are not generated by perturbative splitting at lowest order. We recall that this was also the case for the ladder graphs discussed in section 5.1.3. We see that the perturbative splitting mechanism gives rise to a multitude of twoparton distributions at high transverse momentum, which we have collected in table 2. As the comparison of (5.42), (5.52) and (5.56) shows, a common feature of all channels is the dependence on the transverse momenta k and r.
Position space. The Feynman graphs for the splitting contributions are naturally evaluated in momentum representation. We now transform our results to position space. We restrict our attention to the splitting g → qq since the other distributions can be treated in close analogy. Using the relation of the MacDonald functions we obtain For the factor appearing in F q,∆q and F ∆q,q one finds in a similar fashioñ For small k 2 y 2 we can approximate the MacDonald functions and perform the integral over t, which gives where γ is the Euler number. In the short-distance limit y 2 → 0 we thus have a logarithmic divergence in D(k, y) and hence in the distributions F q,q (x i , k i , y), F ∆q,∆q (x i , k i , y) and F δq,δq (x i , k i , y).

Contribution to the cross section
We now investigate how the splitting contribution to quark-antiquark distributions in figure 33a enters in the cross section for double hard scattering, as shown in figure 37a.
Concentrating on the factors that depend on transverse momenta, we have with the cross section formula (2.33) and the distributions from (5.42) Each integral is infrared finite but has a logarithmic divergence at large k ± . This logarithmic divergence also appears if we use the impact parameter space representation (5.62). According to (2.36) the cross section is then proportional to The last line diverges logarithmically for y = 1 2 z and y = − 1 2 z. At these points one respectively has y − 1 2 z 1 = − 1 2 z 2 and y + 1 2 z 1 = 1 2 z 2 , so that the singularities correspond to configurations where partons are at the same transverse position, either to the right or to the left of the final-state cut.
To understand the origin of this ultraviolet divergence, we go back to the graph in figure 37a. As mentioned in section 5.2.1 this graph receives leading contributions from two kinematic regions. In the first region, the virtualities and transverse momenta of the quarks are of order q T and thus much smaller than Q, whereas in the second region they are of order Q. The approximations that are necessary to derive factorization for double hard scattering are only valid in the first region. However, the integrand in (5.71) does not decrease fast enough with k ± = 1 2 (k 1 − k 2 ± r) to suppress the second region, so that the factorization formula (2.33) requires a suitable regularization in order to remove contributions from that region. A corresponding statement holds in the position space formulation.
A simple way to regularize the cross section formula in impact parameter space is to impose a lower cutoff 1/µ 2 on (y + 1 2 z) 2 and (y − 1 2 z) 2 . The integral in the last line of (5.72) then becomes times the same expression with l → l ′ and m → m ′ . The integral in (5.73) behaves like log µ/|q| for µ ≫ |q|. The µ dependence of the cross section obtained in this way must cancel when one adds the contribution from figure 37a in the region of transverse loop momenta of order Q. That region is naturally associated with single hard scattering as discussed in section 5.2.1. At this point, one must obviously be careful to avoid double counting between the parts of the graph that one associates with single or with double hard scattering. The analogous double counting problem in multijet production has been pointed out in [141].
To use a cutoff in (5.72) is of course rather ad hoc, and there should be better ways to construct a consistent factorization scheme in which the formula for double-parton scattering has a controlled ultraviolet behavior and in which the double counting problem is properly taken care of. One may for instance think of subtracting the perturbative splitting contribution of figure 33 at large momenta or small transverse distances in the definition of the two-parton distributions, so that graphs like in figure 37 are not included in double hard scattering at all. To solve this issue is a nontrivial task and must be left to future work.
We already remarked that the integrals in (5.71) are finite in the infrared. This is due to the numerator factors and can be understood in simple physical terms, as noted in the detailed analysis given in [142]. The points where one of the four momenta k + , k + − q, k − or k − − q vanishes correspond to configurations where one of the four g → qq splitting processes in figure 37a proceeds in strictly collinear kinematics. The amplitude for the collinear splitting g → qq is zero because an on-shell gluon has helicity ±1, whereas the helicities of q andq add up to zero due to chirality conservation for massless quarks.
Referring to the end of section 5.2.1 we finally determine the dependence of (5.70) on q T and on Λ. With |q 1 + q 2 | ∼ Λ the second line scales like 1/Λ 2 , and with the behavior of the third line just discussed we find where µ is an ultraviolet cutoff much larger than q T .

Parton splitting in collinear distributions
The results of the previous section are relevant not only for transverse-momentum dependent two-parton distributions but also for collinear ones. As we have seen, collinear two-parton distributions appear in transverse-momentum integrated cross sections and in cross sections at perturbatively large q T via the ladder graphs discussed in section 5.1.
Since k 1 and k 2 are not fixed in collinear distributions, the splitting contributions we computed in section 5.2.2 are relevant for F (x i , r) at large r and, after Fourier transform, for F (x i , y) at small y.

Ultraviolet behavior
Integrating (5.42) over k 1 and k 2 , i.e. over k and κ, one formally obtains where the integration over κ gives the collinear gluon distribution f g 1 (x 1 + x 2 ), whereas the term with h ⊥g 1 disappears due to rotation invariance. In the case where T l l ′ a 1 ,ā 2 ∝ δ l l ′ , i.e. for F q,q , F ∆q,∆q and F δq,δq , the integral over k is ultraviolet divergent. The corresponding integrals of F q,∆q and F ∆q,q are proportional to ǫ l l ′ r l r l ′ and hence vanish, as they must according to the constraint (4.13) from parity invariance. An analogous discussion can be given for the interference distributions I a 1 ,ā 2 and for the distributions resulting from the splitting processes q → gq or g → gg. In all cases, contributions going with the Boer-Mulders functions h ⊥q 1 or h ⊥g 1 vanish after integration over κ and one is left with contributions from the unpolarized distributions f q 1 or f g 1 . With the kernels T or V given in (5.43), (5.45), (5.53) and (5.57) we find that the splitting mechanism generates nonzero collinear two-parton distributions F q,q , F ∆q,∆q , F δq,δq , I q,q , I ∆q,∆q , I δq,δq , F g,q , F ∆g,∆q , F δg,q , F g,g , F ∆g,∆g , F g,δg , F δg,δg , (5.76) as well as the distributions obtained by interchanging the first and second subscripts in (5.76) or by replacing quarks with antiquarks in F g,q and its polarized counterparts. With the exception of F δg,∆q (and F ∆q,δg , F δg,∆q , F ∆q,δg ) these are indeed all collinear distributions that are allowed by parity invariance and that are chiral even. For the distributions depending on polarization indices we have F jj ′ δq,δq ∝ I jj ′ δq,δq ∝ δ jj ′ , F jj ′ ,kk ′ δg,δg ∝ τ jj ′ ,kk ′ , F jj ′ δg,q ∝ F jj ′ g,δg ∝ 2τ jj ′ ,l l ′ r l r l ′ = 2r j r j ′ − δ jj ′ r 2 . (5.77) To further investigate the ultraviolet divergence mentioned below (5.75) we focus on F q,q for definiteness. Since D(k, r) in (5.64) falls off as 1/k 2 for fixed r and as 1/r 2 for fixed k, one obtains logarithmic divergences if one integrates over one or both of these variables. To regulate these divergences one may work in 4 − 2ǫ dimensions. The result for F q,q (x i , k i , r) is then the same as in (5.42) with a modified kernel T l l ′ q,q (u; ǫ) = δ l l ′ u 2 + (1 − u) 2 − ǫ . which is zero due to rotation invariance. We note that integrating over k 1 , k 2 and r puts all four fields in the matrix element defining F q,q at the same transverse position, so that one obtains a twist-four operator. If (5.79) were not zero but finite after subtraction of the logarithmically divergent pieces, the graph in figure 33a would contribute to the scale evolution of a twist-four distribution. The vanishing of (5.79) thus reflects the fact that distributions of twist four and of twist two (the collinear gluon distribution in (5.75)) do not mix under evolution. The same zero result is obtained in any regularization scheme that respects rotational invariance.
Integrating over k at fixed nonzero r and using the integral representation in (5.64), one obtains µ 2ǫ d 2−2ǫ k D(k, r) = π 1−ǫ Γ 2 (1 − ǫ) Γ(1 − 2ǫ) Γ(ǫ) r 2 µ 2 −ǫ = π 1 ǫ + log µ 2 r 2 + const. + O(ǫ) . This contains an ultraviolet pole and an associated logarithm of the renormalization scale µ 2 . The value of the constant is not of relevance for our discussion. If r = 0 then the integral on the l.h.s. is scaleless and therefore vanishes. To isolate the ultraviolet singularity in that case, one can for instance give a small mass to the quarks. Up to corrections of order m 2 this leads to µ 2ǫ d 2−2ǫ k 1 k 2 + m 2 = π 1−ǫ Γ(ǫ) (5.81) The ultraviolet pole and the associated logarithm are hence the same as for nonzero r.
If one defines the collinear distribution F q,q (x i , r) = d 2 k 1 d 2 k 2 F q,q (x i , k i , r) in the MS scheme, the above 1/ǫ pole is subtracted, together with a constant. To leading order in α s one finds for the scale dependence 19 where P q,g (u) = α s 2π is the familiar DGLAP splitting function (now including a color factor T R = 1/2, unlike the function P qg we used in section 5.1.3). We come back to this in the next section. Let us note that with the results in (5.52), (5.53) and (5.56), (5.57) we obtain relations analogous to (5.82) for F g,q and F g,g . On the r.h.s. of these relations we respectively find the DGLAP splitting functions P g,q (u) and P g,g (u), except for terms proportional to δ(1 − u) in P g,g (u). Quite interestingly, the situation changes if we consider F q,q (x i , y) instead of F q,q (x i , r). The Fourier transform of (r 2 ) −ǫ in 2 − 2ǫ transverse dimensions is d 2−2ǫ r e −iry (r 2 ) −ǫ = 4 1−2ǫ π 1−ǫ Γ(1 − 2ǫ) Γ(ǫ) (y 2 ) −1+2ǫ , (5.84) which can be seen by writing (r 2 ) −ǫ = Γ −1 (ǫ) ∞ 0 dα α ǫ−1 e −αr 2 , performing the integral over r and then the one over α. The factor Γ(ǫ) responsible for the ultraviolet divergence in (5.80) is thus canceled if one Fourier transforms from r to y, and the result is finite for ǫ = 0, The 1/y 2 behavior can be obtained directly in 4 dimensions by setting z = 0 in (5.62). We thus find that F q,q (x i , r) requires an ultraviolet subtraction for the graph in figure 33a, whereas F q,q (x i , y) does not. Let us see what we obtain if we define a modified y 19 We note that both αs and the gluon distribution f g in (5.75) also have a scale dependence, which becomes relevant at order α 2 s in the evolution equation.
dependent distribution as the Fourier transform of the ultraviolet subtracted distribution F q,q (x i , r), F mod q,q (x i , y) = d 2 r (2π) 2 e −iry F q,q (x i , r) . (5.86) We have F q,q (x i , r) g→qq = f (x 1 , x 2 ) π log µ 2 r 2 + g(x 1 , x 2 ) , (5.87) where the explicit expressions of f and g are easily obtained but not relevant for our discussion. As shown in appendix A the Fourier transform of (5.87) gives where 1 y 2 with b 0 = 2e −γ . We thus find the same y dependence in (5.88) and (5.89), up to terms concentrated at the singular point y = 0. This is not surprising since the ultraviolet divergent term in (5.80) is independent of r. In section 5.1.1 we have shown that the contribution of ladder graphs at large y to the cross section involves an integral d 2 y F (u i , y)F (ū i , y) (5.91) according to (5.11) and (5.14). Since the collinear two-parton distributions behave like 1/y 2 at small y, the above integral has a linear divergence for small y 2 and is hence not defined as it stands. A corresponding linear divergence is found if one Fourier transforms from y to r, where according to (5.87) the distributions behave like log(r 2 /µ 2 ) for large r. The ultraviolet subtraction already included in the definition of F (u i , r) is hence not sufficient to render the integral in (5.92) finite. The reason for the unphysical divergences in (5.91) and (5.92) is that the cross section formulae containing these integrals have been derived for the region where 1/y 2 or r 2 is much smaller than q 2 T . We thus encounter a similar problem as in section 5.2.3, with the difference that the divergence to be regulated is now linear instead of logarithmic. To make the cross section formulae (5.11) and (5.14) well-defined, one must either remove or suppress the y integral in the region where |y| is not large compared with 1/q T , or one must define F (x i , y) such that in this region the contribution from perturbative splitting as in figure 33 is subtracted. Along with such a procedure, one must provide a prescription for evaluating the splitting contribution at small |y| in such a way that there is no double counting, as discussed in section 5.2.3.
Integrating the cross section (2.91) over q 1 and q 2 , we readily obtain the integral in (5.91) with u i = x i andū i =x i . The discussion of the previous paragraph carries over to that case, with the difference that the requirement for the validity of the cross section formula is then |y| ≫ 1/Q instead of |y| ≫ 1/q T . For the corresponding momentum integral (5.92) with u i = x i andū i =x i one must require |r| ≪ Q instead of |r| ≪ q T . We note that in [143] it was proposed to regulate this integral by imposing an upper cutoff r 2 < min(q 2 1 , q 2 2 ). By itself this is clearly insufficient to obtain a reliable result, since the contribution from r 2 outside that region is large and needs to be evaluated as well.

Scale evolution
Let us now investigate the scale evolution of collinear two-parton distributions. We focus on the color-singlet combinations 1 F , which are most closely related with single-parton densities as we already saw in section 3.5. For definiteness we consider the quark-antiquark distribution 1 F q,q , which we studied extensively in the previous section. The generalization to other parton and polarization combinations is straightforward.
The dependence on the scale µ of collinear parton distributions arises from the regularization and subtraction of ultraviolet divergences in their definition. This involves divergences from self-energy graphs (which also occur in transverse-momentum dependent distributions and can be expressed in terms of suitable Z factors) and divergences from regions of large transverse parton momenta. For a single-parton distribution the contribution from the high-transverse-momentum tail was already discussed in section 5.1.2.
The scale dependence in the collinear distributions 1 F (x i , y) arises from self-energy graphs and in addition from the ladder graphs in figure 30, which according to (5.12) and (5.13) give rise to ultraviolet divergent integrals d 2 k 1 d 2 k 2 1 F (x i , k i , y) unless one performs suitable subtractions. Since the ladder and self-energy graphs have exactly the same structure as for single-parton distributions, the corresponding evolution equation reads for a quark-antiquark distribution. The splitting functions P a,b now include the contributions from virtual corrections, unlike the corresponding kernels in section 5.1.3. Note that the labels b 1 and b 2 do not take the value δg here, because the corresponding kernels P q,δg and Pq ,δg vanish due to rotation invariance, see our remark below (5.18).
, (5.94) where the Wilson line W [ξ ′ , ξ] is defined in (3.113) and where { . . . } ren,µ indicates that each bilinear operatorq W q is renormalized at scale µ in the same way as for single-parton distributions. As long as the transverse distance y between the two bilinear operators remains finite, no further ultraviolet divergences appear, and one has the product of two renormalized twist-two operators. As remarked earlier in the literature, one may choose different renormalization scales µ 1 and µ 2 for the two operators, which appears useful when one has two hard-scattering processes with rather different hard scales. The separate evolution equations in µ 1 and µ 2 are then simply the usual ones with a single DGLAP kernel.
For the collinear distributions 1 F a 1 ,a 2 (x i , r) that depend on the relative momentum r the situation is different, as we have seen in the previous section. The splitting graph in figure 33a and higher-order corrections as in figure 33b give rise to additional ultraviolet divergences. Their subtraction leads to an inhomogeneous term in the evolution equation. At leading order in α s one has F q,b 2 (x 1 , u 2 , r) where the extra term follows from (5.82). At higher orders in α s , the inhomogeneous term will involve a convolution integral, as can be anticipated from the graph in figure 33b. The appearance of the extra term in the evolution equation can also be understood in the impact parameter representation by writing F q,q (x i , r) as a Fourier transform F q,q (x i , r; µ) = d 2 y e −iry F q,q (x i , y; µ) ren,µ .

(5.96)
Since F q,q (x i , y) has a 1/y 2 singularity at small y, the integral over this variable is logarithmically divergent and requires a subtraction in addition to those already made in F q,q (x i , y). We have indicated this extra subtraction by [ . . . ] ren,µ . For the distribution F q,q (x i ; µ) = def F q,q (x i , r = 0; µ) = d 2 y F q,q (x i , y; µ) ren,µ (5.97) the evolution equation (5.95) has long been known in the literature, see [144][145][146] and the recent detailed study [147]. We wish to comment in this context on an ansatz that is often made in phenomenological studies, in which the y dependent two-parton distributions are written as where f (y) is a smooth function normalized as d 2 y f (y) = 1. A typical choice for f (y) is e.g. a Gaussian or a sum of Gaussians. This type of ansatz is obviously inconsistent if F (x i , y; µ) is defined from the product (5.94) of twist-two operators, since the µ dependence of the l.h.s. is then given by the homogeneous evolution equation whereas the µ dependence on the r.h.s. is governed by the inhomogeneous evolution equation (5.95). If one instead defines the y dependent distribution as the Fourier transform of F (x i , r) as in (5.86) then the ansatz (5.98) is consistent regarding evolution since by construction F mod (x i , y; µ) evolves as in (5.95). We do however not think that this procedure is satisfactory. As we have seen in (5.89), F (x i , y) mod differs from F (x i , y) only by terms proportional to δ (2) (y), and such terms do not appear in the ansatz (5.98), which is smooth and finite at y = 0. In more physical terms, we recall that the inhomogeneous term in the evolution equation (5.95) has its origin in the 1/y 2 behavior of F (x i , y) at short distances, which is not described by (5.98).
We have seen in section 5.2.3 that this short-distance behavior prevents us from using either F (x i , y) or F mod (x i , y) in the double-scattering factorization formula as it stands. An ansatz like (5.98) with a smooth function f (y) does not have this problem and may be regarded as modeling a y distribution where the perturbative splitting contribution that gives rise to the 1/y 2 singularity has been removed. Since the ansatz is ad hoc, one cannot say which evolution equation should then be used on both sides of (5.98). Our discussion suggests that the homogeneous form (5.93) may be more appropriate, at least for values y of typical hadronic size, which are of course most important when the ansatz is used in the factorization formula. With this choice, one also retains consistency with respect to evolution if one makes the additional ansatz F (x i , µ) = f (x 1 , µ)f (x 2 , µ), as is often done. To find a systematic solution that treats both splitting and non-splitting contributions in a consistent manner remains a task for future work.
For the reasons discussed in section 3.5, the evolution of color octet distributions 8 F differs from the one of 1 F , and these differences have not yet been worked out in detail. However, the issues discussed in the present section affect 8 F in the same way as 1 F , given that both the ladder graphs in figure 30 and the splitting graph in figure 33a differ only by overall factors between the singlet and octet channels. They hence give rise to the same logarithmic divergences when the relevant transverse momenta are integrated over. One may therefore expect that, once a solution of the above problems for singlet distributions is found, it will be possible to adapt it to the octet sector.

Conclusions
We have investigated several aspects of multiparton interactions in QCD. Such interactions can contribute to hadron-hadron collisions whenever one has a final state with several groups of particles for which the vector sum of transverse momenta is small compared with the large scale Q that characterizes the process. As we have shown in sections 2.1.3 and 2.4, multiple interactions are then not power suppressed in 1/Q compared with the mechanism where these groups of particles are produced in a single hard scattering. Examples are the production of two lepton pairs originating from the decay of two vector bosons with low transverse momenta, or the production of two dijet pairs that are approximately backto-back. For small parton momentum fractions x, which are typical of collisions at the LHC, multiple hard scattering can even be enhanced because one expects multiparton distributions to rise faster with decreasing x than single-parton densities, as we argued in section 2.5.
Given the importance of transverse momenta in the final state, we have given a factorization formula for multiple hard scattering in terms of multiparton distributions that depend on the transverse momenta of the partons. Such a formula can be fully derived for lowest-order Feynman graphs and generalizes the more familiar description in terms of collinear (i.e. transverse-momentum integrated) multiparton distributions given in the literature [52,53,66]. A physically intuitive interpretation is obtained if one expresses the cross section in a mixed representation, in which the multiparton distributions depend on the average transverse momentum of the partons and on their average transverse distance from each other, where the "average" refers to the scattering amplitude of the process and its complex conjugate. These distributions have the structure of Wigner functions.
The simple picture just sketched is however complicated by the presence of correlation and interference effects, some of which have been pointed out earlier in the literature [89]. The spin and color of the partons described by a multiparton distribution can be correlated, and such correlations change the overall rate of multiple interactions. Twoquark distributions allow two color couplings, which we classified as color singlet and color octet, whereas for gluons a number of color couplings appear in addition to the color singlet one, see section 2.3. In section 4.1.2 we have shown that spin correlations can also affect the distribution of particles in the final state, using four-lepton production as an example. Further contributions to the cross section can come from interference effects in fermion number or in quark flavor (figures 6c and 7) and from the interference between single and multiple hard scattering (figure 9a). One can however expect that these interference effects will not benefit from the small-x enhancement of multiple interactions mentioned above (although in the case of interference between single and multiple scattering the situation is not entirely settled as explained in section 2.5). Regarding "rescattering contributions" of the type shown in figure 12a, we have shown that their evaluation in terms of two sequential scattering processes with on-shell external partons is inappropriate and that, when calculated properly, such contributions are suppressed by powers of 1/Q. How large the above correlations and interference effects are remains an important open question, both for the phenomenology of multiple interactions and from the point of view of hadron structure. The possibility to study moments of multiparton distributions on the lattice as explained in section 4.2, as well as the approximate relations with generalized parton distributions we derived in sections 2.1.5 and 4.3 provide two possible avenues to investigate these issues further.
A proper factorization formula in QCD requires much more than an analysis of the lowest-order Feynman graphs contributing to the process in question. In section 3 we have taken first steps towards a factorization proof for double hard scattering in terms of transverse-momentum dependent distributions. Our investigation only applies to processes where each hard scatter produces color-singlet particles, given the limitation of our current understanding for single hard-scattering processes [111]. For definiteness we have restricted our analysis to the double Drell-Yan process. We have shown how collinear and soft gluon exchange at order α s can be arranged into Wilson lines, which are basic building blocks in the construction of an all-order factorization formula. We have also seen that at this order soft-gluon effects cancel in factorization formulae that involve collinear two-parton distributions in the color singlet sector, whereas they do not cancel in the color octet sector. In section 3.2 we have listed the many issues that remain to be clarified and worked out for a full factorization proof. The most critical questions are probably whether one can show that the effect of soft gluons in the Glauber region cancels in the cross section and whether the double counting problem mentioned below can be solved in a satisfactory way.
Our calculation of soft-gluon effects at leading order in α s also allows us to investigate the structure of Sudakov logarithms in the double Drell-Yan process, extending the method of Collins, Soper and Sterman [118]. We find that the leading double logarithms are given by the product of the corresponding Sudakov factors for each single scattering process, whereas beyond this approximation soft gluon effects connect the two hard scatters in a nontrivial way. In the region where all transverse parton momenta are large and the transverse distance y between the two partons is small compared to a hadronic scale, we find that Sudakov effects favor the color singlet coupling in two-quark distributions. If this result could be generalized to large y, it would provide a valuable simplification.
In generic kinematics, the description of multiple interactions involves a multitude of terms, with many unknown distributions that describe correlation effects already in the case of double hard scattering (not to speak of the case with three or more scatters). The predictive power of the theory is increased in the region where the net transverse momentum q T for each final state produced by a hard scattering is large compared with the scale Λ of nonperturbative interactions (while still being small compared with the scale Q characterizing the hard-scattering processes). Apart from the possible simplification due to Sudakov effects just mentioned, the transverse-momentum dependent multiparton distributions can then be computed in terms of collinear distributions and a hard scattering at scale q T . The generation of high transverse momenta can proceed by ladder graphs as in figure 30, and we find that the color factors of these graphs favor the color singlet coupling in two-parton distributions.
A different mechanism is shown in figure 33, where one parton splits into two partons that subsequently take part in a hard scatter. By explicit calculation at order α s we find that this splitting mechanism generates a multitude of spin correlations between the two emerging partons. For qq distributions the color singlet coupling is preferred, whereas for qg and gg distributions the opposite is the case. Contributions from ladder graphs and from parton splitting graphs compete with each other in the double scattering cross section. An overview is given in table 1, where we see that compared with splitting graphs the contribution of ladder graphs is suppressed by powers of Λ/q T . On the other hand, the splitting graphs lack the small-x enhancement discussed earlier, so that one cannot decide on generic grounds which mechanism is more important in given kinematics. Finally, we find that splitting contributions require a modification of the formalism outlined so far, because they increase so strongly for decreasing interparton distance y that one obtains divergent integrals when inserting them into the factorization formulae. This is closely related with the problem that graphs like in figure 37a can either be interpreted as representing double hard scattering with parton splitting in each two-parton distribution, or as representing a single hard-scattering process at two-loop level. A consistent factorization scheme must ensure that there is no double counting of this graph in different kinematic regions. A satisfactory solution of these problems remains to be found, and as we argued in section 5.3.2 such a solution will also have consequences on the evolution equation for collinear multiparton distributions.
In summary, we find that a systematic description of multiparton interactions in QCD involves a considerable degree of complexity, but that there are several elements that hint at possible simplifications. More work is required to work out these simplifications and to put the theory on firmer ground.