Double parton distributions in the pion from lattice QCD

We perform a lattice study of double parton distributions in the pion, using the relationship between their Mellin moments and pion matrix elements of two local currents. A good statistical signal is obtained for almost all relevant Wick contractions. We investigate correlations in the spatial distribution of two partons in the pion, as well as correlations involving the parton polarisation. The patterns we observe depend significantly on the quark mass. We investigate the assumption that double parton distributions approximately factorise into a convolution of single parton distributions.


Introduction
Matrix elements of currents in a hadron offer a variety of ways to quantify and study hadron structure. In particular, information about correlations inside the hadron can be obtained from the matrix elements of two currents that are separated by a space-like distance. Such matrix elements can be calculated in lattice QCD, and there has been considerable activity in this area over the years [1][2][3][4][5][6][7][8][9][10][11]. These studies address a broad range of physics questions, such as confinement [1,2], the size of hadrons [3][4][5]8], density correlations [6], comparison with quark models [7], or the non-spherical shape of hadrons with spin 1 or larger [9][10][11].
We continued this line of investigation in a recent paper [12]. We performed a lattice computation of the matrix elements of two scalar, pseudoscalar, vector, or axial vector currents in the pion and compared our results with predictions of chiral perturbation theory. For the first time, we computed all Wick contractions that contribute to these matrix elements, whilst earlier work had focused on the case in which the two currents are inserted on different quark lines between the hadron source and sink operators (see graph C 1 in figure 4). We obtained signals with a good statistical accuracy for almost all contractions and were thus able to study their relative importance. Our results were compared with different models in [13,14].
Extending our work in [12], we will in the present paper use two-current matrix elements from the lattice to obtain information about double parton distributions (DPDs). DPDs describe the correlated distribution of two partons inside a hadron and appear in the cross sections for double parton scattering, which occurs when there are two separate hardscattering processes in a single hadron-hadron collision. The study of this mechanism has a long history in collider physics, from early theoretical papers such as [15][16][17][18][19][20][21] to the detailed investigation of QCD dynamics and factorisation that started about ten years ago [22][23][24][25][26][27][28][29][30][31][32][33]. After early experimental studies [34,35], a multitude of double parton scattering processes has been measured at the Tevatron and the LHC, see [36][37][38][39][40] and references therein. Some final states produced by double parton scattering are of particular interest because they are a background to search channels for new physics. A prominent example are like-sign gauge boson pairs W + W + and W − W − [40][41][42][43][44][45], the decay of which can yield like-sign lepton pairs. A wealth of further information about double parton scattering can be found in the monograph [46]. Double parton distributions remain poorly known, and their extraction from experimental data is considerably more difficult than the extraction of single parton distributions (PDFs). It is therefore important to have as much theoretical guidance as possible about the properties and behaviour of DPDs. Apart from approaches that focus on fulfilling theoretical constraints [47][48][49][50], there exists a large number of model calculations for the DPDs of the nucleon [51][52][53][54][55][56][57][58][59] and a smaller number for those of the pion [60][61][62].
A relation between the Mellin moments of DPDs and two-current matrix elements that can be computed on the lattice was written down in [23,27]. This generalises the relation between matrix elements of one current and the Mellin moments of PDFs, which has been extensively exploited in lattice studies, as reviewed for instance in [63][64][65]. Whilst knowledge of a few Mellin moments is insufficient for reconstructing the full DPDs, it allows one to investigate crucial features of these functions, such as their dependence on the distance between the two partons and on the parton polarisation. In the present paper, we pursue this idea for the DPDs of the pion, focusing on their lowest Mellin moments. We use the same lattice data as in our study [12]. Corresponding work on the DPDs of the nucleon is in progress, and preliminary results have been presented in [66].
This paper is organised as follows. In section 2, we recapitulate some basics about DPDs and then elaborate on the relation between their Mellin moments and the two-current matrix elements we compute on the lattice. This will in particular lead us to introduce the concept of skewed DPDs. In section 3, we describe the main elements of our lattice simulations (a full account is given in [12]) and investigate several lattice artefacts that are present in our data. Our results for zero pion momentum are presented and discussed in section 4. In section 5, we develop a parametrisation of the data for both zero and nonzero pion momenta, which will allow us to reconstruct the Mellin moments of pion DPDs, albeit in a model-dependent fashion. Our main findings are summarised in section 6.

Double parton distributions
To begin with, we recall some basics about double parton distributions. An extended introduction to the subject can be found in [67].
Factorisation for a double parton scattering process means that its cross section is given in terms of hard-scattering cross sections at parton level and double parton distributions for each of the colliding hadrons. For pair production of colourless particles, such as Z, W or Higgs bosons, this factorisation can be proven rigorously. A DPD gives the joint probability for finding in a hadron two partons with longitudinal momentum fractions x 1 and x 2 at a transverse distance y from each other. The distributions for quarks and antiquarks are defined by operator matrix elements as We use light-cone coordinates v ± = (v 0 ± v 3 )/ √ 2 and boldface letters for the transverse part v = (v 1 , v 2 ) for any four-vector v µ . The definition (2.1) refers to a reference frame in which the transverse hadron momentum is zero, p = 0. In a frame where the hadron moves fast in the positive z direction, x 1 and x 2 can be interpreted as longitudinal momentum fractions. The hadron state is denoted by h(p), and it is understood that an average over its polarisation is taken on the r.h.s. of (2.1) if the hadron has nonzero spin. Unless specified otherwise, the expressions of the present section hold both for a pion and for the nucleon (and in fact for any unpolarised hadron or nucleus).
• f ∆q 1 ∆q 2 is the density for finding two quarks with their longitudinal polarisations aligned minus the density for finding them with their longitudinal polarisations anti-aligned.
• f δq 1 q 2 describes a correlation between the transverse polarisation of the quark q 1 and the distance y of that quark from the unpolarised quark q 2 . In f q 1 δq 2 , the first quark is unpolarised and the second quark has transverse polarisation.
• f t δq 1 δq 2 describes a correlation between the transverse polarisations of the two quarks and their transverse distance y.
Decompositions of the same form as (2.4) can be given for the cases where one replaces one or both of the quarks by an antiquark, with the same physical interpretation as given above for two quarks. Note that the polarisation dependence of DPDs is not only interesting from the point of view of hadron structure, but can have measurable implications on double parton scattering, as was for instance shown in [27,44,45,68]. Lattice calculations can give information about the strength of the different spin correlations we just discussed. We note that cross sections for double parton scattering involve the product of two DPDs integrated over the interparton distance, The dependence of DPDs on y can hence not be directly inferred from experimental observables. If y is small, one can use perturbation theory to compute F a 1 a 2 in terms of PDFs and splitting functions [27,69]. By contrast, for large distances the y dependence is fully nonperturbative. Lattice studies can give information about this dependence, whose knowledge is crucial for computing double parton scattering cross sections. Both unpolarised and polarised DPDs can exhibit correlations in their dependence on x 1 , x 2 and y. We cannot address this aspect in our present study, because the matrix elements we compute are related to the lowest Mellin moments of DPDs, i.e. their integrals over both x 1 and x 2 . In principle, one could investigate higher Mellin moments, i.e. integrals weighted with powers of x 1 and x 2 . This would require extending the set of currents in (2.9) to currents that involve covariant derivatives and is beyond the scope of the present work.
Phenomenological analyses often make the assumption that in unpolarised DPDs the two partons are independent of each other. This gives the relation F a 1 a 2 (x 1 , x 2 , y) where f a (x, b) is an unpolarised impact parameter dependent single parton distribution. The question mark above the equal sign in (2.6) indicates that this is a hypothesis. Our lattice study allows us to test this indirectly in two different ways, as discussed in sections 2.4, 4.4 and 5.4. A related but different simplifying assumption is that unpolarised DPDs can be written as where f a (x) denotes a standard PDF and G(y) is a factor describing the dependence on the transverse parton distance. This assumption leads to the so-called "pocket formula", which expresses double parton scattering cross sections in terms of the cross sections for each single scattering and a universal factor σ −1 eff = d 2 y [G(y)] 2 . Whilst our study cannot address the factorisation between the x 1 , x 2 and y dependence assumed in (2.7), we can investigate the assumption that the y dependence is the same for all parton combinations (a 1 , a 2 ) in a given hadron. We will do this in section 4.5.

Matrix elements of local currents
The matrix element (2.1) involves fields at light-like distances and is hence not suitable for direct evaluation on a Euclidean lattice. What we can study in Euclidean space-time are the matrix elements where as in (2.1) a polarisation average is understood if the hadron h carries spin. The local currents J µ··· q,i we will consider here are For spacelike distances y, which we assume throughout this work, the two currents in (2.8) commute, so that one has Together with the fact that the currents in (2.9) are Hermitian, it follows that the matrix elements (2.8) are real valued. The currents transform under charge conjugation (C) and under the combination of parity and time reversal (P T ) as with sign factors and The combination of a parity and time reversal transformation gives and thus relates the matrix elements for y and −y.
Symmetry relations for pion matrix elements. For pion matrix elements, one has additional relations due to charge conjugation and isospin invariance. For η i 1 C η i 2 C = 1, which is the case for all current combinations considered in our work, one has where we indicated for which hadron the matrix element is taken but for brevity omitted the Lorentz indices and the labels i 1 , i 2 specifying the currents. Still for η i 1 C η i 2 C = 1, one finds 16) for c = +, −, 0, as well as A derivation of these relations can be found in [12, section 2.1].
Tensor decomposition and extraction of twist-two functions. The matrix elements in (2.8) are related to the lowest Mellin moments of DPDs as with the Mellin moments given by Here i 1 and i 2 refer to the currents in the matrix elements on the l.h.s. of (2.18). An analogous relation holds between I t and the lowest moment of f t . The relations (2.18) extend the wellknown connection between the Mellin moments of PDFs and the matrix elements of a single local current to the case of two partons.
In analogy to the case of PDFs, the matrix element (2.1) defining a DPD has support for both positive and negative x 1 and x 2 , with positive x i corresponding to a parton a i and negative x i to its antipartonā i . On the r.h.s. of (2.19), we have limited the integration region to positive momentum fractions. Note that if a 1 and a 2 are quarks and if i = V or T (but not A), then the quark-antiquark distributions on the r.h.s. enter with a minus sign. This is of special importance for distributions in a pion, whose valence Fock state consists of a quark and an antiquark. Relations analogous to (2.18) exist for higher Mellin moments in x 1 and x 2 and involve local currents with covariant derivatives [27], as is the case for PDFs.
Contrary to Γ j δq in (2.3), the tensor current J µν q,T in (2.9) is defined without γ 5 . As a consequence, the vector indices k 1 and k 2 in (2.18) do not give the transverse quark spin direction but the transverse quark spin direction rotated by +90 • in the x − y plane. This follows from the relation iσ j+ γ 5 = jk σ k+ .
The relations (2.18) still refer to Minkowski space, because they involve plus-components. To make contact with matrix elements evaluated in Euclidean space, we decompose the matrix elements (2.8) in terms of basis tensors and of Lorentz invariant functions A, B, C, D, E that depend on y 2 = y µ y µ and py = p µ y µ . We write For the operator combination T V , we subtract trace terms according to where it is understood that u µνρ is antisymmetric in µ and ν. The decomposition for M q 1 q 2 ,AA has the same form as the one for M q 1 q 2 ,V V , involving the same basis tensors but different invariant functions A ∆q 1 ∆q 2 , . . . , D ∆q 1 ∆q 2 . The decomposition for M q 1 q 2 ,V T is like the one for M q 1 q 2 ,T V with an appropriate change in the role of the Lorentz indices. In the following, we will not discuss the combination V T any further, because it can be traded for T V using the relation (2.10). The basis tensors are chosen as T T,C = −(g µρ p ν y σ − g µσ p ν y ρ + g µρ y ν p σ − g µσ y ν p ρ ) + 1 2 (g µρ g νσ − g µσ g νρ )py − {µ ↔ ν} , u µνρσ T T,D = −2 g µρ y ν y σ − g µσ y ν y ρ ) + 1 2 (g µρ g νσ − g µσ g νρ )y 2 − {µ ↔ ν} , u µνρσ T T,E = g µρ g νσ − g µσ g νρ . (2.22) The tensor components related to twist-two matrix elements can be identified from the l.h.s. of (2.18), taking into account that y + = 0 and p = 0 in that equation. For the basis tensors, a nonzero plus-component requires the vector p on the r.h.s. of (2.22), whilst a nonzero transverse component requires the vector y or the metric tensor. One thus finds that the invariant functions corresponding to operators of twist two are A q 1 q 2 , A ∆q 1 ∆q 2 , A δq 1 q 2 , A δq 1 δq 2 and B δq 1 δq 2 . We will call them "twist-two functions" in the remainder of this work. All of them are even functions of py due to the symmetry relation (2.14).
One can project out the invariant functions by multiplying the matrix elements with suitable linear combinations of basis tensors. For the twist-two functions, the relevant projections read with a normalisation factor N = p 2 y 2 − (py) 2 . (2.24) For spacelike y µ , which we are interested in, one has N < 0, so that the projections are always well defined. Using (2.18) and (2.20), one can derive the relation between Mellin moments of DPDs and integrals of twist-two functions over py: where in the first line we have all combinations of (a 1 , a 2 ) that appear on the r.h.s. of (2.18). The matrix elements (2.8) can be evaluated in Euclidean space-time at y 0 , i.e. with the two current operators taken at equal Euclidean time. This entails the important restriction where v = (v 1 , v 2 , v 3 ) denotes the spatial components of a four-vector v µ . Since the range of accessible hadron momenta p in a lattice calculation is finite, the range of the variable py is limited, and one cannot directly evaluate the integrals in (2.25). In addition, one needs data for nonzero hadron momentum p to access even a finite range in py. We note that the restriction (2.26) also applies if one computes the Mellin moments of transverse-momentum dependent single parton distributions (TMDs) on the lattice [70][71][72]. In that case, y µ is the distance between the quark and the antiquark field in the matrix elements that define the distributions. The same holds for lattice studies of single parton distributions in x space. There has been an enormous amount of activity in this area in recent years; we can only cite a few papers here [73][74][75][76][77][78][79][80][81] and refer to the recent reviews [82,83] for an extended bibliography.

Skewed double parton distributions
Together with the restriction (2.26), the necessity to perform an integral over all py in (2.25) presents a significant complication for relating matrix elements calculated on a Euclidean lattice with the Mellin moments of DPDs. This prompts us to extend the theoretical framework in such a way that we can discuss the physical meaning of the twist-two functions A a 1 a 2 and B δq 1 δq 2 at a given value of py.
To this end, we introduce skewed double parton distributions 1 Compared with the definition (2.1) of ordinary DPDs, we have an additional exponential e −iζp + y − here. As a consequence, the partons created or annihilated by the fieldsq and q in O a 1 and O a 2 have different longitudinal momentum fractions. A sketch is given in figure 1 for (a 1 , a 2 ) = (u, d) and the case where x 1 − 1 2 ζ, x 1 + 1 2 ζ, x 2 − 1 2 ζ and x 2 + 1 2 ζ are all positive. If x 1 − 1 2 ζ becomes negative, the u quark in the wave function of |h becomes an antiquarkū with momentum fraction −x 1 + 1 2 ζ in the wave function of h|. Corresponding statements hold for x 1 + 1 2 ζ, x 2 − 1 2 ζ and x 2 + 1 2 ζ. For nonzero ζ, the distributions (2.27) do not appear in cross sections for double parton scattering, but they may be regarded as a rather straightforward extension of the DPD concept. Let us take a closer look at some of their properties. The support region of the matrix element (2.27) in the momentum fraction arguments is the same as if all four parton fields were at the same transverse position. In that case, we would have a collinear twistfour distribution. The support properties of these distributions were derived in [84], and the argument given there does not depend on the transverse position arguments of the parton fields. The result given in [84] is equivalent to the interpretation of x 1 − 1 2 ζ, x 1 + 1 2 ζ, x 2 − 1 2 ζ and x 2 + 1 2 ζ as positive or negative momentum fractions, as described in the previous paragraph. 1 The term "skewed" refers to the parton momenta here, whilst the hadron momentum is the same in the bra and ket vector of (2.27). This is different from "skewed parton distributions", now commonly called "generalised parton distributions", which involve two instead of four parton fields, such that there is an asymmetry both in the parton and in the hadron momenta. For nonzero ζ there are hence different regions, in which one has either 1, 2 or 3 partons in the wave function of |h . With the constraints that the partons in the wave function of |h must carry the same total longitudinal momentum as those in the wave function of h|, and that this cannot be larger than the longitudinal hadron momentum, one obtains the constraint and the support region for (x 1 , x 2 ) shown in figure 2. For ζ = 0 this region becomes a square with corners (0, ±1) and (±1, 0), whereas for ζ = ±1 it becomes a square with corners (± 1 2 , ± 1 2 ). Using P T symmetry, one finds that where η i PT = +1 for an unpolarised parton and η i PT = −1 for a polarised one. The skewed DPDs can be decomposed in terms of scalar distributions as in (2.4), with the distributions on both sides depending additionally on ζ. The symmetry property (2.29) then implies f a 1 a 2 (x 1 , x 2 , ζ, y 2 ) = f a 1 a 2 (x 1 , x 2 , −ζ, y 2 ) (2. 30) and an analogous relation for f t .
Mellin moments. We define the lowest Mellin moments of skewed DPDs as and likewise for f t , where the integration region in x 1 , x 2 follows from figure 2. The moments are nonzero for ζ in the interval [−1, 1]. The generalisation of (2.25) to nonzero ζ reads x 1 x 1 Figure 2. Support region of the distribution F ud (x 1 , x 2 , ζ, y) in the momentum fraction arguments. The notation d|duū means that one has one d quark in the wave function of |h and duū in the wave function of h|. In both panels, the triangle for the region ud|du has the corners 1 2 |ζ|, 1 2 |ζ| , 1 2 |ζ|, 1 − 1 2 |ζ| and 1 − 1 2 |ζ|, 1 2 |ζ| . Notice that the parton configuration in each of the four triangles is the same for positive and negative ζ, whereas the configuration in each of the squares is different. which can readily be inverted for the function A a 1 a 2 (y 2 , py). In particular, one finds A a 1 a 2 (y 2 , py = 0) = 1 π 1 0 dζ I a 1 a 2 (y 2 , ζ) , where we have used the symmetry relation (2.30) to reduce the integration region to positive ζ. Rather than the Mellin moment of a DPD, a twist-two function at py = 0 is thus the average of the Mellin moment of a skewed DPD over the skewness parameter ζ.
Quantities that characterise the ζ dependence of I a 1 a 2 (y 2 , ζ) are the even moments in ζ, Odd moments ζ 2m+1 are zero because of the symmetry (2.30). To compute the moments ζ 2m , one needs A a 1 a 2 (y 2 , py) in the vicinity of py = 0. According to (2.26), this can be evaluated from Euclidean data with nonzero hadron momentum p. Relations analogous to (2.32) to (2.34) can be written down for I t δq 1 δq 2 and B δq 1 δq 2 in the place of I a 1 a 2 and A a 1 a 2 .

Factorisation hypotheses
We now discuss how the factorisation hypothesis (2.6) for DPDs can be formulated at the level of Mellin moments and twist-two functions. At this point, we specialise to the case where the hadron h is a π + . This avoids complications due to the proton spin, which are discussed in [27, section 4.3.1].
Let us take the lowest Mellin moment in x 1 and x 2 of (2.6). The Mellin moment of an unpolarised impact parameter dependent parton distribution is where F q,V (t) is the form factor of the vector current We then obtain from (2.6) We note that thanks to isospin invariance, one has F u,V = −F d,V . As this is not essential in the present context, we will not use it here. Since one cannot directly determine I ud (−y 2 ) from Euclidean correlation functions, one cannot directly test (2.37) with lattice data. We therefore derive an analogous relation for the twist-two function A ud (y 2 , py) at py = 0.
We recall from [27] that (2.6) can be obtained by inserting a complete set of intermediate states between the operators O a 1 (y, z 1 ) and O a 2 (0, z 2 ) in the DPD definition (2.1) and then assuming that the dominant term in this sum is the ground state. Following exactly the same steps for the skewed DPD (2.27), one obtains Here H q (x, ξ, t) is the generalised parton distribution (GPD) for unpolarised quarks in a pion; its definition can be found e.g. in [85, section 3.2]. The momentum fraction arguments x and ξ of H q are defined in a symmetric way between the incoming and outgoing hadron and parton momenta, with x referring to the sum of parton momenta and ξ to their difference, and with momentum fractions normalised to the sum of hadron momenta in the bra and the ket state. Both x and ξ are limited to the interval [−1, 1]. A pictorial representation of the GPDs that appear on the r.h.s. of (2.38) is given in figure 3(a). At this point, we must critically examine the support properties of the two sides of (2.38) in x 1 and x 2 . The support of the l.h.s. is shown in figure 2, whereas the one of the r.h.s. is the square delineated by −1 For ζ ≥ 0, this misses the kinematic constraint |x 1 | + |x 2 | ≤ 1 in F ud , whereas for ζ < 0 it is even larger.
In the matrix element (2.27), the order of the two operators can be interchanged, because the respective fields are separate by spacelike distances. In a schematic notation, we thus If we insert a set of intermediate states in the latter matrix element, we obtain (2.38) with ζ replaced by −ζ on the r.h.s. This is represented in figure 3(b). In that case, the mismatch between the support regions of the two sides is less bad for ζ ≤ 0 than for ζ > 0. We therefore retain (2.38) for ζ ≥ 0 and its analogue with ζ → −ζ on the r.h.s. for ζ < 0. This also satisfies the symmetry in ζ required by PT invariance and stated in (2.29), which is violated if one uses (2.38) for positive and negative ζ.
The mismatch of support properties just discussed also affects the case ζ = 0 and is hence not special to the skewed kinematics we are considering here. In fact, it is well known that the factorisation hypothesis (2.6) for DPDs violates the momentum conservation constraint x 1 +x 2 ≤ 1. From a theoretical point of view, inserting a full set of intermediate states between the operators in the DPD definition (2.1) or its skewed analogue (2.27) is of course a legitimate manipulation, but we see that the restriction of this set to the ground state leads to theoretical inconsistencies such as an incorrect support region or the loss of a symmetry required by PT invariance. How the sum over all states manages to restore the correct properties is difficult to understand in an intuitive manner. We note that a similar observation was made in [84] when discussing the support properties of PDFs and of higher-twist distributions.
Integrating both sides of (2.38) over their respective support regions in x 1 and x 2 and using the sum rule dx H q (x, ξ, t) = F q,V (t), one obtains (2.40) Using this for ζ ≥ 0 and inserting it into (2.33), we obtain A ud (−y 2 , py = 0) We note that (2.41) is expressed in terms of a two-dimensional vector y. This is different from the factorisation hypothesis we derived in [12, section 5.3], which involved the zerocomponents of currents and a three-dimensional vector y . Note that the two hypotheses (2.41) and (2.37) are based on the same assumption but are not equivalent to each other. Both are special cases of (2.40), obtained by either setting ζ = 0 or by integrating over ζ from 0 to 1. The assumption that the ground state dominates the sum over intermediate states could be a better approximation in one or the other case. Using our lattice results, we will investigate (2.41) in section 4.4 and (2.37) in section 5.4.

Lattice computation and lattice artefacts
We performed lattice simulations for the matrix elements (2.8) in a pion with the currents given in (2.9). We set y 0 = 0, so that on the lattice the two currents are inserted at the same Euclidean time, but with a spatial separation y . We generated data both for zero and nonzero pion momentum p. The lattice techniques we employed are explained in detail in [12, section 3]. In the following, we recall only the basic steps described in that work and then proceed to the specifics of our present analysis.

Lattice graphs
To evaluate the two-current matrix elements (2.8), we compute the four-point correlation function of a pion source operator at Euclidean time 0, a pion sink operator at Euclidean time t, and the two currents J i and J j at Euclidean time τ . The correlation function receives contributions from a large number of Wick contractions, which are shown in figure 4. We will also refer to these contractions as "lattice graphs" or simply as "graphs".
The relation between pion matrix elements and lattice graphs depends on the product of C parities of the currents. Omitting Lorentz indices and the dependence on the pion momentum p, and using the shorthand notation for the graphs or their symmetrised combinations, we have for η i C η j C = +1, which is satisfied for all combinations of currents considered in the present study. We note that this is no longer the case if one includes operators with covariant derivatives (corresponding to higher Mellin moments). One readily checks that (3.2) satisfies the general symmetry relation (2.17), as it must.
To compute the different graphs on the lattice, we use a variety of techniques as detailed in [12, section 3.3]. We make extensive use of stochastic sources, and for graph C 2 we use a hopping parameter expansion to reduce statistical noise for the propagation between the two currents.
For the disconnected graphs S 2 and D we need to subtract vacuum contributions, namely the product of a two-point correlation function of the pion source and sink with a two-point correlation function of the two currents. The latter corresponds to the vacuum expectation value 0| J i (y)J j (0) |0 . The vacuum subtraction for the disconnected graph S 1 involves 0| J i (y) |0 or 0| J j (0) |0 , which is zero because our currents carry Lorentz indices.
We anticipate that the doubly disconnected graph D in general gives a good signal for the four-point correlation function, but that there is a near-perfect cancellation between this correlator and its vacuum subtraction term. The result after subtraction is consistent with zero and has huge statistical uncertainties compared with those of any other graph. We will hence not be able to report useful results for graph D. Fortunately, we encounter no such problem for graph S 2 .

Lattice simulation and extraction of twist-two functions
We perform our simulations using the Wilson gauge action and n F = 2 mass degenerate flavours of non-perturbatively improved Sheikholeslami-Wohlert (NPI Wilson-clover) fermions. The gauge configurations were generated by the RQCD and QCDSF collaborations. We use two gauge ensembles, whose parameters are given in table 1. They have different spatial sizes, L = 32 and L = 40, which allows us to study finite volume effects in section 3.5. Despite having data for only a single lattice spacing, a = 0.071 fm, we are also able to investigate discretisation effects, as discussed in section 3.3.
For the ensemble with L = 40, we performed simulations with different κ values in the Here "light quarks" refers to the κ value used for simulating the sea quarks, whereas the other two values correspond to the physical strange and charm quark masses, as determined in [88] and [89] by tuning the pseudoscalar ground state mass to 685.8 MeV in the first case and the spin-averaged S-wave charmonium mass to 3068.5 MeV in the second case. Since our simulations are performed with an n F = 2 fermion action, the strange and charm quarks are partially quenched. The values of m π in (3.3) are obtained from exponential fits of the pion two-point function. We quote them only for orientation and do not attempt to quantify their errors. These masses are in reasonable agreement with the value in table 1 for light quarks, and with the mass of the pseudoscalar ground state quoted below (3.3) for strange quarks.
Pion matrix elements. For all lattice graphs, we compute the correlation functions with zero three-momentum p of the pion. For the connected graphs C 1 and C 2 , we additionally have data with finite pion momenta. These data are restricted to the L = 40 lattice and to light quarks. The pion momenta that can be realised on the lattice are given by where the components of P are integers and 2π/(La) ≈ 437 MeV in our case. For simplicity we write P = | P |. Graph C 1 is computed for all 18 nonzero momenta with P 2 ≤ 2, for 6 momenta with P 2 = 3, and for one momentum with P 2 = 4. For C 2 , we have results for all 6 momenta with P = 1.
The distance between the pion source and sink in the correlation functions is fixed to t = 15a ≈ 1.07 fm as a default. To investigate the influence of excited states, we also calculate graphs C 1 , C 2 and S 1 with t = 32a. The matrix element (2.8) is extracted from the ratio between the four-point correlation function around τ = t/2 and the pion two-point function. For graphs C 1 and A, we measure the τ dependence of the four-point function and fit to a plateau in the τ ranges specified in [12, equation (4.1)]. The quality of the corresponding plateaus is good for matrix elements that have a nonzero value within statistical uncertainties. For the remaining graphs, we extract the matrix element from data at τ /a = 7 and 8 if t/a = 15. For the C 2 and S 1 data with t/a = 32, we use τ /a = 16. A comparison of data with t = 15a and t = 32a is shown in section 3.4.
All lattice currents are converted to the MS scheme at the renormalisation scale As described in [12, section 3.4], this is done using a combination of non-perturbative and perturbative renormalisation and includes an estimate of the quark mass dependent order a improvement term.
Invariant functions. From the matrix elements (2.8), we determine the invariant functions for each individual value of y and p. This is done using a minimum χ 2 fit of the data for all tensor components to the decomposition (2.20). For invariant functions of twist two, we also use the projector method (2.23). In both cases, the statistical error of an invariant function at given y and p is computed using the jackknife method. To eliminate autocorrelations, we take the number of jackknife samples as 1/8 times the number N used of gauge configurations given in table 1.
For P = 0, the twist-two functions extracted with one or the other method show excellent agreement with each other and have statistical uncertainties of almost the same size. For P > 0, the values obtained with the projection method have much larger statistical errors than those obtained with a fit and provide only a very weak cross check. All data shown in the following are obtained by the fit method, both for P = 0 and P > 0.
In the remainder of this section, we investigate the extent to which our data are affected by lattice artefacts, largely following the corresponding studies in [12, section 4]. We only consider data with py = 0 here, because they have much smaller statistical errors than the data for py = 0. We will return to the case of nonzero py in section 5.
When discussing twist-two functions extracted from the correlation functions for particular lattice graphs, we will generically write A qq , A ∆q∆q , . . . , B δqδq , without reference to specific quark flavors q 1 and q 2 . This is because the distinction between u and d quarks in a pion only appears when lattice graphs are combined as specified in (3.2).

Isotropy and boost invariance
The decomposition (2.20) of matrix elements in terms of basis tensors and functions of y 2 and py assumes Lorentz invariance and thus requires both the continuum and the infinitevolume limit. If our lattice simulations are sufficiently close to these limits, then the values of twist-two functions extracted for individual points y and p with p y = 0 must not depend on the directions of y or p or on the size of p.
Let us test whether this is the case in our simulations for light quarks on the lattice with L = 40. We restrict our attention to graphs C 1 and C 2 , for which statistical errors are small enough to reveal the effects of interest. For the sake of legibility, we henceforth write y = | y | for the length of the spatial distance y between the two currents. We continue to use y 2 and py to denote the products y µ y µ and p µ y µ of four-vectors in Minkowski space. Since y µ is always spacelike in our context, this implies that y 2 < 0.
At large y of order La/2, we see a clear anisotropy with a saw-tooth pattern in all twisttwo functions that have sufficiently small errors. Examples are shown in figure 5. This pattern is expected on a lattice with periodic boundary conditions and can be understood in terms of "mirror images". The same effect has been seen and discussed in previous lattice studies  of two-current correlators [8,11], including our study in [12] that employed the same lattice data as the present work. As shown in [8], the effect of mirror charges at a given distance y is smallest for points y close to one of the space diagonals, i.e. the lines given by z = (z 1 , z 2 , z 3 ) with |z 1 | = |z 2 | = |z 3 |. To quantify this, we define θ( y ) as the angle between y and the space diagonal in the same octant as y . In [12, section 4.2], we found that a cut cos θ( y ) ≥ 0.9 (3.6) on the data efficiently removes the effect of mirror charges at large y, whilst keeping sufficient statistics.
A different type of anisotropy in the C 1 data is observed at small y and shown in figure 6. For A ∆q∆q , A δq q and B δqδq , the data with zero pion momentum exhibit a clear discrepancy between points y on a coordinate axis (i.e. with two components being zero) and all other points. This discrepancy is very strong for y below 5a and ceases to be visible above 7a. The data for A δqδq (not shown in the figure) have larger errors and show only a weak anisotropy for y < 4a. Only the function A qq is not affected by this phenomenon, for which we have no explanation.
By contrast, we find that the C 1 data with nonzero pion momentum and py = 0 are isotropic in y . For nonzero momenta, we can hence average all data with the same values of y and P , which greatly decreases statistical errors. We find good agreement between the P > 0 data and the P = 0 data with y on a coordinate axis for all twist-two functions except A ∆q∆q , where the agreement is only approximate.  Figure 6. Twist-two functions at small y for graph C 1 , with scaled pion momenta P = 0 and P = 1 as defined below (3.4). All points have py = 0 and are for L = 40 and light quarks. The data for P = √ 2, P = √ 3 and P = 2 agree with those for P = 1 within errors but are not shown for the sake of clarity. Data with cos θ = 1/3 correspond to y on a coordinate axis.
We now turn our attention to graph C 2 at small y. Here, we find a very strong anisotropy in the P = 0 data. This is shown in figure 7, where we distinguish points y on the coordinate axes, which have cos θ = 1/3, points with 1/3 < cos θ ≤ 2/3, and points with 2/3 < cos θ. We note that points in a coordinate plane, i.e. with at least one component of y equal to zero, have 1/3 ≤ cos θ ≤ 2/3. In all channels, we see a clear discrepancy between the points y on a coordinate axis and all other points. In addition, there is a significant mismatch between points with cos θ above or below 2/3 in several channels, most strongly so in A δq q . We recall that a strong anisotropy for C 2 at small y was also seen for the correlation functions in our study [12]. In section 4.2 of that work, we argued that this reflects an anisotropy in the Twist-two functions at small y for graph C 2 . All points are for L = 40, zero pion momentum, and light quarks. For A δqδq (not shown), one finds a clear anisotropy at y < 4a, whilst at larger y the statistical errors are too large for drawing firm conclusions. lattice propagator between the two currents, and that points selected by the cut (3.6) should give the most reliable results according to the analysis in [90].
We also have P = 1 data for C 2 , which we can compare with those for P = 0. As seen in figure 8, for A qq and A δq q the data at P = 1 are inconsistent with those at P = 0, regardless of the value of cos θ in the latter. Since for P = 1 the condition py = 0 requires y to lie in a coordinate plane, we can in fact not select points satisfying the cut (3.6) in this case. We therefore discard our data with nonzero P for C 2 . Testing boost invariance of twist-two functions at py = 0 in the presence of the cut (3.6) would require data with at least P = √ 2, which we do not have for C 2 .
At P = 0 and small y, we are now in a difficult situation. Points with large cos θ are preferred for C 2 , while for C 1 the points with the smallest possible value cos θ = 1/3 seem to be more reliable, given that they agree with the P > 0 data. Applying different cuts in cos θ to the data for C 1 and C 2 would prevent us from taking linear combinations of those graphs at a given y . However, we regard combining data point by point in y as highly desirable for a transparent and consistent treatment of statistical correlations in the jackknife analysis.
To avoid this problem, we choose to discard points with y < 5a in our further analysis, and to apply the cut in (3.6) to the P = 0 data for all lattice graphs. After this cut, data points with equal values of y are averaged also for P = 0. We thus avoid the regions where the anisotropy for C 1 and C 2 seen at P = 0 is most severe. For C 1 , a small discrepancy between the data with P = 0 and P > 0 is still visible up to about y ∼ 8a, but we consider this to be at an acceptable level. The result of this procedure is shown for graph C 1 in figure 9. The agreement between the data for different pion momenta is quite good, except for the function A ∆q∆q .
As an exception to the selection just described, we will in section 4.4 use the C 1 data for A qq down to y = 3a, given that in this particular channel there are no indications of anisotropy or a pion momentum dependence, as can be seen in figure 6(a).

Excited state contributions
As specified in section 3.2, we have a limited set of data with a separation of t = 32a between pion source and sink. Comparing this with our results for t = 15a allows us to assess the relevance of excited state contributions in our extraction of the pion matrix elements (2.8).
On our lattice with size L = 32, we have t = 32a data for graphs S 1 and C 2 . Unfortunately, these graphs give a statistical signal consistent with zero for all twist-two functions and for both source-sink separations. We hence limit the following discussion to graph C 1 on our L = 40 lattice.
In general, we find that the results for the two source-sink separation agree reasonably well for light quarks, as illustrated in the upper panels of figure 10. For strange quarks, the data have smaller statistical errors and we can clearly see discrepancies between t = 15a and 32a, as shown in the lower panels of the figure. Except for the case of A ∆q∆q , these discrepancies are, however, small when compared with the size of the twist-two functions.
In our data for charm quarks, the statistical signal and the agreement between the two source-sink separations is excellent for all twist-two functions, and even better than the one in figure 10(a). With the exception mentioned above, we thus find no indication for a sizeable contamination from excited states in our results.

Volume dependence
Let us finally compare our simulations for light quarks on the lattices with L = 40 and L = 32. In general, the data for the smaller lattice have larger jackknife errors. This is to be expected from the parameters that determine the statistical averaging in our simulations. Details for these are given in table 2 of [12]. For twist-two functions with a small relative error, we typically find a weak volume dependence compared with the size of the functions themselves, as shown in panels (a) to (c) of figure 11. In the case of panel (b), this dependence is, however, statistically significant. For functions that have large relative errors, the volume dependence appears to be more [fm] (f) A δq q , graph S2 Figure 11. Comparison of data for the two different lattice sizes in our study. All points are for zero pion momentum, light quarks, and subject to the cut (3.6).
pronounced in some cases, especially at low y. An example is figure 11(e). One may take this as a general warning against over-interpreting statistically weak signals in our simulations.

Results for zero pion momentum
In this section, we present our results for the twist-two functions at py = 0. All data shown in the following are for zero pion momentum and have been extracted from the lattice with L = 40 with our standard source-sink separation t = 15a. The data selection described at the end of section 3.3 removes regions in which we see strong lattice artefacts in the form of broken rotational or boost symmetry. As we explained in section 2.3, twist-two functions at py = 0 are not directly related to the Mellin moments of DPDs. Instead, they are Mellin moments of skewed DPDs, integrated over the skewness parameter ζ. As seen in figure 2, these moments receive contributions from parton configurations that are different from those in a DPD at ζ = 0. When interpreting the results of the present section, we will assume that these configurations are not dominant, and that the qualitative features of invariant functions at py = 0 are the same as for Mellin moments of DPDs at ζ = 0. The results presented in section 5.3 will lend support to this assumption.
Notice that each of the lattice graphs in figure 4 can contribute to each of the partonic regimes shown in figure 2. Examples for different regimes of the connected graphs C 1 and C 2 are shown in figures 12 and 13.

Comparison of graphs
In figures 14 and 15, we compare the contributions from different lattice graphs to the twisttwo functions for light quarks. The contributions from graphs S 1 and C 2 are multiplied with a factor 2 in the figures, since they always appear with this weight in physical matrix elements according to (3.2).
For all twist-two functions except A ∆q∆q , graph C 1 gives a very clear signal, which is positive for A δqδq and negative for the other functions. By comparison, the signal for the annihilation graph A is smaller than the one for C 1 by an order of magnitude or more, except for y > 20a, where the statistical errors prevent us from making a clear statement. The  function A ∆q∆q shows a different behaviour, with C 1 and A being of similar size and much smaller than C 1 for all other twist-two functions. We recall from section 3.3 that A ∆q∆q is more strongly affected by lattice artefacts than the other channels, see figure 9(b). A clear signal for the connected graph C 2 is only seen for A qq and A δq q , with a sign opposite to the one for graph C 1 . This signal is most important at small y. For the graphs S 1 and S 2 with one disconnected fermion loop, the signal we obtain is rather noisy in all channels. For graph D with two disconnected fermion loops, the signal after vacuum subtraction is even more noisy and not shown.
From our simulations with the strange quark mass, we have only data for graphs C 1 and A. In all channels, we obtain an excellent signal for C 1 , whereas for A the statistical significance is typically not much larger than one standard deviation. In the region 5a ≤ y ≤ 15a, we find that A is smaller than C 1 by one to two orders of magnitude, except for A ∆q∆q . For A qq and A δqδq , we see in figure 16 that the behaviour of A is quite flat, unlike the one of C 1 , so that at large y the two graphs become more comparable in size. As in the case of light quarks, the function A ∆q∆q behaves differently, with graph A being smaller than C 1 at y ∼ 5a and the data for both graphs having a zero crossing a bit below y = 9a. Recall, however, that also for strange quarks we see stronger lattice artefacts in A ∆q∆q than in other channels, as seen in figure 10(c).
From our simulations with the charm quark mass, we have data for all graphs except S 2 . A clear nonzero signal is seen for C 1 and C 2 up to y ∼ 10a to 15a, with 2C 2 being smaller than C 1 by at least one order of magnitude. The signal for A and S 1 is in general consistent with zero. The only exception to this is A δq q . For this function, we see a clear signal for 2S 1 at y around 5a, which is about 50 times smaller than the one for C 1 . We also see a weak 1σ signal for A, which we do not wish to over-interpret.
By and large, we find that for all quark masses the only graphs that give signals of   appreciable size are C 1 and, in several cases, C 2 . We therefore take a closer look at these graphs in the next subsection. The annihilation graph is negligible, except in the case of A ∆q∆q for light or strange quarks, where the signal from graph C 1 is small by itself. Disconnected graphs either have a negligibly small signal or large statistical errors.

Results for connected graphs
The contribution of graph C 1 to the twist-two function A qq for unpolarised partons is negative for all three quark masses in our study. We recall from (2.19) that the regime with a quark and an antiquark in the pion contributes with a negative sign to the lowest Mellin moment of a DPD. The same holds for the Mellin moment of a skewed DPD, and hence for A qq at py = 0. A negative sign of A qq is easily understood by the dominance of the valence qq Fock state, which is probed by graph C 1 as shown in the first panel of figure 12. The situation is different for graph C 2 , whose partonic representation always involves a higher Fock state of the pion. The Z-graphs in figure 13 probe the qq,qq and qq regimes in a similar manner. We find that for all quark masses, the contribution of C 2 to A qq is positive, which means that for a given distance y this graph gives a larger probability for finding a qq orqq rather than a qq pair.
Let us now take a closer look at the mass dependence of our results for graph C 1 . We multiply A δq q and B δqδq with the power of the meson mass m with which they appear in the decomposition (2.20) of two-current matrix elements. We see in figure 17 that for all twist-two functions except A ∆q∆q , the decrease with y becomes stronger with increasing quark mass, which simply reflects the decreasing size of the meson. At y ∼ 5a, the functions A qq , mA δq q and m 2 B δqδq are of comparable size for all quark masses, whereas A δqδq increases with the mass. The behaviour of A ∆q∆q for light and strange quarks is qualitatively different from the one of the other functions, as is evident from figure 18. For charm quarks, A ∆q∆q is approximately exponential in y, with a logarithmic slope similar to the one of A qq . A fit of the y dependence of the twist-two functions for light quarks is presented in section 5.2.
We now discuss graph C 2 , for which we have data with light quarks and with charm. For the functions A ∆q∆q , A δqδq and B δqδq , the light quark data is too noisy for a meaningful comparison with charm results, so that we focus on A qq and A δq q . As is seen in figure 19, the size of both functions is significantly smaller for charm quarks. This is plausible: as discussed in the previous subsection, the partonic interpretation of graph C 2 always involves a Fock state with at least two quarks and two antiquarks in the meson, whereas for C 1 we have the regime shown in the first panel in figure 12, which involves only the quark-antiquark  Fock state. The y dependence of A qq and A δq q is also qualitatively different for the two masses: for charm we observe a clear and steep exponential falloff, whereas for light quarks, the logarithmic slope of both functions decreases around y ∼ 0.5 fm.

Polarisation effects
A major aim of our study is to investigate the strength and pattern of spin correlations between two partons in a pion. We spelled out the physical interpretation of polarised DPDs in section 2.1. This interpretation extends to the corresponding twist-two functions at py = 0, provided that these are dominated by partonic regimes associated with DPDs at ζ = 0. Under this assumption, comparing A ∆q∆q and A δqδq with A qq indicates whether two partons prefer to have their spins aligned or anti-aligned, with A ∆q∆q referring to longitudinal and A δqδq to transverse polarisation. We will refer to these as "spin-spin correlations". Note that, according to (2.19), a qq pair with aligned spins contributes with a negative sign to A qq and A δqδq and with a positive sign to A ∆q∆q , whereas a qq pair with aligned spins contributes with a positive sign to all three functions. The comparison of myA δq q and m 2 |y 2 |B δqδq with A qq tells us about the strength of correlations between the transverse spin of one or both observed partons and the distance y between these partons in the transverse plane. We refer to this as "spin-orbit correlations" in the following. The pre-factors my and m 2 |y 2 | in myA δq q and m 2 |y 2 |B δqδq follow from the decompositions (2.4) and (2.18). We note that the probability interpretation of polarised DPDs implies positivity constraints [91] that extend the well-known Soffer bound for single parton distributions [92]. These bounds imply that |f ∆q∆q |, |f δqδq |, |myf δqq | and |m 2 y 2 f t δqδq | are bounded by f qq . Corresponding bounds do not hold for the lowest Mellin moments of DPDs because of the relative minus sign between quark and antiquark contributions in (2.19). They hold even less for the moments of skewed DPDs, which do not represent probabilities to start with. Nevertheless, in a loose sense, the size of A qq sets a natural scale for the other twist-two functions (multiplied with my or m 2 |y 2 | as appropriate).
Starting our discussion with graph C 1 , we see in the top panels of figure 20 that by far the strongest polarisation effect seen for light quarks is the spin-orbit correlation for a single parton, followed by the spin-orbit correlation involving both partons. Both the transverse and the longitudinal spin-spin correlations are very small. This is completely different from the simple picture of a pion as a qq pair in an S-wave, for which one would obtain 100% anti-alignment of both transverse and longitudinal spins.
All spin correlations increase considerably with the quark mass. For charm quarks, myA δq q is almost as large as A qq . Spin-spin correlations are also important for charm: the spins of the quark and antiquark are anti-aligned by about 75% for transverse and by about 50% for longitudinal polarisation. We note that this is still quite far away from the nonrelativistic limit, in which transverse and longitudinal spin correlations become equal.
We note that the pion mass for our simulations with light quarks, m π ≈ 295 MeV, is quite a bit larger than the physical value. A naive extrapolation of the polarisation patterns just described suggests that at the physical point the spin-orbit correlation for one polarised parton may be substantial, whilst correlations involving two quark spins might be even smaller than the ones we see for light quarks in the present study.
We now turn to our results for graph C 2 , which are shown in figure 21. For light quarks, we see a substantial spin-orbit correlation of order 50% for a single parton. The spin-spin correlation for longitudinal polarisation is also of order 50% for y ∼ 0.35 fm, but quickly decreases and is negligible already around y ∼ 0.5 fm. For all other spin dependent correlations, the data for light quarks are too noisy to extract any physics.
With charm quarks, we have an excellent statistical signal for all twist-two functions. We find that all spin correlations for graph C 2 are appreciable, apart from the one described by m 2 |y 2 |B δqδq . Notice that A ∆q∆q has the same sign for C 1 and C 2 , unlike all other twist-two functions. If (as suggested by the sign of A qq ) the dominant parton configuration probed by the twist-two operators is a cc pair for graph C 1 and a cc pair for graph C 2 , then the longitudinal parton spins tend to be anti-aligned in both cases.

Test of the factorisation hypothesis
We now test the factorisation hypothesis for A ud (y 2 , py = 0) that we derived in section 2.4. We restrict ourselves to the contribution from the connected graph C 1 . Taking the full combination of graphs in the first line of (3.2) is not an option because of the huge errors in our results for the doubly disconnected graph D. By contrast, we see in figure 14(a) that S 1 is consistent with zero for A qq (although within errors much larger than those on C 1 ). We find it plausible to expect that the contribution from D is even smaller than the one of S 1 , since D has two disconnected fermion loops with one operator insertion.
The factorisation hypothesis (2.41) involves the vector form factor of the pion. We have extracted this form factor from our lattice simulations, using the full number of 2025 gauge  configurations available for our lattice with L = 40. As we consider only the connected contribution to the two-current correlation function, we restrict ourselves to the connected graph for the form factor as well. We fit the form factor data to a power law We use two fit variants, which gives us a handle on the bias of the extrapolation to −t > 1.15 GeV 2 , where we have no data. Such an extrapolation bias is inevitable when we Fourier transform from momentum to position space, as is required in (2.41). In a monopole fit, we fix p = 1 and obtain M = 777(12) MeV. Leaving the power free, we obtain p = 1.173 (69) and M = 872(16) MeV. Both fits give a very good description of our lattice data, as shown in figure 14a of [12]. With the ansatz (4.1), the two-dimensional Fourier transform on the r.h.s. of (2.41) can be carried out analytically. We compute the remaining integral over ζ numerically. The results obtained with the two form factor fits agree very well for y > 0.2 fm. In panel (a) of figure 22 we compare the two sides of the factorisation hypothesis (2.41), and in panel (b) we show the ratio of the r.h.s. to the l.h.s. of the equation. We see a clear deviation from the factorised ansatz, which does however not exceed 30% in the considered y range. One may thus say that the factorised ansatz provides a rough approximation of the two-current correlator.

Physical matrix elements
We now investigate the combinations (3.2) of lattice graphs that appear in the matrix elements of currents between charged or neutral pions. We omit the doubly disconnected graph D throughout, because its statistical errors are much larger than the signal for any other graph. Since data for the full set of remaining graphs is only available for light quarks, we restrict our attention to this case. The results are shown in figures 23 and 24 for the flavour combinations ud and uu. The combinations dd and du can be obtained from the symmetry relations (2.16).
As can be expected from figures 14 and 15, the statistical errors of the physical combinations are significantly larger than those for the connected graphs alone. Nevertheless, we see a clear negative signal for A ud in a π + . As discussed in section 4.2, this can be understood as a dominance of the valence Fock state ud over Fock states that contain ud,ūd orūd. The function A uu in a π + has a clear positive signal at small distances y. This reflects the behaviour of graph C 2 and corresponds to a larger probability for finding two u quarks rather [ ] (f) myA δu u | π 0 Figure 23. Twist-two functions at py = 0 for the flavour combinations ud or uu in a π + or a π 0 . Lattice graphs are combined according to (3.2), except for of graph D, which is affected by huge errors and hence omitted. All results are for light quarks. than a uū pair at small separation y. Remarkably, the signal at small y is of comparable size for A uu and A ud , which implies that Fock states containing sea quarks do play an important role in this region. As for polarisation effects, a clear signal for ud or uu in a π + is only seen for myA δq q and shown in the right panels of figure 23. Comparing this with A qq , we see that spin-orbit correlations are appreciable for both flavour combinations. The flavour combination uu in a π 0 involves the sum C 1 + 2C 2 . We observe a very strong compensation between the two connected graphs, which results in a marginal signal for A uu and myA δuu . The twist-two functions for ud in a π 0 receive no contribution from connected graphs at all. Within errors, the corresponding results are zero for all combinations of currents, and we do not show them here.
Among all polarised twist-two functions other than myA δq q , a marginally nonzero signal is only seen for the longitudinal spin correlation A ∆u∆u in a π + or a π 0 . This is dominated by the contribution from C 2 in both cases and shown in figure 24.
Comparing the functions A ud and A uu in a π + , we see a clear difference in their y dependence. This is at variance with the assumption going into (2.7) and thus into the "pocket formula" for double parton scattering, which is that the DPDs for all parton combinations have the same y dependence in a given hadron. Of course, the twist-two functions at py = 0 are not directly related to DPDs at zero skewness ζ. However, it would be remarkable if the strong flavour dependence we see in πA qq (y 2 , py = 0) = 1 0 dζ I qq (y 2 , ζ) were absent in I qq (y 2 , ζ = 0).

Results for nonzero pion momentum
In this section, we use our data for nonzero pion momentum to study the py dependence of the twist-two functions. We restrict our study to graph C 1 for light quarks on the L = 40 lattice: only in this case do we have simulations for a sufficient number of pion momenta. Since graph C 1 dominates the twist-two matrix elements for ud in a π + , we will write A ud , A δud , . . . for twist-two functions and I ud , I δud , . . . for Mellin moments in what follows.

Fit ansatz for the py dependence
We start by proposing a functional ansatz for the twist-two functions, which is based on their relation (2.33) with the Mellin moments of skewed DPDs. We use this ansatz to fit the py dependence of our lattice data. This will allow for a model dependent extension of the twisttwo functions to all values of py, beyond the region (2.26) available on a Euclidean lattice. This will in turn allow for a model dependent extraction of the Mellin moments of DPDs at zero skewness. For ease of notation, we write A(y 2 , py) to denote any of the twist-two functions A ud , . . . , A δuδd , B δuδd . Likewise, we write I(y 2 , py) for the Mellin moments I ud , . . . , I δuδd , I t δuδd . The basis of our ansatz is the assumption that, in its support region −1 ≤ ζ ≤ 1, the skewed moment I(y 2 , ζ) can be approximated by a polynomial in ζ, I(y 2 , ζ) = π N n=0 a n (y 2 ) ζ 2n (5.1) with some integer N , where we used the symmetry relation (2.30) to restrict terms to even powers of ζ. We write = instead of ≈ in the spirit of a fit ansatz, i.e. we do not claim that and thus obtains the Taylor series An explicit representation is given by with rational functions For n = 0 and n = 1, these functions read In terms of the normalised quantities A(y 2 , py) = A(y 2 , py) A(y 2 , py = 0) ,â n (y 2 ) = a n (y 2 ) A(y 2 , py = 0) (5. Let us now describe our general fitting procedure. In order to achieve stable fits, we first determine the y 2 dependence of A(y 2 , py = 0). This includes the information from data with zero pion momentum and has typically much smaller errors than the data for nonzero py.
In a second step, we fit the y dependent coefficientsâ n (y 2 ) in the ansatz (5.10). To make the degrees of freedom of this fit explicit, we consider the moments ζ 2m (y 2 ) for m = 0, . . . , N . Inverting the relation (5.11), we obtain a n (y 2 ) = N m=0 (T −1 ) nm ζ 2m (y 2 ) , (5.12) where T is the (N + 1) × (N + 1) matrix with elements Since by definition ζ 0 (y 2 ) = 1, we can thus fit the py dependence of the twist-two functions to (5.10) and (5.12) with N fit parameters ζ 2 , . . . , ζ 2N at each value of y 2 . We call this "local fits" in the following, where "local" means "local in y 2 ".
To obtain a parametrisation of both the py and the y 2 dependence, we assume an expansion ζ 2m (y 2 ) = K k=0 c mk −y 2 k . (5.14) This is referred to as our "global fit". By virtue of (5.10) and (5.12), this corresponds to an expansion of A(y 2 , py) in powers of −y 2 . The condition ζ 0 (y 2 ) = 1 implies c 0k = δ 0k .

Fitting the data
We recall that we have data for p = 0, 1, √ 2, √ 3 and 2 in units of 2π/(La) ≈ 437 MeV. For a given value of y, this allows for a maximum value 4πy/(La) ≈ 6.28 y/(20a) for |py|. We apply the cut (3.6) on the angle θ to the p = 0 data, but not to the data with p > 0. We then average all data points with the same values of py and y 2 .
We find that the twist-two functions at py = 0 can be well described by a superposition of two exponentials, A(y 2 , py = 0) = A 1 e −a 1 (y−y min ) + A 2 e −a 2 (y−y min ) for y min ≤ y ≤ y max , (5.15) with y min = 5a = 0.355 fm and y max = 20a = 1.42 fm. We do not include data with y > y max , because they have large errors and are increasingly affected by finite size effects. The resulting fit parameters are given in table 2. Let us emphasise that these fits are not suitable for extrapolating the twist-two functions to values significantly below y = y min . We notice a relatively high value of χ 2 /dof in the fit for A δud . This is due to some scatter in the data at high y, which comes from points with large p. Repeating the fit with an upper limit y ≤ 15a, we find that χ 2 /dof decreases from 1.76 to 0.9 for A δud . By comparison, the value of χ 2 /dof in the fit for A ud decreases from 0.95 to 0.6 with the same reduction of the fitting range.
We then proceed and fit the py dependence to (5.10) and (5.12) locally in y 2 . To have enough data in these fits, we introduce bins in y and combine all points with (n − 1/2)a < function  Table 3. Parameters of the fit of the combined y 2 and py dependence of the normalised twist-two functions A(y 2 , py) to (5.10), (5.12) and (5.14) with N = K = 1.
y < (n + 1/2)a for integer n between 5 and 20. In addition, we fit the combined y 2 and py dependence of A to (5.10), (5.12) and (5.14). We explored fits with different maximum values N and K in the sums and find that, given the fit range and the statistical quality of our data, an adequate choice is N = 1 for local fits and N = 1, K = 1 for the global fit. The parameters of the global fit are given in table 3. If we take N = 2 instead, the error bands of the fit results for A increase significantly, whilst the decrease of χ 2 /dof is minor. We hence conclude that we would over-fit the data by choosing N = 2 or even higher values. We compare our data and fits in figure 25 for different functions at y = 15a and in figure 26 for A ud at y = 5a and 10a. We find good agreement between the local and global fits. Note that the twist-two functions are symmetric in py due to P T invariance, which is realised on the lattice. A departure from this symmetry in the data must therefore be due to statistical fluctuations. Many data points have admittedly large errors, which is a consequence of at least one of y or p being large. Nevertheless, the fitted parameters for all functions except A ∆u∆d are in general well determined, and the corresponding error bands of the fit results are reasonably small. As is seen in figure 25(b), the data for A ∆u∆d are much too noisy for fitting the py dependence, and we exclude this function from our further discussion.
In the data for y = 15a, we see an indication for zero crossings around |py| = 4 in several twist-two functions. That this can be reproduced with a superposition of the two functions h 0 (py) and h 1 (py) gives us some confidence in our fit ansatz.
Using our fits, we can compute the moment ζ 2 (y 2 ) associated with I(y 2 , ζ), which according to (5.11) follows from the curvature of A(y 2 , py) at py = 0. The results are shown in figure 27. We find again good agreement between the local and global fits. A clear y dependence of ζ 2 is observed, except for I δud . The values of ζ 2 are not too large, especially for small y. Their size does, however, imply that nonzero values of the skewness ζ must play some role in the integral representation πA(y 2 , py = 0) = 1 0 dζ I(y 2 , ζ).

Mellin moments of DPDs
We now use the global fit described in the last section to reconstruct the lowest Mellin moments of skewed DPDs. Let us re-emphasise that such a reconstruction is necessarily dependent on the functional ansatz we have made, given the impossibility to constrain the full py dependence of twist-two functions with lattice simulations. We recall that the results for the spin correlation ∆u∆d are too noisy and hence omitted in the following.
We can easily derive the analytic form of the Mellin moments for our fits by inverting the 2 × 2 matrix T mn in (5.13). This gives I(y 2 , ζ) = 3π 4 3 − 5 ζ 2 (y 2 ) − 5ζ 2 1 − 3 ζ 2 (y 2 ) A(y 2 , py = 0) .  The values of ζ 2 (y 2 ) for y ≤ 20a are in the range between 0 and 0.5 for all twist-two functions. The combination 3 − 5 ζ 2 in (5.16) is therefore always positive and varies between 3 and 0.5. We can hence anticipate that the dependence of the Mellin moments I(y 2 , ζ = 0) on y and on the polarisation indices should roughly follow the corresponding dependence of A(y 2 , py = 0). By contrast, the coefficient of ζ 2 in (5.16) has a larger variation and can change sign as a function of y. Our results for the y and ζ dependence of the Mellin moments are visualised in figures 28 and 29. Compared with the data entering our fit, we have slightly extended the y range from 5a down to 4a.
In the left panel of figure 30, we show the Mellin moments at ζ = 0 for the different polarisation combinations. Comparison with the data of the corresponding twist-two functions at py = 0 shows the close similarity between the two quantities. This corroborates the basic assumption of our discussion in section 4, namely that the qualitative features of twist-two functions at py = 0 are representative of the Mellin moments of ordinary DPDs.
With the caveats of choosing a functional ansatz and restricting ourselves to the connected graph C 1 , we can in particular extend our discussion for light quarks in section 4.3 to the Mellin moments of DPDs for the flavour combination ud in a π + : there is a substantial spin-orbit correlation for one transversely polarised quark or antiquark, whereas correlations involving transverse polarisation of both partons are rather small. This is one of the main results of our work.
DPDs at ζ = 0 satisfy sum rules, which have been proposed in [47] and can be proven rigorously in QCD [93,94]. These sum rules express momentum and quark number conservation.  Figure 27. Values of the moment ζ 2 (y 2 ) associated with I(y 2 , ζ), extracted by local fits (data points) and the global fit (bands).
The quark number sum rule for the flavour combination ud in a π + implies that 2π ∞ ycut dy y I ud (y 2 ; where Λ denotes a hadronic scale. The necessity of a lower cutoff on the y integral and the presence of an O(α 2 s ) term on the r.h.s. result from the singular behaviour of DPDs at perturbatively small distances y, as explained in [50]. To avoid large logarithms in the O(α 2 s ) term, one should take y cut ∼ 1/µ, and a standard choice is y cut = b 0 /µ, where b 0 = 2e −γ ≈ 1.12 and γ is the Euler-Mascheroni constant. With the renormalisation scale µ = 2 GeV of our analysis, this gives y cut ≈ 0.11 fm ≈ 1.56a. Extrapolating our global fit down to this This result is not too sensitive to the extrapolation in y: taking an upper integration boundary of 20a, we obtain −0.908 (63), whilst raising the lower integration boundary by a factor 2, we obtain −0.885 (72). Note that with a larger y cut , one expects a larger O(Λ 2 y 2 cut ) term on the r.h.s. of (5.17). Given the presence of this power correction in the theory prediction, we find its agreement with our result (5.18) quite satisfactory. We regard this as a strong cross check of our analysis, and in particular of the fit ansatz we have made.

Factorisation hypothesis for Mellin moments
With the Mellin moments reconstructed from our global fit, we can also test the factorisation hypothesis (2.37), which directly follows from the corresponding hypothesis (2.6) for DPDs.
To evaluate the r.h.s. of (2.37), we use the same two fits for the vector form factor of the pion as we did in section 4.4. The comparison of the left and right-hand sides of (2.37), as well as their ratio is shown in figure 31. We see the same trend as we did in figure 22 for A ud at py = 0. At small y, the result of the factorised ansatz is too large in absolute size, and at large y it is too small. The discrepancy at large y is even somewhat stronger for the Mellin moment I ud than it is for A ud . We draw the same conclusion as we did in section 4.4: the factorised ansatz for the unpolarised ud flavor combination in a π + can provide a rough approximation at the level of several 10%. In the sense that the factorised ansatz represents the assumption that the u and thed in a π + have independent spatial distributions, our result for I ud indicates that the two partons prefer to be farther apart than if they were uncorrelated.

Summary
This paper presents the first lattice calculation that provides information about double parton distributions in a pion. Our simulations are for a pion mass of m π ≈ 300 MeV, a lattice spacing of a ≈ 0.07 fm, and two lattice volumes with L = 32 and L = 40 points in the spatial lattice directions, respectively. We also have results for the pseudoscalar ground state made of strange or of charm quarks at their physical masses, in a partially quenched setup.
We compute the pion matrix elements of the product of two local currents that are separated by a space-like distance. From these tensor-valued matrix elements, we extract Lorentz invariant functions associated with the twist-two operators in the definition of DPDs. In the continuum and infinite volume limits, these functions depend on the pion momentum p µ and the distance y µ between the currents only via the invariant products py and y 2 . This allows us to detect discretisation and finite size effects in our data, and to devise cuts that minimise these artefacts. In particular, most results reported here are limited to distances y above 5a ≈ 0.35 fm. Comparing the data from our two lattice volumes, we find only mild differences in channels that have a good statistical signal. Comparing results obtained with different source-sink separation, we find little evidence for contributions from excited states in our analysis. The invariant twist-two function in the axial vector channel appears to be most strongly affected by several of the lattice artefacts.
Comparing the importance of different Wick contractions in the twist-two functions, we find that the connected graphs C 1 and C 2 in figure 4 are the most important ones in almost all cases. For light quarks, graph C 2 is as important as C 1 at small distances y between the two partons, which indicates that Fock states containing sea quarks are important in that region. As one would expect, this importance is strongly reduced for charm quarks, but it is still visible at a level below 10%. For light quarks, the combination of graphs C 1 and C 2 leads to a significant difference in the y dependence of the twist-two functions for the flavour combinations ud and uu in a π + .
We compute matrix elements for different combinations of the vector, axial vector and tensor currents, which respectively correspond to unpolarised partons and partons with longitudinal or transverse polarisation. For light quarks, we find surprisingly small correlations between the longitudinal or transverse spins of the two partons. By contrast, a large spin-orbit correlation is seen between the transverse component of y and the transverse polarisation of one of the partons. All spin correlations increase considerably with the quark mass, and for charm quarks we observe large spin-spin correlations for both longitudinal and transverse polarisation.
The invariant twist-two functions that we can determine on the lattice are not directly related to the Mellin moments of DPDs, but rather to the moments of what can be called "skewed" DPDs. To compute the Mellin moments of ordinary DPDs from two-current matrix elements, one needs the dependence of the invariant functions on the variable py on the full real axis. This is inaccessible on a Euclidean lattice. Fitting an ansatz for the py dependence to our lattice data, we can however reconstruct the Mellin moments by extrapolating this ansatz to the full py range. We find that the moments obtained in this way have a behaviour very similar to the one of the twist-two functions at py = 0. A valuable cross check of our procedure is the fact that the result for the unpolarised Mellin moment is in good agreement with the number sum rule that must be obeyed by the DPD for the flavour combination ud in a π + .
A starting point of many phenomenological studies is the assumption that unpolarised DPDs can be "factorised" into the single-particle distributions of each parton, which would mean that the two partons are independent of each other. We have formalised this assumption and tested it, both for the twist-two functions directly extracted from the lattice data and for the Mellin moments reconstructed by extrapolating a fit to these data. In both cases, we find that the two-parton correlator deviates from the factorisation ansatz by a few 10%, and that the sign of the deviation depends on the transverse distance y. More specifically, the two partons tend to be farther apart from each other than if they were independent of each other.
We see several directions into which the studies reported here should be extended. First and foremost comes the extension from a pion to a nucleon, which is of direct relevance for double parton scattering in proton-proton collisions. Work in this direction is underway. On a longer time scale, one will want to have simulations with finer lattice spacing and smaller quark masses. Data of sufficient quality for higher hadron momenta will extend the range in py that can be probed and thus allow for a better controlled extrapolation in this variable. Given the results obtained in the present work, we think that the efforts required for such studies will be rewarded with valuable physics insights.