Double parton distributions in the nucleon from lattice QCD

We evaluate nucleon four-point functions in the framework of lattice QCD in order to extract the first Mellin moment of double parton distributions (DPDs) in the unpolarized proton. In this first study, we employ an nf = 2 + 1 ensemble with pseudoscalar masses of mpi = 355 MeV and mK = 441 MeV. The results are converted to the scale mu = 2 GeV. Our calculation includes all Wick contractions, and for almost all of them a good statistical signal is obtained. We analyze the dependence of the DPD Mellin moments on the quark flavor and the quark polarization. Furthermore, the validity of frequently used factorization assumptions is investigated.

present work; preliminary results have been published in [68].
Our paper is organized as follows. In section 2, we review the different theoretical objects relevant to our study and explain how they are related to each other. Details of the lattice setup and ensembles we used are given in section 3, with more technical information being collected in an appendix. Sections 4 and 5 contain the results of our calculation. In section 4, the Mellin moments for different flavor and polarization combinations are presented, while the results of various factorization tests are discussed in section 5. Such tests are especially important in view of the fact that many phenomenological models of DPS use similar factorization assumptions. We summarize our findings in section 6.

Theory
In this section we review certain basics on double parton distributions (DPDs) and their relevance in the context of double parton scattering.

Double parton distributions
The DPD of a given hadron parameterizes the joint probability of finding two partons with certain polarization and momentum fractions x i at a given relative transverse distance y. For the case of the unpolarized proton, DPDs of quarks and antiquarks are defined by a proton matrix element of two operators: where indicates the average over the proton helicity states. In (2.1) we use light-cone coordinates v ± = (v 0 ± v 3 )/ √ 2, v = (v 1 , v 2 ) for a given four-vector v µ . Moreover, we work in a frame where the transverse proton momentum vanishes, i.e. p = 0. The light-cone operators appearing in (2.1) are defined as: O a (y, z) =q y − 1 2 z Γ a q y + 1 2 z where a specifies the quark flavor and polarization, which is determined by the spin projections Γ q = 1 2 γ + , Γ ∆q = 1 2 γ + γ 5 , Γ j δq = 1 2 iσ j+ γ 5 (j = 1, 2) . (2.3) In this notation, q refers to the sum over all quark polarizations. ∆q denotes the difference between positive and negative helicity contributions and, therefore, corresponds to the longitudinal quark polarization, whereas δq is the analogue for the case of transverse polarization.
The expression of the light cone operators given in (2.2) is only valid in light cone gauge, otherwise a Wilson line has to be inserted. Notice that the light cone operators have to be renormalized. This leads to a scale dependence of the operators and, consequently, of the DPDs. For brevity, we do not indicate the scale. Because of momentum conservation, the momentum fractions can take only values satisfying |x 1 | + |x 2 | ≤ 1. Negative momentum fractions are associated with antiquarks, i.e.: F a 1 a 2 (−x 1 , x 2 , y) = η a 1 C Fā 1 a 2 (x 1 , x 2 , y) , F a 1 a 2 (x 1 , −x 2 , y) = η a 2 C F a 1ā2 (x 1 , x 2 , y) , (2.4) where η a C = −1 for a = q, δq , η a C = +1 for a = ∆q . (2.5) DPDs fulfill certain sum rules, which have been proposed in [33] and proven in [69,70]. In this paper, we consider the number sum rule. In position space this can be formulated as: where f q (x) is an ordinary PDF for an unpolarized quark with flavor q and satisfies fq(x) = f q (−x). N q is the number of valence quarks with flavor q . The lower cutoff in the integral over y is necessary, because DPDs have a singular 1/y 2 behavior for y 2 → 0. This is caused by perturbative splitting processes, which are of O(α s (µ)). For more details see [71]. A common choice for the lower cutoff is y cut = b 0 /µ, where µ is the renormalization scale and b 0 = 2e −γ ≈ 1.12 with the Euler-Mascheroni constant γ.
The double parton distributions defined in (2.1) are needed to compute double parton scattering processes. The corresponding cross section can be written in terms of two DPDs, integrated over the transverse parton distance: (2.7) Hence, the dependence of DPDs on the transverse distance is not directly accessible in experiments. DPDs are often simplified and expressed in terms of single parton distributions within certain factorization approaches. The first procedure in this context is based on the insertion of a complete set of states between the operators in the matrix element in (2.1). Then it is assumed that nucleon states dominate, such that all other states can be neglected. This leads to an expression of DPDs in terms of impact parameter distributions f a (x, b): F a 1 a 2 (x 1 , x 2 , y) (2.8) This kind of factorization has been investigated on the lattice for the case of the pion. Significant differences between the r.h.s. and l.h.s. of (2.8) have been found, while the orders of magnitude are consistent with each other [67]. Similar observations have been made in quark model studies [49]. We shall perform analogous investigations for the case of the nucleon in section 5. The other factorization approach frequently used assumes a complete factorization w.r.t. all arguments: (2.9) This leads to the so-called "pocket formula", where the DPS cross section is written as a product of two SPS cross sections [72]: where i and j indicate the final states of the two hard scattering processes. C is a combinatoric factor, which is 2 if i = j and 1, otherwise. The effective cross section σ eff is defined by σ −1 eff = d 2 y[G(y)] 2 . The function G(y) must be independent of the quark flavor, which leads to the prediction that σ eff should be a universal constant. Since we are not able to resolve the x i dependence of DPDs in lattice studies, we cannot investigate to what extent factorization approaches w.r.t. the momentum fractions are valid. However, we shall perform the evaluation of DPD Mellin moments for different quark flavor combinations, such that we are able to check the universality of the function G(y).

Skewed double parton distributions
The DPDs defined in (2.1) can be extended by introducing an additional phase in the definition. This causes a difference between the momenta of the emitted and absorbed partons, respectively. We call the resulting functions skewed DPDs, which additionally depend on the skewness parameter ζ: The partons have momentum fractions x i ± 1 2 ζ. The sign of the fractions determines whether there is a quark (antiquark) in the proton wave function or an antiquark (quark) in its complex x 2 x 1 x 1 Figure 2. Support regions of F ud (x i , ζ, y) w.r.t. the arguments x i and ζ. This is shown for positive (left) and negative (right) skewness parameter ζ. For each sub-region we indicate the (anti-)quark content of the wave function and its complex conjugate. The notation u|ddu means that we have a u-quark in the proton wave function andddu in its complex conjugate.
conjugate. An overview of the corresponding regions is given in figure 2. If all fractions are positive, we have two quarks with momentum fractions x 1 − 1 2 ζ and x 2 + 1 2 ζ in the proton wave function and two quarks with x 1 + 1 2 ζ, x 2 − 1 2 ζ in its complex conjugate. This is sketched in figure 1 for the case of a u and a d quark. Because of momentum conservation, the region in the (x 1 , x 2 , ζ)-parameter space where the skewed DPDs are non-zero is restricted by The corresponding support region is also indicated in Figure 2. From P T invariance it follows that: F a 1 a 2 (x 1 , x 2 , ζ, y) = η a 1 PT η a 2 PT F a 1 a 2 (x 1 , x 2 , −ζ, −y) , (2.13) with η a PT = −1 for a = ∆q, δq , η a PT = +1 for a = q . (2.14) Moreover, one can give a decomposition of the skewed DPDs in terms of functions that are rotationally invariant in the transverse plane. F q 1 q 2 and F ∆q 1 ∆q 2 have even parity and are scalar quantities, therefore they are already rotationally invariant. By contrast F ∆q 1 q 2 and F q 1 ∆q 2 are parity-odd, which implies that they have to vanish. From invariance under time reflection, it also follows that the T -odd quantities F j 1 δq 1 ∆q 2 and F j 2 ∆q 1 δq 2 are zero. The remaining DPDs can be decomposed in terms of transverse vectors as follows: F q 1 q 2 (x 1 , x 2 , ζ, y) = f q 1 q 2 (x 1 , x 2 , ζ, y 2 ) , F ∆q 1 ∆q 2 (x 1 , x 2 , ζ, y) = f ∆q 1 ∆q 2 (x 1 , x 2 , ζ, y 2 ) , F j 1 δq 1 q 2 (x 1 , x 2 , ζ, y) = j 1 k y k mf δq 1 q 2 (x 1 , x 2 , ζ, y 2 ) , F j 2 q 1 δq 2 (x 1 , x 2 , ζ, y) = j 2 k y k mf q 1 δq 2 (x 1 , x 2 , ζ, y 2 ) , F j 1 j 2 δq 1 δq 2 (x 1 , x 2 , ζ, y) = δ j 1 j 2 f δq 1 δq 2 (x 1 , x 2 , ζ, y 2 ) + 2y j 1 y j 2 − δ j 1 j 2 y 2 m 2 f t δq 1 δq 2 (x 1 , x 2 , ζ, y 2 ) , (2.15) where m is the proton mass and ij is the antisymmetric tensor in two dimensions, with 12 = 1. Notice that y 2 = y µ y µ denotes the Lorentz invariant scalar product. In our case we have y + = 0, i.e. y 2 = −y 2 . For ζ = 0 the functions on the r.h.s. of (2.15) have the following physical interpretation: • f q 1 q 2 describes the probability of finding two quarks with momentum fractions x 1 and x 2 at a transverse distance y. It contains a sum over all quark polarization states, i.e. the quarks are unpolarized.
• f ∆q 1 ∆q 2 describes the difference between the probabilities of finding the two quarks with aligned or anti-aligned spins in the longitudinal direction. This gives a measure for the longitudinal quark polarization.
• f δq 1 δq 2 is the analogue of f ∆q 1 ∆q 2 for polarization in the transverse direction.
• f δq 1 q 2 describes the correlation between the transverse polarization of the first quark and the transverse distance y between the two quarks. f q 1 δq 2 can be interpreted in analogy, where the second quark is polarized and the first unpolarized.
• f t δq 1 δq 2 gives the correlation between the transverse distance y of the quarks and their transverse polarizations.
(2. 16) The matrix elements in the definitions (2.1) and (2.11) are not directly accessible on a Euclidean lattice, since they involve light-like distances. A way to circumvent this obstacle is to consider Mellin moments of skewed DPDs. The lowest Mellin moment is defined as: +η a 2 C f a 1ā2 (x 1 , x 2 , ζ, y 2 ) + η a 1 C η a 2 C fā 1ā2 (x 1 , x 2 , ζ, y 2 ) . (2.17) The integrals over x 1 and x 2 in (2.17) together with the exponentials e ix i p + z − i in (2.1) and (2.11) set z − i to zero, which is what we intended. The resulting matrix elements involve only local quark bilinears, which can be evaluated in lattice simulations.
These operators commute if the distance vector y is space-like, in particular for y 0 = 0. A consequence of this property is the relation: Moreover, the currents have definite transformation behavior under charge conjugation and the combination of parity and time reflection: where in analogy to (2.5) and (2.14) the sign factors η C and η PT are defined as: and The combined P T symmetry implies for the two-current matrix elements: In the context of DPDs we have to consider the current combinations V V , AA, V T , T V , and T T . The corresponding two-current matrix elements are by definition Lorentz tensors of a certain rank, which is determined by the involved currents. Therefore, the matrix elements can be decomposed in terms of Lorentz invariant functions and Lorentz tensors constructed from the four-vectors p and y. In order to reduce the number of independent quantities, we subtract trace contributions and consider symmetric combinations. For brevity we skip the arguments y and p of the matrix elements M : Here we write t {µν} = (t µν + t νµ )/2 and t [µν] = (t µν − t νµ )/2 for an arbitrary tensor t µν . The quantities A, B,... are Lorentz invariant functions, i.e. they only depend on py = p µ y µ and y 2 = y µ y µ . The decomposition of M µν q 1 q 2 ,AA , which is not explicitly given in (2.25), has the same form as the one for M µν q 1 q 2 ,V V and introduces the functions A ∆q 1 ∆q 2 , B ∆q 1 ∆q 2 , and C ∆q 1 ∆q 2 . Decomposing M µνρ q 1 q 2 ,V T works in analogy to M µνρ q 1 q 2 ,T V with the Lorentz indices interchanged appropriately. The basis tensors are given by: Notice that the tensorsũ T T,A . . .ũ T T,D are not trace-subtracted, which is in contrast to the analogous tensors u T T,A . . . u T T,D defined in [67]. For this reason, the last term in (2.25), which is proportional to the trace, involves a modified invariant function E δq 1 δq 2 rather than the original function E δq 1 δq 2 . The remaining functions A δq 1 δq 2 . . . D δq 1 δq 2 are the same as in [67]. Using the decomposition (2.15), we can relate the two-current matrix elements (2.18) to the DPD Mellin moments (2.17): Notice that the Dirac structure in the local tensor operator J µν q,T differs from that in the spin projection Γ j δq by an extra γ 5 , see (2.3) and (2.19). This corresponds to a rotation by 90°in the transverse plane, which follows from the relation iσ j+ γ 5 = jk σ k+ and has been taken into account in (2.27).
Comparing (2.27) with (2.25) we find the following relations between the DPD Mellin moments and the invariant functions: i.e. the Mellin moments are Fourier transforms of the invariant functions A a 1 a 2 and B δqδq . We refer to this subset of invariant functions as twist-two functions throughout this paper.
Since the Mellin moments are symmetric in ζ, which follows from (2.13), the inverse Fourier transform at py = 0 can be written as: We define even ζ-moments of the Mellin moments: whereas odd ζ-moments vanish because of parity. The last expression in (2.30) follows from inserting (2.28) and performing an integration by parts. Hence, the 2m-th moment in ζ is directly related to the 2m-th derivative in py of the corresponding twist-two function at py = 0.

Two-current matrix elements on the lattice
In order to perform lattice simulations, we switch to Euclidean spacetime in this section. The corresponding time component of a four vector x µ is denoted by x 4 instead of x 0 . In Euclidean spacetime, the matrix elements given in (2.18) can be directly calculated in lattice QCD if the distance between the two insertion operators is purely spatial, i.e. y 4 = y 0 = 0. In this section we describe the relation between the matrix elements and the lattice four-point functions defined below for the case of the nucleon and explain the techniques we use for the evaluation of the latter.

Four-point functions and matrix elements
Definition: We define the proton four-point function C ij, p 4pt as the correlator of a proton creation operator P (source), the corresponding annihilation operator P (sink), and the two local currents J i defined in (2.19): where the sum over z , z combined with the exponential injects a total proton momentum, and the operator projects onto positive parity. The proton creation and annihilation operators, which we also refer to as interpolators, are given by tri-quark operators matching the proton's spin J = 1/2 and isospin I = 1/2: where C is the charge conjugation matrix in spinor space, and [.] indicates a scalar quantity w.r.t. spinor indices. The traces in (3.1) are taken w.r.t. the open spinor indices introduced by the quark fields u a andū c , respectively. Furthermore, we define the two-point function: We denote the separation in Euclidean time direction between the source and the current insertions by τ , and the separation between the source and the sink by t.
Wick contractions: The evaluation of the correlation functions (3.1) and (3.4) leads to a definite set of Wick contractions. While there are only two contractions arising from permutations of uū-pairs in the two-point function, there is a multitude of possible contractions in the case of the four-point functions, which can be grouped into five types. Following the notation of [62] we call them C 1 , C 2 , S 1 , S 2 and D. They can be represented by the graphs illustrated in figure 3. S 1 , S 2 and D are disconnected contractions involving the sub-graphs G 3pt and G 2pt , as well as the loops L 1 and L 2 . Explicit expressions are given in appendix A.2. The explicit contribution of a given type depends on the flavor content of the inserted operators J qq ,i , which in general can be flavor changing. In the case of the graph C 1 this is indicated by the four flavors q 1 . . . q 4 of the quark lines connected to the current insertions, where the first two indices correspond to the flavor of the first operator J q 1 q 2 ,i and the last two flavor indices are those of the second operator J q 3 q 4 ,j . For the proton there are three independent contributions called C 1,uuuu , C 1,uudd and C 1,uddu , where the latter is not considered in this work, since we restrict ourselves to flavor conserving currents J qq,i = J q,i , see definition (2.19). If all considered quarks have the same mass, the graphs C 2,q and S 1,q depend only on the flavor q of the two propagators connecting the source or sink with one of the current insertions. Therefore, in the case of proton-proton matrix elements there are two possibilities for each contraction: C 2,u , C 2,d , S 1,u , S 1,d . For each of the contractions S 2 and D there is only Figure 3. Illustration of the five kinds of Wick contractions (graphs) contributing to a four-point function of a baryon. The explicit contributions for the graphs C 1 , C 2 and S 1 depend on the quark flavor of the current insertions (red points). In the case where all quark flavors have the same mass, C 2 only depends on the flavors of the two propagators connected to the source or the sink. These flavors have to be the same for proton-proton matrix elements. For the graphs S 1 , S 2 and D we also indicate the parts connected to the proton source and sink, i.e. G 3pt and G 2pt (blue), as well as the disconnected loops L 1 and L 2 (orange).
one contribution, which is flavor independent. Notice that we define the quantities C 1,uuuu , C 1,uudd , C 1,uddu , C 2,u , C 2,d , S 1,u , S 1,d , S 2 , and D as a sum of all quark permutations that share the same quark line topology. In particular, this includes permutations of the two u-quarks of the proton itself (see the definitions in (A.13), (A.15), and (A. 16)). In addition to the desired nucleon ground state, the interpolators P and P also create and annihilate excited states. In order to relate the four-point functions to physical matrix elements of the nucleon ground state, we have to ensure that these excited states are sufficiently suppressed. This can be achieved by taking large Euclidean time separations t and t − τ . In this context we define: where V = L 3 a 3 denotes the spatial volume. The factor 2V m 2 + p 2 on the r.h.s. of (3.5) ensures the correct normalization of states. In a similar manner, we define: and likewise for the other contractions C 1,uuuu , C 2,u , . . . that contribute to C 4pt ( y , t, τ ).
Let us now point out some properties of these contractions: Using (A.7) and (A.11) and P T invariance, as well as invariance under translations in the time direction we are able to deduce the relations where η ij PT is defined in (2.23). Notice that strictly speaking (3.7) is exactly fulfilled only if τ = t/2. In the limit of large Euclidean time separations 0 τ t we consider G ij, p to be constant w.r.t. τ so that the P T relations can also be applied to the ratio (3.5). Invariance under CP transformations, together with the relations (A.7) and (A.10), implies for all contractions: If η ij 4 = 1, which is the case for the matrix elements we consider in this work, the relations (3.7) and (3.8) imply that C 1,uudd , C 1,uuuu , S 1,u , S 1,d , S 2 , D are real-valued, whereas C 2,u , C 2,d can have non-vanishing imaginary parts. For these contractions we find Moreover, translational invariance implies that G ij, p ( y ) = G ji, p (− y ) for G = S 2 , D . (3.11) Contribution to physical matrix elements: Inserting a complete set of states between the interpolators and the current insertions and taking the limit 0 τ t (see (3.5)), we find: where u λ (p) is the usual spinor solution of the Dirac equation for the nucleon. Again we note that we set y 0 = y 4 = 0 so that the translation to Minkowski spacetime is trivial. By writing y 0 instead of y 4 on the r.h.s. of (3.12) we refer to the matrix elements in Minkowski spacetime, which we are actually interested in. For the parity projection P + defined in (3.2) the r.h.s. of (3.12) turns into the desired spin averaged proton matrix element. Considering the currents defined in (2.19) (we omit Lorentz indices for brevity), we can write: where M q 1 q 2 ,i 1 i 2 is the two-current matrix element (2.18) to be investigated. For the currents containing only the light quarks u and d, we find for the proton matrix elements: where J latt i (y) are the bare lattice operators. The renormalization factors Z A and Z V for the axial and vector currents do not depend on the renormalization scale, because the associated anomalous dimensions vanish. By contrast, Z T refers to the scale µ = 2 GeV . (3.17) The renormalization constants Z i specific to our lattice setup with β = 3.4 have been determined in [73] (see table X therein) using the RI -MOM scheme. They include the conversion to the MS-scheme at 3-loop accuracy. We summarize the corresponding values in table 1.
The matrix elements we are interested in contain two local operators. Hence, the twocurrent matrix element renormalized in the MS scheme is given by: In other words, the product of renormalized operators J MS i (y) J MS j (0) requires no additional renormalization, because we always consider a finite spacelike distance y between the two currents.

Technical details on Wick contractions
In the following, we discuss the technical details regarding the evaluation of each Wick contraction we have previously defined. A technical sketch of all graphs is shown in figure 4. Each contraction is calculated on a smeared quark point source S Φ, p z = Φ p S z . It is a diagonal spinor-color matrix located at position z, i.e. (S z ) ab αβ (y) = δ zy δ αβ δ ab , where z 4 is the nucleon source timeslice. Notice that here and in the following spinor indices are denoted by Greek letters α, β, . . . , whereas Latin letters a, b, . . . denote color indices of the fundamental representation. More details and explanations on the notation can be found in appendix A.1 Smearing: As already mentioned above, we apply a smearing function Φ p to the corresponding sources and propagators, in order to increase the overlap of the proton interpolators with the proton ground state. Φ p includes a phase injecting a momentum b p to each of the quarks, where p denotes the proton momentum. The method is known as momentum smearing [74], which is based on the Wuppertal smearing technique [75]. Explicitly, the smearing function Φ p reads:  (Φ p 0 )(x|y) = where we set = 0.25 and b = 0.45, in order to obtain a maximal overlap with the ground state. The value of the latter parameter is specific to our setup. The smearing function is applied n times, which is denoted by Φ = Φ n 0 . The gauge links U sm appearing in (3.19) are obtained from the original gauge links by applying spatial APE-smearing [76], which reduces unphysical short-distance fluctuations. M Φ, p z (y) denotes the source-smeared pointto-all quark propagator, which is obtained by solving: where D is the Dirac operator. This propagator is used for the construction of each of the contractions relevant for the four-point function C 4pt .

Stochastic propagators and improvements:
The all-to-all propagators required for the evaluation of most of the four-point graphs are estimated by use of time-local stochastic sources η ( ) t . In this context the spatial unit matrix is approximated in the following way: In the present study we employ Z 2 ⊗ Z 2 wall sources defined for a specific timeslice t, i.e. the entries can take the values The propagated stochastic source ψ ( ) t , which we call "stochastic propagator" in the remainder of this work, is obtained by solving: (3.23) It describes the propagation from any spatial position on timeslice t to any other site on the lattice. The off-diagonal components in (3.21) are pure noise. This noise is particularly large for near-diagonal terms, where the propagator takes large values. Quantities involving these terms can be improved by a method that has also been used in [77], where one exploits ultralocality of the action. The method consists of applying the hopping parameter expansion (HPE), where the Dirac operator is rewritten as D = (1 − H)/(2aκ). Subsequently, the corresponding propagator can be expanded in terms of powers H n using the geometric series. Depending on the situation there exists a maximal order N in the series, up to which the corresponding terms vanish exactly in the stochastic limit or, equivalently, for the exact solution of the propagator. This enables us to rewrite the propagator as: (3.24) The replacement D −1 → H N D −1 removes the first N terms in the expansion. In the context of our calculations this method is used in two different places. The first one is a propagator connecting two sites on the same timeslice, which is needed for the calculation of the C 2 graph. Since the hopping term H connects only nearest neighbors, we set aN ( y ) = |y 1 | + |y 2 | + |y 3 |. Taking into account the periodicity of the lattice, the exact definition of N ( y ) is: In the expression of the graph to be evaluated, we then have to replace ψ If the propagator is contracted with a Dirac matrix, e.g. in loops containing only one current, there is also a certain number of terms in the hopping parameter expansion that cancel. The number of terms that vanish depends on the Dirac matrix, see table 2.
Interpolator kernels: Before we continue to define the expressions to be evaluated, we introduce the compact notation: (3.27) We then define the annihilation operator kernel: Contracting with the quark field operators, this yields the baryon annihilation operator (3.3) itself. In analogy, the baryon creation operator kernel is defined as: In both cases, the index σ corresponds to the open fermion index, which is consistent with the fermionic nature of baryons. P + again denotes the parity projection operator (3.2).
The two point function C 2pt : The proton two-point function involves two Wick contractions arising from permutations of the two u-quarks. In terms of the smeared point-to-all propagator (3.20) evaluated at the source at position z, the total contribution for momentum p is given by: where z denotes the sink position. Together with the phase introduced by the factor E p (z −z), a sum over z at the sink timeslice projects onto the proton momentum p. The two-point function itself is defined as the average over all gauge fields, which is indicated by the .notation: where t is the source-sink separation in the time direction. A momentum projecting sum at the source is not necessary because of translational invariance, i.e. there is no dependence on the source position. In the second expression in (3.31), the omitted sum over z has been compensated for by a factor V . As previously discussed, the two-point function C 2pt is needed to normalize the twocurrent matrix element, see (3.5). Furthermore, the expressionG 2pt is part of the contractions S 2 and D, which will be discussed later.
Graph C 1 : The evaluation procedure of the C 1 graph is shown in figure 5. In total, this contraction involves five propagators, where three of them correspond to the smeared pointto-all propagator (3.20), which we refer to as forward propagator in the following. The two propagators connecting the current insertions and the sink are calculated from sources placed at the sink. Both propagators have to be Hermitian conjugated and multiplied by γ 5 on both sides in order to obtain the desired propagator in the forward direction. For these two propagators, we use two different methods: The first propagator is obtained from an inversion on a stochastic wall source η ( ) t , which is placed at the sink timeslice, see (3.23). This stochastic propagator is denoted by ψ ( ) t . From the smeared stochastic source, i.e. Φ p γ 5 η ( ) t (the γ 5 is needed to reverse the propagator), and the smeared forward propagator Φ p M Φ, p z , both contracted with the baryon annihilation kernel (3.28), we create a sequential source S p,( ) t , where: , which is called q 1 . This is constructed from the quantity Y . The open baryon spinor index σ of the sink is contracted with the baryon spinor index of the source, which is why it does not appear anymore in q 1 . Center: A part of the stochastic quark line b, which is called q 2 . Right: The complete C 1 graph, which is constructed from q 1,i and q 2,j (or in some cases q 1,j and q 2,i ) The exact contraction with the annihilation kernel, i.e. which index is contracted with which part, depends on the baryon type and the quark flavor of the local currents. This is discussed in detail in appendix A.3, where all possible expressions for S ( ) are listed, see (A. 18). An inversion on the momentum smeared sequential source yields a sequential propagator X The sequential source technique has been invented in [78]. The sequential propagator is connected to the second current insertion. In this technical context, the three quark lines between the proton source and sink can be distinguished w.r.t. the evaluation method of the involved propagators. We shall use the following labels: a forward propagator connecting the baryon operators b quark line with the stochastic source, the stochastic propagator, and one current insertion c quark line with the sequential propagator and the other current insertion.
Furthermore, we define: (3.35) The γ 5 in (3.34) and (3.35) again appears from reversing the sequential or stochastic propagator, respectively. The C 1 graph itself is obtained by calculating Here x is the position of the operator O j . In order to increase statistics, we perform a sum over all spatial positions at the insertion timeslice (volume average), exploiting spatial translational invariance. Depending on the quark flavors of the baryon and the insertion operators, there are several terms that have to be summed up to obtain the full C 1 contribution. This is explained in detail in appendix A.4 for the proton case and the operators O u,i O d,j and, O u,i O u,j i.e. the graphs C ij 1,uudd and C ij 1,uuuu .
Loops L 1 and L 2 : We implement two methods to calculate the loop L 1 , which is needed for the evaluation of the S 1 and D graphs. The first method involves the stochastic wall source η ( ) τ at the insertion timeslice τ and the corresponding propagator. Fluctuations introduced by the stochastic noise vectors are reduced by employing the hopping parameter expansion trick, which we have discussed previously. The number of omitted terms N depends on the inserted Dirac structure Γ i , see table 2. With the accordingly improved stochastic propagator ξ ( ),N τ (see (3.26)), we define: Alternatively, we compute the loop for fixed spatial positions using point sources. A disadvantage is that the calculation has to be repeated for each loop position we want to consider. This version is only employed for one of the two loops in the D graph: Furthermore, we define the volume average: The second kind of loop, L 2 , appears in the S 2 graph. It contains the two spatially separated current insertions, which are connected by two propagators. Using stochastic noise vectors is not feasible in this case. Thus, the loop is calculated from point-to-all propagators only: Statistics can be enhanced by averaging over several spatial positions x. For each position the calculation has to be repeated.
Graphs C 2 and S 1 : The C 2 and S 1 graphs are both constructed from a sequential source. For this we use the same source as for usual three-point functions, see (A.20). The corresponding sequential propagator X t is obtained by inverting: where again In the case of the C 2 contraction, the sequential propagator is connected to one current insertion. The other current insertion is contracted with the forward propagator. Both current insertions are connected by a stochastic propagator, which is improved by the HPE trick we have discussed earlier. Explicitly, we find for the C 2 graph: . (3.43) The S 1 graph consists of two disconnected pieces. The first one has the same structure as a usual three-point function calculated from a sequential source: The second part is given by the previously defined loop L 1 . If the quantum numbers permit, there are disconnected contributions from the vacuum expectation values of G 3pt and L 1 . These must be subtracted, in order to obtain the S 1 contribution we wish to calculate: Notice that the global sign corresponds to the permutation sign of the Wick contraction.
Graphs S 2 and D: The graph S 2 consists of a two-point contraction and the loop L 2 , whereas D consists of a two-point contraction and two L 1 loops. As for the S 1 graph, we have to consider vacuum contributions of the disconnected parts, which have to be subtracted. Notice that we defined the loop L 2 at a fixed spatial position. Hence, we are not able to perform a volume average like in the previous cases: We use two methods to evaluate the D graph: The first employs two stochastic loops L 1,st , which allows us to perform a volume average. In the second method, we replace one stochastic loop by a loop attached to a point source L 1,pt . This might reduce the stochastic noise but  Table 3. Details of the CLS ensemble which we use for the evaluation of the two-current matrix elements. Our simulation includes 990 configurations.
precludes the possibility to perform a volume average. For the doubly stochastic case, the D graph reads: Note that we use two different sets of stochastic sources for the two disconnected loops. Equation (3.47) is valid for the first method. For the second method L 1,st has to be replaced by L 1,pt . Furthermore, one has to replace the sum a 3 x by a volume factor V .

Lattice setup
The simulation is performed on the gauge ensemble H102 of the CLS collaboration [79]. It includes n f = 2 + 1 dynamical Sheikholeslami-Wohlert fermions and the tree-level improved Lüscher-Weisz gauge action. The extension is 32 3 × 96 with open boundary conditions in the time direction. The pseudoscalar masses are m π = 355 MeV and m K = 441 MeV, and the lattice spacing is a = 0.0856 fm, which corresponds to the inverse lattice coupling β = 3.4. More information can be found in table 3. From this ensemble we use 990 configurations. For the calculation of the ratio (3.5) we need to know the value of the nucleon energy E p = m 2 + p 2 in the given lattice setup. We obtain the corresponding value from an exponential fit to the two-point function data for each momentum. Moreover, the proton mass is needed in the decompositions (2.25) and (2.27). From our fits, we obtain m = 1.1296(75) GeV.
Our analysis requires a wide range of proton momenta. Explicitly, calculations are performed for the momenta with P = (0, 0, 0), (−1, −1, −1), (−2, −2, −2), (2, 2, −2), (2, −2, 2), (−2, 2, 2). Thus, the largest momentum has the absolute value | p| ≈ 1.57 GeV. To avoid artifacts possibly caused by the open boundary conditions, we place the source at t src = T /2 = 48a. The spatial position is chosen randomly for each configuration. The distance to the sink in time direction is t = t snk − t src = 12a for the case p = 0 and t = 10a otherwise. We evaluate the C 1 graph for all intermediate insertion times 0 < τ < t. A value for C 1 ( y ) is then given by a fit w.r.t. τ to a constant including a certain region around t/2, where excited states are seen to be sufficiently small. The remaining graphs are calculated for τ = t/2, i.e. τ = 6a for p = 0 and τ = 5a for p = 0. The disconnected parts L 2 (τ ) and L 1 (τ )L 1 (τ ) do not depend on the proton momentum. Hence, the corresponding calculations can be combined, which increases statistics. Consequently, the average insertion time for the contractions S 2 and D is slightly different from t/2, which should not be a problem as long as excited state contributions are small.
We perform the calculations for multiple proton sources located at different source positions, which further enhances statistics. The number of proton sources, as well as the number of stochastic noise vectors being used for each contraction is summarized in table 4. The propagators are smeared at the proton source and sink by n = 250 smearing iterations (3.19).

Data quality
In the following we want to consider the matrix elements V 0 V 0 and A 0 A 0 and discuss a number of artifacts. For the remainder of this paper we shall use the following notation for absolute values of 3-vectors: Nevertheless, we denote the usual 4-vector scalar product by y 2 = y µ y µ . Since y 0 = 0, one has | y | 2 = −y 2 . To avoid confusion, the n-th power of y = | y | is denoted by −y 2 n . For details on our notation, see appendix A.1. At the moment, we consider the data for single contractions instead of the complete four-point functions and, moreover, we restrict ourselves   to zero momentum, i.e. p = 0 or, equivalently, P = 0. In our study, we are interested in the dependence on the current distance y. For the C 1 graph we are able to investigate the dependence on the insertion time τ , which is plotted in figure 6 for V 0 V 0 and A 0 A 0 at y = (−3, 4, 3). We observe a reasonable quality of the data and plateaus around t/2. The values for C 1 ( y ) are obtained by a fit to a constant w.r.t. the insertion time τ , where we take into account the timeslices τ ∈ [t/2 − 3a, t/2 + 3a]. The corresponding fit bands are also plotted in figure 6. For all remaining contractions, the insertion time is fixed at τ = t/2 in our simulation, as discussed in the previous section.
For the remainder of this paper, we concentrate on the contributions C 1 , C 2 , S 1 and S 2 . For both versions of L 1 L 1 we have presented in section 3.2, and consequently for the D graph itself, we obtain statistical errors that are much larger than the signals of the remaining graphs. In contrast to our study [62] for the pion, this is already the case before carrying out the vacuum subtraction. As a consequence, we shall not consider contributions of the D graph in subsequent analysis steps.
In order to investigate possible anisotropy effects, we distinguish three sets of data points characterized by the angle θ between the distance vector y and the nearest lattice space diagonal: • cos θ = 1/3: data points placed on one of the lattice axes • cos θ > 0.9: data points in the vicinity of one lattice space diagonal (d) anisotropy, V 0 q V 0 q , S2 Figure 7. Visualization of anisotropies found in the four-point data. The data points are separated w.r.t. to the angle θ between the distance vector y and the next nearest space diagonal (see the text). These plots show the results for the C 1 contribution to V 0 , as well as the S 2 graph for small y (d).
the spatial directions, which is explained in detail in [58]. These are stronger along the lattice axes, since the mirror charges lie closer together in this case. This artifact can be observed in figure 7(a) at distances y > 12a, where the data with cos θ < 0.9 clearly lie above the data for y close to the lattice diagonals. The resulting "saw-tooth" pattern can be seen in each channel in the C 1 data. Another anisotropy effect is caused by the anisotropy of the lattice propagator and is present in all contractions involving at least one propagator directly connecting the two currents, i.e. the graphs C 2 and S 2 . Examples are plotted figure 7(b) and 7(c) for the C 2 graph and in 7(d) for S 2 . In these plots, we see a significantly different behavior of the data close to the lattice space diagonals and the remaining data points.
The lattice propagator anisotropy has been studied in detail, e.g. in [80,81], where it was found that lattice artifacts are most pronounced along the lattice axes, whereas they are moderate close to the lattice diagonals.

Extraction of twist-two functions
According to (2.25), the two-current matrix elements we obtain in our lattice simulation can be decomposed in terms of Lorentz invariant functions. The twist-two components, which are relevant in the DPD context, are parameterized by a certain subset of these invariant functions. We refer to these functions as twist-two functions. Explicitly, the twist-two functions are A qq , A ∆q∆q , A δqq , A qδq , A δqδq , and B δqδq . Since our calculation includes only light-quark operators, we can extract the twist-two functions for qq = uu, ud, dd. For proton DPDs, which we consider in this paper, the latter probes at least one sea quark. The twist-two functions are obtained by solving the overdetermined system of equations given by (2.25). This we do by χ 2 minimization. Before we go into physics interpretation, we discuss possible lattice artifacts seen in the data. If Lorentz invariance were intact, the extracted data points of the invariant functions would be boost-and rotationally invariant, i.e. for a given py they would be independent of the momentum p and the direction of y . In order to check this, the system of equations is solved separately for each graph and for each accessible direction of the distance vector y , i.e. we obtain one data point for each y 2 , py and θ, where θ is the angle between y and the nearest space diagonal on the lattice. We use the same classification of the data points w.r.t. θ as in section 3.4. Figure 8 shows the data obtained for the twist-two functions separated according to this scheme for some selected channels. As in the data of the bare two-current matrix elements, we observe the saw-tooth pattern in the C 1 data for large distances, which originates from mirror charges due to the periodic spatial boundary conditions in our lattice setup. This is plotted in figure 8(a) for P = 0 and in figure 8(b) for P = −(1, 1, 1) and py = 1.6. The data corresponding to distance vectors along one of the lattice diagonals are less affected by mirror charges. In figure 8(d) and 8(c) we again observe the anisotropy of the lattice propagators in the data of the C 2 graph. As discussed in the previous section, the propagator is less affected by this for distance vectors close to one lattice diagonal.
Beside the patterns already discussed, we find an anisotropic behavior of the twist-two function B δuδd for P = 0, which can be seen for all regions in y. The data points along a lattice axis have a significantly larger value than those corresponding to distance vectors in the vicinity of a space diagonal. This is shown in figure 8(e) and figure 8(f), where we compare these data with those for P = −(1, 1, 1). The data for non-zero momentum are consistent with the data for zero momentum if again y is close to a space diagonal. Therefore, we regard those data points as more reliable.
Based on this discussion, we will keep only data corresponding to distances y that satisfy (f) anisotropy, B δuδu , C1, py = 0 Figure 8. Visualization of anisotropies in the data of twist-two functions. We separate the data points w.r.t. the angle between y and the nearest diagonal in the same manner as in figure 7. This figure shows the corresponding results for the C 1 contributions to A ud (a) and A uu (b), as well as the C 2 contribution to A uu for small y (c) and large y (d). In panel (b) we plot the data for non-zero momentum and py = 1.6, whereas in the remaining plots P = 0. In panels (e) and (f) we show the data for B δuδd and B δuδd , respectively, where we distinguish only between cos θ < 0.9 and cos θ > 0.9. This is compared to the data for P = √ 3 at py = 0.
cos θ > 0.9 , when discussing physical results. As a further check of the reliability of our data, we compare the twist-two functions obtained for different proton momenta at py = 0. Because of Lorentz invariance, these should yield the same result within statistical errors. In figure 9 we compare the twist-two data obtained for the momenta P = 0, P = √ 3 and P = 2 √ 3. For each value of P , y 2 and py the data are extracted separately, taking into account all distances y satisfying (4.1) and all contributing momenta P .
In the case of C 1 , see e.g. figure 9(a), we observe consistency with Lorentz symmetry. In some cases small deviations are visible, as for A δdu shown in figure 9(b). In this case, the difference occurs between the data for P = 0 and P = 0. Notice that we used different source-sink separations for these two cases, hence, the discrepancy might be caused by excited state contributions.
At large distances y, Lorentz symmetry is also intact for the C 2 graph, as can be seen in figure 9 (c)-(e). However, once we go to smaller y, Lorentz invariance is clearly broken. The most extreme example for this is given by A δuu , which is plotted in panel (d). Deviations start to show up for y < 5a and become large for y < 4a.
The situation is even worse for the S 2 graph at y < 7a, where in the most extreme cases the data for P = 0 and P = 0 show different signs. As an example we show the corresponding data of A qq in figure 10(a). For larger y, consistency with Lorentz invariance can be observed in all channels; an example is given in figure 10(b). (e) Lorentz invariance, A dd , C2, py = 0 Figure 9. Comparison of the twist-two data for P = 0 (red), P = √ 3 (green) and P = 2 √ 3 (blue). This is shown for the C 1 contributions to A ud (a) and A δdu (b), the C 2 contributions to A uu (c), A δuu (d) and A dd (e). In the latter case we leave out the data for P = √ 3 for clarity, since they have large statistical errors.

Physical results for py = 0
In the following, we consider the data of twist-two functions extracted for each single graph. For the moment we restrict ourselves to P = 0. Again we take into account only the data points fulfilling (4.1) and solve the system of equations (2.25) for each value of y 2 and py = 0, i.e. data points for equal y = | y | are combined. In figure 11(a) and (b) we show the results for A qq and A δqq , where we compare the contributions of C 1 , C 2 and S 2 for a specific flavor combination. Panels (c) and (d) show the same comparison for C 1 and S 1 . It is observed that the most dominant contributions are those of the two connected graphs C 1 and C 2 . The C 2 data strongly increase towards small y, whereas C 1 is relatively large at all distances and shows a slow decay with increasing y. S 2 is smaller by orders of magnitude than the other contractions for y > 6a but very steeply increasing towards small y. Remember that in this region the S 2 graph strongly violates Lorentz invariance, as we have seen in the previous section. The S 1 contribution has rather large errors and is consistent with zero in all regions of y. For A qq we see a significant offset in the S 1 contribution. This offset is very small compared to the size of the connected contractions, except for very large distances, where the size of the offset and the decreasing signals of the connected contractions become comparable.
In the following discussion, we take into account only the C 1 and C 2 contributions, since all the other contractions are small compared to the connected graphs or, in the S 2 case, are not reliable due to violation of Lorentz symmetry. For our final result for the twisttwo functions, we add up all considered contractions according to (3.14), before solving the system of equations (2.25). Furthermore, we include the data for all considered momenta, see section 3.3.
Let us first look at the flavor dependence of the twist-two functions at py = 0. Since the spin-orbit correlations A δqq or B δqδq are multiplied by terms proportional to m y or m 2 |y 2 | (d) graph comparison, A δqq , P = 0, C1, S1 Figure 11. Comparison between the contributions of each Wick contraction to the twist-two functions for P = 0 and A qq (left) and A δqq (right). Panels (a,b) show the data for C 1 , C 2 and S 2 , whereas in (c,d) we compare the data for C 1 and S 1 .
in the decomposition (2.25), we always consider m y A δqq and m 2 |y 2 |B δqδq in the following discussion. The same applies to the corresponding DPD Mellin moments, see (2.27). In figure 12 we show the results for the twist-two functions A qq (a) and A δqq (b) for the different flavor combinations. Notice that for A δqq we have the four combinations uu, ud, du, and dd, whereas in all other cases the functions for ud and du are equal by permutation symmetry between the two partons. At large distances we have comparably large signals for uu, ud and du, while the ones for dd are much smaller. This changes for smaller y, where both uu and dd strongly increase. The size of dd becomes comparable to that of ud and du around y = 4a = 0.342 fm.
A very interesting aspect is the dependence on the quark polarization. We compare the corresponding channels in figure 13 for ud (a), uu (b) and dd (c). In all cases A qq is observed to be the channel with the largest signal. Polarization effects are significant in the case of ud, especially A δud and A δdu are very large. The signal in the remaining channels is smaller but clearly different from zero. In the case of uu and dd, polarization effects are suppressed. The largest polarized contribution is again A δqq in both cases.

Parameterization of the y 2 dependence
Further analysis steps require a parameterization of the results obtained for the twist-two functions. In the following, we adapt the approach we developed in [67]. For the description of the y 2 -dependence at py = 0 a sum of two exponentials is found to be suitable in most cases. For A ud and A δuδd it appears that this ansatz has to be slightly modified. As a general ansatz we write: where the fits are preformed for fixed δ. In the cases of A ud and A δuδd it turns out that δ = 1.2 is a suitable choice. In all other channels, a pure double exponential, i.e. δ = 0, is sufficient. For most of the fits we take into account each point in the region 4a ≤ y ≤ 16a. Thus, we ensure that the data points entering the fit are only mildly affected by the lattice artifacts that result in anisotropy effects or the breaking of boost invariance. For stability reasons the fit range is slightly modified in some channels. In all cases where the fit range is adjusted, we carefully checked that the data points within the modified fit range do not include such artifacts. An overview is given in table 5, where also the corresponding fixed value of δ is shown. In order to achieve that the parameters A i describe the relative weight of the two    (d) double-exponential fit for A δuu (py = 0, y 2 ) Figure 14. Data points for the twist-two functions compared to the corresponding curve resulting from a fit to the form (4.2). Each plot has a logarithmic scale for the vertical axis. exponentials at the lower fit boundary, we introduce a shift y 0 = 4a = 0.342 fm in the exponent. In the fits we neglect correlations between the data points.
The data points of the twist-two functions at py = 0 are plotted together with the curve resulting from the fit in figure 14. We take a logarithmic scale on the vertical axis to emphasize the double-exponential shape. As can be observed in the plots, the fitted curves describe the twist-two data reasonably well. The values obtained for the fit parameters A i and η i are listed in table 6, as well as the values of χ 2 per degree of freedom. The corresponding errors are computed using the Jackknife procedure.  Table 6. Results of the fit (4.2) to the twist-two functions at py = 0. The corresponding χ 2 /dof is listed in the rightmost column.

Parameterization of the py dependence
A parameterization of the twist-two functions is in particular mandatory for the evaluation of the py-integral in (2.28). The reason is that one has to extrapolate in py, since the accessible range is restricted by the largest proton momentum: In order to make an ansatz for the py-dependence, we consider the constraints on the ζdependence of the skewed DPDs. These are the symmetry relation (2.16) and the constraints (2.12) restricting the support region in ζ. Furthermore, we assume that the Mellin moment I(ζ, y 2 ) can be Taylor expanded around ζ = 0. Combining everything, we make the ansatz that the Mellin moment I(ζ, y 2 ) can be approximated by an even polynomial in ζ within the region |ζ| ≤ 1: This implies for the twist-two functions, which are related to the Mellin moments by a Fourier transform: A(py, y 2 ) = N n=0 a n (y 2 ) h n (py) , where the functions h n are defined as: It is easy to check that the functions h n (x) fulfill the following relations: We recall that A(py = 0, y 2 ) is already completely described by the double exponential ansatz in (4.2). Therefore, in the analysis of the py dependence, we consider the normalized twist-two function A(py, y 2 ) := A(py, y 2 ) A(0, y 2 ) = N n=0â n (y 2 ) h n (py) , (4.9) with the normalized coefficientsâ n (y 2 ) = a n (y 2 ) A(0, y 2 ) . (4.10) A useful quantity to investigate in the context of the py-analysis is the 2m-th moment in ζ of the DPD Mellin moment, which can be written as: If we insert our ansatz (4.5) combined with (4.9) and replace the 2m-th derivative of A according to (4.8), we find that ζ 2m can be expressed as: where we defined the (N + 1) × (N + 1)-matrix T T mn = (1 + 2n + 2m) −1 .
(4.13) Equation (4.12) can be inverted, so that we are able to express the coefficientsâ n in terms of the ζ-moments:â and hence A(py, y 2 ) = N n,m=0 One has ζ 0 (y 2 ) ≡ 1 by definition. Thus, the first non-trivial term in (4.15) is the one with m = 1. For each value of y 2 we can perform a fit with the functional form (4.15) with N fit parameters. These kind of fits are referred to as "local" fits in the following. Furthermore, we parameterize the moments of ζ in terms of powers of the distance y = −y 2 , i.e. we write such that we obtain a global parameterization describing both the y 2 and py-dependence: A(py, y 2 ) = N n,m=0 K k=0 T −1 nm c mk −y 2 k h n (py) . Since c 0k = δ 0k by definition, there are N (K + 1) parameters to be determined in a "global" fit to the parameterization (4.17).  Figure 15. py dependence of the twist-two function data and the corresponding local fits with N = 2, 3. This is shown for the functions A ud at y fit = 10a (a) and y fit = 12a (b), as well as for A uu (c) and A δdu (d) both at y fit = 10a. We plot all data points included by the fits for a given y fit , i.e. all data points in the range y fit ± 0.5a (see the text).
Local py-fits: The results obtained for the y 2 -fit are used to calculate the normalized function A(py, y 2 ), which is then fitted to the functional form (4.15) for certain values of y 2 . We perform two sets of fits using N = 2 or N = 3, i.e. there are two or three free fit parameters, respectively. The free fit parameters are the moments in ζ, i.e. ζ 2m with m = 1, . . . , N . For each accessible value of y 2 , there is a number of available data points that can be used to fit the py-dependence. This number strongly varies with y 2 . In order to avoid fluctuations caused by this circumstance we do not only consider the data points with y = y fit , but take into account all data points in a band y fit − 0.5a ≤ y ≤ y fit + 0.5a. The fit is carried out for y fit = νa, where ν ∈ [4,16] is an integer.
In figure 15 we show for selected channels the data points of A(py, y 2 ) entering the fit for a given y 2 in comparison to the resulting fit bands for N = 2 and N = 3. We observe   that the A data are reasonably described and the two fits are consistent within the statistical error. For N = 3 the fit tends to be sensitive to the data points at large py, which causes visible deviations relative to the fit with N = 2.
There are channels where the data of A are compatible with zero, which leads to a dominance of fluctuations. In these cases a reliable fit of the py-dependence is not feasible. We refer to these channels as the "bad" channels. Explicitly, they are given by the functions A ∆q∆q and B δuδu , as well as all polarized channels for the flavor combination dd. These channels will not be considered in the subsequent physics discussions.
The resulting values of ζ 2m are plotted in figures 16 and 17 (red data points). It appears that the moments are rather small ( ζ 2m < 0.25) and in almost all cases these show a linear dependence on the distance y. In most cases they are nearly constant. Deviations from     that behavior are seen for uu at small y, where the data tend to increase. However, this is the region where the violation of Lorentz invariance starts to show up in the corresponding channels, as we have discussed earlier. This might skew the py-dependence. The results that are not shown in the plots look very similar. An exception to this are the data for A δuδu , which carry large statistical errors.
The results for the ζ-moments are quite different from those we obtained for the pion [67], where we found a clear linear rise with increasing y. In that case, for y > 1 fm, values of ζ 2 > 0.5 were observed.
Global py-fits: In order to reduce the number of parameters entering our analysis, we perform a global fit on the A data using the functional form given in (4.17). This is again carried out for N = 2, 3. We have seen in the previous discussion that a linear dependence on y is sufficient to describe the ζ 2m behavior. Therefore, we take K = 0, 1 for the global fits. For N = 3 we restrict ourselves to K = 0, i.e. a constant, since for K = 1 we find that the data are overfitted. In total we have three fits, where we use (N, K) = (2, 0), (2, 1), (3, 0) with 2, 4 or 3 free fit parameters, respectively. In each fit we take into account all data points for which 4a < y < 16a. The resulting curves for K = 0 are plotted in figure 18, where again we show the py-dependence for fixed values of y 2 . As for the local fits, the two possibilities N = 2 and N = 3 yield comparable results; small deviations are found for large py.
In general, the value of χ 2 /dof differs only weakly between different fits of the same channel. In most channels, the differences are marginal ( 0.01). Hence, we consider the fit with (N, M ) = (2, 0) as reliable; the other two fits might already overfit the data. Exceptions are given by B δuδd (see the discussion below), and A δdu , A δud , where discrepancies up to 0.11 in χ 2 /dof are found. This can also be observed in the slightly different behavior of the fit bands for large py, see figure 18(d). In the last two cases, fits with N = 3 yield the smallest value for χ 2 /dof.
The ζ 2m curves resulting from the global fits are also shown in figures 16 and 17 (blue and light blue bands). The results for the fit parameters c mk are listed in table 7 to 11, where for completeness also the results of the "bad" channels (see the discussion above) are shown. In most cases, the linear fit barely differs from the fit to a constant. For a few exceptions, there is a better overlap with the data if the linear term is included. The most extreme example is given by I t δuδd , which is shown in figure 17(e) and 17(f). The corresponding χ 2 , see table 11, is slightly smaller. However, the linear fit must be considered with some caution, since there is a wide region in y where the moments ζ 2m become negative. For even moments this is mathematically inconsistent. The constant fit still covers the data points sufficiently well.  Table 7. Fit results for the parameters c mk of our global fit ansatz (4.17) obtained for the unpolarized channels A uu , A ud and A dd . We take into account (N, K) = (2, 0), (2, 1), (3,0).

Results for Mellin moments
From the fits described in the previous section, we are able to reconstruct the Mellin moments I(ζ, y 2 ). Combining (4.17), (4.2), (4.9) and executing the Fourier transform (2.28) we arrive at: In the following we discuss the corresponding results and physics implications. We take into account every channel except for those we characterized as "bad" channels in section 4.4.
Fit dependence: Figure 19 shows the results for the Mellin moments I(ζ = 0, y 2 ) for selected channels. We compare the bands obtained from the three different fits in order to estimate the systematic error introduced by the extrapolation in py. In each channel we observe consistency between the different fits, i.e. the three curves coincide within the error bands. The situation is the same for the channels which are not shown in the plots. Notice that also the bands for I t δuδd match within the statistical error, despite the fact that a linear dependence of the moments ζ 2m on y seemed to give a better description.
The agreement of the results for different fits also holds for ζ 0.6 in most of the channels that we have not excluded. As an example we show the results for I ud , I t δuδd and I uu in figure 20 (a-c). An exception is found for I δdu plotted in figure 20 (d), where clear deviations between the fits with N = 3 and N = 2 are found for ζ > 0.2. Notice that in this channel we found the largest variations between the values of χ 2 /dof of the different fits. At this point, we emphasize again that the ansatz (4.4) for the functional form of the DPD Mellin moments represents an expansion around ζ = 0. Consequently, the more terms of this expansion are taken into account, the more sensitive the results for large ζ become to fluctuations of the corresponding coefficients. Since the fit for (N, K) = (2, 0) yields already a consistent description of the data, we will base our physics discussion on the corresponding results.
Flavor comparison: We compare the results for the DPD Mellin moments w.r.t. the quark flavor in figure 21, using a logarithmic scale on the vertical axes. The results for I δqq are multiplied by m y, which follows from the decomposition (2.27). Like for the twist-two functions, we observe that in the case of two unpolarized quarks (see panel (a)) the dd signal is much smaller than that of ud and uu for large distances. At small y, the Mellin moments for uu and dd show a steeper slope than I ud . The same behavior is observed for I δqq in panel (b), where we compare only uu, ud and du, since we have classified dd as a "bad" channel.
A very interesting result is the different behavior of the Mellin moments I ud and I uu . In factorization assumptions as they are made in the pocket formula (see section 2.1) it is required that the dependence of DPDs on the transverse quark distance is independent of the  quark flavor, see (2.9). Our results clearly exclude this.
Polarization effects: In figure 22 we show the dependence of the Mellin moments on the quark polarization for ud (c) and uu (a). Again we only show the results for N = 2 and K = 0. As in the discussion of the twist-2 functions, we multiply the DPD Mellin moments I δqq or I t δqδq by m y or m 2 |y 2 |, respectively, which follows from the decomposition (2.27). The polarization dependence of the Mellin moments is very similar to that of the twist-two functions, which we already gave in figure 13. These are again shown in panel (d) and (b). We see that the unpolarized channels are clearly dominant for both flavor combinations. However, in the case of ud, there are visible polarization effects. They are especially large for I δud and I δdu , whereas Mellin moments I δuδd and I t δuδd are smaller but still significantly different from zero. At this point, we want to compare with the situation for ud in the π + , which was calculated in [67]. The corresponding results are also plotted in figure 22. Remarkably, the behavior of the Mellin moments (e), as well as the twist-two functions (f), for ud in a π + is comparable to the one for ud in a nucleon.
In the case of uu in the proton, polarization effects appear to be less important. Notice that in the corresponding plots we only show the results for I δuu and I δuδu , since the remaining functions belong to "bad" channels, as we have discussed before. The largest polarized Mellin moment is again I δuu . I δuδu is clearly non-zero for small distances, but the corresponding statistical error is quite large (> 50%). The sign of I δuδu indicates that the quark spins are more aligned than anti-aligned, which agrees with expectations from SU (6) symmetric valence quark wave functions [13]. However, the ratios I ∆u∆d /I ud = −2/3 or I ∆u∆u /I uu = +1/3 predicted by this model are clearly not observed in our results. The same conclusion can be drawn from the corresponding data of the twist-two functions.

The number sum rule
We consider the DPD number sum rule, which we have already stated in (2.6) in position space. We look at the flavor combination ud. The remaining two flavor combinations uu and dd cannot be investigated, since the corresponding expressions include sea quark contributions that would lead to diverging integrals over x 1 . In the considered case of one u and one d quark, splitting contributions are at least of second order in α s . Inserting the sum rule for ordinary PDFs in (2.6) we can write:  and ud (c,d). The left panels show the results for the Mellin moments obtained from the fit with (N, K) = (2, 0). In the right panels we again show the data for the corresponding twist-two functions, which was already plotted in figure 13. Panels (e) and (f) show the results for ud in the π + , which were calculated in [67].
The verification that this equations holds for the results we presented in the previous sections can be seen as a consistency check of our lattice calculations and our fitting ansatz. We evaluate the expression on the l.h.s. of (4.20) by inserting the parameters obtained from the y 2 fit and each of the three global py fits.   Each of the obtained results is very close to the value predicted by the sum rule with a largest absolute deviation of the mean of 0.07. The statistical error varies between 12% and 25%, i.e. it is larger than the systematic error which is introduced by the extrapolation in py. Evaluating the integral (4.20) implicitly includes an extrapolation for y > 16a. In order to estimate the corresponding systematic error, we decrease the upper integration boundary of the y integral to 16a = 1.37 fm. We obtain values which are at most 16% smaller. Thus, the systematic error from the extrapolation in y is at most of the size of the statistical error. Notice that there is no extrapolation to the lower boundary b 0 /µ ≈ 1.29a, since the lower boundary of the fit range is 1a in the unpolarized ud case.

Factorization Tests
A crucial aspect to be studied in the context of DPDs is the strength of parton-parton correlations. These are neglected in factorization assumptions like (2.8). In the following we want to check to what extent this factorization ansatz is valid.

Derivation
Equation (2.8) can be derived by inserting a complete set of states in the two-current matrix element appearing in (2.1) or (2.11) and then assuming that the intermediate nucleon states dominate, i.e. omitting all remaining contributions: Figure 23. Illustration of the approximation of a DPD in terms of GPD matrix elements f λλ for the flavor combination ud. Panel (a) shows the factorization ansatz to be used for the case ζ > 0, whereas (b) depicts the variant that we employ if ζ < 0.
By writing ? =, we emphasize that (5.1) is an assumption; its validity is investigated in this section. For the remaining derivation steps, we substitute the intermediate momentum p by: Furthermore, we set p = 0 and identify: This enables us to write a factorized expression of the skewed DPD defined in (2.11) in terms of GPD matrix elements f λ λ (x, ξ, p , p): F a 1 a 2 (x 1 , x 2 , ζ, y) where ξ = (p − p ) + /(p + p ) + . This factorization is shown pictorially in figure 23(a) for the flavor combination ud. In the following we concentrate on the case of two unpolarized quarks or two longitudinally polarized quarks. In these cases, the GPD matrix elements can be decomposed in terms of the GPDs H and E orH andẼ, respectively. For details we refer to equation (14) in [82]. The polarization sum in (5.4) can be replaced by: with t = t(ζ, r 2 ) from (5.3). Notice that for ξ = 0 the cross terms between H and E in (5.6), as well as the last three terms in (5.7) vanish. This is the case if the skewness parameter ζ is zero. For that case, the expressions in (5.6) and (5.7) have already been derived in [13], see equations (4.48) and (4.49) therein. Before we continue, we have to discuss an issue regarding the support region w.r.t. x i and ζ, which is different on the two sides of (5.4). On the r.h.s. the support region is constrained by −1 + ζ/2 < x i < 1 − ζ/2, whereas on the l.h.s. it is given by (2.12). Except for the case where ζ = 1, the two regions are distinct. Their mismatch is even more pronounced if ζ < 0. For this reason, we derive an alternative factorization formula by commuting the two operators in the two-current matrix element. Following the same steps as in the derivation of (5.4), we obtain: F a 1 a 2 (x 1 , x 2 , ζ, y) The corresponding support regions show the same relative behavior as for (5.4) and ζ > 0. Hence, we shall use (5.4) for ζ > 0 and (5.8) if ζ < 0 for the following calculations. A graphical representation of (5.8) can be found in figure 23(b). Taking the first Mellin moments on both sides in (5.4), we find I a 1 a 2 (ζ, −y 2 ) and an analogous expression for (5.8). The integrals over x i of the corresponding GPD matrix elements can be expressed in terms of the Pauli and Dirac form factors F 1 and F 2 (for f q ) or the axial and pseudoscalar 2 form factors g A and g P (for f ∆q ), which are the lowest Mellin moments of the GPDs H and E orH andẼ, respectively. Since the GPDs are invariant under rotations in the transverse plane, we can evaluate the angular part of the r-integral.
Considering I qq or I ∆q∆q and inserting ζ = 0 we can write: with the Bessel function J 0 . The validity of the equations (5.10) and (5.11) is one subject to be investigated in this section. Another relation can be derived by using (2.29) and performing the angular part of the r-integral in (5.9). This yields: A a 1 a 2 (py = 0, −y 2 ) Considering A qq and A ∆q∆q and replacing the integrals over x i of the GPD matrix elements by F 1 , F F 2 , g A , or g P , we arrive at: A qq (py = 0, −y 2 ) A ∆q∆q (py = 0, −y 2 ) where t is a function of ζ and r 2 as defined in (5.3), and In our lattice study we obtained data for the l.h.s. of (5.13), (5.14), (5.10), and (5.11). In the remainder of this section we investigate differences relative to the corresponding factorized expressions given on the r.h.s.. These can be calculated form the nucleon form factors, which can be evaluated in lattice studies.

The nucleon form factor
As already mentioned, the nucleon form factors as functions of the virtuality t can be obtained from lattice calculations. In this study we use the form factor data [83] which has been generated in the simulation described in [84]. In that work various gauge ensembles have been investigated; we take the form factor data for gauge ensemble H102, which is the same ensemble that is used in our DPD study. The form factor analysis carefully takes account of excited state contributions. The absolute value of the largest initial proton momentum that has been used is | p| = √ 6 · 2π/(La) ≈ 1.11 GeV. Notice that the final momentum is set to p = 0. In this setup, the largest available virtuality is t = −∆ 2 ≈ 1.02 GeV 2 .
In order to evaluate the integrals (5.13), (5.14), (5.10), and (5.11), we need to extrapolate the lattice results in t. To this end, we fit the form factor data to a power law of the form which is frequently used for parameterization of form factors. For each channel we perform two different fits with fixed values for the exponent, n = 2 and n = 3, whereas F (0) and M enter the fit as free fit parameters. The fits are performed employing the complete covariance matrix, i.e. taking into account correlations between the data points. The resulting curves are shown together with the form factor data in figure 24 for n = 3. The corresponding values of the fit parameters and of the χ 2 /dof are summarized in table 13 (vector current) and table 14 (axial current), respectively. In order to analyze the quality of the fit, we plot for each fit the ratio of the data and the fit value. This is shown in figure 25. From most of the fits we obtain a sufficiently good description of the form factor data. The only exception is found for F d 1 and n = 2, where we observe a relatively large discrepancy between the data and the resulting curve, see figure 25(a). Consequently, the corresponding χ 2 /dof has the very large value of 7.15. Hence, we perform an alternative fit using n = 4, which again yields a reasonable result. For the remainder of this section we discard the fit for F d 1 with n = 2 and instead use the fit for n = 4 in this channel.

Results
Before comparing the two sides of the factorization formulae (5.13), (5.14), (5.10), and (5.11), let us investigate the different terms on their r.h.s.. In figure 26 we compare the size of the integrals over these terms. Notice that the shown results are based on the form factor fits with the smallest χ 2 . In the unpolarized channels the F 1 F 1 -term is found to be dominant, whereas the remaining contributions are very small. As an example we show A ud (a) and A dd    (c), as well as I ud (d). In the longitudinally polarized case, the g A g A -term is also the most relevant one, but the relative size of the other contributions is larger than in the unpolarized cases. This can be observed e.g. in the result for A ∆u∆d , which is plotted in figure 26(b). A similar behavior is found in the other channels that are not shown in the plots. In the following, we consider the complete results of the r.h.s. of (5.13), (5.14), (5.10), and (5.11) obtained from the corresponding integrals over the form factors and compare them to the l.h.s.. The observed difference can be interpreted as a measure of the strength of the quark-quark correlations. If the values of the involved data points are large enough compared to the statistical error, we also compute the ratio of both sides, in order to better see similarities and differences. We start with (5.13), where the two sides, as well as the ratio of both sides is shown in figure 27 for A ud and A uu . The result for A dd (without the ratio, since the signal is not sufficiently clean) is plotted in figure 28(a) Figure 26. Comparison between the different terms contributing to the factorized expressions for the twist-two functions A ud , A ∆u∆d , A uu , and for the Mellin moment I ud at zero skewness ζ. In the keys we use the short notation K 34 := K 3 (ζ) + K 4 (ζ) r 2 /(4m 2 ) and K 35 := K 3 (ζ) + K 5 (ζ) r 2 /(4m 2 ). the form factor result correctly reproduces the size of the two-current data. Deviations are observed to be very small. From the ratio, we can read off the relative deviation, which is at most ∼ 20% for ud. For uu, deviations are seen to be typically around ∼ 20%. Notice that the F 2 F 2 -term and the mixed term play only a minor role in the integral formula, i.e. the F 1 F 1 -term (blue curve) is almost equal to the complete result.
The size of the two results also matches in the longitudinally polarized channels, as can be seen in figure 28 for A ∆u∆d (b), A ∆d∆d (c), and A ∆u∆u (d). A remarkable observation is the nearly perfect agreement within statistical errors in the case of A ∆u∆d . Notice that the twocurrent signal of A ∆d∆d is consistent with zero. Hence, the agreement of the corresponding curves and data points should be interpreted with some caution. In contrast to the unpolarized case, taking the complete integral instead of only the g A g A -term is crucial. Evaluating the integral over the g A g A -term only (the corresponding result is again shown by the blue curve) (f) Ratio FuFd/A ud for π + , py = 0 Figure 27. Left: Comparison of the twist-two functions A uu (a) and A ud (c) (green points) and the factorization results obtained by the integral (5.13). The red curve is obtained from the form factor fits with best χ 2 /dof. The orange band represents the envelope of the error bands of the different fits. Right: Ratio of the form factor integral and the corresponding twist-two functions, again shown for A uu (b) and A ud (d). In the panels (a) and (c) we also present the integration result taking into account only the first term (5.13) (blue curve). In panel (e) and (f), we show the corresponding results for the π + obtained in [67] for two different fits. (d) A∆u∆u vs gugu, py = 0 Figure 28. The twist-two functions A dd (a), A ∆u∆d (b), A ∆d∆d (c) and A ∆u∆u (d) compared to the corresponding form factor integral (5.13) or (5.14). The orange band again represents the envelope of the error bands for the different fits. The blue curve shows again the integration result of the first term in (5.13) or (5.14). yields a significant difference between the two sides of (5.14). In figure 27(e) and 27(f) we show again the factorization results for A ud for the π + , which has been investigated in [67]. The results obtained there are comparable with those of A ud in the nucleon that we have described above.
Finally, we want to consider the factorization for the Mellin moments I qq at ζ = 0 according to (5.10). We shall not discuss (5.11), since we do not have results of sufficient quality for I ∆q∆q , as we have concluded in section 4.4. Figure 29 shows the two sides of (5.10) (a), as well as the ratio (b) for quark flavor ud, while the analogous results for uu and dd (the latter again without the ratio) are shown in (c), (d) and (e). The integral again yields a consistent order of magnitude. However, the deviations of the two curves are found to be larger than for the factorization ansatz of the twist-two functions. The relative deviations are (f) I ud vs FuFd for π + , ζ = 0 Figure 29. Mellin moment I ud at ζ = 0 compared to its factorized result obtained from the corresponding integral (5.10) (a) and the ratio of the integral and the Mellin moment (b). The same is plotted for I uu (c,d) and I dd (e). For the latter the ratio is not shown. The orange curve shows the envelope of the error bands for every fit. The result of a integral where only the first term in (5.10) is taken into account is represented by the blue curve. Panel (f) shows the factorization result for I ud in the π + obtained in [67] for the two form factor fits considered in that work.
at most ∼ 40% for ud and ∼ 60% for uu. Again we compare with the situation for the π + , which is shown in figure 29(f). Especially for small distances y, the factorization result of I ud is closer to the two-current result for the Mellin moment in the pion case than it is observed for ud in the proton. Notice that regions where the integral gives a higher value than the two-current data, or vice versa, are consistently the same for the twist-two functions and the Mellin moments. For ud we observe the integral to be larger for y < 8a, while it is smaller if y > 8a. This means that in a joint observation of an u and a d quark, we find the two quarks farther apart than we would if they were uncorrelated. This is similar to ud in the π + described in [67]. For two quarks of the same flavor, the integration results are generally larger than the two-current data. An exception is given by the region y < 5a, where at least the twist-two function results indicate a sign change in the absolute difference.

Conclusions
This paper presents the first lattice calculation that provides information about double parton distributions in the proton. The distributions in the neutron are readily obtained from isospin symmetry. Our simulations are done on a 32 3 × 96 lattice with spacing a ≈ 0.086 fm and a pion mass of m π = 355 MeV. We compute the correlation functions (2.18) of two spatially separated currents in the proton and project out their twist-two parts. Our primary observables are the invariant functions A and B associated with that projection, see (2.25). They depend on the distance y µ between the two currents and on proton four-momentum p µ via the scalar products y 2 and py. We consider the vector, axial, and tensor current, whose twist-two components respectively correspond to unpolarized, longitudinally polarized, and transversely polarized quarks.
Lattice aspects. We evaluate all Wick contractions that contribute to the two-current correlation functions, making heavy use of stochastic sources, sequential sources, and the hopping parameter expansion. The statistical signal we obtain is in general very good for the connected graphs C 1 and C 2 and the disconnected graph S 2 , and fair for the disconnected graph S 1 (see figure 3). Only for the doubly disconnected graph D are the errors so large that we must exclude it from our analysis. Lattice artifacts manifest themselves in the invariant functions as a breaking of rotation invariance (i.e. a dependence on direction of y ) and a breaking of boost invariance (at given y = | y | and py the functions must be independent of p µ ). We find a significant amount of anisotropy in the C 1 data at large y and in the C 2 and S 2 data at small y. These can be interpreted as a finite size effect in the first case and as due to the anisotropy of the lattice propagator in the second case. We can largely remove these effects by selecting points y close to the lattice diagonals and by imposing a lower cutoff on y, which depending on the polarization channel is taken of order 4a ≈ 0.34 fm. After this selection, the violation of boost invariance is at an acceptable level, except for graph S 2 , where a momentum dependence is seen up to about y ∼ 7a. For larger y, the contribution of S 2 to physical matrix elements is small compared with the one from C 1 and C 2 . The contribution of S 1 is found to be small at the scale of C 1 and C 2 , except for larger y, where the errors on S 1 prevent us from drawing strong conclusions. For our final physics analysis, we restrict ourselves to the contributions of the connected graphs C 1 and C 2 , where C 2 is absent for the parton combination ud and C 1 is absent for dd.
Results. In a first stage, we analyze the invariant twist-two functions A and B at py = 0, where the statistical signal is best and the data can be plotted as a function of the single variable y. To connect these functions with DPDs, we slightly deform them by a skewness in the parton momentum fractions that is parameterized by ζ (see figure 1). Twist-two functions at py = 0 are then equal to the Mellin moments of skewed DPDs integrated over ζ. The size of these functions is seen to be largest for A qq and A δq q , with the former corresponding to unpolarized partons and the latter to the correlation between the transverse polarization of one parton and the parton separation. Our results exhibit a clear flavor dependence, with A ud and A δud ≈ A δdu decreasing more slowly with y than their counterparts for two u or two d quarks (see figure 12). For unpolarized quarks, this finding is of particular importance, because one of the assumptions made for deriving the pocket formula (2.10) for DPS cross sections is a universal y dependence of DPDs for all flavor combinations. Interestingly, A uu and A dd have a rather similar y dependence, although the former receives a contribution from C 1 but the latter does not.
The signal for spin dependent functions other than A δq q is best for the ud combination, whereas for qq = uu and dd it is mostly consistent with zero (see figure 13). In the ud channel, the invariant functions for two polarized quarks are significantly smaller than A δud . We see a clear difference between the spin-spin correlations A ∆u∆d and A δuδd for longitudinal and transverse polarization, which shows the inadequacy of simple non-relativistic pictures that predict them to be equal. Moreover, we find that the longitudinal polarization ratios A ∆u∆d /A ud and A ∆u∆u /A uu are significantly smaller in size than the values −2/3 and +1/3 obtained with a static SU(6) invariant wave function for the three valence quarks in the proton [13]. Interestingly, the pattern of polarization dependence for ud in the proton is quite similar to the one we found for ud in a π + in our previous work [67].
In the second stage of our analysis, we assume a parametric form for the y and py dependence of the twist-two functions (see (4.2) and (4.17)). We use this to fit our data and to extrapolate it to the full range of py, which is needed to compute the Mellin moments I qq , I δqq , . . . of DPDs at given skewness ζ. For flavor and polarization combinations with sufficiently small statistical errors, the results of fits with different numbers of parameters are consistent with each other for small to moderate ζ (see figures 19 and 20). This gives us confidence in analyzing the corresponding Mellin moments at ζ = 0 and thus to make closer contact with the physics of double parton scattering.
The flavor and polarization dependence of Mellin moments at ζ = 0 is very similar to the one of the associated twist-two functions at py = 0, which corroborates the physics conclusions discussed above (see figures 21 and 22). From the moment I ud , we can also evaluate the x integral of the number sum rule for DPDs [33,70]. We find excellent agreement with the predicted value of the sum rule (see table 12) and regard this as a strong check of our fitting ansatz and analysis procedure.
Correlation effects. Many models for DPDs rest on the assumption that the two partons are independent of each other. This assumption can be formalized and leads to factorization formulae for the twist-two functions A qq and A ∆q∆q ((5.13) and (5.14)), and for the associated Mellin moments ((5.10) and (5.11)). These functions are then expressed in terms of the nucleon Dirac and Pauli form factors F 1 and F 2 for unpolarized quarks, and of the axial and pseudoscalar form factors g A and g P for longitudinal quark polarization. We fit these form factors to lattice data from the same ensemble used for computing the two-current correlators, and then extrapolate the form factors in the momentum transfer. We find that the factorization formula for unpolarized quarks is to a good approximation saturated by the contribution from F 1 , whilst for longitudinal polarization it is important to include the contributions from both g A and g P (see figure 26).
We find that the factorization assumption for A ud and A uu at py = 0 works remarkably well, with deviations not larger than 20% in the y range considered (see figure 27). It works rather well also for A ∆u∆d , whereas for A dd and A ∆u∆u larger deviations from factorization are observed (see figure 28). The factorization for the Mellin moments I ud and I uu at ζ = 0 works rather well, albeit with deviations up to almost 60%, whereas for I dd the discrepancies are again larger (see figure 29). In other channels, the errors in our data or fits are too large for drawing solid conclusions.
Summary and outlook. In summary, we find that the calculation of two-current correlators on the lattice can provide valuable physics insight into two-quark correlations inside the proton, which are essential for understanding double parton scattering. Our main results are as follows. (i) The dependence of two-parton distributions on the distance y is not the same for different flavors. (ii) Spin-spin correlations between two quarks are remarkably small, in contrast to spin-orbit correlations. (iii) The functions we studied approximately factorize into separate functions for the individual partons.
Important challenges for future work are to perform simulations at smaller lattice spacings, so as to extend the y range where lattice artifacts can be controlled, and to move closer to the physical pion mass. Improvements that will allow the inclusion of disconnected graphs in the physics analysis are also highly desired. The results obtained in the present study strongly motivate us to make efforts in these directions.
Foundation (DFG, SFB/TRR-55), the German Federal Ministry of Education and Research (BMBF, grant 05P18WRCA1), and the European Union's Horizon 2020 research and innovation programme under grant agreement no. 824093 (STRONG-2020). Our simulations were performed on the QPACE 3 systems of the SFB/TRR-55. Here we used an extended version of the Chroma software stack [85] together with a KNL adaption of the Multi-Grid solver [86][87][88][89][90]. The graphs in this paper were generated using Jaxodraw [91,92] and the tikz library. All plots were created using the Matplotlib library [93].

A Notation and lattice technicalities
In the following, we list expressions that are useful for the calculation of the baryon four-point contractions introduced in section 3. This includes symmetry relations, as well as ingredients that are used to evaluate the four-point contractions on the lattice. Furthermore, we give details on the notation used in this paper.
• Spacetime dependencies are indicated by an argument if it represents a degree of freedom. If the corresponding variable is fixed (e.g. the source position of a point-to-all propagator) an index is used instead.
• Unless stated otherwise, traces and transpositions are taken w.r.t. spinor and color indices.
• For a given 4-vector y µ we denote the spatial components by y (identical in Minkowski and Euclidean spacetime). The spatial distance is denoted by y := | y |. If y 0 = 0, we have | y | 2 = −y 2 := −y µ y µ . In order to avoid confusion with the usual Minkowski scalar product y 2 , we explicitly write −y 2 n for the n-th power of y = | y |.
For better readability, spinor and color indices, as well as spacetime arguments are not always explicitly written in section 3.2. This applies if the considered objects have matrix or vector character w.r.t. these indices or arguments. We list some of the objects that are considered in this work and display their explicit notation in table 15. Notice that each of the mentioned expressions may have further dependencies which are not stated above. A product of these quantities is considered to be a matrix-matrix or matrix-vector product. As an example we rewrite (3.41) using the compact and the explicit notation, respectively: Object Symbol Degrees of freedom Explicit Gauge link U µ space × color 2 (U µ ) ab (x) Generic source S space × spinor 2 × color 2 S ab αβ (x) Smearing function Φ space 2 × color 2 Φ ab (x|y) Dirac operator D space 2 × spin 2 × color 2 D ab αβ (x|y) Propagator (x → y) M space 2 × spin 2 × color 2 M ab αβ (y|x) Point(x)-to-all(y) propagator M x space × spinor 2 × color 2 (M x ) ab αβ (y) Stochastic source/propagator η, ψ space × spinor × color η αa (x), ψ αa (x) Sequential propagator (at time t) X t space × spinor 2 × color 2 (X t ) ab αβ (x) Gamma matrices Γ spinor 2 Γ αβ  In some cases where spinor and color indices are written explicitly, we make use of the Einstein summation convention, i.e. indices that appear twice are to be summed over.

A.2 Explicit expressions for four-point Wick contractions
The baryons are created and annihilated by the interpolators (3.3). Referring to this equation, we assign the following integer numbers to the quark fields: For the nucleon we have Γ B = Cγ 5 and Γ A = P + , where C is the charge conjugation matrix and P + selects positive parity. As a consequence, we can relate: Each of the expressions G ijk , K 1 , K 2 and K 2 is pictorially represented in figure 30. The second identity in the last line of (A.6) is a consequence of translational invariance. We now consider the effects of P T transformations and the combination of complex conjugation and CP transformation on the previously defined expressions. The following relations are understood to be valid after integrating over the gauge fields: As discussed in section 3.1, there are five types of Wick contractions, which can be represented by the graphs depicted in figure 3. The explicit contributions depend on the quark flavors of the inserted operators and the baryon, which in our case is always a proton. Since the end points are always connected to the source at z and the sink at z , we shall not write the corresponding arguments of K 1,2 and K 2 in the following for brevity. For C 1 -type graphs we define: The contribution for a certain proton momentum is obtained by a discrete Fourier transform: C ij, p 1,uudd ( y , t, τ ) := a 6 z z e −i p( z − z ) C ij 1,uudd (z, z , y)| y 4 =τ,z 4 =0,z 4 =t , (A.14) with analogous expressions for the remaining contractions, which shall be defined in the following. The contributions for C 2 and S 1 can be written as: (A.20) A.4 C 1 contractions We now give explicit expressions for the sequential source S and the contraction q 1 needed for the calculation of the C 1 graph. We start with the contributions to O uu i (y) O dd j (0). The corresponding sub-graphs are illustrated in figure 31. If the last integer index of the sequential   . Depending on the evaluation method, we use different symbols to depict the propagators: The forward propagator M z is represented by a simple line, the stochastic propagator ψ by a zigzag line, and the sequential propagator X (without the incorporated forward propagator and the stochastic source) by a dashed line. The colors indicate the quark lines: Red corresponds to a, orange to b, and blue to c. The combination of the quark lines with the numbers (1), (2), (3) at the sink or (1), (2), (3) at the source determines the sequential source type S n(a)n(b)n(c) (see (A.18)) or the contraction S n(a)n(b)n(c) (see (A.19)), respectively. The resulting permutation is also given for each contraction. Moreover, at the bottom line of each panel we give the permutation of quark fields represented by the propagator and the corresponding sign, which enters the total contribution and hence the physical matrix elements. sources (A.18) is equal for two or more contractions appearing in the flavor specific sum, the corresponding sequential sources can be combined before the inversion. In the case considered, we are able to combine A with C and B with D, which in both cases gives: up to a global sign. Inserting the source (A.21), we obtain the corresponding sequential propagator X by an inversion of (3.33). The relative signs, which can be read off from figure 31, correspond to the permutations of fermionic fields. The two contributions (A, C) and (B, D) are then combined in the quantity q 1 by calculating the sum:      Like for C 1,uudd , we calculate the sequential propagator by inverting (3.33) with the source (A.23), and calculate Y (see (3.34)), which is then contracted with the sources (A.19) according to the permutation that can be read off in figure 32. It is possible to combine A with C and B with D before doing the spatial correlation, since each current insertion is connected to the same quark line within these pairs. In contrast to the C 1,uudd case, we have two terms contributing to C 1,uuuu consisting of q 1 q 2 products, where q 1 and q 2 are given by: Notice that in the BD case the current insertion indices i, j are exchanged compared to the AC case. Putting everything together, the total C 1,uuuu contribution reads: C ij, p 1,uuuu ( y , t, τ ) =