Determination of the Collins-Soper Kernel from Lattice QCD

We present lattice results for the non-perturbative Collins-Soper (CS) kernel, which describes the energy-dependence of transverse momentum-dependent parton distributions (TMDs). The CS kernel is extracted from the ratios of first Mellin moments of quasi-TMDs evaluated at different nucleon momenta.The analysis is done with dynamical $N_f=2+1$ clover fermions for the CLS ensemble H101 ($a=0.0854\,\mathrm{fm}$, $m_{\pi}=m_K=422\,\mathrm{MeV}$). The computed CS kernel is in good agreement with experimental extractions and previous lattice studies.


Introduction
The last decades witnessed a rapid increase in our understanding of hadron structure, leading to ever more ambitious goals, such as the investigation of multi-dimensional elements of nucleon structure, which are parameterized by functions like generalized parton distributions (GPDs) and transverse momentum-dependent parton distributions (TMDs). These functions contain significantly more information than collinear parton distributions. Determining the multi-dimensional structure is the goal for many modern (such as COMPASS at CERN [1], RHIC at BNL [2], and JLab12 [3]) and future (such as the Electron-Ion Collider (EIC) [4]) experimental facilities. Even so, the precise determination of TMDs and GPDs is exceedingly difficult from experimental data alone. Fortunately, lattice QCD has reached a level of maturity that allows supplementing the experimental data in many ways. This progress is made possible not only by Exaflop computing becoming a reality but also by the theoretical progress exploring new avenues to extract parton distributions from lattice simulations. As this field is very much in a state of flux and opinions differ strongly on which approach is the most successful, we cite here only a few relevant papers [5][6][7][8]. In many cases, lattice QCD and experimental measurements give access to complementary information, making a combined strategy very promising. In this paper, we discuss one such case, namely the Collins-Soper (CS) kernel [9], also known as the rapidity anomalous dimension [10].
The CS kernel is a fundamental nonperturbative function that characterizes the QCD vacuum [11]. It appears as the universal rapidity evolution kernel for TMDs and can be extracted by comparing experimental data at different scales. The CS kernel depends on transverse distance b, characterizing the relevant nonlocal parton correlator. The comparison of the most recent extractions made in refs. [12][13][14] demonstrates that presently available data (from Drell-Yan and Semi-Inclusive Deep-Inelastic scattering (SIDIS) reactions) are not sensitive to the value of the CS kernel at b 2 GeV −1 . The situation will certainly improve with inclusion in the fit of on-going and future measurements, but even the EIC will hardly restrict the CS kernel for b 3.5 GeV −1 . By contrast, lattice calculations for b ∼ 5 GeV −1 with reliable uncertainty could be possible within a few years. Several ways to extract the CS kernel were suggested within the last years [15][16][17][18]. All these suggestions are based on the measurement of quasi-transverse momentum-dependent parton distributions (qTMDs) -the correlation of two quark fields in a hadron wave function connected by a staple-shaped gauge link positioned in a space-like plane. It has been shown that in the regime of large hadron momentum and large staple length, a qTMD is a convolution of a physical TMD, a perturbative coefficient function, and a soft factor. The latter is an unknown nonperturbative function, depending only on b, that cancels (together with other multiplicative factors, particularly the Wilson line renormalization factors) in the ratio of qTMDs. The ratios of qTMDs give access to several different pieces of information. In particular, ratios of qTMDs at distinct hadron momenta (with the other parameters being identical) are exclusively sensitive to the CS kernel [15,18]. In this work, we evaluate such ratios and extract the CS kernel using the approach suggested in ref. [18].
One of the fundamental difficulties in the evaluation of quasi-distributions is the necessity to perform a Fourier transformation into momentum-fraction space (introducing the variable x). This transformation is notoriously ill-defined, already for one-dimensional observables (for a review of recent developments, see [19,20]), and even more so for qTMDs, where one should respect an entangled hierarchy of several scales. To by-pass this severe complication, we study the first moment of qTMDs (i.e., the integral over x). These are much simpler observables that were already studied on the lattice before the formulation of factorization theorems for qTMDs [21][22][23][24]. Thus, the methodology of the determination of the first moments of qTMDs is well established. On the theory side, qTMDs are related to TMDs by the factorization theorem, which is valid in the regime of large hadron momentum. The integral over the factorized expression gives us the factorization theorem of the moments, which contains an unknown function that accumulates the information about the x-dependence of TMDs. The central aspect of the present approach is the assumption that this function is almost independent of b. This is a model assumption. However, all previous studies (theoretical and phenomenological) show that it is a very good approximation. We expect that a systematic error introduced by such an assumption is much smaller than our approach's other uncertainties. However, let us stress that our method has a limited scope of application and is less valuable for determining other properties of TMDs. In contrast, the approaches used in [25,26] are more general.
As members of the CLS (Coordinated Lattice Simulations) collaboration, we subscribe to its general strategy of focusing on controlling the continuum limit. As the lattice spacing is reduced, standard QCD simulation algorithms suffer from critical slowing down, meaning that an increasing Hybrid Monte Carlo (HMC) simulation time is required to sample the configuration space fully. Suppose gauge (and fermion) fields fulfill (anti)periodic boundary conditions in all directions. In that case, the simulation will eventually become stuck in a fixed topological sector, and ergodicity is lost, typically for lattice spacing a < 0.05 fm. This effect is known as "topological freezing" [27]. To avoid the "topological freezing", the CLS collaboration uses open boundary conditions for lattices with fine spacing. So far, the CLS collaboration has generated about 50 different ensembles, which together allow to reliably control all systematic errors of results extrapolated to the physical point. In this paper, we present results for just one of these ensembles labeled H101. More ensembles and the extrapolation to the physical point will be analyzed in future work. This paper focuses on analyzing the applicability of the method proposed in [18] and demonstrates that it is more efficient than the methods used in the refs. [25] and [26].
The paper is partitioned into three main sections. In sec. 2 we review the factorization theorem for qTMDs and explain the theoretical setup for our study. The theoretical peculiarities and assumptions of the CS kernel determination from the first moment of qTMDs are described in subsec. 2.3. Sec. 3 is devoted to the details of our lattice analysis. In sec. 4 the extraction of the CS kernel is presented. Subsec. 4.1 presents technical details of the extraction, while we discuss systematic uncertainties in subsec. 4.2. The results are given in subsec. 4.3. The paper concludes with section 5.

Formalism
In this section, we introduce the notation and provide theory details used to determine the CS kernel. For the derivation of the presented expressions cf. ref. [18].

Factorization theorem for qTMDs
We study the following matrix element where f labels the quark flavor, Γ is a Dirac matrix, [x, y] is the straight gauge link between x and y, and |h(P, S) is a single-hadron state with momentum P and spin S. The vector b µ is orthogonal to the vectors P µ and v µ , (bP ) = (vb) = 0. In addition, the vectors v µ and b µ lie in an equal-time plane, i.e. they have no time-component and hence b 2 < 0 and v 2 < 0. For definiteness, we fix the normalization by v 2 = −1. A visualization of this configuration is given in fig. 1. Let us assume that the vectors P µ and v µ are positioned in the (t, x) plane 1 . We introduce light cone coordinates for all four-vectors a µ , a + = (a 0 + a 1 )/ √ 2 and a − = (a 0 − a 1 )/ √ 2. The transverse part of the vector is a µ T = (0, 0, a 2 , a 3 ). The basis vectors corresponding to the + and − components are denotedn and n respectively (n 2 = n 2 = 0, (nn) = 1). The decomposition of the momentum for a nucleon moving in the x-direction is (2.2) Figure 1: The geometrical configuration of the matrix element (2.1). The red lines represent the operator, and the blue line represents the hadron momentum.
In the limit of a very energetic hadron and large L, the matrix element W can be rewritten in the factorized form where Φ is a physical TMD distribution, Ψ is a combination of soft-factors [18], and the ellipsis represents the power suppressed correction discussed later. The factorization scales µ, ζ andζ are discussed in the following section. The perturbative coefficient function C H is known up to next-to-leading order (NLO) [16,18]. It reads Note that the coefficient function is strictly universal and independent of Γ. The expression on the RHS of (2.3) contains The operation on the RHS of (2.5) selects the TMD distributions of leading twist, whereas highertwist distributions are nullified and appear as part of the power corrections in (2.3). The factorization theorem (2.3) is valid in the parameter range defined by where Λ QCD is the characteristic low-energy scale of QCD. Essentially, one needs large hadron momentum, large longitudinal size of the contour L, and fixed transverse size b. The parameter must not be too large to guarantee the -independence of the function Ψ.
The presence of unknown nonperturbative factors prevents the direct determination of TMDs from the lattice matrix element (2.1). To eliminate the unknown and singular factor Ψ we consider a ratio of W 's evaluated at the same value of b but for different momenta. In this case, the functions Ψ cancel, and the result is expressed entirely in terms of the physical TMDs. For the extraction of the CS kernel, we use the ratio f ←h (P 1 , P 2 ; b; , L; v, S) = P + 1 W [Γ] f ←h (b; , L; v, P 1 , S; µ) P + 2 W [Γ] f ←h (b; , L; v, P 2 , S; µ) . (2.7) Substituting the expression (2.3) we obtain where the ellipsis denotes power suppressed terms. To cancel the factors Ψ we tookζ 1 =ζ 2 =ζ, which according to (2.10) implies ζ i = (2x i P + i v − ) 2 µ 2 /ζ. Note that the ratio (2.8) is independent of µ. The factorization theorems derived in refs. [16,17] are equivalent to (2.3). The only difference is that the authors of refs. [16,17] consider the qTMD together with a certain soft factor S bent , which is equivalent to the division of (2.3) by S bent . The factor S bent is constructed such that it cancels Ψ in the perturbative regime. Performing the inverse Fourier transformation, one determines the physical TMD distribution from the combination of qTMD and S bent . This approach gives access to the full TMD. However, it contains several fundamental complications. The two main complications are -the absence of proof that Ψ/S bent = 1 nonperturbatively and the fact that the inverse Fourier integration incorporates values of that violate the requirements of the factorization theorem (2.6). We expect these complications to be resolved in the future. For this work, the dependence on x is irrelevant since we are only interested in determining the CS kernel. We extract the CS-kernel from the ratios (2.8) evaluated at fixed .

CS kernel and evolution of TMD distributions
The factorization expression (2.3) contains a number of scales, which are combined into the variables µ, ζ andζ. The ultraviolet renormalization of W gives the overall dependence on µ [28], where we omitted the arguments of W for brevity. The factorization of collinear singularities introduces further dependence on µ, which cancels out between the coefficient functions and the functions Φ and Ψ. The scales ζ andζ result from the factorization of rapidity divergences [29]. These scales satisfy the relation Let us emphasize that the scale µ is present on the RHS of (2.10). In ordinary TMD factorization [29][30][31] one has the relation ζζ = (2p + 1 p − 2 ), where p i are the momenta of the colliding hadrons. In the factorization of W leading to (2.10) the second hadron is absent and replaced by Wilson lines. Therefore its momentum does not appear in (2.10).
The dependence on µ and ζ of the TMD Φ is given by a pair of equations, where γ F is the ultraviolet anomalous dimension, and K is the CS kernel. The anomalous dimension and the CS kernel depend only on the color representation F and quark flavor f . Since in the present work, we deal only with quark TMDs, the label f for γ f F and K f is not necessary, and we omit it in the following. Note that the CS kernel itself, which should be independent of the hadron type, depends on the employed momenta P 1 and P 2 only via the scale µ. However, P 1 and P 2 do influence the systematic error due to power corrections discussed in subsec. 4.2.
The integrability condition for the system (2.11, 2.12) yields where Γ cusp is the cusp anomalous dimension of light-like Wilson lines. The expression for γ F is known up to N 3 LO. The CS kernel is a generically nonperturbative function, but for small values of b it can be computed by means of the weak field approximation. The leading term is where L = ln(|b 2 |µ 2 /4e −2γ E ) with γ E = 0.5772... being the Euler constant, and C F = 4/3. The N 3 LO expression is derived in ref. [32], and the leading power correction in ref. [11]. In practice, it is convenient to resum the logarithms of (bµ), which significantly improves the perturbative convergence of the series [33]. In this case the leading term is where β 0 is the leading order QCD beta function where P is an arbitrary path connecting the points (µ, ζ) and (µ 0 , ζ 0 ) [34]. Path-independence is guaranteed by the integrability condition (2.13). For our purposes, it will be convenient to evolve TMDs along the path of constant µ. Choosing the straight path we obtain

Extraction of the CS kernel from ratios at = 0
In our lattice simulation we evaluate W at = 0. In this case, the expression (2.8) simplifies to where we also drop unimportant variables (L, v, S) from the argument for brevity. To explicitly extract the CS kernel we evolve both TMDs in ζ to the point ζ 0 using (2.17), and obtain .
(2.20) f ←h (2.22) computed using fits for the d-quark unpolarized TMDs in the proton (blue) and in the pion (black), and from the d-quark Sivers function (brown) in refs. [12,35,36], with uncertainty bands. The numbers to the right show the mean and the error for each case, where the error is given by the maximal deviation. Note that for the unpolarized TMD f 1 there is more precise experimental data available than for the Sivers function f 1T and thus M [f1] f ←h (b) can be determined with higher precision than M In general, the function r f ←h has complicated properties. For example, its numerator and denominator have potential problems with convergence at x → 0, see the discussion in ref. [18]. Also, the function r f ←h depends on µ. To simplify it and reveal its dependence on µ, we expand r [Γ] f ←h in the limit of small α s . The NLO perturbative expansion for this function is In the following we use this expression as our approximation for r [Γ] f ←h , assuming M [Γ] f ←h to be a constant in b. The scale µ is selected to nullify the logarithm in the square brackets of (2.21) where we used that |v − |= 1/ √ 2. The central point of our approach is the assumption that the function r f ←h is almost independent of b. Such a behavior is expected, because a different behavior of denominator and numerator in (2.20) can only come from the ln x term present in C H . Therefore, any essential deviation from the constant behavior implies a significant change of the x-profile between different values of b, which is unlikely. In fig. 2 we present examples of the function r f ←h for different TMDs computed using phenomenological extractions [12,35,36]. The deviation of r [Γ] f ←h from its mean value for b > 1 GeV −1 is (+1.2%,-3.5%) for the unpolarized TMD in the proton, and (+2%,-5%) for the unpolarized TMD in the pion. Let us mention that in the case of the Sivers function, the integrals are not sign-definite, and thus the ratios (2.20) are singular at certain points (see fig. 2(right)). This results in the enormous error band for r Sivers , and indicates possible issues in an application of the suggested method in this case. A phenomenological estimate of M f ←h is presented in fig. 3. The computation of r [Γ] f ←h using (2.21) based on this phenomenological M [Γ] f ←h is given by the dashed lines in fig. 2.
The observation that r [Γ] f ←h (or equivalently M [Γ] f ←h ) is approximately independent of b is central for our method of determining the CS kernel. Assuming r [Γ] f ←h to be constant we determine the CS kernel by where we omit the arguments of R and r. To determine the constant M [Γ] f ←h we normalize K computed on the lattice to the perturbation theory values with b in the regime where perturbation theory and the factorization formula are applicable. That requires simultaneously b 1/P + and b 1/Λ QCD . We select b ∼ 1 GeV −1 for which the perturbative series converges well [15,33]. For normalization we use the N 3 LO value in the resummed form [12,32]. The normalization procedure introduces a correlated systematic uncertainty. Assuming that r (M) has variance δr (δM) we compute the variance for K as Considering the phenomenological extractions shown in figs.2, we obtain δK ∼ 0.15 for the unpolarized proton case, and δK ∼ 0.06 for the unpolarized pion case. In the case of the Sivers function, this uncertainty is larger, δK ∼ 0.25. Therefore, using this approach, we can estimate the CS kernel from = 0 matrix elements with a systematic uncertainty of about ±0.1. Let us emphasize that the method described here does not depend on the error in the determination of absolute values of r, but only on the error of the deviation of r(b) from the constant. The analysis presented in this section is valid only for Γ-matrices corresponding to leading twist TMDs, for which Γ = Γ (2.5). For leading twist TMDs, the CS kernel is independent of Γ. This fact can be exploited for a cross-check of our computation. There are three Γ-matrices of leading twist TMDs Γ = {γ + , γ + γ 5 , iσ α+ γ 5 }, (2.26) where the index α is transverse. For these cases, the matrix elements Φ [Γ] are parameterized in terms of the leading twist TMDs according to [37,38] where µν T and g µν T are the transverse parts of the Levi-Civita and the metric tensor ( 23 T = − 32 T = 1, if the transverse plane is the (y, z) plane), s µ T and λ are the transverse and longitudinal components of the nucleon's spin vector, and M is the mass of the nucleon. Considering different polarization states of the hadron, we can access different TMDs independently.
The adopted method's main advantage is that lattice computations of R( = 0) are essentially simpler than the evaluation of qTMDs with x-dependence. In our case, we do not need to account for -dependent renormalization factors and power corrections induced by . We see this as an advantage of our analysis. In any case, the CS kernel extraction might be strongly affected by power corrections, which (as we expect) is a common property of all determinations of the CS kernel from the lattice. Better control of power corrections is vital for continued progress. At present, too little is known about these to predict which precision can be ultimately reached. The only point that seems inevitable is that further progress will require substantial work on all sidesperturbative QCD, lattice QCD, and experiment.

Evaluation of TMD matrix elements on the lattice
In this section we provide details of the computation of the matrix elements W [Γ] (2.1) on the lattice.

Computation of the TMD matrix elements
In the following, we consider Euclidean spacetime, and utilize the following notation for any fourvector, a = ( a, a 4 ), i.e. a represents the spatial vector components, whereas a 4 is the Euclidean time component. The TMD matrix element can be accessed on an Euclidean lattice by evaluating an extended three point function C 3pt . It is defined as: where the angle brackets indicate the sum over all gauge configurations of the employed gauge ensemble and the trace is taken with respect to spinor indices. The operators P and P are the proton interpolators These annihilate or create a tri-quark state sharing the same quantum numbers as the proton, which is implemented in the Chroma software system [39]. The spinor matrix Γ S projects out the state with positive parity (at rest) and the desired proton spin S, see (3.9). The expression (3.1) is invariant under shifts in time, assuming infinite or periodic time extensions. In our simulations, we use a lattice with open boundaries in the time direction. For that reason we shall use the source time slice t src 0. In the current case, the operator J Γ is represented by a non-local gauge-invariant quark bilinear (2.1), t src τ t snk Figure 4: Lattice nucleon three point function with the source nucleon located at t src and the sink nucleon at t snk separated by the distance t = t snk − t src . The blue line is the Wilson line with the shape shown in fig. 1.
Notice that in the latter case, there are two nonequivalent possibilities to implement the link path. The second is obtained by interchangingμ andν. We perform our calculations using both versions. In order to relate the three-point function C 3pt to the TMD matrix element W [Γ] f ←h we calculate the following quantity: 2 m 2 + P 2 C 3pt (P, S, C, t, τ ) C 2pt (P, S, t) 0 τ t = rsū (r, P )Γ S u(s, P ) h(P, s)| J Γ (C, τ ) |h(P, r) sū (s, P )Γ S u(s, P ) , (3.6) where the r.h.s. can be identified with W [Γ] f ←h if Γ S is chosen accordingly. In equation (3.6) the Euclidean time separations between source, insertion, and sink should be large. In this limit, the excited proton states, which also overlap with the proton interpolators, are exponentially suppressed. The ratio with the two-point function is constructed using C 2pt ( P , S, t) := tr Γ S P( P , t) P( P , 0) .
This weighting is required for the correct normalization. The evaluation of the fermionic integral implicit in (3.1) leads to several Wick contractions, of which we distinguish two kinds. These are referred to as "connected" and "disconnected" graphs, where the latter involves a fermion loop, including the operator J Γ . In this study, we restrict ourselves to the non-singlet quark contributions (particularly, u − d). In this combination, the disconnected graphs vanish exactly. Hence, we have to consider only one Wick contraction, which is schematically drawn in  superscript Φ, P indicates that the source has been treated with momentum smearing [40] in combination with HYP smeared gauge links [41]. Momentum smearing is crucial to realize large hadron momenta on the lattice, as it increases the overlap with the proton ground state dramatically. The propagator connecting the proton sink and the insertion operator is evaluated by the sequential source method [42]. The proton sink and the sequential source, which is placed at time slice t, are again improved by momentum smearing. The corresponding sequential propagator is denoted by . Notice that it should be Hermitian conjugated and followed by multiplication with γ 5 , since an inversion on the sequential source returns a backward propagator. In total, we can write the connected contraction as: where U latt [C] denotes the staple-shaped gauge link, which is constructed using the lattice gauge links (3.4) or (3.5), respectively. Notice that we are able to calculate C Γ,conn 3pt for all insertion time slices τ by performing only one sequential inversion, whereas the sink time slice t is fixed by the location of the sequential source.

Simulation setup
The two-point function, as well as the connected three-point graph C Γ,conn 3pt are evaluated on 2000 configurations of the H101 CLS ensemble [43]. It employs the tree-level improved Lüscher-Weisz gauge action and includes N f = 2 + 1 Sheikholeslami-Wohlert fermions. The extension is 32 3 × 96 with lattice spacing a = 0.0854fm. Additional ensemble parameters are given in table 1.
For each configuration, we perform calculations for four different quark sources located at random spatial position with the source time slice t src = 10a, and the source-sink separation t snk − t src = 11a. The quantity that enters the l.h.s. of (3.6) is obtained by a fit to a constant regarding the insertion time τ . The fit includes an interval of insertion time slices assuming that excited states are sufficiently suppressed. In the present work we consider τ ∈ [4a, 7a].
In our simulation the proton momentum P = (P 1 , 0, 0) has values P 1 ∈ {0, 1, 2, 3} 2π aL . The first case P 1 = 0 is used to extract the nucleon mass and cross-check the dispersion relation. The other cases provide us with three values for P + = {1.25, 1.74, 2.27} GeV, useful for the extraction of the CS kernel. The proton spin is oriented along the z-direction, i.e. we insert in (3.1) and (3.6). Here, the first term is used to project the proton at rest onto the positive parity state. Since the proton momentum points in x-direction, the proton spin lies in the transverse plane, and thus λ = 0. The vector v is taken to be v µ = (±1, 0, 0, 0), such that (v · P ) = ±P 1 . The vector b is b µ = (0, b y , b z , 0). In the case b y = 0 and b z = 0, the transverse link has step-like form (3.5).
In order to reduce autocorrelation, we apply the binning method to the measured C 2pt and C Γ,conn 3pt , with bin size 20. From the resulting binned samples, we create 100 jackknife samples, which are used for the estimation of the error propagation in all derived quantities, such as the ratio (3.6).

Extraction of lattice correlators W
To extract physical matrix elements, we parametrize the correlators W [Γ] using the most general parameterizations and fit the invariant structure functions. This approach has been developed in refs. [21,22], and successfully applied to the computation of moments of various TMDs in the references [22-24, 44, 45]. Note that, although this method can also be used to scan the -dependence of the correlators and extract the x-dependence of TMDs, the application to first moments ( = 0) is particularly simple and robust, and suited to the current task -the extraction of the CS kernel.
The detailed description of this procedure with explicit expressions is given in ref. [21]. Here, for illustration purposes, we present only the parametrization for the vector matrix element. It reads where for our calculation at = 0,ã i andb i are functions of the Lorentz-invariant combinations . Using the set of matrix elements W [γ µ ] evaluated at different µ, L and b ν we fit the invariant functionsã i andb i with a least-square fit.
In the following step, the fitted functionsã i andb i are grouped into the leading twist TMD combinations (2.26). Using that the spin-vector is an independent vector we identify the combinations ofã i andb i that refer to particular TMD distributions (2.27,2.28,2.29). In the vector case, the terms proportional to the unpolarized (f 1 ) and Sivers (f ⊥ 1T ) TMDs are Here, we introduce the notation W [F ] , which indicates the component of the matrix element W [Γ] with the leading contribution proportional to the TMD F . In other words, it is a qTMD (at = 0) with the quantum numbers corresponding to F . The axial and tensor qTMDs defined in (2.28) and (2.29) are computed analogously. In our calculation we have access to the 6 leading-twist TMDs that remain present in (2.27,2.28,2.29) after setting λ = 0 as implied by our choice of the proton spin. These are the unpolarized TMD f 1 , the Sivers function f ⊥ 1T , worm-gear-T function g 1T , transversity TMD h 1 , Boer-Mulders function h ⊥ 1 and pretzelocity h ⊥ 1T . The evaluation of the functions g 1L and h ⊥ 1L is also possible but requires simulation with a different spin orientation.

The CS kernel from lattice data
In this section, we present the CS kernel extraction from the W [F ] determined by the lattice simulations. We also estimate the systematic uncertainty and compare the results of the extraction with previous extractions.

Determination of the CS kernel
As an outcome of the lattice simulation, described in the previous section, we have the functions W wherev = sign(v · P ). It is important that many lattice-related artifacts, such as lattice renormalization factors, cancel in this ratio. This cancellation is not exact since there is an operator mixing (such as discussed in ref. [46]) that possibly depends on momentum. This point requires further investigation, which will be performed in the future. For the moment, we ignore these effects, expecting them to be small in comparison to other distortions, such as lattice artifacts and power  corrections to the factorization theorem.
We analyze the ratios (4.1) using the theoretical scheme presented in sec.2, and extract the CS kernel. The analysis has three principal steps, which we describe below.
(i) Extrapolation L → ∞. The large-L limit is an essential requirement to reach the TMD regime (2.6). To extract the large-L value of R [F ] we use a constant fit (for L > 2.98 GeV −1 ) independently for each combination of b, P + 1 /P + 2 and F . Example profiles of R [F ] in L and extracted plateau values are shown in figure 5. In the plateau fit we assume that ratios at the values forv = ±1 are the same (which is the consequence of parity conservation) and perform a combined fit for the left and right branches.
In many cases, the profile in L does not have a well-defined transition to an asymptote. Such a situation is especially typical for large values of b (b > 2.5 GeV −1 ). It indicates that the values of L used in our analysis are not large enough, and corrections ∼ b/L are significant. This part of the analysis provides the largest source of uncorrelated uncertainties. In the following plots, we specially shade the area b > 2.5 GeV −1 to indicate that this part of the extraction is not under control.
We faced problems with the extrapolation of R [F ] for the Sivers f ⊥ 1T , Boer-Mulders h ⊥ 1 and pretzelocity h ⊥ 1T cases. Particularly, the ratios R [f ⊥ 1T ] and R [h ⊥ 1 ] show anomalous growth at large b. The origin of this problem is not entirely clear. We associate it with the incomplete cancellation of lattice artifacts. Alternatively, such abnormal behavior can be described by a significant violation of the M = const. assumption, see also fig. 2 (right). The ratio R [h ⊥ 1T ] has a very small signal-to-noise value. For these reasons, we exclude these three cases 2 from the following consideration, which left us with three TMDs.
(ii) Determination of M. The phenomenological formula for the extraction of the CS kernel contains the unknown function M, which corrects the ratio for the unobserved x-dependence. In our approach we assume M(b) = const., and extract it from R  fig. 6. Generally, we found a good agreement between different momentum configurations and different fitting ranges. Partially, the difference between different momentum configurations is explained by the evolution effects for M, which are subleading in our analysis.
(iii) Extraction of K. We combine the extrapolated R  In all steps the statistical propagation of uncertainties is performed with the resampling method. All theoretical predictions, such as the perturbative CS kernel, evolution factors, etc, are evaluated by artemide [49] at N 3 LO with free parameters tuned as in the SV19 TMD-extraction [12].

Estimation of systematic uncertainties
The results of extraction are given in figs. 8 and 9 and discussed in the following section. The error-bars on these figures represent the statistical uncertainty only. The accurate estimation of systematic uncertainty is impossible at the current stage of research. However, let us at least identify the main sources and give an estimate of possible effects.
• Lattice artifacts. Presumably, this is the main source of uncertainty and the least controlled.
Mainly, the lattice artifacts arise due to differences between proton states evaluated at different momenta. This effect is visible in fig. 5 as a sudden change of asymptotics at large-L. In some cases (f ⊥ 1T , h ⊥ 1 , h ⊥ 1T ) these lattice artifacts completely ruin the extraction. An improved understanding of these artifacts and their control is essential for further development.
• Assumption M = const.. This assumption introduces a fully correlated systematic uncertainty, the size of which is estimated by equation (2.25). The values of δM are given in fig. 6, and the resulting values for δK are indicated on each figure.
• Small size of staple contour. The insufficient size of the contour generates a systematic deformation of the CS kernel at large b. We expect that the values with b > 2.5 GeV −1 are systematically higher (lower for |K|). This effect occurs because many configurations do not reach the plateau for available values of L, and thus corresponding values are systematically underestimated. Thus, in fig. 8 and 9 this area is shaded in red. The same effect but in a smaller amount takes place also at smaller values of b in some configurations. This is apparent in fig. 6, where some series of points are moving to the left or right with an increase of L, e.g. for g 1T at P + 1 = 2.27 GeV.
• Power corrections in P to factorization theorem. The factorization theorem is valid at large P , whereas the available values are not that large. The amount of violation of the factorization theorem can be estimated by the method suggested in ref. [18], which uses the fact that in the factorization limit, only the "good" components of the quark fields survive (2.26). In turn, it implies that certain tensor components, cf. (2.5), of the quark-bilinear are larger than others. Therefore, the ratio of a "bad" component to a "good" component provides an estimation for the size of power corrections to the factorization theorem. We performed such a comparison and found that these corrections amount to ∼ 40% at P + = 1.25 GeV, and ∼ 30% at P + = 2.27 GeV for b 2.5 GeV −1 . In fig. 7 we demonstrate the case where f T is the twist-three TMD that parametrizes the ∼ b µ component of the vector current (g µν T qγ ν q ∼ b µ M f T , ignoring common structures). The tested ratios exhibit the expected general behavior, i.e., they decrease with the increase of P + or decrease of b. Fortunately, these corrections do not significantly impact our extraction because they mostly cancel in the ratio R [F ] and are partially accounted for by the fit of M. However, these corrections will present significant problems for determining absolute values for TMDs from the lattice.
• Power corrections in b −1 to factorization theorem. At small values of b the factorization theorem is violated by 1/|b|P + corrections. Therefore, the range of small-b is not reliable. The effect of these corrections is apparent for b 0.8 GeV −1 , where the CS kernel is perturbative.
There are also numerous smaller sources of systematic uncertainties, which are not discussed here.
In general, we believe that our extraction, supplemented with the statistical and the given correlated uncertainties, provides a reliable estimation for the CS kernel in the range 0.8 b 2.5 GeV −1 .

Discussion
The extracted values of the CS kernel from three different TMDs, and three different momentum combinations are shown in fig. 8. Clearly, the present uncertainties are dominated by statistical noise. Nonetheless, we observe a good agreement between different channels, which confirms the CS kernel's universality. The noisiest channel is the worm-gear TMD g 1T . In this case, the main source of disagreement comes from the spread of M (see fig. 6). The uncertainty in the determination of M gives rise to the correlated uncertainty, which is denoted by δK and indicated for each case. In fig. 9 (left panel), we present the values of the CS kernel from the combined fit of different momentum configurations for each TMD. Effectively, the averaging over channels triples the statistics. We observe that the extractions from f 1 and h 1 are in excellent agreement with each other,  whereas the curves we obtain from g 1T have a rising tendency at large b. This effect mainly appears due to the small sizes of the contour. The right panel of fig. 9 shows the values for the CS kernel combined from all channels and represents the main result of our work. The combined value of δK is computed according to ref. [50]. We compare the obtained CS kernel with other results in fig. 10. The left panel shows the comparison with phenomenological extractions [12,13] and the purely perturbative CS kernel resummed at N 3 LO [32,33]. We observe some similarity between extractions in the range 1 < b < 2.5 GeV −1 (which is the most reliable part of our analysis). However, for b > 2.5 GeV −1 the discrepancy is essential, as well as the difference between the two phenomenological curves. Probably this indicates that these curves are primarily determined by the chosen analytic form of the parametrizations for such large b. The right panel of fig. 10 shows the comparison with other lattice computations, which were made in ref. [26] (in the quenched approximation) and in ref. [25] (where the CS kernel is extracted from the qTMD soft factor [51]). We observe a nice agreement.
Importantly, all lattice simulations demonstrate a weak variation of the CS kernel at large-b. This observation contradicts the popular assumption K(b) ∼ b 2 for large b, see e.g. refs. [13,52,53], and supports models with linear or constant asymptotics, such as in ref. [11,12,54].

Conclusion
In this work, we present a lattice computation of the Collins-Soper (CS) kernel from the first moments of qTMD distributions. The analysis is performed with dynamical fermions and stapleshaped Wilson lines for the CLS ensemble H101 with lattice spacing a = 0.0854 fm and unphysical quark masses. The results are shown in figures 8 and 9. For the first time the CS kernel is determined from three different TMDs (f 1 , g 1T , h 1 ). The results agree with each other, which confirms the universality of the approach.
The method of extraction used in this work was suggested in ref. [18] and is based on the NLO factorization formula for qTMDs [15][16][17][18]. In contrast to other methods [15][16][17], it does not require evaluating the x-dependence of qTMDs but uses only x-moments. The information on the x-dependence is contained in a certain integral combination of TMDs, called M (2.22). The central assumption of the approach is that M is independent of b, which we confirm (within statistical uncertainties) from our lattice data. The ambiguity in the determination of M is contained in the fully correlated uncertainty δK. The given values for δK may be exceeded by other (not estimated) systematic uncertainties. Overall, our results demonstrate that the method suggested in ref. [18] is suitable for determining the CS kernel. Our extraction of K is supplemented with a discussion of sources of systematic uncertainties. We identify the pure lattice artifacts as the largest and the least controlled source of uncertainty. Excessive lattice artifacts force us to exclude the functions f ⊥ 1T , h ⊥ 1 and h ⊥ 1T from our analysis. Other sources, such as power corrections to qTMD factorization and model assumptions, produce smaller and better-quantified uncertainties. We identify as the reliable range for the extracted CS kernel the interval 0.8 GeV −1 b 2.5 GeV −1 . We also estimate the size of power suppressed corrections to TMD factorization and find them to amount to ∼ 30 − 40% at our energies. These corrections do not strongly influence the present extraction of the CS kernel but will play a vital role in extracting the x-dependent TMDs.
In fig. 10 we demonstrate that our results are in agreement with previous lattice calculations [25,26]. In our case, the statistical uncertainty is smaller and comparable to the uncertainty of phenomenological extractions due to a larger number of channels for determining the CS kernel. For that reason, and because we can better estimate systematic uncertainties, we regard our result as a significant step forward in the lattice determination of the CS kernel. The comparison with the phenomenological extraction exhibits discrepancies, which are mainly due to systematic sources. Therefore, future studies should concentrate on the elimination of systematic contaminations.