Extraction of the Sivers function from SIDIS, Drell-Yan, and $W^\pm/Z$ boson production data with TMD evolution

We perform a global fit of the available polarized Semi-Inclusive Deep Inelastic Scattering (SIDIS), polarized pion-induced Drell-Yan (DY) and $W^\pm/Z$ boson production data at N$^3$LO and NNLO accuracy of the Transverse Momentum Dependent (TMD) evolution, and extract the Sivers function for $u$, $d$, $s$ and for sea quarks. The Qiu-Sterman function is determined in a model independent way via the operator product expansion from the extracted Sivers function. The analysis is supplemented by additional studies, such as the estimation of applicability region, the impact of the unpolarized distributions' uncertainties, the universality of the Sivers functions, positivity constraints, the significance of the sign-change relation, and the comparison with the existing extractions


Introduction
The asymmetry of the intrinsic nucleon structure is a fascinating phenomenon that demonstrates the complexity of the underlying theory of strong interactions, Quantum Chromodynamics (QCD). By their very nature, asymmetries are related to a nontrivial three-dimensional (3D) structure of the nucleon and could not be fully understood within the collinear partonic picture. A more adequate approach for studies of such phenomena is the transverse momentum dependent (TMD) factorization theorem [1][2][3][4][5][6][7], which resolves the transverse components of the parton's momentum in addition to the collinear one and is sensitive to the short (perturbative) and long (nonperturbative) distance QCD dynamics. The TMD factorization theorem associates the correlation of the nucleon's spin and the transverse momentum of an unpolarized parton with the Sivers function [8,9]. In this paper, we present a global QCD analysis and the extraction of the Sivers function from Semi-Inclusive Deep Inelastic Scattering (SIDIS), Drell-Yan (DY), and W ± /Z production data with next-to-next-to-leading order (NNLO) and next-to-next-to-next-to-leading order (N 3 LO) of TMD evolution and provide a detailed description of results presented in Ref. [10].
Within the TMD factorization theorem, nonperturbative effects of collinear and transverse parton motion are included in the TMD distribution (TMD PDF) and fragmentation (TMD FF) functions. In its modern form, the TMD factorization theorem has been proven in Refs. [4,7,11], to which we refer the reader for the theory details. A natural subject of the TMD factorization is inelastic sufficiently inclusive processes with two observed scales, such as SIDIS [3-5, 12, 13], DY [2, An essential aspect of TMD distributions is their scale-dependence (evolution), as predicted by QCD evolution equations. The analyses that described (or predicted) both SIDIS and DY data from Ref. [23,28,30] used the parton model approximation without the TMD evolution and demonstrated a good agreement with the experiment. These analyses assumed suppression of evolution effects in asymmetries and indicated [28,30] the Sivers function's universality, however, they did not capture the full complexity of QCD evolution for TMD distributions. On the other hand, the analyses that included TMD evolution, Refs. [27,31] had problems and did not achieve better agreement with the DY data compared to Refs. [23,28,30]. This situation constitutes a puzzle in establishing the status of the TMD factorization theorem because the scale dependence and the nonperturbative and universal CS kernel are among its principal elements, and the predictive power and precision of the TMD factorization should improve when higher orders are included.
• We use the SV19 [18] extraction as an input for unpolarized TMD distributions and nonperturbative TMD evolution (CS kernel). The selection of input unpolarized distributions is of principal importance for the analysis of TSSAs, because they enter both the numerator (convoluted with the Sivers function) and the denominator (the unpolarized cross-section) of the observed asymmetry. SV19 extraction is made by the simultaneous fit of the unpolarized DY and SIDIS data on cross-sections differential in the transverse momentum, spanning from relatively low energy experiments such as HERMES, to the highest energies of the LHC.
• The present fit is performed with NNLO and N 3 LO TMD evolution (together with NNLO matching for unpolarized distribution in SV19). The latest studies [18,[65][66][67] demonstrated the importance of the perturbative input for a good agreement of theory with the data.
• In contrast to many of the previous extractions, we do not use any collinear function in the ansatz for the Sivers function. In fact, a parametrization of the Sivers function starting from the collinear PDFs is not a well founded approximation, because the Sivers function generally belongs to a different class of functions, and thus such an approximation only biases the extraction.
• The key element of our analysis is the ζ-prescription [68]. In the nutshell, the ζ-prescription is a particular selection of the reference scale for the TMD evolution, which totally decouples nonperturbative evolution from the nonperturbative TMD distributions. In this scheme a TMD distribution is defined as a universal function without any specific relation to the collinear distributions. Exactly this property allows us to use NNLO or N 3 LO TMD evolution (with NNLO or N 3 LO hard coefficient functions, and NNLO matching for unpolarized distributions inherited from SV19) together with a free parametrization for the Sivers function in a strict theoretical manner.
• We use a conservative data cut, which guaranties that the selected data could be described within TMD factorization approach derived in the approximation of a large hard scale Q, and a low transverse momentum, q T . Experimental data are also carefully selected to avoid double counting. The resulting data set becomes relatively small in comparison to other extractions, however the latest 3D binning data set by HERMES [35] allows us to obtain enough data for the analysis.
Let us mention that the SV19 extraction [18] is also performed using these principles, and demonstrated a perfect agreement with the data. The numerical computations are performed with artemide [69] -the multi-purpose package for phenomenology within the TMD factorization framework. The results of the SV19 extraction can be found at [69] (and the analysis code can be found at [70]).
The paper is organized as follows. In Section 2 we recall aspects of the TMD factorization formalism needed to analyze and interpret the experimental data. The extraction procedure and the uncertainty estimation approach are described in Section 3. Section 4 is devoted to the description of results of the global QCD analysis of the data related to the Sivers function, exploration of the properties of extracted Sivers function, and extraction of the QS functions. We finally conclude and discuss further improvements in Section 5.

Formalism
We start by summarizing theoretical formalism and presenting expressions that describe DY process and SIDIS cross-sections within TMD factorization. An interested reader can find details about TMD factorization and derivations of cross-sections for these processes in Refs. [4,13,15,23,68,71].

Definition of TMD distributions
The unpolarized TMD PDF, f 1 , the unpolarized TMD FF, D 1 , and the Sivers function, f ⊥ 1T , parametrize the matrix element of the vector TMD operator. In the position space, they are defined as follows where P µ , S µ and M are the momentum, the spin-vector and the mass of the hadron, P µ h is the momentum of the produced hadron h, and b µ is a transverse vector 1 . TMD distributions depend on x and z, which are light-cone momentum fractions for incoming quark q and the produced hadron h correspondingly, and the transverse separation b. Also, TMD distributions depend on the ultraviolet µ and the rapidity ζ renormalization scales, which we discuss below. We adopt the standard notation for components of the light-cone decomposition, v µ = v + n µ + v −nµ + v µ T with n 2 =n 2 = 0 and n ·n = 1, and the transverse anti-symmetric tensor µν T = −+µν with 12 T = − 21 T = 1. The staple gauge-link points to +∞(−∞) in the case of TMD distributions measured in SIDIS (DY), and assures the gauge invariance of the TMD operator. The unpolarized distributions are independent of the direction of the staple gauge-link in SIDIS and DY, whereas the Sivers distribution changes sign while the absolute size remains the same, see Refs. [53][54][55], and 1 Let us note that the relative position of quark fields is important since it defines the direction of b µ , and hence the sign convention for the Sivers function. For instance, TMD operators are defined in Ref. [72] as ∼q(0)...q(λn + b), which results in −i µν T bµSν prefactor for the Sivers function, compare to Eq. (2.1). Taking the direction of b into account, the definition used here coincides with definitions in Refs. [60,72,73] so that the final expressions for the cross section also coincide. Notice also that Ref. [72] denotes the Sivers function in configuration space asf making explicit the Fourier transform from the momentum space to the position space and a derivative with respect to b. This notation corresponds to our notation of f ⊥ 1T as we start directly from parametrizations of TMD distributions in the position space.
For definiteness, in the formulas for a particular process we use the notation f ⊥ 1T for the Sivers function without explicit indication of the process, and the sign change between DY and SIDIS is implemented in calculations. All our results of the Sivers function extraction will be presented for the SIDIS definition.
The dependence on the scales µ and ζ is given by a pair of TMD evolution equations [4,68,74] where F is any TMD distribution (f 1 , f ⊥ 1T , or D 1 in the current context). The first equation is the ordinary renormalization group equation, with γ F being the ultraviolet anomalous dimension for the TMD operator. The second equation is the result of the factorization of rapidity anomalous dimension, with D being the Collins-Soper kernel 2 (or rapidity anomalous dimension). The Collins-Soper kernel is a fundamental universal function that has explicit operator definition and parametrizes properties of QCD vacuum [75]. It is a universal function, nonperturbative at largeb while at small-b it is calculable in terms of the perturbative expansion in the strong coupling constant α s , whereas it has to be extracted from the experimental data. Both quark and rapidity anomalous dimensions are known up to N 3 LO in the perturbative regime, see Refs. [76][77][78][79].
Using the evolution equations one relates measurements performed at different energies. It is convenient to select certain value of the pair (µ, ζ) as a reference scale. There are several choices of the reference scale (µ, ζ) used in the literature, see Refs. [4,17,68]. In this work we use the so-called ζ-prescription [68]. It consists in selection of the reference scale (µ, ζ) = (µ, ζ µ (b)) on the equipotential line (of (γ F , −D)-field) that passes through the saddle point. In this case, the reference TMD distribution, called the optimal TMD distribution, is independent on µ (by definition) and perturbatively finite in the whole range of µ and b. The solution of the TMD evolution equations from Eqs. (2.4, 2.5) can be written in the following simple form where F (x, b) on the right-hand side of the equation (2.6) is the optimal TMD distribution [65]. The functions ζ µ (b) is a known function [80] of the nonperturbative Collins-Soper kernel. In our notations, the optimal TMD distribution F (x, b) has no scaling arguments, which emphasizes its scale independence.

Sivers asymmetry in SIDIS
The differential SIDIS cross section of the inclusive hadron production in the electron scattering off a transversely polarized target (e(l) + h 1 (P, S) → e(l ) + h 2 (p h ) + X) has the following structure [13,[81][82][83] dσ dx dy dz dφ S dφ h dP 2 2 Our definition of the rapidity anomalous dimension corresponds toK and γν used in Refs. [4] and [74] as where q = l − l is the momentum of the virtual photon. The variables φ h and P hT are the angle and the absolute value of transverse component of the produced hadron's momentum, measured in the laboratory frame. The azimuthal angles for transverse components of the produced hadron (φ h ) and the spin of the target hadron (φ S ) are defined relative to the lepton plane [84]. The dots denote other angular modulations that are not interesting in the current context, and also the power suppressed structure functions [13], such as F U U,L and F , which do not contribute at our order of accuracy. We define the shorthand notation where f and D are TMD PDF and FF, J n is the Bessel function of the first kind and e q are electric charges of quarks q and the summation runs over all active quarks and antiquarks. Within the TMD factorization the expressions for structure functions F U U,T and F where C V is the quark vector form-factor and the hadron mass M is originated from the definition of the Sivers function Eq. (2.1). Let us emphasize the combination |P hT |/z that enters the argument of the Bessel function in Eq. (2.9). It is originated from the Lorenz transformation from the factorization frame, where the factorization theorem is derived, to the laboratory frame, where the experimental measurement is performed. This combination serves as a small parameter, and power corrections to Eqs. (2.10) and (2.11) have a generic size O((P hT /z/Q) 2 ). The accurate transformation between the frames must account for masses of initial and final hadrons. In this case, the argument of the Bessel function is more complicated [18]. Here, we omit these complications, which is valid in Q → ∞ limit.
The scales of the factorization should be selected such that µ ∼ Q, and ζ 1 ζ 2 = Q 4 [4,11,74,[85][86][87]. We use The resulting products of TMD distributions are to be evolved to the scale of the experimental measurement. Since the TMD evolution is independent of the flavor and the spin, all structure functions (at the leading TMD twist) have common evolution properties [88]. In the case of the ζ-prescription, using Eq. (2.6) one derives that products of TMD distributions in Eq. (2.9) turn into where we introduced the evolution factor Therefore, in the TMD factorization framework structure functions are Fourier transforms of products of three b-dependent universal factors: two TMD distributions f q←h and D q→h , and the The dependence of the single-spin Sivers asymmetry on Q at fixed values of x = 0.12, z = 0.32, and P hT = 0.14 GeV (these values correspond to a particular bin of HERMES [35]). Different perturbative orders are compared. In all cases unpolarized TMD PDF, TMD FF, the Sivers function and the nonperturbative part of the CS kernel are the same. The change of the perturbative order influences the order of perturbative part of CS kernel, TMD anomalous dimension. evolution factor R. Each of these factors governs dependence on a particular kinematic variable, x and z for TMD distributions, and Q for evolution factor, and altogether they are integrated over b with a Bessel function.
The single-spin Sivers asymmetry that is measured in SIDIS experiments, is the ratio of structure functions Combining expressions from Eqs. (2.10, 2.11, 2.13) we obtain the following formula The dependence on Q in (2.16) is enclosed in the factors R(b, Q). They are the only part of our computation that depends on the perturbative input since the hard coefficient functions |C V | 2 exactly cancel in the ratio Eq. (2.16). The perturbative order is defined by the order of TMD anomalous dimension (2.4) and by the perturbative part of CS-kernel (2.5) (see also Eq. (2.29)). Nowadays, these anomalous dimensions are known up to three-loop order, i.e. up to α 3 s [76][77][78][79]. This maximum order (the Γ cusp part is taken with one order higher, i.e. at α 4 s [89]) we refer as N 3 LO, according to the standard nomenclature (see Ref. [18] for extended discussion and references). Currently, one can define four consequent orders of perturbative input, starting from LO, which contains Γ cusp at LO, and null for other anomalous dimensions. In Fig. 1 we demonstrate 3 the comparison of different orders and the general behavior of asymmetry as a function of Q. The convergence of the series is good. The difference between orders is almost homogeneous at different Q and ∼ 50% at LO→NLO, ∼ −7% at NLO→NNLO, and ∼ 3% at NNLO→N 3 LO. Also, we notice 3 We anticipate and use in Fig. 1 our results of extraction of the Sivers function that we will perform in Sec. 4.1. The Q dependence of the asymmetry depends mainly on the evolution factor R that is known from the analysis of unpolarized data. The dependence on the parameters describing nonperturbative TMD functions is quite weak therefore a similar Q behavior is anticipated for all TSSAs that include J 1 a very rapid behavior of asymmetry at small values of Q. Fig. 1 is given for SIDIS and values of x = 0.12, z = 0.32, and P hT = 0.14 GeV are taken from a particular bin in HERMES kinematics [35]. For other values of x, z, and P hT , and for DY measurement, the behavior is similar.

Transverse single-spin asymmetry in DY process
Using notations of Ref. [15], the differential cross-section for DY reaction (h 1 (P 1 , S) + h 2 (P 2 ) → l + (l) + l − (l ) + X) can be written as where where q = l + l is the momentum of the electroweak boson, and y is its rapidity. The variables ϕ and q T are the angle and the absolute value of the transverse component of the momentum of the electroweak boson measured in the center-of-mass frame. The ellipsis denotes other DY structure functions. The shorthand notation from Eq. (2.9) for DY reads where f 1 and f 2 are TMD distributions, J n is the Bessel function of first kind and e q are electric charges of quarks q and the summation runs over all active quarks and anti-quarks. The TMD factorization gives the following expressions for structure functions (cf. (2.10, 2.11)) where C V (−Q 2 , µ 2 ) is the quark vector form factor for the space-like momentum transfer. Further steps are analogous to the SIDIS case (2.13) and the final expression for the transverse spinasymmetry in the ζ-prescription reads . (2.22) In some cases, discussed also in the following Sections, experimental measurements provide asymmetries which are related to A T U . In particular, the asymmetry A N [43] measured by STAR Collaboration is defined as the asymmetry relative to cos ϕ with S T oriented along y. In this case, cos ϕ = − sin(ϕ − φ s ) and thus Another important case is the process h 1 (P 1 ) + h 2 (P 2 , S) → l + l − + X (i.e. with the polarized hadron h 2 ) measured by COMPASS [40]. In this case where the exchange of Sivers and unpolarized TMD PDFs takes place in the numerator of (2.22). In order to explain the origin of the minus sign in (2.24) let us recall that the definition in Eq. (2.1) is written for the operator with γ + . The directions n andn are associated with the large components of P 1 and P 2 , correspondingly. Therefore, the Sivers function for the operator with γ − , corresponding to the large momentum P 2 , has a prefactor −i µν T b µ S ν , since +−µν = − µν T . The detailed discussion on different asymmetries related to A U T and relation between them can be found in Refs. [90,91].
At high Q the Drell-Yan pair production should account also for weak boson channels. To account for various channels in Eq. (2.19) one should replace q e 2 q → l,q,ch.
where z l and z q are combinations of couplings associated with lepton and quark vertices, and ∆ is the product of propagators multiplied by Q 4 . In the case of the neutral boson production, expressions for z f (where f = q or l) and ∆, for the channels in Eq. (2.25), ch. ∈ {γγ, γZ, ZZ} are where f = q, l. While in the case of charged-boson production there exists a single channel In these expressions e f is the electric charge, T 3 is the third component of the iso-spin, s W and c W are the sine and the cosine of the Weinberg angle, V f f is the element of the Cabibbo-Kobayashi-Maskawa matrix, Γ Z and M Z are the decay-width and the mass of the Z boson, and Γ W and M W are the decay-width and the mass of the W boson. Let us emphasize that in the presented expressions, we omitted the target-and product-mass corrections. Accounting for these corrections modifies many factors in formulas (2.7, 2.10, 2.11, 2.17, 2.21, 2.20), including the values of collinear variables x and z, values for P hT and ε (the corresponding expressions can be found in Refs. [18,92]). It was shown that the accounting for these power corrections improves the quality of extraction of unpolarized TMD distributions, see Ref. [18]. At small-Q or large q T /Q, which is a typical situation for the kinematical region of many experimental measurements, these corrections push values of kinematic variables closer to the phase space's border, and theory predictions become unstable. To stabilize predictions with respect to these corrections, we decided to neglect mass corrections in this analysis and leave a more profound analysis of the influence of target and produced hadron mass corrections to a future publication. We also make sure that neglecting mass corrections does not spoil our results by restricting the data used in our analysis to small values of q T /Q, where the influence of target and product mass corrections is relatively small.

Nonperturbative parametrization for the Sivers function
The optimal TMD distributions in the expressions (2.16, 2.22) are universal nonperturbative functions which should be determined from the comparison with experimental data. In principle, at small values of b these functions could be related to collinear distributions through the operator product expansion. Generally, accounting of the small-b relation significantly reduces the parametric freedom since any modification of small-b behavior appears as power corrections in b 2 . Thus, the typical ansatz for a TMD distribution F (x, b) has the following generic form is a collinear parton distribution, ⊗ is an integral convolution in variable x (for twist-2 collinear distributions it is the Mellin convolution), and f N P is some function that parametrizes nonperturbative b-shape of the TMD distribution such The expressions for perturbative matching coefficient for unpolarized distributions were obtained at NNLO in Refs. [73,93], and for Sivers function at NLO in Ref. [73]. The ansatz (2.28) works well for the unpolarized TMD distributions, where corresponding collinear distributions are known very precisely from different experiments. In particular, the SV19 global fit of DY and SIDIS data made in Ref. [18] is based on this ansatz with NNLO coefficient functions. The SV19 fit was performed with NNLO and N 3 LO TMD evolution, and with the Collins-Soper kernel parameterized as is the resummed expression for the perturbative part, and c 0 is a fitting parameter. The model (2.29) has linear asymptotic at large-b. This behavior is in agreement with the model calculations and analysis of large-b behavior of the Collins-Soper kernel performed in Refs. [75,94]. The coefficient c 0 can be related to the gluon condensate and its extracted values agrees [75] with its known values. The SV19 fit demonstrates perfect agreement of the theory with the data. In particular, χ 2 /N pt is 1.1 for DY (with N pt = 457) and 0.95 for SIDIS (with N pt = 582). The pion unpolarized TMD PDF was extracted in the same framework in Ref. [80], what we refer to as Vpion19. The values of nonperturbative parameters for SV19 and Vpion19 fits together with Monte-Carlo replicas are publicly available via artemide repository 4 [69].
In the case of the Sivers function the perturbative NLO matching has been derived in Refs. [60,73]. It has the the following expression (see also Sec. 4.5) where a s = g 2 /(4π) 2 is the strong coupling constant and the function T (x 1 , x 2 , x 3 ) parametrizes twist-3 quark-gluon-quark correlator ∼q(z 1 )γ + F µ+ (z 2 )q(z 3 ). This function mixes with functions ∆T , G ± that parametrizes another twist-3 correlations, and in (2.30) are collectively denoted as T . The twist-3 distributions have support −1 < x 1,2,3 < 1 with x 1 + x 2 + x 3 = 0, and the integral convolution ⊗ in (2.30) projects it to a single variable x. The tree order term in Eq. [61,63,95,96]. The QS function is not an autonomous function, in the sense that its evolution involves the values of arguments outside of the line (−x, 0, x), and mixes with functions ∆T and G ± [97]. None of these functions is known, and thus accounting for small-b asymptotic in the sense of Eq. (2.28) is not a feasible way of constructing the expression to be used in phenomenology. Therefore, in the present work, we will not use the small-b matching for Sivers function. Instead, we consider the optimal Sivers function as a generic nonperturbative function that we will extract directly from the data. We do not put any special restrictions on the parametrization, apart from the usual constraints. We require f ⊥ x −1 to ensure integrability and vanishing of the Sivers function at x = 0 and x = 1. Also, we require that is a function of x and b 2 to mimic the operator product expansion structure. We have explored many parametric forms and selected the following one, which is flexible enough to reveal the Sivers function, but at the same time is not overwhelmed with free parameters: The b-dependent factor mimics f N P (x, b) used in SV19 fit, with a reduced number of parameters.
Notice that b and x dependencies do not factorize in our parametrization. The experimental data on Sivers asymmetries is available for various final states, including charged pions and kaons. The quark composition of those final states allows access to u, d, s quark flavors but is not sufficient to distinguish other sea quarks, such asū,d, ands. The Sivers function for heavy quark flavors b and c cannot be extracted with the current data either. Thus, we will distinguish separate functions for u, d, s quarks, and a single sea Sivers function forū,d ands quarks. We nullify the Sivers function for b and c flavors. We also set β s = β sea and s = sea = 0, since they are not restricted by the existing experimental data. Large-x region of the data is also limited at the moment to x 0.5 and we therefore are using a general (1 − x) factor in our parametrization. In total we have 12 free parameters: N u , N d , N s and N sea that dictates the general scale, β u , β d and β sea that gives small-x asymptotic (β i > −1), u and d to fine-tune of valence distributions, and r 0 , r 1 and r 3 for x-dependence in parameterization of transverse momentum behavior (r i > 0).
Let us emphasize that the absence of small-b matching in the optimal Sivers function is not in contradiction with the perturbative order of TMD evolution (NNLO and N 3 LO in the current case) or the perturbative order of matching to other distributions (NNLO for unpolarized distributions). The utilization of different orders for components in TMD factorization is consistent within the ζprescription, as well as, in other schemes with fixed reference scale for TMD distributions, discussed e.g. in Ref. [94], but is not consistent in the resummation-like schemes e.g. used in Refs [27,29,31]. In the latter scheme, one would need to use the matching function for Sivers function at N 3 LO, which is not yet available [73]. For resummation-like schemes of scale-fixation, where the scales of TMD distributions depend on b in an arbitrary manner, such an approach is inconsistent. In this case, the orders of TMD evolution and matching coefficients must be adjusted to guarantee the compensation of scaling logarithms.

Global analysis procedure
In this Section we discuss basic principles of the global QCD analysis, data selection, fit procedure, and the study of the limits of TMD factorization.

Data selection
The TMD factorization theorem is derived in the limit of large-Q and a small relative transverse momentum δ, defined as Figure 2. Distribution of the experimental data over the values of x and δ Eq. (3.1).
The large-Q requirement is needed to suppress the power corrections ∼ M 2 /Q 2 and ∼ Λ 2 /Q 2 , where Λ is a general nonperturbative scale of QCD. Since M and Λ are ∼ 1 GeV, we impose the restriction Q > 2 GeV, which limits possible power corrections to around 10 − 20% for the lowest energy data points. The optimal values of δ for applicability of TMD factorization were studied in Ref. [65] (and were further confirmed by independent studies in Refs. [18,67]), where it was shown that phenomenologically TMD factorization is valid for δ < 0.2 − 0.3, and is strongly violated for large values of δ. In the current study we impose δ < 0.3, assuring that we accommodate data points from as many experiments as possible, still preserving applicability of TMD factorization, see Fig. 2. Summarizing our data selection cuts, we apply the following selection criteria These restrictions are consistent with the applicability of the TMD factorization theorem as discussed in Ref. [65]. However, we hope that a part of power corrections cancels in the ratio of structure functions measured experimentally (2.16, 2.22). The more stringent conditions (say δ < 0.2) would secure the TMD approach, but they are hardly applicable to the modern data, which is dominated by the low-energy measurements. Our data selection cuts (3.2) are the most stringent among all other extractions of Sivers function, compare to Refs. [19][20][21][22][25][26][27][28][29][30]. The Sivers asymmetry in SIDIS has been measured by HERMES [34,35], COMPASS [36,39] 5 and JLab Hall A [41] collaborations. DY measurements of the transverse spin-asymmetry were performed by the COMPASS Collaboration [40] in the pion-induced DY process and by the STAR Collaboration [43] in W ± /Z production. After application of our data selection cuts (3.2) we have 76 data points in total (63 for SIDIS, and 13 for DY). The distribution of the points in the (x, δ)-plane is shown in Fig. 2. The synopsis of data is presented in Table 1.
A large portion of the SIDIS data comes from a recent HERMES analysis [35] that uses a threedimensional kinematic binning and enlarged phase space. It is the three-dimensional binning that allows a clean separation of the TMD factorization region. On the contrary, the Compass and JLab measurements provide effectively "one-dimensional binning", i.e., only one of the kinematic variables has narrow binning, while the rest are integrated over a wide range. Only the P hT -differential measurements could be studied in such cases. The z-differential and x-differential measurements have P hT integrated over the full kinematic range and thus could not be fully described by the TMD factorization theorem. Even for the P hT -differential binning, the TMD factorization is hard to apply due to the presence of z −1 in the data selection rules (3.2). Almost every bin of COMPASS Hermes [35] JLab [41,42] Total 76 Table 1. Synopsis of the data sets used in the analysis. The fourth column "# Points" shows the number of data points selected after application of cuts from Eq. (3.2) and the total number of available data points. The last column shows the average uncorrelated error for points in the data set (after application of (3.2)).
and JLab measurements borders with a region of the phase space where the TMD factorization is strongly violated (P hT /z ∼ Q). Consequently, we were forced to use the average kinematics to include these data points into the fit. The ignorance of the bin integration effects is compensated by large uncertainties of these measurements but could lead to a systematic error in our extraction. We also use the averaged kinematics for HERMES measurement as it is suggested by the HER-MES collaboration, because effects of the bin-integration are already included in the systematic uncertainty of the data 6 .
In the case of DY measurements, the bin integration effects are larger due to the larger bin sizes. These effects are especially significant for electroweak boson production, where the crosssection changes rapidly. Thus, we perform the integration over the bin size separately for the numerator and denominator of Eq. (2.22).

Fit procedure and estimation of uncertainties
To estimate the goodness of theory prediction against the experimental measurements we use the χ 2 -test function defined as where m i is the central value of the i'th measurement, t i is the theory prediction for it, and V ij is the covariance matrix. The covariance matrix is build as usual ) are uncorrelated (correlated) uncertainties of i'th measurement. In the case of asymmetries the main source of the correlated uncertainties is the beam/target polarization dilution. This definition takes into account the nature of experimental uncertainties, and gives a faithful estimate of the agreement between the experimental data and the theory prediction.
The evaluation of the theory prediction for a given set of model parameters is made by artemide. Artemide [69] is the fortran library for numerical computations within TMD factorization approach. It allows for a flexible implementation of any nonperturbative model for TMD distributions alongside with NNLO and N 3 LO perturbative order precision [76,79,93,98]. The evaluation of χ 2 function and the data analysis is performed via the Python interface for artemide, which is (together with all programs used for the current fit) available in Ref. [70]. The minimization of The measured polarized asymmetries are weighted by unpolarized cross-section, see (2.16, 2.22). The unpolarized cross-section is also evaluated by artemide with SV19 and Vpion19 sets. In contrast to other extractions, the SV19 set of unpolarized distribution is extracted from the fit to the large set of DY and SIDIS data without application of additional normalization conditions.
There are two sources of uncertainties in the determination of the Sivers function. The main one is the uncertainties of the experimental data. The other important source is the uncertainties in the determination of unpolarized TMD distributions and the nonperturbative part of TMD evolution. We have estimated both uncertainties using the replica method [100]. The present precision of the experimental data does not restrict values of parameters for the Sivers function strongly, and the χ 2 -test function does not have an isolated global minimum. The obtained uncertainty bands and underlying distributions are very broad and asymmetric, see Fig. 3. Therefore, we use the notations from Eq. (3.5) to faithfully represent the uncertainties of our determination. However, we stress that only the whole set of obtained replicas is meaningful and provides complete information.
To estimate the uncertainties due to the experimental data, we follow the procedure described in Ref. [100]. The method's essence is to generate pseudo-data replicas derived from the experimental data with central values randomly distributed according to the experimental uncertainties. In this procedure, one accounts for the origin of the uncertainties and properly re-scales the errorbands in the pseudo-data. For each replica, we find the values of the nonperturbative parameters that minimize the value of χ 2 -test function. The resulting set of parameter-vectors samples the empirical probability density distribution of parameters. Therefore, with a sufficiently large number of replicas, we can reliably estimate uncertainties for the Sivers function and related observables. We use 500 replicas, which give a reliable precision (2-3 significant digits) that we have verified by computing and comparing to 1000 replicas for several cases.
The uncertainty for SV19 extraction of unpolarized TMD distributions is accounted for by including the distribution of 300 replicas from SV19 analysis (both for NNLO and N 3 LO). To estimate the uncertainties due to unpolarized TMD PDFs, we have minimized χ 2 for each replica of SV19 set with central values of experimental data points. Note that this computation requires a re-evaluation of the normalization factors for asymmetries for each replica.
After the implementation of these two procedures, we have two distributions of model parameters -due to the data uncertainty and due to the uncertainties of unpolarized TMD distributions. For the ideal data/theory input, almost Gaussian statistics for distributions of parameters and observables should be observed. In our case, we have found the distributions that deviate significantly from the Gaussian shape (especially the ones due to the experimental uncertainties). Typically, they are skewed and have long, power-like tails. Two examples of replica distributions (we select parameters r 1 and N sea as examples with the widest distribution due to SV19 uncertainty) and their parameters determination are shown in Fig. 3. We also observed that the distribution due to SV19 uncertainty is much narrower (in most cases by order of magnitude) and less skewed compared to the distribution due to the uncertainty of the experimental data. Therefore, we use the mean value of the distribution due to SV19 uncertainty as to the central fit value (CF value). CF value is the value of our best estimate of the true values for the parameters. The uncertainty is given by 68% confidence interval (68%CI) computed with distribution due to the data uncertainty using the bootstrap method, see Ref. [101]. The results for observables are presented in the form where δ 1,2 distances to boundaries of 68%CI. For continuous functions such as the Sivers function, we use the same method for each point; see an example in Fig. 4. The resulting distributions of replicas are available as a part of artemide distribution [102].
Let us explain our choice of CF values. There are several strategies for determination of CF value used in the literature, compare e.g. with [22,29,31]. Usually, one uses the fit to central values of data, and thus it is close to the true minimum of χ 2 test. However, such a choice is also problematic because resulting parameters could lie outside of 68%CI (see e.g. [29]). Such a situation could happen due to the over-fitting or due to the skewness of distributions. The usage of CF value avoids these problems because averaging over SV19 replicas washes out possible over-fitted cases and remains close to the global minimum of χ 2 , and therefore is highly probable.
In the plots that represent our results, we do not show the uncertainty due to the unpolarized SV19 input. The main reason is that it is small in comparison to the data-related uncertainty. Another reason is that two uncertainty bands could not be combined to the total uncertainty as a quadrature due to the essential non-Gaussianity of distributions. The accurate determination of total uncertainty requires the generation of multiple replicas for each replica of SV19 and thus is very computationally costly. Provided that the uncertainty due to the data is dominating in all cases compared to the uncertainty due to the unpolarized distributions, we have decided to showcase only the former. However, generally speaking, the uncertainty band due to the unpolarized input is not negligible, contrary to the commonly used assumption. To our best knowledge, this is the first estimation of such uncertainty in the analysis of polarized TMD distributions. The TMD factorization works at small values of δ, see Eq. (3.1), see also discussion in Ref. [92]. A priori, the size of power corrections, which violate the factorization approach, is not known. For that reason, one should implement a data selection cut, Eq. (3.2), and exclude the data with large values of δ. In this section, we perform a survey of different values of δ and thus test the boundary for the TMD factorization for asymmetries.

Test of the factorization region limit
To test TMD factorization's applicability, we perform fits of the data selected with different values of δ in Eq. (3.2). The fits are executed at NNLO accuracy. In the region where the factorization theorem is applicable, one expects the value of χ 2 /N pt 1, and grows to larger values outside of the applicability region. In other words, the plot of χ 2 /N pt versus δ should have a plateau in the validity range of the factorization theorem. Such a test has been suggested in Ref. [65] and successfully applied for unpolarized SIDIS and DY data analysis, where it was found that the optimal data cut is δ ∼ 0.2 − 0.25, see Refs. [18,67]. It has been shown that for DY processes, the data with δ > 0.3 are poorly described by the TMD factorization formula. In contrast, the situation for SIDIS is better, and one could go up to δ ∼ 0.3 − 0.35 [18] without significant loss of quality of the fit. These numbers served as the rationale for the initial estimation of our current data selection cut in Eq. (3.2).
The results of our current test are shown in Fig. 5, which has a clear plateau χ 2 /N pt 1 for δ < 0.4. The quality of the fit drops drastically for δ > 0.4 for SIDIS. This result agrees with the general expectations. Indeed, one could expect that power corrections partially cancel in asymmetry, and thus the kinematic range for the applicability of factorization theorems becomes slightly wider. Since, in the SIDIS unpolarized case δ < 0.3 − 0.35 the observation of rough agreement for δ 0.4 is anticipated.
The situation for DY is less certain because the total number of points is small. All points included into the fit have δ < 0.22 (see Fig. 2). There is only one additional point to include. This point is measured in pion-induced DY at COMPASS [40], and it has δ = 0.36 and a wide q T -bin up to values q T Q. This point is outside of the applicability range, and the prediction strongly disagrees with the measurement (χ 2 ∼ 8). The main source of the disagreement is the denominator in Eq. (2.22), which becomes negative. The negative values for cross-section are typical for TMD factorization formula in the region beyond its validity. To get the positive cross-section valid in the full range of q T one should match it to the collinear picture via the so-called Y -term [4]. This goes far beyond the present study.
We conclude that even though the region of TMD factorization widens slightly for asymmetries, one has to be cautious when including the data outside of the TMD factorization region. In the following sections, we analyze only the data with δ < 0.3. This value corresponds to our best estimate of the region of data appropriate for the TMD factorization approach description. Future work that will include matching to the collinear factorization is needed to widen the region of the data used in the global analysis.

Results of extraction
This is the main Section of our work. We describe in detail results of N 3 LO extraction of the Sivers function, also presented in Ref. [10]. We discuss the Sivers function in momentum and position spaces, discuss positivity constraints, show the 3D tomography of the nucleon via the Sivers function, extract the Qiu-Sterman functions, and study the significance of the sign change of the Sivers function between SIDIS and DY.

Fit of the data
Using the approach described in the previous sections, we performed several fits with different setups. In particular, we distinguish the fits with and without inclusion of DY data, with a purpose to estimate the universality of the Sivers function. Also we performed separate fits at NNLO and N 3 LO perturbative precision for the TMD evolution. The synopsis of χ 2 values is presented in Table 2. The distribution of contributions to χ 2 per experiments is shown in Table 3. The values of nonperturbative parameters extracted in these fits are given in Table 4 Table 2. Values for χ 2 /Npt in different fits. Note, that for the cases included in the fit the CF value of χ 2 lies outside the 68%CI. This is because CF realizes the minimum of χ 2 distribution, whereas the 68%CI (roughly) excludes 16% of boundary replicas. The main observation that follows from Table 2 is that the Sivers function extracted in SIDIS data only nicely describes the DY data, even without extra tuning. Indeed, the Sivers function extracted at NNLO from SIDIS data results in χ 2 /N pt = 1.29 for DY data, and the agreement improves slightly at N 3 LO. The overall quality of the description of SIDIS and DY data from the SIDIS data only fit with χ 2 /N pt = 0.93 indicates the consistency of the data with the universality of the Sivers function. The inclusion of DY data into the fit reduces the χ 2 associated with DY. It modifies some nonperturbative parameters, especially those related to the high-x behavior of the Sivers function, such as β and , see Fig. 6. Let us mention that this is the first consistent description of polarized Drell-Yan data in the TMD formalism with TMD evolution. The inclusion of DY data in the fit does not result in the worsening of a description of SIDIS data, and thus the fit demonstrates the absence of the tension between SIDIS and DY data. The inclusion of both SIDIS and DY data sets results in a better overall description of the data χ 2 /N pt = 0.88 (at N 3 LO) and the most precise determination of the Sivers function. We, therefore, believe that our extraction demonstrates the universality of the Sivers function.
The values of model parameters extracted with and without DY data agree within 68%CI, see Fig. 6. The main impact of the inclusion of DY data into fit happens on CF values of parameters (r 1 , N d , N sea , β s ) and to a lesser degree on (β u , u ). Other parameters are almost unaffected by the inclusion of DY data due to the bigger experimental uncertainty of DY data, see Table. 1.
A previous attempt to describe DY W and Z data with TMD evolution was made in Ref. [31]. It    Table 4. Values of parameters obtained in various fits. The visual representation of the table is given in Fig. 6.
faced serious problems giving the best χ 2 /N pt ∼ 1.5 − 1.9 for W ± /Z data (depending on additional assumptions on the evolution of the Sivers function). In our case, the χ 2 /N pt for DY and W ± /Z data close to 1 even without the inclusion of these data into the fit. Unfortunately, it is not possible to directly compare our and [31] approaches and find the origin of the disagreement. Although the main spirit of both works is similar, the number of smaller differences is significant. Let us mention the differences between these works that, in our opinion, influence the results. First, in SV19 and in the present extraction, we operate with the data that definitely belong to the TMD factorization region. In contrast, in Ref. [31] the cut is much wider, δ < 0.75, assuming possible cancellation of power corrections to TMD factorization (see Sec  Figure 7. Description of HERMES data [35] for π ± and K ± , only data with δ < 0.5 are shown. The data are presented as the function of x and the 3D binning of the data is indicated by the bin sizes in P hT (GeV) and z. Solid (open) symbols data used (not used) in the fit. Blue line is the CF and the blue box is 68%CI of the fit of the data and prediction for the data not used in the fit.
differences are the TMD evolution implementation (ζ-prescription vs. CSS-like ansatz) and the nonperturbative model for the Sivers function. In particular, our parametrization for the Sivers function is more flexible compared to Ref. [31] and allows sea quark contributions to be large in the large-x region.
In the remainder of the Section, we will discuss details of the description of various SIDIS and DY data sets coming from various experiments considered in this analysis (see also Table 3). We present results of the description and discuss the data.
HERMES data set [35]. In our fit we use the latest updated data on Sivers asymmetry in SIDIS by the HERMES Collaboration [35] on the proton target for π ± , K ± . The incident electron energy is P lab = 27.5 GeV. Events were selected subject to the requirements Q 2 > 1 GeV 2 , W 2 > 10 GeV 2 , 0.1 < y < 0.95, and 0.023 < x < 0.6. Hadrons were accepted if 0.2 < z < 0.7. The data are presented in a three-dimensional binning in x, z, and P hT (GeV). The correlated uncertainty of the data is 7.3% due to the accuracy of the target polarization determination. Importantly, the systematic uncertainty of HERMES data already includes possible effects of the bin-integration, and thus the theory prediction for this data set must be evaluated using the average bin kinematics. For the SIDIS subset, the largest χ 2 /N pt is for K − production measured at HERMES (typical values ∼ 1.6 for 12 points). The next-to-the-largest χ 2 /N pt is for K + -production measured at Hermes (typical values ∼ 1.3 for 12 points). The rest of the SIDIS data, for π ± and h ± , have partial Figure 8. Description of JLab HALL A data [41,42] for π ± and K ± . Solid (open) symbols data used (not used) in the fit. Blue line is the CF and the blue box is 68%CI of the fit of the data and prediction for the data not used in the fit. χ 2 /N pt which are smaller than 1. These relatively large contributions may be related to either poor knowledge of Kaon fragmentation functions or sea quark Sivers function. Description of HERMES data is presented in Fig. 7 where we plot only the data with δ < 0.5 and show description for both the data used in the fit (solid points) and the data not used in the fit (open points). JLab data set [41,42]. Jefferson Lab experiments in HALL A measured Sivers asymmetry on 3 He target for π ± [41] and K ± [42]. The experiment, conducted at Jefferson Lab using a 5.9 GeV electron beam, covers a range of 0.14 < x < 0.34 with 1.3 < Q 2 < 2.7 GeV 2 . SIDIS events were selected using cuts on the four-momentum transfer squared Q 2 > 1 GeV 2 , the hadronic final-state invariant mass W > 2.3 GeV. The data were presented as Bjorken-x projection. We show the description of JLab HALL A data [41,42] in Fig. 8. P hT (GeV) Figure 9. Description of COMPASS SIDIS data [36] for π ± and K ± , only data with δ < 0.5 are shown. Solid (open) symbols data used (not used) in the fit. Blue line is the CF and the blue box is 68%CI of the fit of the data and prediction for the data not used in the fit. Compass08 [36] and Compass16 [39] data sets. COMPASS measured the Sivers asymmetry using different targets (iso-scalar samples from 2003-2004 data [36] and proton sample for unidentified charged hadrons from 2010, multi-dimensional data [39]) with incident muon energy P lab = 160 GeV. In these measurements, the cuts on the photon virtuality Q 2 > 1 GeV 2 and the mass of the hadronic final state W 2 > 25 GeV 2 were applied, as well as 0.1 < y < 0.9. To simulate the isospin target (deuteron), we make the iso-spin rotation for components of the Sivers function The measurement Compass08 is made for π ± and K ± fragmenting hadrons (we omit the π 0 and K 0 measurements because SV19 extraction does not have these fragmentation functions). The Compass16 measurements is made for charged hadrons h ± , which in SV19 are approximated as sum of pion and kaon components h ± = π ± + K ± ignoring the higher-mass contribution. We show the description of COMPASS SIDIS data [36] in Fig. 9 and [39] in Fig. 10. One can see that, as in previous cases, the data description is good even for the data not used in the fit. CompassDY [40] data set. The data were taken using a high-intensity π − beam of 190 GeV and the transversely polarized isoscalar NH 3 target. Sivers asymmetry was extracted using di-muon events with the invariant mass between 4.3 GeV and 8.5 GeV. The measured asymmetry, A U T , is given in (2.24). Notice that our definition of A U T from Eq. (2.24) corresponds to the definition from Ref. [40] The data is presented in the one-dimensional binnings over x π , x N , x F , q T . In non-q T binning, the integration over q T spans up to 5 GeV, i.e., includes the domain with q T > Q. Therefore, only the q T -binned data could be analyzed within TMD factorization. We show a description of the data in Fig. 11. One can see that the resulting Sivers function describes well the data on q T -dependence that we use in the fit and predict the data on x F -dependence not used in the fit.
STAR [43] data set. The STAR Collaboration at RHIC measured the transverse single-spin asymmetry of weak boson ( charged (W ± ) and neutral (Z/γ)) production in polarized proton-proton collisions at √ s=500 GeV. It is described by A N (2.23) with inclusion of modified factors (2.26, 2.27).
The results were presented as a function of rapidity, y, and the bosons' transverse momentum, q T .
The measured values of asymmetry are much higher (up 60%) than typical asymmetries in SIDIS, which present a certain problem in their description. We show the description of STAR data [43] in Fig. 12. One can see that our global analysis gives a good description of q T dependent data for W ± production. We also describe well y dependent data that is not used directly in the fit for W ± and a single point for Z-boson production that we use in the fit. It is the first agreement with the data of extraction of the Sivers function with TMD evolution to our best knowledge. For the DY subset, the main contribution to the χ 2 /N pt is due to a single Z−boson production point (A N = 0.6 ± 0.33) measured at RHIC. Despite the large error, this single point contributes significantly with ∆χ 2 = (2.9, 1.6, 2.8, 1.6) into fits (SIDIS at NNLO, SIDIS+DY at NNLO, SIDIS at N 3 LO, SIDIS+DY at N 3 LO). Let us notice that for W and Z bosons productions, one should also account for contributions of c and b quarks, which are currently neglected. N 3 LO fit does not essentially change the result of the fit compared to NNLO. It is expected because the difference between NNLO and N 3 LO evolution is relatively marginal, see Fig. 1, especially in comparisons to the large uncertainties of experimental measurements of asymmetries. The values of χ 2 are practically unchanged. As for the values of parameters, we observe that they agree within the error-bands, thus corroborating the stability of evolution effects and the fit results.

Sivers function in the position and the momentum spaces
The extracted Sivers function in position space for u and d quarks is shown in Fig. 13. Its values have notably large uncertainties, which we demonstrate by shaded areas. Another distinctive feature of our extraction is a non-positive definiteness of the Sivers function. The Sivers function does not have the probabilistic interpretation and can have nodes [103,104], which is realized by the parameter . Moreover, the presence of a node is predicted by various models [103,[105][106][107]. The Sivers function for u quark in our extraction, see Fig. 13, turns positive at large-x. However, it can stay negative within 68%CI. Although such behavior looks unusual, it does not contradict any known properties of the Sivers function.
In the momentum representation the TMD distributions for unpolarized quarks are defined as 7 7 Notice that we do not distinguish the symbols for the Sivers functions in the position and the momentum spaces, they are related by the Fourier transform of Eq. (4.4). It is intended by the functional arguments, b or k T , which function we use.  Figure 12. Description of the transverse single-spin asymmetry data [43] for W ± and Z boson production measured by STAR in polarized proton-proton collisions at √ s = 500 GeV. Left column, the data as a function of y for W ± and Z, the right column, the data as a functions of qT GeV for W ± . Solid (open) symbols data used (not used) in the fit. Blue line is the CF and the blue box is 68%CI of the fit of the data and prediction for the data not used in the fit.
The momentum space representation has complicated evolution properties since the TMD evolution factor is multiplicative in the position space. The notion of the optimal TMD distribution is less useful in the momentum space because it involves the integration over all scales. For that reason, we only show the TMD distributions in the momentum space at a fixed scale. Figure 14.
Sivers function in the momentum space (black solid line) for u, d, sea, and s quarks at x = 0.1 and µ = 2 GeV. The blue band is the 68%CI. The gray dashed line is the unpolarized TMD PDF extracted in SV19 shown for the comparison (for u and sea-quark the Sivers function is multiplied by −1 and sea-quark the Sivers function is compared toū unpolarized TMD PDF). Fig. 14. The Fourier transformation, Eq. (4.4), effectively inverses the ranges of variables. Therefore, a large uncertainty at large-b (given by parameters r 0,1,2 ) transforms to a large uncertainty at small-k T . For comparison, we also show the values and uncertainties of the unpolarized TMD PDFs extracted in SV19 fit. We observe that the Sivers function's typical size is about 4-5 times as small as the corresponding unpolarized distribution. Figure 14 shows the functions at x = 0.1, for other values of x of the data used in our fit x ∼ 0.01 − 0.25 profiles are similar. In Fig. 15, we demonstrate the impact of QCD evolution in the momentum space. We show u quark Sivers function calculated by Eq. (4.4) at four different scales Q = 1.5, 5, 20, 91 GeV. As one can see, the evolution modifies the shape and the amplitude of the Sivers function.

Positivity constraints for the Sivers function
In Ref. [108] the positivity constraints for TMD distributions were derived assuming the positivedefiniteness of the polarization matrix due to its probabilistic interpretation in the parton model. In particular, the positivity constraint involving the Sivers function is where g 1T is the worm-gear T or Kotzinian-Mulders [109,110] function. Generally, such positivity constraints are not respected in the quantum field theory due to renormalization effects, which are only enhanced in the TMD case by renormalizing rapidity divergences. Recall in particular that even cross-sections become negative in the region outside of the TMD factorization validity. In some cases the violation of positivity constraints is very significant, e.g., for linearly polarized gluon TMD PDF discussed in Ref. [111]. As far as our analysis includes the TMD evolution, we expect that the positivity constraint is not applicable, given that it is based on the tree order approximation argument. Nonetheless, it is instructive to check the constraint from Eq. (4.5).
In Fig. 16 we plot the function as the function of x and k T at µ = 2 GeV. One has pos > 0 (pos < 0) for the regions where Eq. (4.5) is (not) satisfied in the absence of g 1T contribution.
For the values of the Sivers function we take the largest boundary of 68%CI. We observe that the positivity constraint is satisfied everywhere except for the unmeasured large-x region. If we consider the lowest boundary of 68%CI the region pos > 0 is much larger, in particular, u quark satisfies Eq. (4.5) in the full range of (x, k T ). Also the picture depends on the scale, and improves (in the sense that the the region pos > 0 becomes wider) for larger scales. We conclude that our extraction does not contradict the positivity constraint in the regions reached by the experimental data used in this analysis.

3D tomography of the nucleon and the Sivers function
The magnitude of the Sivers function extracted in our fit is generally much smaller than the unpolarized TMD PDF. To present the distortion effect on the unpolarized quarks driven by the hadron polarization, we introduce the momentum space quark density function where k T is a two-dimensional vector (k T x , k T y ). This function reflects the TMD density of unpolarized quark q in the spin-1/2 hadron totally polarized inŷ-direction, S T = (S x , S y ), where S x = 0, S y = 1, compare to Eq. (4.2). In Fig. 17 we plot ρ at x = 0.1 and µ = 2 GeV. To present the uncertainty in unpolarized and Sivers function, we randomly select one replica for each point of a figure. Thus, the color fluctuation roughly reflects the uncertainty band of our extraction. The presented pictures have a shift of the maximum in k T x , which is the influence of Sivers function that introduces a dipole modulation of the momentum space quark densities. This shift corresponds to the correlation of the Orbital Angular Momentum (OAM) of quarks and the nucleon's spin. One can see from Fig. 17 that u quark has a negative correlation and d quark has a positive correlation. Without OAM of quarks, such a correlation and the Sivers function are zero, and thus we can observe in Fig. 17 the evidence of the presence of OAM of u and d quarks in the wave function of the nucleon. Let us also discuss the tomographic scan of the nucleon both in x and k T . We plot in Fig. 18 the momentum space quark density function ρ 1;q←h ↑ (x, k T , S T , µ) from Eq. (4.7) as function of both x and k T x in order to assess the region in which the Sivers effect has the most influence. The color scheme is chosen to be proportional to the function elevated to power 1/3 in order not to underestimate the region where the function is not big. The asymmetry in color and contours between negative and positive k T x indicates the asymmetry of the distribution and the important influence of the Sivers function. From Fig. 18 one can see that the existing data indicate that most of the correlation between the spin and the motion of the partons happens in the region of large to moderate x. In the low-x region, the momentum space quark density becomes almost symmetric, and it indicates that the Sivers effect becomes smaller and corresponding experimentally observed asymmetry is small. Of course, one has to consider that there is no experimental data in the low-x region available yet, so our findings must be corroborated by the future Electron-Ion Collider data. At the same time, it is crucial to explore the large-x region where the effect is the largest, and the future Jefferson Lab 12 GeV data will be important for the exploration of this region.

Determination of the Qiu-Sterman function
At small-b the Sivers function f ⊥ 1T (x, b) can be expressed via the operator product expansion (OPE) through the collinear twist-3 distributions [59,73,86,112,113]. In our determination we do not use this relation as twist-3 functions are largely unknown. Instead, we will use the opposite strategy and determine the collinear twist-3 component from the extracted Sivers function. Such a determination has a limited power, and allows to extract only Qiu-Sterman (QS) function with certain systematic uncertainty. Nonetheless, such an extraction is meaningful, especially because the information on twist-3 distributions is very limited. Moreover, the extraction of QS function given here is much less theoretically biased in comparison to other extractions, such as those made in in Refs. [27,31], where QS function is parametrized via twist-2 distributions and expected to have DGLAP-type evolution equation.
The complete expression for matching of the Sivers function to collinear twist-3 distributions at NLO was derived in Ref. [73]. In the ζ-prescription, this expression reads whereȳ = 1 − y, N c = 3 is the number of colors, C F = (N 2 c − 1)/(2N c ) = 4/3, a s = g 2 /(4π) 2 is the strong coupling constant, and L µ = ln(µ 2 b 2 e 2γ E /4). The function T is the twist-3 collinear distribution defined by the matrix element p, s|gq(z 1 n)[z 1 n, z 2 n] n F µ+ (z 2 n)[z 2 n, z 3 n]q(z 3 n)|p, s (4.9) where F µν is the gluon-strength tensor, n is a light-cone vector. The function G (+) is a similar matrix element with three F µ+ 's. Its explicit form is not important for the present discussion and can be found in Ref. [73]. The notation P ⊗ T refers to the leading order evolution kernel for T q (−x, 0, x). It has the form of a complicated integral convolution that involves function T q , ∆T q (the analog of T with γ µ → γ µ γ 5 ) and G (±) . The expression for this kernel can be found in Refs. [73,97]. It is crucial that the evolution term involves twist-3 function for a generic argument (x 1 , x 2 , x 3 ), but not just (−x, 0, x) as for QS matrix element. Moreover, the dominant contribution to this convolution is given by the integral along (−x, x − ξ, ξ)-line with ξ ∈ [x, 1], whereas the contribution from the QS-component (−ξ, 0, ξ) is suppressed by almost two orders of magnitude [97,114]. The scale µ in (4.8) is the scale of OPE, and present only on the right-hand-side of Eq. (4.8). The sum of all terms becomes µ independent, so that the left-hand-side, corresponding to the optimal Sivers function, does not depend on µ.
The right-hand side of Eq. (4.8) depends on four nonperturbative functions, each of which is a function of two variables (x 1 , x 2 , −x 1 − x 2 ). To reduce the number of unknowns we set such that L µ = 0. This choice essentially reduces number of functions and parameteric freedom since the remaining functions are only T q (−x, 0, x) and G (+) (−x, 0, x), i.e. QS-functions for the quark and the gluon. The resulting expression can be inverted by means of the perturbation theory This expression can be written as To use this expression, we should select a reasonably small value of b, such that power corrections are negligible. Simultaneously, b could not be too close to 0 because this region corresponds to a very high-energy and thus unreliable in the current extraction. The reasonable compromise is b 0.11 GeV −1 such that µ b = 10 GeV. In this case, we could estimate the introduced systematic uncertainty due to omitted power corrections as O(M 2 b 2 ) ∼ 1%, which is smaller than perturbative uncertainties at this scale. Extraction of the QS function at lower scales, µ ∼ 2 GeV, is not reliable in this approach as the corresponding value of b ∼ 0.5 GeV −1 is relatively large, and the power corrections become to be not negligible. The gluon function G (+) is also unknown, so we set it to be zero. The resulting QS-functions are shown in Fig. 19 by the black line, with 68%CI (blue band). To estimate the uncertainty due to the unknown gluon contribution we approximate The resulting band for CF value is shown in black and in green for 68%CI. The effects of gluons are not negligible for x 0.2. The extracted QSfunction is in general agreement with the model computations made in the light-cone wave function model in Ref. [114]. We also compare our results to other extractions of the QS functions. These are the extraction from Ref. [30] made in the parton model approximation with SIDIS, DY, pp and A N asymmetries; the NLL extraction from Ref. [29] from SIDIS data; and the NLO/NNLL extraction from Ref. [31] from SIDIS (and DY) data. One can see that our results confirm the signs of the QS functions for u and d quarks found in Refs. [29][30][31].
We obtain non-negligible functions for s and sea quarks, and our extraction shows bigger functions in relatively large-x regions and disagrees with the signs obtained in Ref. [31]. The reason partially because our QS functions for u quarks that changes sign in the large-x region. Another reason is that Ref. [31] and Ref. [29] use collinear unpolarized distributions to parametrize the QS functions and therefore cannot obtain sizable functions for sea quarks in the large-x region. The QS function belongs to a different type of function, and we believe that parametrizations of twist-3 functions that utilize collinear twist-2 functions are not optimal and may bias the results of the extraction.
We have studied the functional shape of the Sivers functions, and in particular we constrained all > 0 to remove the nodes from the Sivers function. It turns out that a good description of the data with χ 2 /N pt < 1 is still possible. Another study that we performed was dedicated to the large-x behavior of the Sivers functions, namely, we added an extra factor (1 − x) to our ansatz (this choice is inspired by the model calculations made in Refs. [53,54], where it was found that the Sivers functions behave as (1 − x) 2 in the large-x region). The resulting fit is also good with χ 2 /N pt < 1, and, in particular, for the sea quarks and s-quarks the resulting functions become much smaller in the large-x region.
We conclude that the current data do not constrain the large-x (and small-x) behavior of the Sivers functions and they exhibit large uncertainties in the region of x > 0.3 and x < 0.01. Future data from EIC and JLab 12 will be very important for exploration of both small-x and large-x behavior of the Sivers functions. Figure 19. Qiu-Sterman function at µ = 10GeV for different quark flavors, derived from the Sivers function (4.11). Our results are labeled as BPV20. The black line shows the CF value. Blue band shows 68%CI without gluon contribution added. The green band shows the band obtained by adding the gluon contribution estimated to be G (+) = ±|T d + Tu| as described in the text. Our results are compared to JAM20 [30] (gray dashed line with the error corridor hatched), PV20 [29] (magenta hatched region), ETK20 [31] (violet hatched region, dashed line).

Analysis of the sign change
The sign-change of the Sivers function (2.3) is one of the principal predictions of the TMD factorization theorem. It follows from the nontrivial shape of the gauge-link contour within TMD operators (2.1) and would be absent in the case of a straight gauge link. Here, we attempt to estimate the significance of the sign-change.  Figure 20. To make a test of the sign change, we performed an independent fit of SIDIS and DY data with f ⊥ 1T [SIDIS] = +f ⊥ 1T [DY ] , i.e., assuming the Sivers function does not change the sign. The fit is performed at N 3 LO. The comparison of fits with and without sign-change is presented in Table 5. The CV fit demonstrates good values of χ 2 /N pt = 1.00, with the 68%CI being [1.08, 1.22]. The (normalized) histograms of χ 2 replicas for same-and opposite-sign fits are shown in Fig. 20, together with χ 2 distribution for N pt − 1=75 degrees of freedom. The p-values of different cases are calculated as areas under the sampling distribution in [χ 2 tot , ∞) interval, and given in Table 5. The case f ⊥ 1T [SIDIS] = +f ⊥ 1T [DY ] has somewhat higher χ 2 , and consequently lower p-value. Nonetheless, the difference is not large, and 68%CI almost overlap. Therefore, we conclude that one cannot strictly discriminate with the current experimental data the possibility of the Sivers function having the same sign in DY and SIDIS.
The  81, 1.27] in the case of the sign-change, correspondingly). Simultaneously, the 68%CI for the total χ 2 is broader and located at higher values. This indicates a tension with the data in the same-sign approach, namely, the Sivers function that provides a better description for SIDIS gives a worse description for DY and vice-versa.
It is also instructive to compare Sivers functions extracted in both fits. We have found that the parameters extracted in both cases agree within 68%CI's, except for N sea -parameter, which flips the sign. It shows that u, d, and s components are mainly constrained by the SIDIS data, where the dominant contribution comes from q + γ * → q sub-process. In the DY process, the anti-quarks play a more significant role since the dominant sub-process is q +q → γ * . Given that the unpolarized TMD for anti-quarks is much smaller than for quarks, the sign for anti-quark Sivers function almost exclusively defines the sign of the asymmetry of W ± /Z production in polarized p + p collision.

Conclusions
We extract Sivers function from the global fit of SIDIS, pion-induced Drell-Yan and W ± /Z-bozon production experimental data. For the first time, using TMD evolution, we demonstrate the universality of TMD factorization description for SIDIS and DY transverse spin asymmetries. Our analysis is done in the ζ-prescription with the unpolarized TMD distributions and nonperturbative CS-kernel extracted in [18] (SV19), together with NNLO and N 3 LO TMD evolution. Our results compare well in magnitude with the existing extractions [19][20][21][22][23][24][25][26][27][28][29][30][31][32] and confirm the sign of Sivers function for u and d quarks while we also obtain a non-negligible Sivers function for s quark and anti-quarks. The analysis was done with artemide package [69]. The fitting codes and the results of the extraction (in the form of replica-distribution for model parameters) are publicly available at [70] and [102].
To demonstrate the Sivers function's universality, we perform an independent fit of Sivers function from the SIDIS data only and confirm that it also describes well the DY data without any need for re-fitting. It is the first explicit check of universality for Sivers function with TMD evolution to our best knowledge. The previous successful attempt was made in the parton model approximation in Refs. [28,30]. Moreover, it is the first time SIDIS and DY data on transverse-spin asymmetries are consistently described together with a good χ 2 . The previous attempt to make a joined fit of Sivers function with TMD evolution [31] faced a problem due to the difficulty in describing the large values of asymmetries in W ± /Z-production data measured by RHIC. In our analysis, we do not observe any difficulties with this data set. Although our approach is based on the same general theoretical ground as that of [31], our approach has a number of improvements with respect to others, and each of them could be deciding. One of such improvements is the usage of the ζ-prescription. The central feature of the ζ-prescription is the separation of perturbative and nonperturbative elements of the TMD factorization. So, we non-controversially use the NNLO or N 3 LO TMD evolution, NNLO small-b matching for unpolarized ingredients (TMD PDF, TMD FF, and CS-kernel), without specification of the collinear limit for the Sivers function. On the one hand, we use the best possible perturbative input and unpolarized nonperturbative parts fitted to the global data. On the other, the Sivers function is extracted as an entirely nonperturbative function of x and b, and such a parametrization allows for sizable contributions from sea quarks in the large-x region, and this may be a decisive difference with respect to the analysis of Ref. [31].
In turn, the extracted Sivers function was used to determine the QS-function, with the NLO matching relation. To our best knowledge, it is the first unbiased determination of the QS-function since all previous extractions made certain assumptions on its evolution.
Another important point is the conservative selection of the data. The TMD factorization is valid at small values of δ = q T /Q for DY, and δ = P hT /(zQ) for SIDIS. In our analysis we used only the data with δ < 0.3, which is much more strict compared to other fits [29,31,31]. It resulted in a relatively smaller data pool (76 points in total), which is guaranteed to belong to the TMD factorization domain. Additionally, we performed the test of limits for δ, and found that one can raise δ to 0.4 in the case of the transverse single-spin asymmetry measured in SIDIS.
We have also performed a test of the sign-change relation between SIDIS and DY definitions of the Sivers function. We found that the fit without sign-flip converges to values of χ 2 only slightly worse than the fit with the predicted sign-flip. Therefore, we cannot statistically disregard this possibility. We have observed that the sign of the DY asymmetry is strongly correlated to the sign of the Sivers function for sea quarks, which is also apparent from the partonic channel consideration. Therefore, to clearly distinguish sign-flip/non-sign-flip scenarios, one needs the data with more substantial restrictions on the sea contribution, such as DY and kaon-production in SIDIS. Indeed, the on-going analysis of DY production by STAR, COMPASS, and the future Electron-Ion Collider will constraint the sea quark Sivers function.
We present in Figs. 17 and 18 the momentum space tomographic slices of the transversely polarized nucleon. These slices are representations of the three-dimensional (3D) nucleon structure encoded in TMD PDFs. The future and existing facilities such as the Electron-Ion Collider and Jefferson Lab 12 GeV Upgrade physics programs aim at sharpening our understanding of the 3D structure of the nucleon. We will study the impact of JLab and EIC data on the Sivers function's knowledge in the forthcoming publication.
Our results set a new benchmark and the standard of precision for studies of TMD polarized functions. They will be important for theoretical, phenomenological, and experimental studies of the 3D nucleon structure and the planning of experimental programs of existing and future facilities, such as Jefferson Lab 12 GeV Upgrade, Electron-Ion Collider, and others [44][45][46][47][48][49][50][51].