Non-perturbative structure of semi-inclusive deep-inelastic and Drell-Yan scattering at small transverse momentum

We consider semi-inclusive deep inelastic scattering (SIDIS) and Drell-Yan events within transverse momentum dependent (TMD) factorization. Based on the simultaneous fit of multiple data points, we extract the unpolarized TMD distributions and the non-perturbative evolution kernel. The high quality of the fit confirms a complete universality of TMD non-perturbative distributions. The extraction is supplemented by phenomenological analyses of various parts of the TMD factorization, such as sensitivity to non-perturbative parameterizations, perturbative orders, collinear distributions, correlations between parameters, and others.

In this article, we consider the unpolarized observables that have the simplest structure and are accessible in a relatively large number of experiments. It allows us to extract the quark unpolarized TMD distributions and the NP part of TMD evolution. There were plenty of extraction of these elements within various schemes [13][14][15][16][17][18][19][20][21]. The distinctive feature of this work is the simultaneous consideration of two kinds of reactions: DY and SIDIS. Previously, a global fit of both processes was performed only in ref. [18]. We demonstrate that the global description of both processes is straightforward and does not meet any obstacle. The description is based on the latest theory developments, such as next-to-next-to-leading order (NNLO) and N 3 LO perturbative parts together with ζ-prescription. In addition, we make a special emphasis on the following topics that previously did not attract serious attention: the universality and the theory uncertainties.
The theoretical work done in recent years for the development of the elements of TMD factorization has been noticeable. The significant efforts were committed in the perturbative calculations for TMD distributions at small-b [22][23][24][25][26][27][28]. Together with the N 3 LO results for universal QCD anomalous dimensions [29][30][31][32][33], it leads to an extremely accurate perturbative input. The consistent composition of all elements is made employing the ζ-prescription [11,21]. The ζ-prescription is essential for current and future TMD phenomenology because it grants a unified approach to observables irrespectively of the order of perturbative matching. So, the collinear matching procedure that is fundamental for resummation approaches (such as in refs. [4,[34][35][36][37][38][39]) or b * -like prescriptions (such as in refs. [5,18,40,41]), is considered just as a part of the model for a TMD distribution in the ζ-prescription. Therefore, unpolarized TMD distributions (extracted in this work with NNLO matching) and the TMD evolution (extracted in this work with NNLO and N 3 LO matching) are entirely universal and could be used for the description of other processes, where the matching is not known at such a high order.
The factorization theorem declares a strong universality of NP elements. This statement refers to two important facts relevant for the present study: a) the evolution kernel is the same for all processes where the TMD factorization theorem is valid; b) the TMD parton distribution functions (TMDPDF) are the same in DY and SIDIS experiments. The test of universality needs an analysis of different types of experiments at the same time. Although the universality is a cornerstone of the approach, we have not found any dedicated tests in the literature. In order to check and proof universality properties of the TMD approach, we perform a three steps analysis: I. Firstly, we consider only the DY measurements, and analyze TMDPDF f 1 (x, b) and rapidity anomalous dimension (RAD), D(µ, b). The DY data sets have a vast span in x and Q, therefore, it is possible to extract f 1 (that dictates the x-dependence of the cross-section) and D (that dictates the Q-dependence of the cross-section) without a significant correlation between these functions. This analysis is conceptually similar to the previous work [20], albeit some improvements.
II. Using the outcome of the previous step (D and f 1 ), we consider the SIDIS measurements and extract the TMDFF, D 1 . Assuming the universality of TMD distributions, one should be able to describe the SIDIS cross-section with a single extra function D 1 . It is a nontrivial statement since the SIDIS cross-section has 4-degrees of freedom, only two of which are affected by D 1 . Additionally, the present SIDIS data are concentrated in a range of small-Q that is unreachable for DY experiments.
III. Finally, we perform a simultaneous fit of DY and SIDIS data. Given the excellent quality of the separate DY and SIDIS fits, this stage could provide only a fine-tune of non-perturbative parameters.
These three independent analyses provide a consistent and congruent picture of the TMD factorization and allow the extraction of three non-perturbative functions (unpolarized quark TMDPDF, TMDFF and quark evolution kernel). We find that our results are in full agreement with the depicted scenario, which gives a solid confirmation of the declared universality.
Apart from the described test of universality and extraction of TMD distributions in this work, we perform many additional tests and checks of the TMD approach: we test the phenomenological limits of the TMD factorization for SIDIS; we check the dependence of the TMD prediction on the collinear input; we perform an overall test of impact power suppressed contributions to the TMD factorization; we check the impact of experimental constraints on the final phase space configurations (like fiducial cross sections and lepton cuts at LHC, bin shapes in HERMES kinematics). Altogether the tests can form a comprehensive picture of TMD factorization and its accuracy. We have observed the impact of some input uncertainties, f.i. the ones from collinear PDF, to the prediction is unlucky large. Still, we restrict ourself to the indication of problematic regions, leaving it as an invitation for the further developments in the future.
Given the number of details needed for the presentation of this work, we split the discussions into almost independent parts. The first part, sec. 2, contains the description of the TMD factorization theorem for unpolarized DY and SIDIS cases. In this section, we articulate all relevant formulas, including a lot of small corrections and details that were previously only mentioned. This part provides a comprehensive collection of theory results, which can be useful for comparison with other works and future tests, and it can be considered a theory review. The sec. 3 is devoted to the review of the available SIDIS and DY data suitable for unpolarized TMD phenomenology. The sec. 4 presents the details of the comparison of the theory expression with the experimental data. It contains the definition of χ 2 -test, the interpretation of the experimental environment, and some details of the numerical implementation that is made by artemide package [42]. The following sections 5, 6 and 7 describe the fit program outlined earlier, and they are devoted to DY(only), SIDIS(only), and DY and SIDIS(together) fits. Each of these sections contains several subsections describing impact tests of specific peculiarities. Finally, we collect the information on the resulting NP functions in sec. 8.

Cross sections in TMD factorization
In this section, we present in detail the cross sections of SIDIS and DY processes in the TMD factorization, skipping their derivation that can be found in refs. [1][2][3][4][5][6][7][8][9][10][11]. The main aim of the section is to collect all pieces of information on theoretical approximations and models that are used in the fit procedure.

SIDIS cross-section
The (semi-inclusive) deep-inelastic scattering (SIDIS) is defined by the reaction (l) + H(P ) → (l ) + h(p h ) + X, (2.1) where l is a lepton, H and h are respectively the target and the fragmenting hadrons, and X is the undetected final state. The vectors in brackets denote the momenta of each particle. The masses of the particles are In the following, we neglect the lepton masses, but keep the effects of the hadron masses.
Approximating the interaction of a lepton and a hadron by a single photon exchange, one obtains the differential cross-section where e is the lepton charge, and J µ is the electro-magnetic current.

Factorization for hadronic tensor
In this work we consider the transverse momentum dependence of the cross section which is factorizable in terms of transverse momentum dependent (TMD) distributions in the limit of q T Q, where q T is defined in eq. (2.22) and Q is the di-lepton invariant mass. We refer to the literature about the proof of factorization of the processes related to this work [1,[4][5][6][7][8][9][10]. In order to specify the properties of the TMD distributions and the factorized hadronic tensor, we start fixing the basic notation.
The formulation of the factorization theorem is done in the hadronic Breit frame (alternatively, we can call it "the factorization frame"), where the momenta of hadrons are almost light-like and back-to-back. The light-like direction to which the hadrons are aligned defines the decomposition of their momenta, as we detail for each type of experiment. For SIDIS we set with n 2 =n 2 = 0, (nn) = 1. Here, we have also introduced the common notation of a vector decomposition The traverse component of a vector is extracted with the projector We also use the convention that the bold font denotes vectors that have only transverse components. So, they obey Euclidian scalar product: For unpolarized hadrons, the factorized hadronic tensor in its complete form reads where the index f in the sum runs through all quark flavours (including anti-quarks), e f is a charge of a quark measured in units of e. The function C V is the matching coefficient for vector current to collinear/anti-collinear vector current and the factorization (µ) and rapidity (ζ) scales typical of the TMD factorization are shown explicitly. The variables x S and z S are the collinear fractions of parton momentum, which are invariant under boosts along the direction of n,n, but are not invariant for a generic Lorentz transformation. The unpolarized TMDPDF and TMDFF from partons of flavor f are defined as Here, W v (x) are Wilson lines rooted at x and pointing along vector v to infinity. In the case of SIDIS, the Wilson lines in TMDPDF(TMDFF) points to future (past) infinity. The functions h ⊥ 1 and H ⊥ 1 are Boer-Mulders and Collins functions respectively and they are defined as where µν T = +−µν . In formulas (2.11-2.14) we have omit for brevity the obvious details of operator definitions, such as T (T )-ordering, color and spinor indices, rapidity and ultraviolet renormalization factors.
Boer-Mulders and Collins functions (2.13,2.14) do not contribute to the angle averaged crosssection. However they can appear when cuts on the phase space distributions of final particles are introduced by the experimental setup. In this work we will not consider these effects, and leave their study for the future.
The TMD distributions depend on b 2 only. Therefore, the angular dependence can be integrated explicitly with the result The functions W f are dimensionless and scale-independent functions. The experimental configurations are not usually provided in the factorization frame, and the correspondence between the measured quantities and the ones that appear in the factorization theorem is often non-trivial. It happens in fact, that a Lorentz transformation affects the power corrections to the cross section presented here. We detail this in the next sections.

Kinematic variables
For the SIDIS cross-section one introduces the following scalar variables: In the experimental environment one typically measures the transverse momentum defined as the one of the produced hadron with respect to the plane formed by vectors q and P . The projector corresponding to these transverse components is given by the tensor g µν ⊥ defined as In what follows, we denote the transverse components of v µ in the factorization frame as v µ T , see eq. (2.7), while transverse components projected by g ⊥ are v µ ⊥ . To describe the target-and produced-mass corrections, it is convenient to use the following combinations The definition of ς 2 ⊥ in eq. (2.20) contains p 2 h⊥ = p hµ p hν g µν ⊥ . The measured transverse momentum p ⊥ is different from the one defined in the TMD factorization. The transverse momentum used in the factorization theorem q T is defined with respect to the hadron-hadron-plane. These corresponding transverse components are extracted by the tensor g µν T in eq. (2.7). In terms of hadron momenta the tensor g T reads Using projectors (2.21) and (2.19), it is straightforward to derive the relation between q 2 T = q µ q ν g µν T and p 2 ⊥ = p µ h p ν h g ⊥,µν : Using these definition we rewrite the elements of the SIDIS cross-section formula in the terms of observable variables. The differential volumes of the phase space are where ψ is the azimuthal angle of scattered lepton, and ϕ is the azimuthal angle of the produced hadron. The variables x S and z S in eq. (2.10) are where we have used the variable q 2 T (2.22) for simplicity. The kinematic corrections presented above are usually small when Q M, m. In this case the relation between observed and factorization variables simplifies Notice that the data of SIDIS at our disposal are taken at energies comparable with hadron masses and thus target mass correction could be significant. The contributions dependent of hadron masses could in principle be classified as power corrections. However we consider more appropriate to distinguish these corrections from others of different origin. Thus we will not use the approximate formulas in eq. (2.26). The phenomenological test of this assumption is given in sec. 6.2.

Leptonic tensor
The leptonic tensor for unpolarized SIDIS is In order to express the convolution of the leptonic tensor with a hadronic tensor we define the azimuthal angle of a produced hadron as [2]: (2.28) and we define As the result we obtain .
The kinematical rearrangements of the variables produce the appearance of the cos φ and cos 2φ terms in the second line of eq. (2.29), that is, there are contributions to the structure functions F cos φ U U and F cos 2φ U U , see also [43]. Similarly, the convolution of lepton tensor with the spin-1 part produces also contribution to the cos φ and cos 2φ parts. The terms ∼ p 2 ⊥ /Q 2 presented in eqs. (2.29,2.30) are to be modified by the power corrections to the TMD factorization. Moreover, these terms could be artifacts of the violation of transversality of hadron tensor (2.90) at the leading order of the TMD factorization. The status of these corrections could not be resolved without the computation of the power corrections.
The final expression for the cross section in eq. (2.31) explicitly shows that a part of power corrections has a kinematical origin, and thus are independent of the factorization theorem and can be taken into account with the present formalism without contradictions. Examples are, the factor 1 − ς 2 ⊥ that is a part of the phase-space element, and the difference between z S and z that is a consequence of the TMDFF definition. The separation between kinematical power corrections and higher orders in the power expansion of the cross-section is however not neat, because a detailed study of the factorization theorem correction is still not complete. The admixture of these effects can be seen in the second line of eq. (2.31), which is the present status of our understanding. In the fit of the presented later we omit the contribution W h ⊥ 1 H ⊥ 1 in eq. (2.31) and perform a check of the importance of mass-corrections for the agreement with experimental data in sec. 6.2.

DY cross-section
The Drell-Yan pair production (or DY for shortness) is defined by the process h 1 (P 1 ) + h 2 (P 2 ) → l(l) + l (l ) + X, (2.32) where l, l are the lepton pair, h 1 , h 2 are the colliding hadrons, and the symbols in brackets denote the momentum of each particle. In the following, we include the hadron masses and we neglect the lepton masses: The energies of the DY experiments are higher than the SIDIS ones, and the interference of electroweak (EW) bosons must be included. Approximating the interactions of leptons and hadrons by a single EW-gauge boson exchange one obtains the following expression for the differential crosssection where q = l +l , α em = e 2 /4π is the QED coupling constant and the index G runs over gauge bosons γ, Z. Here, we have approximated the exact flux factor with 1/s because the corrections of order M 2 /s are negligibly small for any considered data set. The function ∆ G (q) is defined as with M Z = 91.188GeV and Γ Z = 2.495GeV [44]. Finally, L µν GG and W µν GG are the leptonic and hadronic tensors that are defined as where e is the lepton charge, and J G µ is the current for the production of EW gauge boson G. Integrating the cross-section over a lepton momentum one finds where q is the momentum of the EW-gauge boson. The new lepton tensor is (2.39)

Factorization for hadronic tensor
The factorization for DY hadronic tensor is totally analogous to the SIDIS case. The vectors n and n are defined by hadrons, where on r.h.s. the small contributions ∼ M 2 /s are neglected. The inclusion of weak-boson exchange requires the consideration of a more general current. To this purpose we define with the EW coupling constants where e f is the electric charge of a particle (in units of e), T 3 is the third projection of weak isospin, Collecting all this, the unpolarized part of the factorized hadronic tensor reads where f runs through all quark flavours. The functions f 1,f ←h1 are the TMDPDF and the functions h ⊥ 1,f ←h1 are the Boer-Mulders functions, defined in eq. (2.11, 2.13). In eq. (2.43) we have omitted arguments of TMD distributions for brevity, however this can be included with e.g.
and similarly for all products of TMD distributions. The variables x 1 and x 2 measure the collinear fractions of parton momenta, (2.45) The flavor indices f runs though all flavors of quarks and anti-quarks. Here, the flavor indexf refers corresponding anti-parton. Note that, in the case of W-boson, the constants g W L/R mix the flavors of quarks.
In the factorized hadronic tensor in eq. (2.43) contributions are not equally important. In fact, the fifth line of eq. (2.43) vanishes identically due to the peculiar combination of g-constants that is null for any electro-weak channel. The forth line can contribute only to ZZ and Zγ-channels, that have an anti-symmetric part of the leptonic tensor. However, the resulting expression is antisymmetric in the rapidity parameter, and thus it vanishes when the rapidity is measured/integrated on symmetric intervals. In principle, this part can contribute to a cross-section when experiments perform very asymmetric kinematic cuts on the detected leptons (e.g. at LHCb). However, even in this case the resulting integral is suppressed as q 2 T /Q 2 e −2|y| and it is numerically very small, e.g in some bins it can give a 10 −6 − 10 −8 -size relative to the leading contribution. Thus, in the following we do not consider contributions of the last two lines in eq. (2.43).
Performing the integration over angles we obtain a result formally similar to the SIDIS case in eq. (2.15),

Kinematic variables
The relevant kinematic variable in DY read The transverse components are projected by a tensor g µν T , that is orthogonal to P µ 1 and P µ 2 , identically to the SIDIS case eq. (2.21), and we have dropped the negligible corrections of order of M 2 /s. In this limit, the factorization theorem is expressed in the center-of-mass frame, the components of momenta are P + 1 = P − 2 = s/2 and the variables x 1,2 in eq. (2.45) are (2.51) The differential phase-space element reads where ϕ is the azimuthal angle of the vector boson.

Lepton tensor and fiducial cuts
In experiments not all final state leptons are collected in the measurements and fiducial cuts are for instance performed at LHC. We use the same implementation of cuts as in [19,20]. However, here we give a more general discussion to see how they affect the power suppressed parts of the cross section.
The lepton tensor of unpolarized DY formally written in eq. (2.36) is where g R G (g L G ) are the couplings of right (left) components of a lepton field to EW current as in eq. (2.41). In the case of W boson, these couplings also carry flavor indices. As discussed in sec. 2.2.1, the anti-symmetric part does not contribute visibly to the unpolarized cross-section even in the presence of asymmetric fiducial cuts. The DY cross-section contains the lepton tensor integrated over the lepton momenta with l + l = q, in eq. (2.39), and this gives are formally of higher twist with respect to the others. While higher twist contributions are formally accompanied by q 2 T /Q 2 factors, the complex kinematics of the experiments (especially in the SIDIS case) makes it hard to distinguish purely non-perturbative higher-twist effects from the kinematical ones. For instance, the azimuthal angles measured in the lab frames and in the Breit frame for SIDIS are different and some non-perturbative QCD effects can be overlooked when we pass from one frame to the other. The only way to solve this problem would be a complete inclusion of higher power corrections to the cross section, which goes beyond the scope of the present work. For this reason, while we consider the exact kinematics, as described in the previous section, we also put The effect of this assumption must be very small at q 2 T Q 2 , and this justifies the conservative data sets used in the present fit (see sec. 3).
The Q dependence of W XY is dictated by the TMD evolution, and it is discussed in the next section 2.3.1. The asymptotic limit of high q T allows for a perturbative matching of TMD distributions to collinear ones and it is discussed in sec. 2.3.2. The non-perturbative inputs on top of the large-q T asymptotic limit are discussed in sec. 2.3.3. Finally, we summarize all theoretical inputs in sec. 2.4.

TMD evolution and optimal TMD distributions
While the differential evolution equations for TMD are fixed by the factorization theorem, the boundary conditions of their solution are a matter of choice. They clearly determine the convergence of the perturbative series and the success of the theoretical description of DY and SIDIS spectrum. In this paper, we work with the so-called ζ-prescription described in [11], and including the improvement found in [21]. The prescription consists in defining the TMD distribution on a null-evolution line. The null-evolution line has the defining property of keeping the evolution factor for a TMD distributions equal to one for all values of the impact parameter b. Because of this property, the ζ-prescription is conceptually different from other popular prescriptions, where the reference scales do not belong to a null-evolution line. In this case, the resulting (reference) TMD distribution has an admixture of a perturbative evolution factor between different values of b. Therefore, the ζ-prescription has an important advantage, that a TMD distribution is independent of any perturbative parameter, i.e. it is completely non-perturbative and one can freely parameterize a distribution without any respect to perturbation order. For a detailed description and analyses of TMD evolution and the ζ-prescription we refer to [11], whereas here we present only the final expressions without derivation.
The system of TMD evolution equations is where F is any TMD distribution (f 1 or D 1 in the present case). The TMD evolution equations are not sensitive to the flavor of a parton 1 and thus we omit flavor indices in this section for simplicity. The eq. (2.64) is a standard renormalization group equation, which comes from the renormalization of the ultraviolet divergences, with the function γ F (µ, ζ) being the anomalous dimension. The eq. (2.65) results from the factorization of rapidity divergences. The function D(µ, b) is called the rapidity anomalous dimension (RAD). The RAD is a generic non-perturbative function that can be computed at small values of b in the perturbation theory. The perturbative expression for the RAD and γ F can be found in the literature (e.g. see appendix of ref. [25]). In this work we use the resummed version of RAD [49]. The resummed expressions are also given in appendix B (see also appendix B in ref. [38]). The scales µ and ζ have independent origins, and this has important consequences. To start with, the TMD evolution takes place in the plane (µ, ζ). The solution of equations eq. (2.64, 2.65) for the evolution from a point (µ f , ζ f ) to a point (µ i , ζ i ) is where P is any path in (µ, ζ)-plane that connects initial (µ i , ζ i ) and final points (µ f , ζ f ). The value of evolution is (in principle) independent on the path, thanks to integrability condition (also known as Collins-Soper (CS) equation [40]) where Γ cusp (µ) is the cusp anomalous dimension. This equation dictates the logarithmic structure of anomalous dimensions. In particular, the TMD anomalous dimension is The formal path-independence of eq. (2.66) is violated at any fixed order of perturbation theory. The penalty term is proportional to the area surrounded by paths, and can be huge in the case of very separated scales. Nevertheless, the path dependence decreases with the increase of the perturbative order and it is numerically small at N 3 LO [11]. The final scales of the evolution are binded to the hard scale of factorization such that µ 2 f ∼ Q 2 and ζ 1f ζ 2f = Q 4 . In particular, we choose the symmetric point The TMD initial (or defining) scale is chosen with the ζ-prescription and deserves some explanation.
In the ζ-prescription the scales µ and ζ belong to a null-evolution line, that we parameterize as (µ, ζ µ (b)). To find the null-evolution line, we recall that the system of eq. (2.64, 2.65) is a twodimensional gradient equation ). Therefore, the null-evolution line is simply an equipotential line of the field E. It provides the equation that

70)
A TMD distribution does not evolve between scales belonging to the same equipotential line by definition.
Among equipotential lines there is a special line that passes though the saddle point (µ 0 , ζ 0 ) of the field E. The values (µ 0 , ζ 0 ) are defined as The special equipotential line is favored for the definition of defining TMD scales for two important reasons. First, there is only one saddle point in the evolution field, and thus, the special nullevolution line is unique. Second, the special null-evolution line is the only null-evolution line, which has finite ζ at all values of µ (bigger than Λ QCD ). These properties follow from its definition and they are very useful. In fig. 2 we show the force-lines of the evolution field E (in grey, with arrows), null-evolution lines, (thick grey lines, orthogonal to the force-lines), and the lines that cross at the saddle point (in red) at different values of b. In this figure the special line is the one that goes from left to right in each panel.
The concept of ζ prescription has been introduced in ref. [19] and elaborated in [11]. Presently we use a form slightly different from the original version of refs. [11,19]. Here we follow the updated realization introduced in ref. [21] that has been used for the description of the pion-induced DY process. In refs. [19,20] the ζ-lines has been taken perturbative for all ranges of b (with slight deformations due to the Landau pole). Notwithstanding, such definition introduces an undesired correlation between the non-perturbative parts of the TMD distribution and RAD. In ref. [21] a new simple solution has been found for the values of special null-evolution line at large b that accurately incorporates non-perturbative effects, without adding new parameters to the fit. In appendix C we present the expression for the special line as it is used in this fit.
A TMD distribution F (x, b; µ, ζ µ ) with ζ µ belonging to the special line is called optimal TMD distribution, and denoted by F (x, b) (without scale arguments), to emphasize its uniqueness and independence on scale µ. The exact independence of optimal TMD distribution on scale µ, allows us to select the simplest path for the evolution exponent in eq. (2.66), that is, the path at fixed value of µ = Q along ζ from the value ζ f = Q 2 down to any point of ζ i = ζ Q (b). In fig. 2 this path is visualized by black-dashed lines. The resulting expression for the evolved TMD distributions is exceptionally simple (2.72) We recall that this expression is same for all (quark) TMDPDFs and TMDFF. Substituting (2.72) into the definition of structure functions W we obtain, . These are the final expressions used to extract NP-functions.
The simplicity of expressions (2.73,2.74) is also accompanied by a good convergence of the cross section. In fig. 3 we show the comparison of curves for DY and SIDIS cross-section at typical energies. In the plot the TMD distributions and the NP part of the evolution are held fixed while the perturbative orders are changed. The perturbative series converges very well, and the difference between NNLO and N 3 LO factorization is of order of percents. This is an additional positive aspect of the ζ-prescription, which is due to fact that all perturbative series are evaluated at µ = Q.

Matching of TMD distribution to collinear distributions
The TMD are generic non-perturbative functions that depend on the parton fraction x and the impact parameter b. A fit of a two-variable function is a hopeless task due to the enormous parametric freedom. This freedom can be essentially reduced by the matching of a b → 0 boundary of a TMD distribution to the corresponding collinear distribution. In the asymptotic limit of small-b one has where f 1 (x, µ) and d 1 (x, µ) are collinear PDF and FF, the label f runs over all active quarks, anti-quarks and a gluon, and with γ E being the Euler constant and g being QCD coupling constant. The extra factor y −2 in eq. (2.76) is present due to the normalization difference of the TMD operator in eq. (2.12) and the collinear operator, see e.g. [5,24]. The coefficient functions C and C can be calculated with operator product expansion methods (for a general review see ref. [50]) and in the case of unpolarized distributions the coefficient functions are known up to NNLO [22,24,25,28]. The coefficient function C has the general form wherex = 1 − x, the symbol ⊗ denotes the Mellin convolution, and a summation over the intermediate flavour index k is implied. In eq. (2.78) we have omitted argument x of functions on left-hand-side for brevity. The functions P (n) (x) are the coefficients of the PDF evolution kernel P (x) = n a n s P (n) (x) (DGLAP kernel), which can be found f.i. in ref. [51]. The functions C (n,0) f ←f (x) are given in [22,24,25,28]. In particular, the NLO terms are The last term in the square brackets of eq. (2.78) is the consequence of the boundary condition of eq. (2.71), and it consists of some coefficients of the anomalous dimension defined in eq. (B.2, C.8).
In the case of TMDFF the matching coefficient C follows the same pattern as in eq. (2.78) with the replacement of the PDF DGLAP kernels P (n) f ←f (x) by the FF DGLAP kernels P (n) f ←f (z) (they can be found f.i. in ref. [52]), and C (n,0) [24,25]. In TMDFF case, the NLO terms are In the ζ-prescription, the scale µ OPE is entirely encapsulated inside the convolutions in eq. (2.75, 2.76) and it has no connection to the scales of the TMD evolution, as it happens in the case of b *prescription [5,41]. This fact gives an enormous advantage to achieve a complete decorrelation of RAD from TMD distributions (we will be more quantitative about this point in later sections). The scale µ OPE has to be selected such that on one hand, it minimizes the logarithm contributions at b → 0, and on another hand, it does not hit the Landau pole at large-b. We use the following value Let us not that in the ζ-prescription, the coefficient functions of small-b matching in eq. (2.78) does not contain a double-logarithm contribution. For that reason the perturbative convergence, as well as, the radius of convergence improves. Both these facts make the ζ-prescription highly advantageous.

Ansatzes for NP functions
In this work we deal with three independent non-perturbative functions in total. These are the unpolarized (optimal) TMDPDF, f 1 (x, b), the unpolarized (optimal) TMDFF, D 1 (x, b), and the RAD, D(b, µ). The amount of perturbative and non-perturbative contributions to each function depends on the value of the impact parameter b. Namely, at small values of b the perturbative approximation is good and the TMD distributions can be matched onto collinear functions as in eq. (2.75, 2.76). In the case of the RAD the small-b limit is given in appendix B. The smallb perturbative expressions gains power corrections in even powers b 2n [53]. Therefore, with the increase of b the perturbative approximation becomes less and less correct, and must be replaced by some generic function.
The phenomenological ansatzes for TMD distributions that satisfy this picture, can be written as following: where functions f NP and D NP are non-perturbative functions. Note, that in our ansatz we do not modify the value of b within the coefficient function. Therefore, at large-b the logarithm part of the coefficient function grows unrestrictedly. This growth is suppressed by the non-pertrubative functions.
Generally, the functions f NP and D NP depend also on parton flavor f and hadron type h. However, in the present work we use the approximation that f NP and D NP are flavor and hadrontype independent. All hadron-and flavor dependence is driven by the collinear PDFs and FFs (see also sec. 4.1). Given such an ansatz the only requirement for NP functions is that they are even-functions of b that turn to unity for b → 0. We use the following parameterizations and we extract λ i and η i from our fit. The functional form of f N P has been already used in [20]. It has five free parameters which grant a sufficient flexibility in x-space as needed for the description of the precise LHC data. The form of D NP has been suggested in [18] (albeit there are more parameters in [18]). In both cases the function has exponential or Gaussian form depending on the relative size of λ 1,2,5 /λ 3 , and η 1,2 /η 3 . There are natural restrictions on the parameter space The ansatz for the non-perturbative part of the RAD has different form, because one expects a different behavior at large-b. We use the following expression suggested in [20], The function D resum is the resummed perturbative expansion of RAD [11,49] reported in the appendix B. At LO it reads The higher order expressions (up to N 3 LO) are given in eq. (B.5). The parameters c 0 and B NP are free positive parameters, in principle totally uncorrelated from the rest of non-perturbative parameters. The resummed expression for RAD explicitly has the singularity in b (see e.g. eq. (2.88)). The singularity designates the convergence radius of the perturbative expression. Consequently, the perturbative behavior must be turned off well before b approaches the singularity. In the ansatz in eq. (2.86), this is achieved freezing the perturbative part at b ∼ B NP . The singularity is located at β 0 a s (µ)L µ = 1 and thus, the value of B NP is restricted from above by: At large-b the non-perturbative expression for RAD in eq. (2.86) is asymptotically linear in b, D ∼ c 0 B NP b. Such a behavior is different from the typical quadratic form. The linear behavior is suggested by model calculations of the RAD [54,55], and the coefficient c 0 can be related to the QCD string tension.
The special null-evolution line can be incorporated both at perturbative and non-perturbative level. In [19] and [20] the special null-evolution line included only its perturbative part for simplicity. This part is the most important one because it guarantees the cancellation of double-logarithms in the matching coefficient. However, at large-b, the non-perturbative corrections to the RAD are large and cannot be ignored: in [19] they can be seen as a part of the non-perturbative model, at the price of introducing an undesired correlation between f N P and D. In order to adjust the null-evolution curve with a non-perturbative RAD one has to solve eq. (2.70) including the RAD in the full generality. Such solution can be found in principle, but its numerical implementation is problematic at very small-b, because it is very difficult to obtain the exact numerical cancellation of the perturbative series of logarithms with an exact solution. To by-pass this problem we use the perturbative solution at very small b, (and hence cancel all logarithm exactly) and turn it to an exact solution at larger b. This is realized by we have the perturbative solution, and one turns to the exact for larger b. Since the RAD is entirely perturbative at small-b, the numerical difference between eq. (2.89) and ζ exact µ (b) is negligibly small.

Summary on theory input
In this subsection we summarize the relevant points needed for a description of the DY and SIDIS differential cross section in the TMD formalism. The cross-section of DY and SIDIS are given by eq. (2.59, 2.31). They contain four structure functions, from which in the present work, we consider only W f1D1 and W f1f1 . We drop structure functions W h ⊥ 1 since mostly probably they are a part of the power correction to TMD factorization. The functions W f1D1 and W f1f1 are considered instead without restrictions.
The eq. (2.59, 2.31) contain a variety of power suppressed contributions, which have different sources: • The power corrections to TMD factorization. These corrections appear during the factorization procedure for the hadronic tensors, see eq. (2.9, 2.43). One can distinguish two kinds of power corrections: The corrections that are proportional to the leading structure functions W XY , and mixes with the so-called Wandzura-Wilczek terms (in the case of SIDIS, this part of cross-section has been studied recently in [56]); And the corrections that involve genuine "twist-3" TMD distributions (some part of these corrections is discussed in ref. [57]).
• The mass and q 2 T dependence within the momentum fraction variables (x 1 , x 2 )(DY) and (x S , z S )(SIDIS), see eq. (2.45, 2.10). Despite the correction in the momentum fraction can be interpreted as a part of the power correction to TMD factorization (contributing to the Wandzura-Wilczek terms), we consider them on their own. These corrections come from the field-modes separation and the definition of the scattering plane, and they can be seen as the "Nachtmann-variable for TMD factorization".
• Fiducial cuts for DY. The cut factors for the DY lepton tensor in eq. (2.57, 2.58) are a source of power corrections and mixture between different structure functions. They are accumulated in a separate factors, and have totally auxiliary nature. They must be accounted for the proper description of LHC data.
• Mismatch between factorization and laboratory frames in SIDIS. The azimuthal angles and transverse planes are defined differently in the factorization and laboratory frames see eq. (2. 21, 2.19). This introduces target-mass, produced-mass, and q T -corrections. A good example is the p ⊥ -linear contribution to the structure function F cos φ U U (2.29, 2.30), which is a purely a frame-dependence effect.
• Cross-section phase-space volume in SIDIS. In the case of a non-negligible mass for the detected particle, the phase-volume contains power corrections. They are accumulated in a universal factor in eq. (2.23), and are a part of definition of the observable.
Some of the power corrections of this list can be accounted exactly (e.g. the corrections to the phase-space, the collinear momentum fractions, the relation between q 2 T and p 2 ⊥ ), while some are absolutely unknown (i.e. the power correction to the TMD factorization). The main problem is that currently the TMD factorization does not allow for a systematic separation of power corrections. The reason is the non-transversality of the hadron tensor in TMD factorization: This violation generates non-physical power-suppressed contributions q µ W µν ∼ q T , and mixes together kinematic and higher-twist corrections.
In the present work, we keep all power suppressed factors as in eq.
. Simultaneously, we apply conservative cuts on the data such that q T is small. In sec. 5.2 and 6.2, we test the influence of the power corrections to the fit quality. In some sense, this test provide us an estimation of the systematic error due to the presence of unknown power correction.
The structure functions W f1D1 and W f1f1 are evaluated according to eq. (2.73, 2.74). The phenomenological ansatzes for the optimal unpolarized TMDPDF and TMDFF are defined in eq. (2.82, 2.83, 2.84, 2.85). At small-b TMD distributions are matched to corresponding collinear distributions. We have found that the result of our fit highly depends on the set of collinear distributions. The study of this effect are in sec. 5.1,6.1. The phenomenological ansatz for the RAD is given in eq. (2.86). In table 1 we list the perturbative orders used in each factor of the cross section. The N 3 LO perturbative composition used here is equivalent to the one used in [38,39] on the resummation side. A total of 11 phenomenological parameters are determined by the fit procedure. Two of these parameters describe the RAD, 5 are for the unpolarized TMDPDF, and 4 are for the unpolarized TMDFF.

Evolution
Acronym in Table 1. Summary of the perturbative orders used for each part of the factorized cross section. The evolution of αs is provided by the LHAPDF library and comes together with PDF set (uniformly nnlo). In brackets we write the last included term of corresponding perturbative expansion (B.4, B.5, C.4, C.8, C.12).

Data overview
In the present work, we consider the extraction of unpolarized TMD in DY and SIDIS data, extending so the analysis of ref. [20] and including the theoretical improvements described in the previous sections. The selection of data is crucial for a proper TMD extraction, because of the limits imposed by the factorization theorem. These constraints are here discussed for both type of reactions.

SIDIS data
In the current literature, one can find several measurements of the unpolarized SIDIS [58][59][60][61][62][63] and a total of some thousands of data points. We restrict out attention only to those data whose kinematical features are compatible with the energy scaling of the TMD factorization theorem. The first constraint comes from the di-lepton invariant mass (Q) and in general from the energy scale of the processes. Most of SIDIS reactions have been measured at fixed target experiments, that are typically run at low energies. Unfortunately much of these data do not accomplish the Total 582 Table 2. Summary of the SIDIS data included in the fit. For each data set we report reference, reaction, kinematic region, and number of points that are left after the application of consistency cuts in eq. (3.1, 3.2).
QCD factorization request of a high Q to separate field modes. To secure our analysis (but still leave some data) we have used a restriction on the average Q of a data point, namely Here, Q is the value of Q averaged over the multiplicity value in a bin, see fig. 4. The restriction in eq. (3.1) quite reduces the pool of data. In particular, eq. (3.1) completely discards the JLAB measurement published in [62], and cuts out the most part of HERMES data in ref. [58]. The second constraint comes from the TMD factorization assumptions. Namely, the TMD factorization regime is fully consistent only for low values of q T /Q and receives quadratic power corrections of order (q T /Q) 2 , see eq. (2.15) and eq. (2.31). We consider data such that where the value 0.25 was deduced in [19]. From the q T interval of eq. (3.2), one can expect a ∼ 4 − 6% influence of the power corrections, which is well inside the uncertainties of the data. In sec. 6.3, we have tested the cutting condition eq. (3.2) considering the data at different δ, and found eq. (3.2) sufficient. It should not pass unobserved that eq. (3.2) is written in terms of q 2 T , that is the natural variable of the TMD factorization approach, whereas the data are presented in terms of p 2 ⊥ . These variables are related by q 2 T p 2 ⊥ /z 2 , see eq. (2.22). Thus, the cut in eq. (3.2) puts also a restriction on z.
Altogether it makes the allowed values of p 2 ⊥ even smaller, p ⊥ 0.25zQ. In particular, we have to completely discard the measurements of H1 and ZEUS collaborations [60,61] that are made at very small values of z, despite the relatively high values of Q.
After the application of eq. (3.1, 3.2) we are left with the data taken by HERMES and COM-PASS 2 collaborations [58,59]. For HERMES we have selected the zxpt-3D-binning set due to the finer bins in p T . The COMPASS data includes the subtraction of vector-boson channel, and thus we also select the subtracted HERMES data (.vmsub set). In total we have 582 points that cover the region of 1.5 Q 9 GeV, 10 −2 x < 0.6, 0.2 < z < 0.8. The summary of the considered data is reported in table 2.

DY data
The DY data are selected following the same principles as the SIDIS data, eq. (3.2) (the rule (3.1) makes no sense now, because DY processes are measured at sufficiently high-energies) with only small modifications. The changes consist in the cutting some extra higher-q T data points for several specific data sets (this concerns mainly ATLAS measurements of Z-boson production). The reason for it is that the estimated size of power corrections at q T /Q ∼ 0.25 is of order of 5%, however, some highly precise data are measured with much better accuracy. So, given a data point p ± σ, with p being the central value and σ its uncorrelated relative uncertainty, corresponding to some values of q T and Q, we include it in the fit only if In other words, if the (uncorrelated) experimental uncertainty of a given data point is smaller than the theoretical uncertainty associated to the expected size of power corrections, we drop this point from the fit. This is the origin of the second condition in eq. (3.3). The resulting data set contains 457 data points, and spans a wide range in energy, from Q = 4 GeV to Q = 150 GeV, and in x, from x ∼ 0.5 · 10 −4 to x ∼ 1. Table 3 reports a summary of the full data set included in our fit. This selection of the data is the same as the one considered in our earlier work [20]. In the current fit, we compare the absolute values of the cross-section, whenever they are available. The only data set that require normalization factors are all CMS data, ATLAS at 7 TeV, and DO run2 measurements. For these sets we have normalized the integral of the theory prediction to corresponding integral over the data (see explicit expression in ref. [19]).

Summary of the data set
In total for the extraction of unpolarized TMD distribution we analyze 1039 data points that are almost equally distributes between SIDIS (582 points) and DY (457 points) processes. All these points contribute to the determination of the TMD evolution kernel D and unpolarized TMDPDF f 1 . The determination of unpolarized TMDFF is based only on SIDIS data. In addition, we recall LHCb (13TeV) [75] 13000 60 -120 2 < y < 4.5 pT > 20 GeV 2 < η < 4.5 9 Total 457 *Bins with 9 Q 11 are omitted due to the Υ resonance. Table 3. Summary table for the data included in the fit.. For each data set we report: the reference publication, the centre-of-mass energy, the coverage in Q and y or xF , possible cuts on the fiducial region, and the number of data points that survive the cut in eq. (3.3).
that a single DY data point is simultaneously sensitive to a larger and a smaller value of x. This is because the cross section is given by a pair of TMDPDFs, eq. (2.47), computed at x 1 and x 2 such that x 1 x 2 Q 2 /s. So, the statistical weight of a DY point in the determination of TMDPDF is effectively doubled. The kinematic region in x and Q covered by the data set and thus contributing to the determination of TMDPDF is shown in fig. 5. The boxes enclose the sub-regions covered by the single data sets. Looking at fig. 5, it is possible to distinguish two main clusters of data: the "low-energy experiments", i.e. E288, E605, E772, PHENIX, COMPASS and HERMES that place themselves at invariant-mass energies between 1 and 18 GeV, and the "high-energy experiments", i.e. all those from Tevatron and LHC, that are instead distributed around the Z-peak region. From this plot we observe that, kinematic ranges of SIDIS and DY data do not overlap.
As a final comment of this section let us mention that our data selection is particularly conservative because it drops points that could potentially be described by TMD factorization (see e.g. ref. [18] where a less conservative choice of cuts is used). However, our fitted data set guarantees that we operate well within the range of validity of TMD factorization. In sec. 7 we show that unexpectedly our extraction can describe a larger set of data as well.

Fit procedure
The experimental data are usually provided in a form specific for each setup. In order to extract valuable information for the TMD extraction, one has to detail the methodology that has been followed, and this is the purpose of this section. Finally, we also provide a suitable definition of the χ 2 that allows for a correct exploitation of experimental uncertainties.

Treatment of nuclear targets and charged hadrons
The data from E288, E605 (Cu), E772, COMPASS, (part of) HERMES (isoscalar targets) come from nuclear target processes. In these cases, we perform the iso-spin rotation of the corresponding TMDPDF that simulates the nuclear-target effects. For example, we replace u-, and d-quark distributions by where A(Z) is atomic number(charge) of a nuclear target. In principle, for E288, E605 data extracted from very heavy targets one should also incorporate the nuclear modification factor that depends on x. However, we found it unnecessary since the systematic errors of these experiments are much larger than a possible nuclear effect. The measurements of SIDIS are made in a number of different channels. The HERMES data include π ± and K ± , and COMPASS data are for charged hadrons, h ± . Pions and kaons are described by an individual TMDFFs. However, charged hadron are a composition of different TMDFFs. According eq. (2.12) the TMDFF for charged hadrons is a direct sum of TMDFFs for individual hadrons: where dots denote the higher-mass hadron states. At COMPASS energies, this sum is dominated by the pion (65 − 75%), and the kaon (15 − 20%) contributions. The residual term is lead by proton/antiproton contribution (2 − 5%). The contribution of other particles is smaller (for discussion and references see [76,77]). Thus, in our study we use the first two terms of eq. (4.3) to simulate the charged hadron fragmentation. The SIDIS measurements in refs. [58,59] are given in form of multiplicities. The SIDIS multiplicity is defined as where dσ DIS is the differential cross-section for DIS. It reads where F 1 and F 2 are DIS structure functions. The DIS cross-section cannot be computed starting from TMD factorization, but it is described by the collinear factorization theorem. In order to evaluate the multiplicity we have pre-computed the DIS cross-section (integrated over the given bin) by the APFEL-library [78], and then divided the TMD prediction according to eq. (4.4).

Bin integration in SIDIS and DY
The majority of SIDIS data is measured at relatively low-Q and in large bins. The cross-section value changes greatly within a bin, and so, binning effects are known to be strong. For a measured cross-section dσ/dxdzdQ 2 dp 2 ⊥ , a bin is specified by {x min , x max }, {z min , z max }, {Q min , Q max } and {p min , p max }. The binning constraints impose certain cuts on the measured phase space. Typically, these cuts are given as intervals of the variable y and of the invariant mass of photon-target system W 2 = (P + q) 2 , which belong to ranges {y min , y max } and {W 2 min , W 2 max }. Both these variables are connected to x and Q 2 , .
where s is the Mandelshtam variable s = (P + l) 2 . So, in the presence of fiducial cuts in SIDIS the bin boundaries arê An example of effects of cuts in the bins is shown in fig. 4. In the case of multiplicity measurements the bin effects are taken into account with the cross-section where the expression in the first line is the volume of (z, p 2 ⊥ )-bin. In the case of DY the binning effects are also extremely important. The difference in the value of the cross section between center-of-bin and the averaged/integrated value can reach tenth of percents, especially, for very low-energy bins (where the change in Q is rapid), and for very wide bins (such as Z-boson measurement). We have used the definition

Definition of χ 2 -test function and estimation of uncertainties
To test the theory prediction against the experimental measurement we compute the χ 2 -test function where m i is the central value of i'th measurement, t i is the theory prediction for this measurement and V ij is the covariance matrix. An accurate definition of the covariance matrix is essential for a correct exploitation of experimental uncertainties. In order to build the covariance matrix we distinguish, uncorrelated and correlated uncertainties. For example, a typical data point has the structure m i ± σ i,stat ± σ i,unc ± σ (1) i,corr ± · · · ± σ (k) i,corr , (4.14) where m i the reported central value, σ i,stat is (uncorrelated) statistical uncertainty, σ i,unc is uncorrelated systematic uncertainty, and σ (k) i,corr are correlated systematic uncertainties. Uncorrelated uncertainties give an estimate of the degree of knowledge of a particular data point irrespective of the other measurements of the data set. Instead, correlated uncertainties provide an estimate of the correlation between the statistical fluctuations of two separate data points of the same data set.
With this information at hand, one can construct the covariance matrix V ij as follows (for more detailed discussion on this definition see refs. [79,80]): Equipped by this definition of covariance matrix the χ 2 -test in eq. (4.13) takes into account the nature of the experimental uncertainties leading to a faithful estimate of the agreement between data and theoretical predictions.
To estimate the error propagation from the experimental data to the extracted values of TMD distributions we have used the replica method. This method is described in details in ref. [79]. It consists in the generation of N replicas of the pseudo data, and minimization of the χ 2 on each replica. The resulting set of N vectors of NP parameters is distributed in accordance to the distribution law of the data. And thus, it represents a Monte Carlo sample that is used to evaluate mean values, standard deviation and correlations of the NP parameters. For the estimation of error propagation we consider N = 100 replicas. The procedure of χ 2 -minimization for each replica is the most computationally heavy part of the fit.
The proper treatment of correlated uncertainties is essential in global analysis. The presence of sizable correlated uncertanties could result into a misleading visual disagreement between theory prediction and the (central values of) data points. Namely, the theory prediction for a data set could be globally shifted by significant amount, that is nonetheless in agreement with correlated experimental uncertainty. To quantify the effects of correlated shifts we use the nuisance parameter method presented in [79,80]. Within the nuisance parameter method one is able to determine the shift d i of a theory prediction t i for the i'th data point, such thatt i = t i + d i contributes only to the uncorrelated part of the χ 2 -value. The value d i is interpreted as a shift caused by the correlated uncertainties. It is computed as where It also instructive to check the average systematic shift, which we define as It shows a general deficit/excess of the theory with respect to the data for a given data set. Let us note that the multiplicities of the SIDIS are convenient as an experimental object due to the fact that systematic uncertainties related to measurement efficiency and the beam luminosity cancel in the ratio. However, theoretically multiplicities are worse defined, since the denominator and the numerator of multiplicity ratio (4.4) need a completely different theoretical treatment. In order to account this effect, we have computed the uncertainty of theory prediction for DIS crosssection for each bin and added it as a fully correlated error for each data set. We should admit that the theory uncertainty for DIS cross-section is negligibly small (typically, 0.1 − 2.0%) in comparison to systematic uncertainties of experiment. Thus, this addition affects the resulting values of χ 2 very little on the level of ±10 −2 per point.

Artemide
The computation of the cross-section is made with the code artemide that is developed by us. Artemide is organized as a package of Fortran 95 modules, each devoted to evaluation of a single theory construct, such as the TMD evolution factor, a TMD distribution, or their combinations such as structure functions W and cross-sections. The artemide also evaluates all necessary procedures needed for the comparison with the experimental data, such as bin-integration routines and cut factors. For simplicity of data analysis artemide is equipped by a python interface, called harpy. The artemide package together with the harpy is available in the repository [42].
The module organization of artemide allows for flexible use. In particular, it gives to a user a full access to non-perturbative ansatzes and models. Although artemide is based on the ζprescription, it also includes other strategies for TMD evolution, such as CSS evolution [41], γimproved evolution [11] and their derivatives. The user has full control on the perturbative orders, and can set each individual part to a particular (known) order. Currently, artemide can evaluate unpolarized TMD distributions, and linearly polarized gluon distributions together with the related cross-sections, such as DY, SIDIS, Higgs-production (for application see [81]), etc. In future, we plan to include more processes and distributions.
The evaluation of a single cross-section point that is to be compared with the experimental one, implies the evaluation of a number of integrals: two Mellin convolution for small-b matching eq. (2.82, 2.83), the Hankel-type integral for the structure function W eq. (2.74, 2.73), and 3(in DY case)/4(in SIDIS case) bin-integrations. Note, that in the ζ-prescription one does not need to evaluate integrations for TMD evolution, which is its additional positive point. Altogether, it makes the evaluation of TMD cross-section rather expansive in terms of computing time. Artemide uses adaptive integration routines to ensure the required computation accuracy. To speed-up the evaluation, artemide precomputes the tables of Mellin convolutions for TMD distributions that are the most difficult integrations. Nonetheless, the computation cost is rather high. In particular, it takes about 4.5 (3.2) minutes to evaluate a single χ 2 value for the full data set of DY and SIDIS given in sec. 3 on an average 8-core (12-core) processor (2.5GHz) depending on the NP-values. Therefore, the minimization χ 2 and especially the computation of error-propagation are especially long. Due to that we are restricted in certain important directions of studies (e.g. error-propagation of PDF sets, and flavour dependence).

Fit of DY
The data-set and the functional input for the DY fit is inherited from our earlier study [20]. The only modification is the update of the functional form of the special null-evolution line in eq. (2.89), which in the present case matches the exact solution at large-b. This update leads relatively minor formal changes, while some values of the model parameter are changed as a result of the fit. The value of χ 2 (per 457 points) is reduced from 1.174 [20] → 1.168 (this work). The main impact takes place at low-energies. In particular, the typical deficit in the cross-section for low-energy experiments is reduced by 5-6% (compare table 3 in [20] with table 8), which however does not significantly affects the χ 2 values due to the large correlated uncertainties of fixed-target DY measurements.
In this section, we present the fit of DY data-set only. Since the general picture is similar to ref. [20], we concentrate on the sources of systematic uncertainties of our approach. We discuss the dependence on the collinear PDF, that serves as a boundary for TMDPDF, and the effects of q T corrections in the definitions of x 1,2 .

Dependence on PDF
The collinear PDF is an important part of our model for TMDPDF, e.g. eq. (2.82). The smallb matching essentially reduces the number of NP parameters for TMDPDF and guarantees the asymptotic agreement of the TMDPDF with the collinear observables. The small-b part of the Hankel integral gives the dominant contribution to the cross-section, especially for q T ∼ 10-20 GeV. Therefore, the quality of our fit and the values of the extracted NP parameter are robustly correlated with the collinear PDF set. This observation has been made earlier, e.g. see discussion in [15,18,20], but it has not been systematically studied. Ideally, the PDF set and TMDPDF are to be coherently extracted in a global fit of collinear and TMD observables. Meanwhile, we treat a collinear input as an independent parameter which we cannot control, thus test various values available from the literature.
There is an enormous amount of available PDF sets. We have tested some of the most popular sets that are recently extracted at NNLO accuracy, see table 4. All sets have LHAPDF interface [87]. For each PDF set we have performed the full fit procedure with the estimation of the errorpropagation. In the fit, the central value of PDFs are used, see sec. 5.3 for the discussion on the possible impact of PDF uncertainties. The values of the χ 2 /(N pt = 457) and the NP parameters are reported in table 5 for each PDF set in table 4. The visual comparison of the parameter values is shown in fig. 6 and 7. The parameters of RAD (B NP and c 0 ) are rather stable with respect to input PDF, and in agreement with each other (note, that B NP and c 0 are anti-correlated, see sec. 8.1).
Contrary to the RAD, the parameters λ i show a significant dependence on the collinear PDF (see fig. 7). It is expected, since a different collinear PDF dictates a different shape in x, while the Q-dependence is not changed. The parameters λ 1 and λ 2 do not change significantly with different PDFs, while a bigger change is provided by the parameters λ 3,4,5 . This is because parameters λ 1,2 dictate the main shape of f N P at middle values of b, whereas other parameters are responsible for the large-b tale (λ 3,4 ) or fine-tuning of x-shape (λ 5 ).
In table 5, fits are ordered according to the χ 2 /N pt value obtained in the DY fit. The distribution of the values of χ 2 between experiments changes for different PDFs. For example, NNPDF31 demonstrates some tension between ATLAS and LHCb subsets (see table 3 in ref. [20], and also table 8). In the case of HERA20 this tension reduces. The value of χ 2 /N pt for ATLAS measurements is practically the same in both cases, we find 2.02 NNPDF vs. 1.99 HERA for N pt = 55 (note, that the bin-by-bin distribution of χ 2 changes between the sets). On contrary, the value of χ 2 /N pt for LHCb measurement undoubtedly differ in the two sets of PDF, as we find 2.93 NNPDF vs. 1.24 HERA for N pt = 24. The main part of the improvement happens due to the general normalization, that is lower by 3-5% in NNPDF case, and almost exact in HERA case.
The TMD distributions with NNPDF31 and HERA20 show a χ 2 value better than all the other, e.g. table 5. These PDFs have also less tension between high-and low-energy data. For this reason, in the next sections we will consider only PDFs from these extractions. Nonetheless, we preferably select NNPDF31 set in the global SIDIS and DY analysis. The reason is that NNPDF31 distribution is extracted from the global pool of data, whereas HERA20 uses exclusively data from HERA. At the same time, we must admit that HERA20 distribution provides a spectacularly low values of χ 2 in our global fit.

Impact of exact values for x 1,2 and power corrections
As discussed in sec. 2.4, the factorization formula eq. (2.59) for DY contains three types of power corrections. The corrections related to TMD factorization cannot be tested, without extra modeling. The corrections due to fiducial cuts must be included without restrictions. Thus it is possible to test only the power correction due to presence of q T /Q in the exact values of x 1,2 , eq. (2.45). The amount of this correction is obtained comparing the fits of the DY data with The approximate values for x 1,2 lead to higher values of χ 2 . In particular, with the approximate x 1,2 for the NNPDF31 set we have obtained χ 2 /N pt = 1.35 and 1.27 at NNLO and N 3 LO respectively.
In the case of HERA20 set, we obtain χ 2 /N pt = 1.03 and 1.03. Comparing these values to the ones reported in table 5 (1.17 and 1.01; 0.95 and 0.96, respectively), we conclude that the quality of fit is worse. The deterioration of the fit quality takes place in both high-and low-energy parts of the data. In the ATLAS experiment (that is the most precise set at our disposal, with N pt = 55), we observe the changes in χ 2 /N pt : 1.82 → 2.83 for NNPDF31 and 1.90 → 2.27 for HERA20. For the fixed target experiments we have χ 2 /N pt : 0.91 → 1.31 for NNPDF31 and 0.71 → 0.97 for HERA30 (here N pt = 260). We have also observed that the value of χ 2 worsens mainly due the change in the shape of cross-section, whereas the normalization part slightly reduces the χ 2 . The values NP parameters varies within the error-bands and the change in the central values is not significant.
Therefore, we conclude that exact values of x 1,2 (2.45) considerably improve the quality of the fit. This conclusion is in agreement with the theory expectations presented in sec. 2.4.

Uncertainties due to collinear PDFs
The model in eq. (2.82) is not sensitive to changes of the NP parameters at small-b. For this reason, the error-band on the TMD distribution vanishes for b 0.5GeV −1 . The only way to modify the TMD distribution in this region is to vary the values of collinear PDF. In sec. 5.1 we have demonstrated that the quality of the fit, as well as the values of extracted NP parameters, essentially depend on the collinear PDF and in our extraction we have used the central values of PDF sets, ignoring the uncertainties of PDF determination. These uncertainties are however large and could cover the gap among different TMD fits if taken into account. Unfortunately, the incorporation of the PDF uncertainties into the analysis is extremely demanding in terms of computer time, especially for the full data set. In order to provide a quantitative estimate in this section we consider only the NNPDF31 data set with NNLO TMD evolution for the fit of DY data. Thus, we have performed a fit for each one of the 100 replicas of the NNPDF31 collinear distributions. The minimization of the χ 2 is done with a simplified procedure in order to speed up the computation, because for many replicas the search of χ 2 -minimum took much longer time in comparison to the central value minimization. It appears that the data is very demanding on the collinear PDF input. So, for some (distant from the central) replicas the fit does not converge (yielding χ 2 /N pt > 5) or produces extreme values of NP parameters (e.g. B NP < 0.7GeV). The values of NP parameters that run into the boundary of the allowed phase space region were discarded (almost 30% of total replicas). The resulting distribution of NP parameters gives an estimate of the sensitivity for PDF distribution. The NP parameters and their uncertainties that we have obtained are the following In fig. 8 we show the comparison of error-bands on the TMDPDF, obtained from the errorpropagation from the experiment to NP parameters (blue band), and from the PDF uncertainty (red band), as described above. The main difference in these bands is that the PDF-uncertainty band is sizable already at b = 0, and for larger b these bands expand in the similar amount. The PDF-uncertainty band is different for different flavors, and larger for non-valence partons. The resulting estimation for the (predicted) cross-section is shown in fig. 9. For the high energy case, the uncertainty is of order of 1%, while at low energies it reaches 20-40%.
The bands that we show here certainly do not accurately represent the uncertainties of TMD-PDF, since many of PDF replicas do not fit the data. It implies that the TMD distributions can be used as a tool for the restriction of collinear PDFs together with the standard collinear observables. At the current stage, we can only conclude that the uncertainties of TMDPDF at small-b (that are out of control in the current model) are sizable. For an accurate estimation of these errors one has to apply more sophisticated techniques, such as reweighing of PDF values [88] by TMD extraction, or even joint fits of TMD distributions and collinear distributions, which are beyond of the present work.

Fit of SIDIS
In this section, we use the unpolarized TMDPDF and TMD evolution, extracted in the fit of DY data, to fit the SIDIS data. The main aim is to test the universality of the TMD evolution, and TMDPDF. Namely, the SIDIS data should be easily fitted adjusting only the parameters of TMDFF. Indeed, we have find that the TMDFF in eq. (2.83) (with a 4-parameter ansatz) together with the TMDPDF and D (extracted from DY data) provide a very good description of the available SIDIS data. It is one of the main results of the present work that demonstrates the complete universality of TMD factorization functions. Its universality has been already tested in [21], where it was applied to the fit of pion-induced DY, and it has been used in studies of the TMD distributions with jets [89][90][91]. To our best knowledge, the test presented here is made for the first time, because in the previous studies DY and SIDIS cases were considered or independently or simultaneously [18]. Also we discuss the dependence on the collinear unpolarized FF, and the impact of power corrections.

Dependence on FF
In contrast to collinear PDFs, there are not too many extraction of collinear FFs. We have considered three sets of collinear FFs. Namely, DSS set 3 (that is a composition of pion FFs from [92] (DSS14) and kaon FFs from [93] (DSS17)), the JAM19 set [94] and the NNFF10 set [77]. All these extractions are made with NLO collinear evolution (a 2 s ). The comparison of fits with different FFs, and some of TMDPDF (together with TMD evolution) extracted in the previous section are presented in table 6. The NNFF set is not presented in the table due to the low quality of the predictions, as it is described below. As it is seen from table 6, the TMD factorization perfectly describes the low-q T SIDIS data with TMDPDF and TMD evolution fixed by DY measurements. It is one of the main result of the present analysis.
The values of χ 2 /N pt are rather small (e.g. 0.76 for combination of HERA20 & DSS), which may indicate an over-fit problem. However, this is not the case for the following reason. The main source of low-χ 2 is the COMPASS data set. The COMPASS data have very large uncorrelated systematic uncertainty for a great amount of points. Here, the systematic uncertainty is (much) larger than the statistical uncertainty, and therefore, the COMPASS data points form smooth lines with huge uncorrelated uncertainty band. As a result, the contribution of each point to the χ 2 -value is small.
The values of χ 2 depend on the input TMDPDF and TMD evolution (compare NNLO and N 3 LO cases) in a reasonable amount. This is mainly due to the different values of c 0 constant in these cases. We recall that the SIDIS measurements are made at much lower energy in comparison to DY, and thus they are more sensitive to D at large-b. Later in sec. 7 we show that in the joint fit of SIDIS and DY data, the uncertainty of the evolution factor at large-b is reduced.
The difference between DSS and JAM collinear FFs sets is of minor importance. It is due to the fact low-energy data are less sensitive to the small-b part of the TMD distributions (and thus to collinear distributions). Given in addition that the data are not very precise, the uncertainty in FF sets are compensated by the NP function D N P . The effect of compensation is clear from the very different values of η i constants for DSS and JAM19 set. Note, that in all cases we obtain a positive and sizable b 2 -term in D N P (parameter η 4 ). It could indicate a hidden issue in the values 3 We are thankful to R. Sassot for providing us the actual grids for DSS FFs.  of collinear FFs. However, we conclude that contrary to DY case, the SIDIS TMD data are not very restrictive on the values of collinear FFs. The NNFF distributions are not able to fit the data with a χ 2 /N pt better than ∼ 6.8. The reason of such an enormous discrepancy is obvious. The NNFF1.0 extraction is made from the eeannihilation data only [77], and thus is sensitive only to particular combinations of quark-flavors. The flavour separation is thus made a posteriori assuming exact iso-spin symmetry. As a result, the FF for sea quarks have very small (and even negative) values. In the processes where the production of a hadron is dominated by the sea-quark channel, the cross-section obtained with NNFF10 collinear FF is much smaller then the experimental one. A crystal clear example is the process p → K − , where both valence quarks of K − ,ūs, are sea-quarks for the proton, and thus the dominant channel is the production of K − from u and d quarks. However, FF for u and d-quarks in K − are negative in NNFF extraction, and the resulting cross-section appears to be negative as well. The situation improves, if we select only the processes with dominant valence channel, e.g. d → π ± , in this case we obtain χ 2 /N pt ∼ 2.2. The COMPASS measurement can be also considered separately with the NNFF1.1 set of FF for charged hadrons [76], in this case we obtain χ 2 /N pt ∼ 1.6. In any case, we have found that NNFF sets of FF are not suitable for the description of SIDIS data.
The uncertainties on NP parameters presented in table 6 are unrealistically small. Given the fact that the data is not very accurate, it indicates a significant underestimation of the uncertainty for TMDFF. We guess that the underestimation of uncertainties is caused mainly by the function bias of D N P . To resolve the situation one could use a more flexible ansatz, e.g. by inclusion of more NP-parameters. Unfortunately, this strategy is not very efficient. Already with the current set of parameters we have very low χ 2 , and the increase of the number of parameters could lead to an over-fit problem. Also, the computation time with a bigger number of parameters increases.

Impact of power corrections
Considering the expression for the SIDIS cross-section eq. (2.31) we distinguish four types of power corrections: (m/Q) the corrections due to non-zero produced mass, (M/Q) the corrections due to non-zero target mass, (q T /Q) the q T /Q-terms in the expression for cross-section and (x S z S ) the q T /Q-terms in the expressions for x S and z S , eq. (2.10). In order to test the impact of these corrections, we have performed the (central value) fits including corrections in different combinations. The resulting values of χ 2 /N pt are reported in table 7.
Let us summarize the observations: • Produced mass corrections. The produced mass-corrections are not necessary extremely small, as it is typically assumed. These corrections appear in the ratio with other kinematic variables through the variable ς 2 , eq. (2.20). In most part of data bins the value of ς 2 is negligible, ς 2 ∼ 10 −3 , but for some low-energy and low-z bins it can reach ς 2 ∼ 10 −2 . For example, the HERMES bin with 0.2 < z < 0.4, 0.2 < x < 0.35 with produced kaon has ς 2 ∼ 0.04. As it is clear from table 7, current data are not sensitive to these corrections. The difference in χ 2 /N pt is of the order 10 −3 .
• Target mass corrections. The target mass corrections appear through the variable γ 2 in eq. (2.20) and at low Q it has a rather significant size, e.g. for some bins in HERMES data γ 2 ∼ 0.13, for some bins in COMPASS data γ 2 ∼ 0.06. Therefore, one can expect up 10% impact of γ 2 for certain bins. Note, that the dependence on γ 2 is non-linear and is different in different edges of the bin. Checking the values in table 7, we observe that the target mass correction produces a small but visible effect on the fit quality especially for HERMES data where the change in χ 2 /N pt is 1.09 → 1.24.
• q T /Q correction in kinematics. This correction cannot be large due to the cuts on the data sets that we have performed. For q T ∼ 0.25Q which is the highest value of q T that we have considered, we can have (q T /Q) 2 ∼ 0.06. In addition, the first correction of this type to the cross section is linear in (q T /Q) 2 and it can be easily compensated by a change of the non-perturbative parameters in D NP . Indeed, we observe that the impact on the χ 2 is small.
• q T /Q correction in x S and z S . For q T ∼ 0.25Q (which is the maximum considered q T ), the difference between exact x S and x is ∼ 0.06, and much smaller between z S and z. Nonetheless, this correction changes the shape of the cross-section in a way that is difficult to compensate by NP parameters. Thus, the inclusion of this correction visibly improves the agreement. Let us note that the same conclusion has been made for the DY case, in sec. 5.2.
We conclude that the impact of each individual correction is rather small, but the inclusion of any of them improves the agreement between theory and data. Most relevant effects are the target mass correction and the ones due to x S and z S . Accounting of all effects simultaneously leads to a qualitative improvement in χ 2 -values. We also admit that the inclusion of power correction considerably affect the values of parameters η (especially η 1,4 ). The values of parameters η 1,4 varies in the range (−10, +25)%. The values of parameters η 2,3 varies in the range (−4, +8)%. It shows that our estimation of uncertainties on parameters presented in table 6 are extremely underestimated. Possibly, the main source of underestimation is the bias of our model, which is not surprising since we have only 4 parameters for all partons flavors and particle kinds. The tests of power corrections suggest that the real errorband on the extracted TMDFF is an order of magnitude larger.

Limits of TMD factorization for SIDIS
In ref. [19] we tested the limits for TMD factorization using the DY data, showing that the natural limit of the leading power TMD factorization is δ 0.2 − 0.25, where δ = q max T /Q and q max T is the maximum value of the transverse momentum in the data sets included in the fit. We have tested the same boundary using the SIDIS data and the result of the global fit (presented in the next section) evaluating the value of χ 2 (without minimization) for different selections of SIDIS data. We have considered two possible cuts on data selection Q > 1 and Q > 2 , eq. (3.1), and the result is shown in fig. 11.
The values of χ 2 /N pt grow when δ > 0.25. The same effect has been observed in ref. [19] for DY. Therefore, we conclude that our earlier estimation for limits of TMD factorization as δ 0.2 − 0.25 holds also in the SIDIS case. It is interesting to observe that the channel with the fastest grow of χ 2 /N pt is d → K − (and the next is p → π + ), which could indicate a possible tension in the description of this reaction.
The inclusion of data at Q < 2GeV almost doubles the values of χ 2 /N pt (e.g. χ 2 /N pt = 1.19 for δ = 0.25). Taking into account the large uncertainties of the COMPASS measurement, it shows that the factorization is broken down at such low values of Q. This is an expected result, since in this region the power corrections dominate the cross-section. In sec. 7.1, we show data and our predictions including the low-Q bins and up to δ = 0.4.

Global fit of DY and SIDIS data
The fit of SIDIS data shows the perfect universality of the TMD distributions and a good agreement between theory and experiment. Performing a global fit of DY and SIDIS data we essentially reduce the uncertainty for D. The resulting sets of TMDPDF, TMDFF and RAD extracted from the global fit represent the SV19 TMD distributions (at NNLO and N 3 LO). As an input for this set we have used NNPDF31 and DSS collinear distributions, because these sets are in good agreement with the global set of collinear observables, and show the best values of χ 2 (see discussions in the previous sections). Table 8 shows the distribution of the χ 2 -values per individual experiments. In total we have considered 1039 points, 457 for DY and 582 for SIDIS. They form three large subsets: DY at high energy, DY at low energy, and SIDIS (at low energy). The worst χ 2 values are concentrated in the high energy DY subset, because of the very high precision of Z-boson production data measured at LHC. Simultaneously, these data robustly restrict the values of TMD distributions (TMDPDF and RAD) at b 1GeV −1 . The lowest χ 2 is for SIDIS data and especially for COMPASS data (χ 2 /N pt = 0.65, with N pt = 390 that is more than the third part of the total data), due to large uncorrelated systematic uncertainty (see discussion in sec. 6.1).

Agreement between theory and data
Altogether we obtain the global value of χ 2 /N pt = 0.95 and 1.06 for NNLO and N 3 LO respectively. These values can be compared to 1.55 (for N pt,total = 8059) and 1.02 (for N pt;SIDIS = 477 that is close to our data selection) obtained in the global fit of DY and SIDIS in ref. [18]. The increase of χ 2 /N pt between NNLO and N 3 LO cases does not indicate a reduction of the fit quality. This change in χ 2 happens mainly because of COMPASS data, for which the χ 2 -value increase 0.65 → 0.85. On contrary the χ 2 -value for ATLAS data reduces 2.12 → 1.82 (mostly due to the improvement in the total normalization). Therefore, we conclude that both NNLO and N 3 LO fits are in agreement, although N 3 LO shows a better agreement with high-energy data.
In table 8 we also present the values of the difference in the normalization between theory and data due to the correlated shift (see definition in (4.18)). The measurements in the this value (e.g. CMS) are normalized to the total cross-section. Note, that the shift value is common to the full data subset (e.g. for all 195 point of COMPASS d → h + ). Some final consideration on each data sets are: • The high energy DY data have a common deficit of 2-5% in the normalization, which has been already observed in [20]. It can be caused by different sources, being the main ones the collinear PDF (e.g. in the case of HERA20 PDF the deficit is much smaller, 0-3%). Another source is the presence of corrections linear in q T due to fiducial cuts, discussed in sec. 2.2.3. This deficit is responsible for a larger value of χ 2 for this sub-set. The nuisance parameter decomposition for high energy DY is 1.51 = 1.28 + 0.23, where the last number is the penalty contribution to χ 2 due correlated uncertainties.
• The low energy DY data is significantly underestimated by the TMD factorization formula. However, this underestimation is within the expected correlated systematic uncertainties of the data. This is a known issue of fixed target experiments. The underestimation has been also observed for the pion-induced DY [21] (E615 and E537 experiments), and for the same low-energy DY experiments (E228 and E605) in ref. [95]. Note, that in ref. [95] the high-q T part of the measurements has been considered (in collinear factorization), and the observed discrepancy is an order of magnitude larger. Also, the present fit has somewhat lesser deficit in the normalization (by 5 − 6%) in comparison to previous one [20]. We connect it to the corrected shape of ζ-line at large-b.
• SIDIS data do not show any problem with the total normalization. This statement is in some contradiction to the literature. In [18] the authors report a significant contribution of normalization to χ 2 from the HERMES data (the COMPASS data was normalized exactly). In ref. [96] an enormous discrepancy between theory and data in the collinear factorization limit has been observed too.
In figures 12-21 we present all the data points used in the fit together with the theory prediction lines. In these figures we also show the data points that were not included in the fit due to the cutting conditions eq. (3.1, 3.2, 3.3) in order to demonstrate the behavior of TMD factorization beyond its limits.

Values of NP parameters
The extracted values of NP parameters are in the table 9. The central values of parameters do not shift much with respect to individual fits of DY and SIDIS data. The main effect of the global fit is the reduction of uncertainties for RAD and TMDPDF by ∼ 40−50%. In figures 6, 7 and 10 we show the values of NP parameters obtained in all fits of this work. Generally, the parameters obtained in different fits are in agreement, except λ 3,4,5 that mainly serve for the fine-tune of TMDPDF to LHC data.
All NP parameters are correlated. The correlation matrices for NP parameters are shown in fig. 22. Ideally, one would expect the complete independence of NP parameters contributing to RAD, TMDPDF and TMDFF. In this case the correlation matrices would have a block-diagonal form. In reality, we observe the correlations between the blocks related to independent functions. In the case of NNLO these correlations are not large, and the block-diagonal structure is evident. The biggest (anti-)correlation is between c 0 and λ 1 , with the correlation matrix element −0.67, with the rest being much smaller. The source of this correlation is evident -it is due to the precise Z-boson production measurements by LHC. In the N 3 LO case the correlation are much stronger. The biggest (anti-)correlation again is between c 0 and λ 1 , with correlation matrix element −0.82, with some other elements reaching ±0.5 and it indicates a possible tension in our description of the data at N 3 LO.

Comments on the extracted TMD distributions
The non-perturbative distribution extracted in this work show several features that are interesting for theory investigations. For instance, the RAD that measures the properties of the soft gluon exchanges and that is inclusively sensitive to the QCD vacuum structure. The factorization theorem ensures that the values of B NP and c 0 are totally uncorrelated from the rest of TMD parameters, because they are of complete different origins. As we have an extraction of these parameters from data we can expect that a certain correlation is re-introduced in the fitting process. In fig. 22 we check this statement on the present global fit and we find that it is qualitatively verified in our DY+SIDIS fit. In the figure the only non-perturbative parameters which show a higher (anti)correlation with the RAD are to c 0 and λ 1 in the TMDPDF. Apart from this, the independence of the RAD parameters from the rest of TMD is certainly a success of the ζ-prescription, which allows a clear separation of all these effects. In the rest of this section we report some considerations specifically for each of the functions that we have extracted.  Table 9. Values of χ 2 and NP parameters obtained obtained in the global fit of DY and SIDIS data. The collinear distributions are NNPDF31 and DSS.

Non-perturbative RAD
In fig. 23 (left) we plot the RAD as a function of b with its uncertainty band. We present only the RAD extracted with NNPDF31 fits, but the picture does not change significantly for all other PDF sets. In this figure we can test the universality of the RAD looking at its extraction in DY and DY+SIDIS. At small b the perturbative structure of the RAD dominates and we find practically no difference in its behavior as coming from different fits. The difference between these two cases happens at large b and it is at most of 10%. The 1σ-uncertainty bands of DY and global fit do not strictly overlap, which possibility indicates their underestimation.
In the same fig. 23 (left) we also compare our RAD with the one obtained in [18] and [19]. In refs. [18,19] a different shape of NP ansatz for RAD has been used, with a quadratic behavior at large-b. Such an ansatz has been used often, and (as we have also checked) it is able to describe the data. Nonetheless we disregard it because the global χ 2 /N pt is worse (1.11 and 1.34 at NNLO and N 3 LO, correspondingly), with much larger correlation between parameters. Additionally, the linear asymptotic behavior used in our ansatz is supported by non-perturbative models. Possibly, the uncertainty band is biased by this model, and the realistic band is larger by a factor two at most.
In fig. 23 (right) we show the scattering of replicas in (B NP , c 0 )-plane collected from all fits. It is clear that the parameters B NP and c 0 are strongly anti-correlated (see also fig. 22) and this is a consequence of the non-perturbative model, since the variation of c 0 can be compensated by a variation of B NP up to b 4 -corrections. The replicas of the global fit (orange points) are scattered in a much smaller area and this provides a ∼ 40% smaller error-bands on parameters. Generally, the inclusion of the SIDIS data drastically constraints the values of B NP , and for that reason they are very important for the determination of RAD. We conclude that the RAD extracted in the global fit is more reliable, in comparison to the one done using DY data only.
The RAD that we have extracted is valid for all distributions and it has been used also to describe the pion-induced DY [21]. For further reduction of the uncertainty of the RAD one should   Unpolarized SIDIS multiplicities (multiplied by z 2 ) for production of positively charged hadrons off deuteron measured by COMPASS in different bins of x, z, Q and pT . Solid (dashed) lines show the theory prediction at NNLO (N 3 LO). Filled (empty) point were (not) included in the fit of NP parameters. For clarity each pT bin is shifted by an offset indicated in the legend. The continuation of the picture is in fig. 21. consider more precise low-and intermediate-energy processes, such as up-coming JLab12 measurements, and the future EIC.

TMD distributions
The quark TMDPDF and TMDFF are extracted simultaneously including high QCD perturbative orders for the first time to our knowledge. The non-perturbative parameters obtained using the PDF set NNPDF31 and the fragmentation set DSS are reported in table 9. Within one set of PDF the error induced from the PDF replicas dominates the experimental error of TMD. Thus, the error that we have reported on TMD parameters is certainly underestimated. To determine a realistic uncertainty band , one must invent a flexible ansatz for NP-part of TMD distributions that does not contradict the known theory. It appears to be a non-trivial task, which we leave for a future study. Unpolarized SIDIS multiplicities (multiplied by z 2 ) for production of negatively charged hadrons off deuteron measured by COMPASS in different bins of x, z, Q and pT . Solid (dashed) lines show the theory prediction at NNLO (N 3 LO). Filled (empty) points were (not) included in the fit of NP parameters. For clarity each pT bin is shifted by an offset indicated in the legend. The continuation of the picture is in fig. 21.
The TMD distributions show a non-trivial intrinsic structure. An example of distributions in (x, b)-plane is presented in fig. 24. Depending on x the b−behavior apparently changes. We observe (the same observation has been made in ref. [18]) that the unpolarized TMDFF gain a large b 2 -term in the NP part. It could indicate a non-trivial hadronisation physics, or a tension between colinear and TMD distributions. The study of its origin should be addressed by future studies.

Conclusion
Standing the TMD factorization of DY and SIDIS cross-section, one identifies at least three nonperturbative QCD distributions in each cross-section -two TMD parton distributions and a nonperturbative rapidity anomalous dimension (RAD). These functions should be extracted from the experimental data. Given such a large number of phenomenological functions, their universality     Example of extracted (optimal) unpolarized TMD distributions. The color indicates the relative size of the uncertainty band plays a crucial role. In this work, we have shown that the TMD distributions and RAD are indeed universal functions.
In order to confirm the universality statement, we have firstly extracted the RAD (D) and the unpolarized TMDPDF (f 1 ) from the DY data, and secondly we have used them to describe the SIDIS data (extracting in addition the unpolarized TMDFF, D 1 ). To our best knowledge, this is the first clear-cut demonstration of the universality of the TMD non-perturbative components. This demonstration is the main result of this work. The subsidiary results are the values of extracted unpolarized TMD distributions and RAD, that could be used to predict and describe the low-q T spectrum of current (LHC, COMPASS, RHIC) and future (EIC, LHeC) experiments.
The sets of data included in this analysis contain in total 1039 points (almost equally distributed between SIDIS, 582 points, and DY, 457 points). We have the data from fixed target DY measurements, Tevatron, RHIC, LHC, COMPASS, and HERMES. Unfortunately, only low-energy measurements are available for SIDIS data. At the moment, we have not included any data from HERA multiplicities because they do not accomplish the kinematical requirements for the TMD factorization. Contrary to some observations in the literature [14,18], we have not found any problem with the normalization of HERMES and COMPASS data, although the systematic experimental errors quit precision to the final result.
The data analysis is made with the current theory state-of-art, including all known perturbative QCD orders, i.e. N 3 LO for the hard part and the evolution, and NNLO for the collinear matching. The NNLO and N 3 LO predictions are very close to each other, which is a good signal indicating that the perturbative part of the cross-section is saturated. We have also collected all recent modifications and updates of the TMD factorization approach, such as target-mass corrections, frame-corrections, and exact evolution solution at large-b. Individually these aspects are subtle, however, cumulatively, they are sizable. In sec. 2 we have presented a comprehensive collection of theory expressions used in this work. Let us also mention that the N 3 LO evolution, as well as a non-trivial QCD matching for TMDFF (NNLO vs. LO) is used here for the first time.
The scales definition and the evolution/modeling separation is done according to ζ-prescription. The ζ-prescription is equivalent to the popular CSS-scheme since it satisfies the same set of differential equations. Nonetheless, this equivalence is strict only within an all-order perturbation theory and it is numerically violated for any truncated series. The origin for this discrepancy is well-understood [11] -it comes from spurious contributions in the CSS formalism that vanish in the exact perturbation theory. At LO and NLO, the numerical value of spurious contributions is large, but it is tiny at N 3 LO [11]. Therefore, the ζ-prescription provides a faster convergence and the better stability of the perturbative series that is shown in fig. 3. Additionally, but not less important, the ζ-prescription grants a strict separation of perturbative and non-perturbative pieces and thus allows a stronger universality of the phenomenological functions, fig. 22. In particular, the RAD extracted here can be used in the analysis of the jet-production [89][90][91]. Preliminary lattice results are also in qualitative agreement with the RAD in ζ-prescription [97]. The success of the present global fit confirms the reliability of the ζ-prescription.
Many points of the TMD phenomenology are discussed quantitatively for the first time (to our best knowledge). We critically consider each detail of the factorization that have a disputable nature, f.i. power corrections to collinear variables. We demonstrated that the inclusion of these details improves the agreement between theory and the data. A particularly important check made here for the first time is the test of the limit of the TMD factorization approximation for SIDIS. In the DY case, the phenomenological limit of TMD factorization is q T 0.25Q, as it has been shown in ref. [19]. We have found that SIDIS also obeys this rule. It is important information since it opens the door for reliable predictions for SIDIS cross-section.
The estimation of the uncertainty for extracted distributions is made by the replica method that gives a reliable error-propagation of experimental errors. On top of it one should include the uncertainty of other theoretical ingredients, and in particular the collinear PDF error. We have checked that the prediction of the TMD factorization is crucially sensitive to the values of collinear PDFs. It indicates that our extraction has a considerable additional uncertainty due to the uncertainty of the collinear input. However, we were not able to accurately quantify the size of this uncertainty band, due to the high computational costs of such analysis. We leave this study for the future. program under grant agreement No 824093 (STRONG-2020). The research was supported by DFG grant N.430824754 as a part of the Research Unit FOR 2926.

A Expression for |C V |
The hard coefficient for TMD factorization formula is a product of hard matching coefficient for quark current |C V | 2 . At NLO hard matching coefficient reads C V (q, µ) = 1 + a s C F − ln 2 −q 2 µ 2 + 3 ln −q 2 µ 2 − 8 + π 2 6 + O(a 2 s ). (A.1) The expression for NNLO can be found f.i. in ref. [98,99], and at N 3 LO in [29]. The hard coefficient for SIDIS and DY kinematics are different only by the sign of q 2 that is space-like (times-like) for DY (SIDIS). Since in our work µ 2 = Q 2 the expression simplifies. In particular, in the DY kinematics the logarithms turns to ln(−1) = ±iπ, whereas in the case of SIDIS logarithm vanish. The NNLO expression for DY hard coefficient is where C F (= 4/3) and C A (= 3) are quadratic Casimir eigenvalues of fundamental and adjoint representation of SU (3). N f is the number of active quark flavors. The NNLO expression for SIDIS hard coefficient is One can see that the difference between SIDIS and DY hard coefficients is +π 2 a 2 s C F C F 32 − These corrections are known as (π 2 a s ) n -corrections. They could be resummed to all orders [100]. However, in the case of vector-boson production the correction coming from (π 2 a s ) n is not significant (of order of next-to-given order correction [100]).

B Perturbative expression for D
The rapidity anomalous dimension (RAD) D(µ, b) is generally non-perturbative function, which can be computed perturbatively only at small-b, see e.g. [23,27] for NNLO and N 3 LO computations. RAD satisfies the integrability condition (2.67) which can be seen as renormalization group equation, Consequently, the a n s -order of RAD contains logarithms up to order L n µ . We define D pert (µ, b) = ∞ n=1 a n s (µ) where d (n,k) are numbers, and The values for d (n,k) for k > 0 can be computed from (B.1) in the terms of Γ i , β i and d (n,0) . Here, and in the following we define beta-function, and coefficients of Γ cusp as The leading terms are β 0 = 11 3 C A − 2 3 N f , Γ 0 = 4C F , and d (1,0) = 0. The values of d (2,0) and d (3,0) were computed in [4,23] and [26,27] correspondingly. The values of β-function coefficient are known up to β 4 [30], the values of Γ i are known up to Γ 3 -order for the quark case, see [31][32][33] and references within.
The series (B.1) has a small convergence radius since the expansion variable (a s L µ ) is fastly became bigger then 1 with the increase of b. To improve the convergence properties of RAD we use the resummed expression [11,38,49]. In this case we write D resum (µ, b) = ∞ n=0 a n s (µ)d n (X), X ≡ β 0 a s (µ)L µ . (B.5)

C Expression for ζ µ
The concept of the special null-evolution line plays the central role in ζ-prescription. The ζprescription, the double evolution and properties of TMD evolution have been elaborated in ref. [11].
In this appendix, we present the expressions for the special null-evolution line ζ pert and ζ NP that were used in the fit. The definition of the special null-evolution line is discussed in the sec. 2.3.1. Parameterizing an equipotential line as (µ, ζ µ (b)), one finds the following equation The special null-evolution line is the line that passes thorough the saddle point (µ 0 , ζ 0 ) of the evolution field. The saddle point is defined as Note, that this boundary condition guarantees the finiteness of ζ µ (b) at all non-singular values of µ.
Originally the ζ-prescription has been implemented in the perturbative regime only [19]. This part is the most important since it gives the cancellation of double-logarithms in the matching coefficient. However, at large-b, non-perturbative corrections to RAD become large and could not be ignored (although they can be seen as a part of NP model, but it introduces an undesired correlation between f N P and D). Therefore, one have to solve equation (C.1) with a generic non-perturbative RAD. Such solution can be found but its numerical implementation is problematic at very small-b. The problem is that it is very difficult to obtain the cancellation of perturbative logarithms at exact solution because at b → 0 the numerical values of logarithms are huge. Therefore, a good practice is to use the perturbative solution at very small-b, (and hence cancel all logarithm exactly) and turn to the exact solution at larger b. This is implemented in the ansatz eq. (2.89).
In the following sections we provide expressions for ζ exact µ and ζ pert µ that were used in the fit procedure.

C.1 Perturbative expression
The perturbative solution eq. (C.1) is conveniently written as where v(µ, b) = ∞ n=0 a n s (µ)v n (L µ ), (C. 4) with L µ defined in eq. (B.3). The general expression for v n can be found in [11] (see eq. (5.14)). We apply the boundary condition eq. (C.2), which in the perturbative regime turns into requirement of finiteness of v n at L µ → 0. The values of v n up to NNLO are The definition of perturbative coefficients is given in eq. (B.2, B.4) and γ V (µ) = ∞ n=1 a n s (µ)γ n . (C.8) Similarly, to RAD the ζ pert µ (b) can be resummed in terms of a s L µ (see appendix A in [11]). However, it is not necessary in our ansatz eq. (2.89) because the perturbative expression turns into exact much before the problems with convergence occur.

C.2 Exact expression
The evolution field, the equipotential line ζ µ and the position of the saddle point (µ 0 , ζ 0 ), depends on values of b, which is treated as a free parameter. It causes certain problems in the implementation of the ζ-prescription exactly. The lesser problem is that additional numerical computations are required to determine the position of saddle-point and the values of the line for different nonperturbative models of D. The greater problem is that at larger b the value of µ 0 decreases and at some large value of b (typically b ∼ 3GeV −1 ) µ 0 is smaller than Λ QCD . Due to this behavior, it is impossible to determine the special null-evolution line at large-b numerically. However, the special null-evolution line is still uniquely defined by the continuation from smaller values of b.
In ref. [21] a simple solution of this problem has been found. The central idea is to use the non-perturbative RAD as a generalized coordinates (a s , D) instead of the natural arguments (µ, b). We introduce the function g as The equation (C.10) can be solved exactly, but the application of boundary condition eq. (C.11) requires the solution of functional equation with transcendental functions.
On another hand, the values of a s used in the ζ-prescription are always small, since they are evaluated at the hard scale Q, see (2.72). Therefore, it is natural and numerically accurate to consider the expansion in a s . Note, that such an expansion already incorporates the nonperturbative corrections exactly. Denoting g(a s , D) = 1 a s 2β 2 0 Γ 0 ∞ n=0 a n s g n (D), (C.12)