Non-perturbative structure of semi-inclusive deep-inelastic and Drell-Yan scattering at small transverse momentum

We consider semi-inclusive deep inelastic scattering (SIDIS) and Drell-Yan events within transverse momentum dependent (TMD) factorization. Based on the simultaneous fit of multiple data points, we extract the unpolarized TMD distributions and the non-perturbative evolution kernel. The high quality of the fit confirms a complete universality of TMD non-perturbative distributions. The extraction is supplemented by phenomenological analyses of various parts of the TMD factorization, such as sensitivity to non-perturbative parameterizations, perturbative orders, collinear distributions, correlations between parameters, and others.

In this article, we consider the unpolarized observables that have the simplest structure and are accessible in a relatively large number of experiments. They allow us to extract the quark unpolarized TMD distributions and the non-perturbative part of TMD evolution. In the literature one can find many extractions of these elements within various schemes [13][14][15][16][17][18][19][20][21][22]. The distinctive feature of this work is the simultaneous study of two kinds of reactions: DY and SIDIS. Previously, a global fit of both processes has been attempted only in ref. [18]. We demonstrate that the global description of both processes is straightforward and does not meet any obstacle. The description is based on the latest theory developments, such as next-to-next-to-leading order (NNLO) and N 3 LO perturbative parts together with ζprescription. In addition, we make a special emphasis on some topics not so often discussed in the literature, that is, universality and theory uncertainties of the TMD.
The factorization theorem declares that the TMD non-perturbative parts have a certain degree of universality, as explained in the following: a) the evolution kernel is the same for all processes where the TMD factorization theorem is valid; b) the TMD parton JHEP06(2020)137 distribution functions (TMDPDF) are the same in DY and SIDIS experiments. Testing universality needs an analysis of different types of experiments at the same time. Although the universality is a cornerstone of the approach, we have not found any dedicated phenomenological study in the literature. In order to check and proof universality properties of the TMD approach, we perform an analysis in three steps: I. Firstly, we consider only the DY measurements, and analyze TMDPDF f 1 (x, b) and rapidity anomalous dimension (RAD), D(µ, b). The DY data sets have a vast span in x and Q, therefore, it is possible to extract f 1 (that dictates the x-dependence of the cross-section) and D (that dictates the Q-dependence of the cross-section) without a significant correlation between these functions. This analysis is conceptually similar to the previous work [20], albeit some improvements.
II. Using the outcome of the previous step (D and f 1 ), we consider the SIDIS measurements and extract the TMDFF, D 1 . Assuming the universality of TMD distributions, one should be able to describe the SIDIS cross-section with a single extra function D 1 . This is a non-trivial statement since the SIDIS cross-section has 4-degrees of freedom, and only two of them are affected by D 1 . Additionally, the present SIDIS data are concentrated in a range of small-Q that is unreachable for DY experiments.
III. Finally, we perform a simultaneous fit of DY and SIDIS data. Given the excellent quality of the separate DY and SIDIS fits, this stage provides only a fine-tune of non-perturbative parameters as well as a consistency check of previous step II.
These three independent analyses provide a consistent and congruent picture of the TMD factorization and allow the extraction of three non-perturbative functions (unpolarized quark TMDPDF, TMDFF and quark evolution kernel). We find that our results are in full agreement with the depicted scenario, which gives a solid confirmation of the declared universality. On top of the described test of universality and the extraction of TMD distributions, in this work we perform many additional studies of the TMD approach, some of which should be better addressed elsewhere: we test the phenomenological limits of the TMD factorization for SIDIS; we check the dependence of the TMD prediction on the collinear inputs; we perform an overall test of the impact power suppressed contributions to the TMD factorization; we check the impact of experimental constraints on the final phase space configurations (like fiducial cross sections and lepton cuts at LHC, bin shapes in HERMES kinematics). Altogether the tests can form a comprehensive picture of TMD factorization and its accuracy. We have observed that the impact of some input uncertainties, f.i. the ones from collinear PDF, to the prediction is unlucky large. Still, we restrict ourself to the indication of problematic issues, leaving it as an invitation for the further developments in the future.
The theoretical work done in recent years for the development of the elements of TMD factorization has been noticeable. Significant efforts have been committed in the perturbative calculations for TMD distributions at small-b [23][24][25][26][27][28][29]. Together with the N 3 LO results for universal QCD anomalous dimensions [30][31][32][33][34], it leads to an extremely JHEP06(2020)137 accurate perturbative input. The consistent composition of all elements is made employing the ζ-prescription [11,21]. The ζ-prescription is essential for current and future TMD phenomenology because it grants a unified approach to observables irrespectively of the order of perturbative matching. So, the collinear matching procedure that is fundamental for resummation approaches (such as in refs. [4,[35][36][37][38][39][40]) or b * -like prescriptions (such as in refs. [5,18,41,42]), is considered just as part of the model for a TMD distribution in the ζ-prescription. Therefore, unpolarized TMD distributions (extracted in this work with NNLO matching) and the TMD evolution (extracted in this work with NNLO and N 3 LO matching) are entirely universal and could be used for the description of other processes, where the matching is not known at such a high order.
Given the number of details needed for the presentation of this work, we split the discussions into almost independent parts. The first part, section 2, contains the description of the TMD factorization theorem for unpolarized DY and SIDIS cases. In this section, we articulate all relevant formulas, including a lot of small corrections and details that we have not found mentioned in previous literature. This part provides a comprehensive collection of theory results, which can be useful for comparison with other works and future tests, and it can be seen as a theory review. Some of the issues reported here are expected to be addressed in separate works. Section 3 is devoted to the review of the available SIDIS and DY data suitable for unpolarized TMD phenomenology. Section 4 presents the details of the comparison of the theory expression with the experimental data. It contains the definition of χ 2 -test, the interpretation of the experimental environment, and some details of the numerical implementation that is made by artemide package [43,44]. The following sections 5, 6 and 7 describe the fit program outlined earlier, and they are devoted to DY(only), SIDIS(only), and DY and SIDIS(together) fits. Each of these sections contains several subsections describing the specific impact of each process on TMD extraction. Finally, we collect the information on the resulting NP functions in section 8.

Cross sections in TMD factorization
In this section, we present in detail the cross sections of SIDIS and DY processes in TMD factorization, skipping their derivation that can be found in refs. [1][2][3][4][5][6][7][8][9][10][11]. The main purpose of this section is to collect all pieces of information about theoretical approximations and models that are used in the fit procedure.

SIDIS cross-section
The (semi-inclusive) deep-inelastic scattering (SIDIS) is defined by the reaction (l) + H(P ) → (l ) + h(p h ) + X, (2.1) where is a lepton, H and h are respectively the target and the fragmenting hadrons, and X is the undetected final state. The vectors in brackets denote the momenta of each particle. The masses of the particles are P 2 = M 2 , p 2 h = m 2 , l 2 = l 2 = m 2 l 0. (2.2) In the following, we neglect the lepton masses, but keep the effects of the hadron masses.

JHEP06(2020)137
Approximating the interaction of a lepton and a hadron by a single photon exchange, one obtains the differential cross-section with q = l − l being the momentum of the intermediate photon.

Kinematic variables for SIDIS
The formulation of the factorization theorem in SIDIS is done in the hadronic Breit frame (alternatively, we can call it "the factorization frame"), where the momenta of hadrons are almost light-like and back-to-back. The light-like direction to which the hadrons are aligned defines the decomposition of their momenta, with n 2 =n 2 = 0, (nn) = 1. Here, we have also introduced the common notation of a vector decomposition v µ = v +nµ + v − n µ + v µ T , v + = (nv), v − = (nv), (nv T ) = (nv T ) = 0. (2.6) The transverse component of a vector is extracted with the projector v µ T = g µν T v ν , g µν T = g µν − n µnν −n µ n ν . (2.7) We also use the convention that the bold font denotes vectors that have only transverse components. So, they obey Euclidian scalar product: For the SIDIS cross-section one introduces the following scalar variables: , y = (P q) (P l) , z = (P p h ) (P q) . (2.9) In the experimental environment one typically measures the transverse momentum defined as the one of the produced hadron with respect to the plane formed by vectors q and P .

JHEP06(2020)137
The projector corresponding to these transverse components is given by the tensor g µν ⊥ defined as In what follows, we denote the transverse components of v µ in the factorization frame as v µ T , see eq. (2.7), while transverse components projected by g ⊥ are v µ ⊥ . In order to describe the target-and produced-mass corrections, it is convenient to use the following combinations The definition of ς 2 ⊥ in eq. (2.11) contains p 2 h⊥ = p hµ p hν g µν ⊥ . The measured transverse momentum p ⊥ is different from the one defined in TMD factorization. In fact, the transverse momentum used in the factorization theorem, q T , is defined with respect to the hadron-hadron-plane and the corresponding transverse components are extracted by the tensor g µν T in eq. (2.7). In terms of hadron momenta the tensor g T reads Using the projectors in eq. (2.10) and eq. (2.12), it is straightforward to derive the relation between q 2 T = q µ q ν g µν T and p 2 ⊥ = p µ h p ν h g ⊥,µν : Using these definition we can rewrite the elements of the SIDIS cross-section formula in terms of observable variables. The differential volumes of the phase space are where ψ is the azimuthal angle of scattered lepton, and ϕ is the azimuthal angle of the produced hadron. In the following we find important to introduce the variables x S and z S , that are the collinear fractions of parton momentum which include kinematic power corrections, where we have used the variable q 2 T (2.13) for simplicity. The kinematic corrections presented above are usually small when Q M, m. In this case the relation between observed and factorization variables simplifies Notice that the data of SIDIS at our disposal are taken at energies comparable with hadron masses and thus target mass correction could be significant. The contributions dependent on hadron masses could in principle be classified as power corrections. However we consider more appropriate to distinguish these corrections from others of different origin. Thus we will not use the approximate formulas in eq. (2.18). The phenomenological test of this assumption is given in section 6.2.

Factorization for the hadronic tensor in SIDIS
In this work we consider the transverse momentum dependence of the cross section which is factorizable in terms of transverse momentum dependent (TMD) distributions in the limit of q T Q, where q T is defined in eq. (2.13) and Q is the di-lepton invariant mass. We refer to the literature about the proof of factorization of the processes related to this work [1,[4][5][6][7][8][9][10]. In order to specify the properties of the TMD distributions and the factorized hadronic tensor, we start fixing the basic notation.
For unpolarized hadrons, the factorized hadronic tensor and in its complete form reads where the index f in the sum runs through all quark flavours (including anti-quarks), e f is a charge of a quark measured in units of e. The function C V is the matching coefficient for vector current to collinear/anti-collinear vector current and the factorization (µ) and rapidity (ζ) scales typical of the TMD factorization are shown explicitly.

JHEP06(2020)137
The unpolarized TMDPDF and TMDFF from partons of flavor f are defined as Here, W v (x) are Wilson lines rooted at x and pointing along vector v to infinity. In the case of SIDIS, the Wilson lines in TMDPDF(TMDFF) points to future (past) infinity. The functions h ⊥ 1 and H ⊥ 1 are Boer-Mulders and Collins functions respectively and they are defined as where µν T = +−µν . In formulas (2.20)-(2.23) we have omitted for brevity the obvious details of operator definitions, such as T (T )-ordering, color and spinor indices, rapidity and ultraviolet renormalization factors.
Boer-Mulders and Collins functions in eq. (2.22), (2.23) do not contribute to the angle averaged cross-section, but they can appear when cuts on phase space distributions of final particles are introduced by the experimental setup. In this work we will not consider these effects, and leave their study for the future (see discussion in section 2.3).
The TMD distributions depend on b 2 only. Therefore, the angular dependence can be integrated explicitly with the result where

JHEP06(2020)137
The functions W f ab are dimensionless and scale-independent functions. The experimental configurations are not usually provided in the factorization frame, and the correspondence between the measured quantities and the ones that appear in the factorization theorem is often non-trivial. It happens in fact, that a Lorentz transformation affects the power corrections to the cross section presented here. We detail this in the next sections.

Leptonic tensor in SIDIS
The leptonic tensor for unpolarized SIDIS is (2.27) In order to express the convolution of the leptonic tensor with a hadronic tensor we define the azimuthal angle of a produced hadron as [2]: and we define As the result we obtain The kinematical rearrangements of the variables produce the appearance of the cos φ and cos 2φ terms in the second line of eq. (2.29), that is, there are contributions to the structure functions F cos φ U U and F cos 2φ U U , see also [45]. Similarly, the convolution of lepton tensor with the spin-1 part produces also contribution to the cos φ and cos 2φ parts. The terms ∼ p 2 ⊥ /Q 2 in eqs. (2.29), (2.30) can be modified by power corrections to TMD factorization, see discussion in section 2.3.

SIDIS cross-section in TMD factorization
Combining together the expressions for the cross-section in eq. (2.3), the differential phase-space volume in eq. (2.14), the hadronic tensor in eq. (2.24), the leptonic tensor in eq. (2.29), (2.30), and integrating over the azimuthal angles we obtain dσ dxdzdQ 2 dp 2 where x S , z S and q 2 T are the functions of p 2 ⊥ , x, and z defined in eq. (2.16), (2.17) and eq. (2.13), correspondingly. The functions W f ab are defined in eq. (2.25), (2.26). The final expression for the cross section in eq. (2.31) explicitly shows that part of power corrections has a kinematical origin, and therefore, it is independent of the factorization theorem and it can be taken into account in the present formalism without contradictions. As an example one can consider the factor 1 − ς 2 ⊥ that is a part of the phase-space element, and the difference between z S and z that is a consequence of the TMDFF definition. The separation between kinematical power corrections and higher orders in power expansion of the cross-section is however not neat, because a detailed study of the factorization theorem correction is still not complete. The admixture of these effects can be seen in the second line of eq. (2.31), which is the present status of our understanding. In the fit we omit the contribution W h ⊥ 1 H ⊥ 1 in eq. (2.31) and perform a check of the importance of mass-corrections for the agreement with experimental data in section 6.2. We discuss power corrections also in section 2.3.

DY cross-section
The Drell-Yan pair production (or DY for shortness) is defined by the process h 1 (P 1 ) + h 2 (P 2 ) → l(l) + l (l ) + X, (2.32) where l, l are the lepton pair, h 1 , h 2 are the colliding hadrons, and the symbols in brackets denote the momentum of each particle. In the following, we include hadron masses and we neglect lepton masses: The energies of the DY experiments are higher than the SIDIS ones, and the interference of electro-weak (EW) bosons must be included. Approximating the interactions of leptons and hadrons by a single EW-gauge boson exchange one obtains the following expression for the differential cross-section

JHEP06(2020)137
where q = l + l , α em = e 2 /4π is the QED coupling constant and the index G runs over gauge bosons γ, Z. Here, we have approximated the exact flux factor [(s − (M 1 − M 2 ) 2 )(s − (M 1 + M 2 ) 2 )] −1/2 with 1/s because the corrections of order M 2 /s are negligibly small for any considered data set. The function ∆ G (q) is defined as with M Z = 91.188GeV and Γ Z = 2.495GeV [46]. Finally, L µν GG and W µν GG are the leptonic and hadronic tensors that are defined as where e is the lepton charge, and J G µ is the current for the production of EW gauge boson G. Integrating the cross-section over a lepton momentum one finds where q is the momentum of the EW-gauge boson. The new lepton tensor is (2.39)

Kinematic variables for DY
The relevant kinematic variable in DY read The transverse components are projected by a tensor g µν T , that is orthogonal to P µ 1 and P µ 2 , identically to the SIDIS case eq. (2.12), and we have dropped the negligible corrections of order of M 2 /s. In this limit, the factorization theorem is expressed in the center-of-mass frame, the components of momenta are P + 1 = P − 2 = s/2 and the variables x 1,2 in eq. (2.49) are (2.42) The differential phase-space element reads where ϕ is the azimuthal angle of the vector boson. JHEP06(2020)137

Factorization for hadronic tensor in DY
The factorization for DY hadronic tensor is totally analogous to the SIDIS case. The vectors n andn are defined by hadrons, where on r.h.s. the small contributions ∼ M 2 /s are neglected. The inclusion of weak-boson exchange requires the consideration of a more general current. To this purpose we define with the EW coupling constants where e f is the electric charge of a particle (in units of e), T 3 is the third projection of weak isospin, s W = sin(θ W ), c W = cos(θ W ). Collecting all this, the unpolarized part of the factorized hadronic tensor reads where f runs through all quark flavours. The functions f 1,f ←h 1 are the TMDPDF and the functions h ⊥ 1,f ←h 1 are the Boer-Mulders functions, defined in eq. (2.20), (2.22). In eq. (2.47) we have omitted the arguments of the TMD distributions for brevity, however they can be included with substitutions like e.g. (2.48) and proceed similarly for all products of TMD distributions. The variables x 1 and x 2 measure the collinear fractions of parton momenta, (2.49)

JHEP06(2020)137
The flavor indices f ,f run through all flavors of quarks and anti-quarks respectively. Here, the flavor indexf refers to the anti-parton of f . Note that, in the case of W-boson, the constants g W L/R mix the flavors of quarks. In the factorized hadronic tensor in eq. (2.47), different terms are not equally important. In fact, the fifth line of eq. (2.47) vanishes identically due to the peculiar combination of g-constants that is null for any electro-weak channel. The forth line can contribute only to ZZ and Zγ-channels, that have an anti-symmetric part of the leptonic tensor. However, the resulting expression is anti-symmetric in the rapidity parameter, and thus it vanishes when the rapidity is measured/integrated on symmetric intervals. In principle, this part can contribute to a cross-section when experiments perform very asymmetric kinematic cuts on the detected leptons (e.g. at LHCb). However, even in this case the resulting integral is suppressed as q 2 T /Q 2 e −2|y| and it is numerically very small, e.g in some bins it can give a 10 −6 − 10 −8 -size relative to the leading contribution. Thus, in the following we do not consider contributions of the last two lines in eq. (2.47).
Performing the integration over angles we obtain a result formally similar to the SIDIS case in eq. (2.24), where g µν T is now given in eq. (2. 41) and

Lepton tensor and fiducial cuts in DY
In experiments not all final state leptons are collected in the measurements and fiducial cuts are for instance performed at LHC. We use the same implementation of cuts as in [19,20]. However, here we give a more general discussion to see how they affect power suppressed parts of the cross section. The lepton tensor of unpolarized DY formally written in eq. (2.36) is where g R G (g L G ) are the couplings of right (left) components of a lepton field to EW current as in eq. (2.45). In the case of W boson, these couplings also carry flavor indices. As discussed in section 2.2.2, the anti-symmetric part does not contribute visibly to the unpolarized cross-section even in the presence of asymmetric fiducial cuts. The DY cross-section contains the lepton tensor integrated over the lepton momenta with l + l = q, in eq. (2.39), and this gives The cuts on the lepton pair at LHC are usually reported as where η and η are pseudo-rapidity of the leptons. In the presence of these cuts the integration volume of the leptonic tensor can be done only numerically. To account this effect we introduce cut factors as , (2.57) These factors are equal to one in the absence of cuts. The impact of these cuts at LHC is extremely important and depends on the rapidity interval and the value of the vector boson transverse momentum. We show P 1,2 for ATLAS experiment in figure 1. One can see that the factor P 2 is enhanced at smaller q T and in general these factors are very different from 1.

DY cross-section in TMD factorization
Collecting the expressions for the differential phase-space element in eq. (2.43), the hadronic tensor eq. (2.50), the leptonic tensor (2.54), (2.55) with the fiducial cuts in eq. (2.57), (2.58), we obtain the final cross-section in the TMD factorization. For the case of neutral vector boson (i.e. Z-and γ-bosons) it reads where functions W f are defined in (2.51), (2.52), M Z and Γ Z are mass and width of Z-boson. The factors z are the combinations of couplings g R,L for quarks and for leptons (2.46): describes the contributions of the Boer-Mulders functions and we omit this term in the rest of the fit as motivated in section 2.3.

Power corrections and higher twist structure functions
The cross-section of SIDIS and DY given by eq. (2.31), (2.59) contains a variety of power suppressed contributions, which have different origin, as listed in the following: • Power corrections to TMD factorization. These corrections appear during the factorization procedure for the hadronic tensors, see eq. (2.19), (2.47). One can distinguish two kinds of power corrections: corrections that are proportional to the leading structure functions W ab , which arise through the so-called Wandzura-Wilczek terms (in the case of SIDIS, this part of cross-section has been studied recently in [51]); corrections that involve genuine "twist-3" TMD distributions (some part of these corrections is discussed in [52]); • Mass and q 2 T dependence within the momentum fraction variables (x S , z S ) (SIDIS), (x 1 , x 2 ) (DY), see eq. (2.15), (2.49). Despite the fact that the corrections in the momentum fraction can be interpreted as part of power corrections to TMD factorization (contributing to the Wandzura-Wilczek terms), we consider them on their own. These corrections come from the field-modes separation and the definition of the scattering plane, and they can be seen as the "Nachtmann-variable for TMD factorization". The usage of these variables is also in agreement with expected large-q T structure of cross-section, which has different form, but uses similar variables, e.g. see [53].

JHEP06(2020)137
• Fiducial cuts for DY. The cut factors for the DY lepton tensor in eq. (2.57), (2.58) are a source of power corrections and they can mix different structure functions. They are accumulated in separate factors, and have totally auxiliary nature. They must be accounted for the proper description of LHC data.
• Mismatch between factorization and laboratory frames in SIDIS. The azimuthal angles and transverse planes are defined differently in the factorization and laboratory frames see eq. (2.10), (2.12). This introduces target-mass, produced-mass, and q Tcorrections. A good example is the p ⊥ -linear contribution to the structure function F cos φ U U (2.29), (2.30), which is a purely a frame-dependent effect.
• Cross-section phase-space volume in SIDIS. In the case of a non-negligible mass for the detected particle, the phase-volume contains power corrections. They are accumulated in a universal factor in eq. (2.14), and are part of the definition of the observable.
Some of the power corrections of this list can be accounted exactly (e.g. the corrections to the phase-space, the collinear momentum fractions, the relation between q 2 T and p 2 ⊥ ), while some are absolutely unknown (i.e. the power correction to the TMD factorization).
The problem of power corrections to TMD factorization is unsolved and should be addressed in future studies. We resume it here for the interested readers in the DY case. The hadronic tensor defined in eq. (2.50) is expressed in terms of the tensor g µν T defined in eq. (2.41) and it is transverse to a plane containing hadrons. The appearance of the tensor g µν T is the consequence of the TMD factorization approach. This tensor is not transverse to the vector boson momentum, and as a result whenever one uses the leading term of factorized formula for the cross section one finds which demonstrates the violation of QED Ward identity. The violation can be accounted for as a power-suppressed contribution, since q µ W µν ∼ q T . Accounting of the linear power correction (∼ q T /Q) would correct the QED Ward identity to this order (i.e. one would obtain q µ W µν ∼ q 2 T = 0). In order to get a hadron tensor completely transverse to q µ one has to account for the full chain of power corrections. This problem is well known and it has been addressed several times in the literature in DY and SIDIS cases [2,5,[54][55][56][57]. All the suggested solutions extend the TMD factorization in some model-dependent way and they provide different expressions for the cross-section. A systematic solution is still not available. It is also often assumed that the resummation of Sudakov logarithms and the matching to the perturbative expansion of the cross section can interpolate between the TMD factorization region and the perturbative region. This method however presents its own limitations because in practice not all sources of power corrections listed above are usually taken into account and a more systematic work in this sense is still missing.
In the present work we adopt a different strategy. We first observe that power suppressed terms have not a single origin and that part of them are calculable, so that they can be included in our computations. The TMD factorization provides the cross section for DY

JHEP06(2020)137
and SIDIS in terms of 4 structure functions W ab defined in eq. (2.25), (2.26), (2.51), (2.52) and each of them is a Hankel convolution of two TMD distributions times a hard coefficient function. We remark that the TMD include all the non-perturbative information of the process, and it is different from the one contained in a collinear PDF. The unknown parts in eq. (2.31), (2.59) come from higher twist matrix elements are formally of higher dynamical twist with respect to the others. While higher twist contributions are in principle accompanied by q 2 T /Q 2 factors, the complex kinematics of the experiments (especially in the SIDIS case) makes it hard to distinguish purely non-perturbative higher-twist effects from the kinematical ones. For instance, the azimuthal angles measured in the lab frames and in the Breit frame for SIDIS are different and some non-perturbative QCD effects can be overlooked when we pass from one frame to the other. The only way to solve this problem would be a complete inclusion of higher power corrections to the cross section, which goes beyond the scope of the present work. For this reason, while we consider the exact kinematics, as described in the previous section, we also put The effect of this assumption must be very small at q 2 T Q 2 , and this justifies the conservative data sets used in the present fit (see section 3).
The Q dependence of W f ab is dictated by the TMD evolution, and it is discussed in the next section 2.4. The asymptotic limit of high q T allows for a perturbative matching of TMD distributions to collinear ones and it is discussed in section 2.4.1. The nonperturbative inputs on top of the large-q T asymptotic limit are discussed in section 2.4.2. Finally, we summarize all theoretical inputs in section 2.5.
In section 5.2 and 6.2, we test the influence of the power corrections to the fit quality. This test provides us an estimation of the systematic error due to the presence of unknown power correction.

TMD evolution and optimal TMD distributions
While the differential evolution equations for TMD are fixed by the factorization theorem, the boundary conditions of their solution are a matter of choice. They clearly determine the convergence of the perturbative series and the success of the theoretical description of DY and SIDIS spectrum. In this paper, we work with the so-called ζ-prescription described in [11], and including the improvement found in [21]. The prescription consists in defining the TMD distribution on a null-evolution line. The null-evolution line has the defining property of keeping the evolution factor for TMD distributions is equal to one for all values of the impact parameter b. Because of this property, the ζ-prescription is conceptually different from other popular prescriptions, where the reference scales do not belong to a null-evolution line. In this case, the resulting (reference) TMD distribution includes an admixture with the perturbative evolution factor evaluated at different values of b. Thus it appears that the ζ-prescription has an important advantage that the resulting TMD distribution is independent of any perturbative parameter, i.e. it is completely non-perturbative JHEP06(2020)137 and one can freely parameterize a distribution without any reference to perturbative order. For a detailed description and analyses of TMD evolution and the ζ-prescription we refer to [11], whereas here we present only the final expressions without derivation.
The system of TMD evolution equations is where F is any TMD distribution (f 1 or D 1 in the present case). The TMD evolution equations are not sensitive to the flavor of a parton 1 and thus we omit flavor indices in this section for simplicity. The eq. (2.65) is a standard renormalization group equation, which comes from the renormalization of the ultraviolet divergences, with the function γ F (µ, ζ) being the anomalous dimension. The eq. (2.66) results from the factorization of rapidity divergences. The function D(µ, b) is called the rapidity anomalous dimension (RAD). The RAD is a generic non-perturbative function that can be computed at small values of b in perturbation theory. The perturbative expression for the RAD and γ F can be found in the literature (e.g. see appendix of ref. [26]). In this work we use the resummed version of RAD [58]. The resummed expressions are also given in appendix B (see also appendix B in ref. [39]). The scales µ and ζ have an independent origin, and this has important consequences. To start with, the TMD evolution takes place in the plane (µ, ζ). The solution of equations eq. (2.65), (2.66) for the evolution from a point ( where P is any path in (µ, ζ)-plane that connects initial (µ i , ζ i ) and final points (µ f , ζ f ). The value of evolution is (in principle) independent on the path, thanks to integrability condition (also known as Collins-Soper (CS) equation [41]) where Γ cusp (µ) is the cusp anomalous dimension. This equation dictates the logarithmic structure of anomalous dimensions. In particular, the TMD anomalous dimension is The formal path-independence of eq. (2.67) is violated at any fixed order of perturbation theory. The penalty term is proportional to the area surrounded by paths, and can be huge in the case of very separated scales. Nevertheless, the path dependence decreases with the increase of the perturbative order and it is numerically small at N 3 LO [11].

JHEP06(2020)137
The final scales of the evolution are binded to the hard scale of factorization such that µ 2 f ∼ Q 2 and ζ 1f ζ 2f = Q 4 . In particular, we choose the symmetric point The TMD initial (or defining) scale is chosen with the ζ-prescription and deserves some explanation. In the ζ-prescription the scales µ and ζ belong to a null-evolution line, that we parameterize as (µ, ζ µ (b)). To find the null-evolution line, we recall that the system of eq. (2.65), . Therefore, the null-evolution line is simply an equipotential line of the field E. It provides the equation that define ζ µ (b) such that

71)
A TMD distribution does not evolve between scales belonging to the same equipotential line by definition. Among equipotential lines there is a special line that passes through the saddle point (µ 0 , ζ 0 ) of the field E. The values (µ 0 , ζ 0 ) are defined as The special equipotential line is preferable for the definition of TMD scales for two important reasons. First, there is only one saddle point in the evolution field, and thus, the special null-evolution line is unique. Second, the special null-evolution line is the only nullevolution line, which has finite ζ at all values of µ (bigger than Λ QCD ). These properties follow from its definition and they are very useful. In figure 2 we show the force-lines of the evolution field E (in grey, with arrows), null-evolution lines, (thick grey lines, orthogonal to the force-lines), and the lines that cross at the saddle point (in red) at different values of b. In this figure the special line is the one that goes from left to right in each panel.
The concept of ζ prescription has been introduced in ref. [19] and elaborated in [11]. Presently we use a form slightly different from the original version of refs. [11,19]. Here we follow the updated realization introduced in ref. [21] that has been used for the description of the pion-induced DY process. In refs. [19,20] the ζ-lines has been taken perturbative for all ranges of b (with slight deformations due to the Landau pole). Notwithstanding, such definition introduces an undesired correlation between the non-perturbative parts of the TMD distribution and RAD. In ref. [21] a new simple solution has been found for the values of special null-evolution line at large b that accurately incorporates non-perturbative effects, without adding new parameters to the fit. In appendix C we present the expression for the special line as it is used in this fit.
A TMD distribution F (x, b; µ, ζ µ ) with ζ µ belonging to the special line is called optimal TMD distribution, and denoted by F (x, b) (without scale arguments), to emphasize its uniqueness and independence on scale µ. The exact independence of optimal TMD distribution on scale µ, allows us to select the simplest path for the evolution exponent in eq. (2.67), that is, the path at fixed value of µ = Q along ζ from the value ζ f = Q 2 down JHEP06(2020)137 to any point of ζ i = ζ Q (b). In figure 2 this path is visualized by black-dashed lines. The resulting expression for the evolved TMD distributions is exceptionally simple We recall that this expression is same for all (quark) TMDPDFs and TMDFF. Substituting (2.73) into the definition of structure functions W we obtain, These are the final expressions used to extract the NP functions.
The simplicity of expressions (2.74), (2.75) is also accompanied by a good convergence of the cross section. In figure 3 we show the comparison of curves for DY and SIDIS cross-section at typical energies. In the plot the TMD distributions and the NP part of the evolution are held fixed while the perturbative orders are changed. The perturbative series converges very well, and the difference between NNLO and N 3 LO factorization is of order of percents. This is an additional positive aspect of the ζ-prescription, which is due to fact that all perturbative series are evaluated at µ = Q.

Matching of TMD distribution to collinear distributions
The TMD are generic non-perturbative functions that depend on the parton fraction x and the impact parameter b. A fit of a two-variable function is a hopeless task due to the enormous parametric freedom. This freedom can be essentially reduced by the matching of a b → 0 boundary of a TMD distribution to the corresponding collinear distribution. In the asymptotic limit of small-b one has where f 1 (x, µ) and d 1 (x, µ) are collinear PDF and FF, the label f runs over all active quarks, anti-quarks and a gluon, and with γ E being the Euler constant and g being QCD coupling constant. The extra factor y −2 in eq. (2.77) is present due to the normalization difference of the TMD operator in eq. (2.21) and the collinear operator, see e.g. [5,25]. The coefficient functions C and C can be calculated with operator product expansion methods (for a general review see ref. [59]) and in the case of unpolarized distributions the coefficient functions are known up to NNLO [23,25,26,29]. The coefficient function C has the general form

JHEP06(2020)137
wherex = 1 − x, the symbol ⊗ denotes the Mellin convolution, and a summation over the intermediate flavour index k is implied. In eq. (2.79) we have omitted argument x of functions on left-hand-side for brevity. The functions P (n) (x) are the coefficients of the PDF evolution kernel P (x) = n a n s P (n) (x) (DGLAP kernel), which can be found f.i. in ref. [60]. The functions C (n,0) f ←f (x) are given in [23,25,26,29]. In particular, the NLO terms are The last term in the square brackets of eq. (2.79) is the consequence of the boundary condition of eq. (2.72), and it consists of some coefficients of the anomalous dimension defined in eq. (B.2), (C.8).
In the case of TMDFF the matching coefficient C follows the same pattern as in eq. (2.79) with the replacement of the PDF DGLAP kernels P [25,26]. In TMDFF case, the NLO terms are As a consequence of the ζ-prescription the scale of operator product expansion µ OPE is independent on external parameters. In particular, it has no connection to the scales of the TMD evolution, as it happens f.i. in the case of b * -prescription [5,42]. In other words, in the ζ-prescription, the scale µ OPE is entirely encapsulated inside the convolutions in eq. (2.76), (2.77). This fact gives an enormous advantage to achieve a complete decorrelation of RAD from TMD distributions (we will be more quantitative about this point in later sections). The optimal TMD distributions as any scale-less observables, are formally, independent on the value of µ OPE given the good convergence of perturbative series. So, the scale µ OPE has to be selected such that on one hand, it minimizes the logarithm contributions at b → 0, and on another hand, it does not hit the Landau pole at large-b. For TMDPDF, we use the following value whereas for TMDFF we use The extra factor z in (2.83) effectively compensates ln z terms in the matching coefficient, and in this way improve the convergence of the series (e.g. it completely neglects ln z terms in NLO expressions (2.81)). The choice of the large-b offset of µ OPE as 2 GeV is arbitrary, with the only motivation that it is a typical reference scale for PDFs (and lattice JHEP06(2020)137 calculations). In the ζ-prescription, this scale is intrinsic to the model of TMD distribution, and thus, any modifications in it would be absorbed by NP parameters discussed in the next section. Let us note that in the ζ-prescription, the coefficient functions of small-b matching in eq. (2.79) do not contain a double-logarithm contribution. For that reason the perturbative convergence, as well as the radius of convergence improves. Both these facts make the ζprescription highly advantageous.

Ansatzes for NP functions
In this work we deal with three independent non-perturbative functions in total. These are the unpolarized (optimal) TMDPDF, f 1 (x, b), the unpolarized (optimal) TMDFF, D 1 (x, b), and the RAD, D(b, µ). The amount of perturbative and non-perturbative contributions to each function depends on the value of the impact parameter b. Namely, at small values of b the perturbative approximation is good and the TMD distributions can be matched onto collinear functions as in eq. (2.76), (2.77). In the case of the RAD the small-b limit is given in appendix B. The small-b perturbative expressions gains power corrections in even powers b 2n [62]. Therefore, with the increase of b the perturbative approximation becomes less and less correct, and must be replaced by some generic function.
The phenomenological ansatzes for TMD distributions that satisfy this picture, can be written as following: where functions f NP and D NP are non-perturbative functions. Note, that in our ansatz we do not modify the value of b within the coefficient function. Therefore, at large-b the logarithm part of the coefficient function grows unrestrictedly. This growth is suppressed by the non-perturbative functions.
Generally, the functions f NP and D NP depend also on parton flavor f and hadron type h. However, in the present work we use the approximation that f NP and D NP are flavor and hadron-type independent. All hadron-and flavor dependence is driven by the collinear PDFs and FFs (see also section 4.1). Given such an ansatz the only requirement for NP functions is that they are even-functions of b that turn to unity for b → 0 (see ref. [62] for an analysis of these processes using renormalons). We use the following parameterizations

JHEP06(2020)137
and we extract λ i and η i from our fit. The functional form of f N P has been already used in [20]. It has five free parameters which grant a sufficient flexibility in x-space as needed for the description of the precise LHC data. The form of D NP has been suggested in [18] (albeit there are more parameters in [18]). In both cases the function has exponential or Gaussian form depending on the relative size of λ 1,2,5 /λ 3 , and η 1,2 /η 3 . There are natural restrictions on the parameter space λ 1,2,3 > 0, η 1,2,3 > 0, λ 5 −2(λ 1 + λ 2 ), due to the request that TMD distribution is null for b → ∞.
We use the following ansatz for the NP RAD, The the term c 0 bb * (b) dictates the large-b behavior of the RAD and its form is suggested in [20].
The linear behavior is suggested by model calculations of the RAD [63,64]. Generally, the asymptotic behavior of RAD could vary from constant to linear [64][65][66].
The function D resum is the resummed perturbative expansion of RAD [11,58] reported in the appendix B. At LO it reads The higher order expressions (up to N 3 LO) are given in eq. (B.5). The parameters c 0 and B NP are free positive parameters, in principle totally uncorrelated from the rest of non-perturbative parameters.
The resummed expression for RAD shows explicitly a singularity in b (see e.g. eq. (2.90)). The singularity designates the convergence radius of the perturbative expression. Consequently, the perturbative behavior must be turned off well before b approaches the singularity. In the ansatz in eq. (2.88), this is achieved freezing the perturbative part at b ∼ B NP . The singularity is located at β 0 a s (µ)L µ = 1 and thus, the value of B NP is restricted from above by: The special null-evolution line can be incorporated both at perturbative and nonperturbative level. In [19] and [20] the special null-evolution line included only its perturbative part for simplicity. This part is the most important one because it guarantees the cancellation of double-logarithms in the matching coefficient. However, at large-b, the non-perturbative corrections to the RAD are large and cannot be ignored: in [19] they can be seen as a part of the non-perturbative model, at the price of introducing an undesired correlation between f N P and D. In order to adjust the null-evolution curve with a non-perturbative RAD one has to solve eq. (2.71) including the RAD in the full generality. Such solution can be found in principle, but its numerical implementation is problematic at very small-b, because it is very difficult to obtain the exact numerical cancellation of JHEP06(2020)137 Table 1. Summary of the perturbative orders used for each part of the factorized cross section. The evolution of α s is provided by the LHAPDF library and comes together with PDF set (uniformly nnlo). In brackets we write the last included term of corresponding perturbative expansion (B.4), (B.5), (C.4), (C.8), (C.12).
the perturbative series of logarithms with an exact solution. To by-pass this problem we use the perturbative solution at very small b, (and hence cancel all logarithm exactly) and turn it to an exact solution at larger b. This is realized by we have the perturbative solution, and one turns to the exact for larger b. Since the RAD is entirely perturbative at small-b, the numerical difference between eq. (2.91) and ζ exact µ (b) is negligibly small.

Summary on theory input
The structure functions W f 1 D 1 and W f 1 f 1 are evaluated according to eq. (2.74), (2.75). The phenomenological ansatzes for the optimal unpolarized TMDPDF and TMDFF are defined in eq. (2.84), (2.85), (2.86), (2.87). At small-b TMD distributions are matched to corresponding collinear distributions. The phenomenological ansatz for the RAD is given in eq. (2.88). In table 1 we list the perturbative orders used in each factor of the cross section. The N 3 LO perturbative composition used here is equivalent to the one used in [39,40] on the resummation side. A total of 11 phenomenological parameters are determined by the fit procedure. Two of these parameters describe the RAD, 5 are for the unpolarized TMDPDF, and 4 are for the unpolarized TMDFF. Additionally, TMDPDFs and TMDFF depend on collinear distributions. Thus collinear distributions can be seen as parameters of our model that we take from others fits. We have found that the quality of fit highly depends on the choice of collinear distributions (we can address this fact as the "PDF-bias" problem). The study of this issue is in section 5.1, 6.1.

Data overview
In the present work, we consider the extraction of unpolarized TMD in DY and SIDIS data, extending so the analysis of ref. [20] and including the theoretical improvements described in the previous sections. The selection of data is crucial for a proper TMD extraction, because of the limits imposed by the factorization theorem. These constraints are here discussed for both type of reactions.
Kinematics N pt after cuts HERMES p → π + [67] 0.023 < x < 0.6 (6 bins) 0.2 < z < 0.8 (6 bins) Total 582 Table 2. Summary of the SIDIS data included in the fit. For each data set we report reference, reaction, kinematic region, and number of points that are left after the application of consistency cuts in eq. (3.1), (3.2).

SIDIS data
In the current literature, one can find several measurements of the unpolarized SIDIS [67][68][69][70][71][72] and a total of some thousands of data points. We restrict our attention only to those data whose kinematical features are compatible with the energy scaling of TMD factorization theorem. The first constraint comes from the di-lepton invariant mass (Q) and in general from the energy scale of the processes. Most of SIDIS reactions have been measured at fixed target experiments, that are typically run at low energies. Unfortunately much of these data do not accomplish the QCD factorization request of a high Q to separate field modes. To secure our analysis (but still leave some data) we have used a restriction on the average Q of a data point, namely Here, Q is the value of Q averaged over the multiplicity value in a bin, see figure 4. The restriction in eq. (3.1) quite reduces the pool of data. In particular, eq. (3.1) completely discards the JLAB measurement published in [71], and cuts out the most part of HERMES data in ref. [67]. The second constraint comes from the TMD factorization assumptions. Namely, the TMD factorization regime is fully consistent only for low values of q T /Q and receives quadratic power corrections of order (q T /Q) 2 , see eq. (2.24) and eq. (2.31). We consider data such that where the value 0.25 was deduced in [19]. From the q T interval of eq. (3.2), one can expect a ∼ 4 − 6% influence of the power corrections, which is well inside the uncertainties of the JHEP06(2020)137 data. In section 6.3, we have tested cutting the condition in eq. (3.2) considering the data at different δ, and found eq. (3.2) sufficient. It should not pass unobserved that eq. (3.2) is written in terms of q 2 T , that is the natural variable of TMD factorization approach, whereas the data are presented in terms of p 2 ⊥ . These variables are related by q 2 T p 2 ⊥ /z 2 , see eq. (2.13). Thus, the cut in eq. (3.2) puts also a restriction on z. Altogether it makes the allowed values of p 2 ⊥ even smaller, p ⊥ 0.25zQ. In particular, we have to completely discard the measurements of H1 and ZEUS collaborations [69,70] that are made at very small values of z, despite the relatively high values of Q.
After the application of eq. (3.1), (3.2) we are left with the data taken by HERMES and COMPASS 2 collaborations [67,68]. For HERMES we have selected the zxpt-3Dbinning set due to the finer bins in p T . The COMPASS data includes the subtraction of vector-boson channel, and thus we also select the subtracted HERMES data (.vmsub set). In total we have 582 points that cover the region of 1.5 Q 9 GeV, 10 −2 x < 0.6, 0.2 < z < 0.8. The summary of the considered data is reported in table 2.

DY data
The DY data are selected following the same principles as the SIDIS data, eq. (3.2) (the rule (3.1) makes no sense now, because DY processes are measured at sufficiently highenergies) with only small modifications. The changes consist in cutting some extra higherq T data points for several specific data sets (this concerns mainly ATLAS measurements of Z-boson production). The reason for it is that the estimated size of power corrections at q T /Q ∼ 0.25 is of order of 5%, however, some highly precise data are measured with much better accuracy. So, given a data point p ± σ, with p being the central value and σ JHEP06(2020)137 Total 457 * : Bins with 9 Q 11 are omitted due to the Υ resonance. Table 3. Summary table for the data included in the fit.. For each data set we report: the reference publication, the centre-of-mass energy, the coverage in Q and y or x F , possible cuts on the fiducial region, and the number of data points that survive the cut in eq. (3.3).

JHEP06(2020)137
its uncorrelated relative uncertainty, corresponding to some values of q T and Q, we include it in the fit only if In other words, if the (uncorrelated) experimental uncertainty of a given data point is smaller than the theoretical uncertainty associated to the expected size of power corrections, we drop this point from the fit. This is the origin of the second condition in eq. (3.3).
The resulting data set contains 457 data points, and spans a wide range in energy, from Q = 4 GeV to Q = 150 GeV, and in x, from x ∼ 0.5 · 10 −4 to x ∼ 1. Table 3 reports a summary of the full data set included in our fit. This selection of the data is the same as the one considered in our earlier work [20]. In the current fit, we compare the absolute values of the cross-section, whenever they are available. The only data set that require normalization factors are all CMS data, ATLAS at 7 TeV, and D0 run2 measurements. For these sets we have normalized the integral of the theory prediction to the corresponding integral over the data (see explicit expression in ref. [19]).

Summary of the data set
In total for the extraction of unpolarized TMD distribution we analyze 1039 data points that are almost equally distributes between SIDIS (582 points) and DY (457 points) processes. All these points contribute to the determination of the TMD evolution kernel D and unpolarized TMDPDF f 1 . The determination of unpolarized TMDFF is based only on SIDIS data. In addition, we recall that a single DY data point is simultaneously sensitive to a larger and a smaller value of x. This is because the cross section is given by a pair of TMDPDFs, eq. (2.51), computed at x 1 and x 2 such that x 1 x 2 Q 2 /s. So, the statistical weight of a DY point in the determination of TMDPDF is effectively doubled.
The kinematic region in x and Q covered by the data set and thus contributing to the determination of TMDPDF is shown in figure 5. The boxes enclose the sub-regions covered by the single data sets. Looking at figure 5, it is possible to distinguish two main clusters of data: the "low-energy experiments", i.e. E288, E605, E772, PHENIX, COMPASS and HERMES that place themselves at invariant-mass energies between 1 and 18 GeV, and the "high-energy experiments", i.e. all those from Tevatron and LHC, that are instead distributed around the Z-peak region. From this plot we observe that, kinematic ranges of SIDIS and DY data do not overlap.
As a final comment of this section let us mention that our data selection is particularly conservative because it drops points that could potentially be described by TMD factorization (see e.g. ref. [18] where a less conservative choice of cuts is used). However, our fitted data set guarantees that we operate well within the range of validity of TMD factorization. In section 7 we show that unexpectedly our extraction can describe a larger set of data as well.

Fit procedure
The experimental data are usually provided in a form specific for each setup. In order to extract valuable information for the TMD extraction, one has to detail the methodology that has been followed, and this is the purpose of this section. Finally, we also provide a suitable definition of the χ 2 that allows for a correct exploitation of experimental uncertainties.

Treatment of nuclear targets and charged hadrons
The data from E288, E605 (Cu), E772, COMPASS, part of HERMES (isoscalar targets) come from nuclear target processes. In these cases, we perform the iso-spin rotation of the corresponding TMDPDF that simulates the nuclear-target effects. For example, we replace u-, and d-quark distributions by where A(Z) is atomic number(charge) of a nuclear target. In principle, for E288, E605 data extracted from very heavy targets one should also incorporate the nuclear modification factor that depends on x. In the given kinematics the nuclear modification factor produces effects of order 5-10% in the normalization of the cross-section. The shape of cross-section is changed in much smaller amount, about 1% in a point, as it is shown in f.i. [21,85]. Simultaneously, the systematic (correlated) errors of these experiments are large 25% and 20%, correspondingly, as well as the uncorrelated error (typically 2-5%). Therefore, we are not sensitive to nuclear modification effect. The measurements of SIDIS are made in a number of different channels. The HER-MES data include π ± and K ± , and COMPASS data are for charged hadrons, h ± . Pions and kaons are described by an individual TMDFFs. However, charged hadrons are a com-JHEP06(2020)137 position of different TMDFFs. According eq. (2.21) the TMDFF for charged hadrons is a direct sum of TMDFFs for individual hadrons: where dots denote the higher-mass hadron states. At COMPASS energies, this sum is dominated by the pion (65 − 75%), and the kaon (15 − 20%) contributions. The residual term is lead by proton/antiproton contribution (2−5%). The contribution of other particles is smaller (for discussion and references see [86,87]). Thus, in our study we use the first two terms of eq. (4.3) to simulate the charged hadron fragmentation. The SIDIS measurements in refs. [67,68] are given in form of multiplicities. The SIDIS multiplicity is defined as where dσ DIS is the differential cross-section for DIS. It reads where F 1 and F 2 are DIS structure functions. The DIS cross-section cannot be computed starting from TMD factorization, but it is described by the collinear factorization theorem. In order to evaluate the multiplicity we have pre-computed the DIS cross-section (integrated over the given bin) by the APFEL-library [88], and then divided the TMD prediction according to eq. (4.4).

Bin integration in SIDIS and DY
The majority of SIDIS data is measured at relatively low-Q and in large bins. The cross-section value changes greatly within a bin, and so, binning effects are known to be strong. For a measured cross-section dσ/dxdzdQ 2 dp 2 ⊥ , a bin is specified by {x min , x max }, {z min , z max }, {Q min , Q max } and {p min , p max }. The binning constraints impose certain cuts on the measured phase space. Typically, these cuts are given as intervals of the variable y and of the invariant mass of photon-target system W 2 = (P + q) 2 , which belong to ranges {y min , y max } and {W 2 min , W 2 max }. Both these variables are connected to x and Q 2 , where s is the Mandelshtam variable s = (P + l) 2 . So, in the presence of fiducial cuts in SIDIS the bin boundaries arê x max (Q) = min x max , (4.10)

JHEP06(2020)137
An example of effects of cuts in the bins is shown in figure 4. In the case of multiplicity measurements the bin effects are taken into account with the cross-section where the expression in the first line is the volume of (z, p 2 ⊥ )-bin. In the case of DY the binning effects are also extremely important. The difference in the value of the cross section between center-of-bin and the averaged/integrated value can reach tenth of percents, especially, for very low-energy bins (where the change in Q is rapid), and for very wide bins (such as Z-boson measurement). We have used the definition

Definition of χ 2 -test function and estimation of uncertainties
To test the theory prediction against the experimental measurement we compute the χ 2 -test function where m i is the central value of i'th measurement, t i is the theory prediction for this measurement and V ij is the covariance matrix. An accurate definition of the covariance matrix is essential for a correct exploitation of experimental uncertainties. In order to build the covariance matrix we distinguish, uncorrelated and correlated uncertainties. For example, a typical data point has the structure i,corr ± · · · ± σ (k) i,corr , (4.14) where m i the reported central value, σ i,stat is (uncorrelated) statistical uncertainty, σ i,unc is uncorrelated systematic uncertainty, and σ (k) i,corr are correlated systematic uncertainties. Uncorrelated uncertainties give an estimate of the degree of knowledge of a particular data point irrespective of the other measurements of the data set. Instead, correlated uncertainties provide an estimate of the correlation between the statistical fluctuations of two separate data points of the same data set. With this information at hand, one can construct the covariance matrix V ij as follows (for more detailed discussion on this definition see refs. [89,90]): j,corr .

JHEP06(2020)137
Equipped with this definition of covariance matrix the χ 2 -test in eq. (4.13) takes into account the nature of the experimental uncertainties leading to a faithful estimate of the agreement between data and theoretical predictions.
To estimate the error propagation from the experimental data to the extracted values of TMD distributions we have used the replica method. This method is described in details in ref. [89]. It consists in the generation of N replicas of pseudo-data, and the minimization of the χ 2 on each replica. The resulting set of N vectors of NP parameters is distributed in accordance to the distribution law of the data. And thus, it represents a Monte Carlo sample that is used to evaluate mean values, standard deviation and correlations of the NP parameters. For the estimation of error propagation we consider N = 100 replicas. The procedure of χ 2 -minimization for each replica is the most computationally heavy part of the fit.
The proper treatment of correlated uncertainties is essential in global analysis. The presence of sizable correlated uncertanties could result into a misleading visual disagreement between theory prediction and the (central values of) data points. Namely, the theory prediction for a data set could be globally shifted by significant amount, that is nonetheless in agreement with correlated experimental uncertainty. To quantify the effects of correlated shifts we use the nuisance parameter method presented in [89,90]. Within the nuisance parameter method one is able to determine the shift d i of a theory prediction t i for the i'th data point, such thatt i = t i + d i contributes only to the uncorrelated part of the χ 2 -value. The value d i is interpreted as a shift caused by the correlated uncertainties. It is computed as where It also instructive to check the average systematic shift, which we define as It shows a general deficit/excess of the theory with respect to the data for a given data set. Let us note that the multiplicities in SIDIS are experimentally convenient because the systematic uncertainties related to the measurement efficiency and the beam luminosity cancel in the ratio. However, theoretically, the multiplicities are not so well defined, since the denominator and the numerator of multiplicity ratio (4.4) need a completely different theoretical treatment. In order to account this effect, we have computed the uncertainty of theory prediction for DIS cross-section for each bin and added it as a fully correlated error for each data set. We should admit that the theory uncertainty for DIS cross-section is negligibly small (typically, 0.1 − 2.0%) in comparison to systematic uncertainties of experiment. As a result the values of χ 2 change very little on the level of ±10 −2 per point.

Artemide
The computation of the cross-section is made with the code artemide that is developed by us. Artemide is organized as a package of Fortran 95 modules, each devoted to evaluation of a single theory construct, such as the TMD evolution factor, a TMD distribution, or their combinations such as structure functions W and cross-sections. The artemide also evaluates all necessary procedures needed for the comparison with the experimental data, such as bin-integration routines and cut factors. For simplicity of data analysis artemide is equipped by a python interface, called harpy. The artemide package together with the harpy is available in the repository [43,44].
The module organization of artemide allows for flexible use. In particular, it gives to a user a full access to non-perturbative ansatzes and models. Although artemide is based on the ζ-prescription, it also includes other strategies for TMD evolution, such as CSS evolution [42], γ-improved evolution [11] and their derivatives. The user has full control on the perturbative orders, and can set each individual part to a particular (known) order. Currently, artemide can evaluate unpolarized TMD distributions, and linearly polarized gluon distributions together with the related cross-sections, such as DY, SIDIS, Higgsproduction (for application see [91]), etc. In future, we plan to include more processes and distributions.
The evaluation of a single cross-section point that is to be compared with the experimental one, implies the evaluation of a number of integrals: two Mellin convolutions for small-b matching eq. (2.84), (2.85), the Hankel-type integral for the structure function W eq. (2.75), (2.74), and 3(in DY case)/4(in SIDIS case) bin-integrations. Note, that in the ζ-prescription one does not need to evaluate integrations for TMD evolution, which is its additional positive point. Altogether, it makes the evaluation of TMD cross-section rather expensive in terms of computing time. Artemide uses adaptive integration routines to ensure the required computation accuracy. To speed-up the evaluation, artemide precomputes the tables of Mellin convolutions for TMD distributions that are the most time-consuming integrations. The code presently takes about 4.5 (3.2) minutes to evaluate a single χ 2 value for the full data set of DY and SIDIS given in section 3 on an average 8-core (12-core) processor (2.5GHz) depending on the NP-values. Therefore, the minimization χ 2 and especially the computation of error-propagation are especially long. Due to that we are restricted in certain important directions of studies (e.g. error-propagation of PDF sets, and flavour dependence).

Fit of DY
The data-set and the functional input for the DY fit is inherited from our earlier study [20]. The only modification is the update of the functional form of the special null-evolution line in eq. (2.91), which in the present case matches the exact solution at large-b. This update leads relatively minor formal changes, while some values of the model parameter are changed as a result of the fit. The value of χ 2 (per 457 points) is reduced from 1.174 [20] → 1.168 (this work). The main impact takes place at low-energies. In particular, the typical deficit in the cross-section for low-energy experiments is reduced by 5  In this section, we present the fit of DY data-set only. Since the general picture is similar to ref. [20], we concentrate on the sources of systematic uncertainties of our approach. We discuss the dependence on the collinear PDF, that serves as a boundary for TMDPDF, and the effects of q T corrections in the definitions of x 1,2 .

Dependence on PDF
The collinear PDF is an important part of our model for TMDPDF, e.g. eq. (2.84). The issue of PDF-bias of our result can be stated in the following terms. The small-b matching essentially reduces the number of NP parameters for TMDPDF and guarantees the asymptotic agreement of the TMDPDF with the collinear observables. The small-b part of the Hankel integral gives a sizable contribution to the cross-section, especially for q T ∼ 10-20 GeV. Therefore, the quality of our fit and the values of the extracted NP parameter are robustly correlated with the collinear PDF set. This observation has been made earlier, e.g. see discussion in [15,18,20], but it has not been systematically studied. Ideally, the PDF set and TMDPDF are to be coherently extracted in a global fit of collinear and TMD observables. Meanwhile, we treat the collinear inputs as independent parameters that we cannot control and we test various sets available in the literature.
There is an enormous amount of available PDF sets. We have tested some of the most popular sets that are recently extracted at NNLO accuracy, see table 4. All sets have LHAPDF interface [97]. For each PDF set we have performed the full fit procedure with the estimation of the error-propagation. In the fit, the central value of PDFs are used, see section 5. Contrary to the RAD, the parameters λ i show a significant dependence on the collinear PDF (see figure 7). This fact is expected, since a different collinear PDF dictates a different shape in x, while the Q-dependence is not changed. The parameters λ 1 and λ 2 do not change significantly with different PDFs, while a bigger change is provided by the parameters λ 3,4,5 . This is because the parameters λ 1,2 dictate the main shape of f N P at JHEP06(2020)137   3 in ref. [20], and also table 8). In the case of HERA20 this tension reduces. The value of χ 2 /N pt for ATLAS measurements is practically the same in both cases, we find 2.02 NNPDF vs. 1.99 HERA for N pt = 55 (note, that the bin-by-bin distribution of χ 2 changes between the sets). On contrary, the value of χ 2 /N pt for LHCb measurement undoubtedly differ in the two sets of PDF, as we find 2.93 NNPDF vs. 1.24 HERA for N pt = 24. The main part of the improvement happens due to the general normalization, that is lower by 3-5% in NNPDF case, and almost exact in HERA case.  Table 5. Values of χ 2 and NP parameters obtained in the fit of DY set of the data with different PDF inputs. Each set of PDF provide the corresponding value of α s (M Z ).
The TMD distributions with NNPDF31 and HERA20 show a χ 2 value better than all the other, e.g. table 5. These PDFs have also less tension between high-and lowenergy data. For this reason, in the next sections we will consider only PDFs from these extractions. Nonetheless, we preferably select NNPDF31 set in the global SIDIS and DY analysis. The reason is that NNPDF31 distribution is extracted from the global pool of data, whereas HERA20 uses exclusively data from HERA. At the same time, we must admit that HERA20 distribution provides a spectacularly low values of χ 2 in our global fit.

Impact of exact values for x 1,2 and power corrections
As discussed in section 2.5, the factorization formula eq. (2.59) for DY contains three types of power corrections. The corrections related to TMD factorization cannot be tested, without extra modeling. The corrections due to fiducial cuts must be included without restrictions. Thus it is possible to test only power corrections due to the presence of q T /Q terms in the exact definition of x 1,2 , eq. (2.49). The amount of this correction is obtained comparing the fits of the DY data with The approximate values for x 1,2 lead to higher values of χ 2 . In particular, with the approximate x 1,2 for the NNPDF31 set we have obtained χ 2 /N pt = 1.35 and 1.27 at NNLO
Comparing these values to the ones reported in table 5 (1.14 and 1.13; 0.95 and 0.06, respectively), we conclude that the quality of fit is worse. The deterioration of the fit quality takes place in both high-and low-energy parts of the data. In the ATLAS experiment (that is the most precise set at our disposal, with N pt = 55), we observe the changes in χ 2 /N pt : 1.82 → 2.83 for NNPDF31 and 1.90 → 2.27 for HERA20. For the fixed target experiments we have χ 2 /N pt : 0.91 → 1.31 for NNPDF31 and 0.71 → 0.97 for HERA30 (here N pt = 260). We have also observed that the value of χ 2 worsens mainly due to the change in the shape of cross-section, whereas the normalization part slightly reduces the χ 2 . The values NP parameters varies within the error-bands and the change in the central values is not significant.
Therefore, we conclude that exact values of x 1,2 (2.49) considerably improve the quality of the fit. This conclusion is in agreement with the theory expectations presented in section 2.5.

Uncertainties due to collinear PDFs
The model in eq. (2.84) is not sensitive to changes of the NP parameters at small-b. For this reason, the error-band on the TMD distribution vanishes for b 0.5GeV −1 . The only way to modify the TMD distribution in this region is to vary the values of collinear PDF. In section 5.1 we have demonstrated that the quality of the fit, as well as the values of extracted NP parameters, essentially depend on the collinear PDF and in our extraction we have used the central values of PDF sets, ignoring the uncertainties of PDF determination. These uncertainties are however large and could cover the gap among different TMD fits if taken into account. Unfortunately, the incorporation of the PDF uncertainties into the analysis is extremely demanding in terms of computer time, especially for the full data set. In order to provide a quantitative estimate of the PDF-bias, in this section we consider only the NNPDF31 data set with NNLO TMD evolution for the fit of DY data. We postpone to future work a similar analysis for the other PDF sets.
Thus, we have performed a fit for each one of the 100 replicas of the NNPDF31 collinear distributions. The minimization of the χ 2 is done with a simplified procedure in order to speed up the computation, because for many replicas the search of χ 2 -minimum took much longer time in comparison to the central value minimization. It appears that the data is very demanding on the collinear PDF input. So, for some (distant from the central) replicas the fit does not converge (yielding χ 2 /N pt > 5) or produces extreme values of NP parameters (e.g. B NP < 0.7GeV). The values of NP parameters that run into the boundary of the allowed phase space region were discarded (almost 30% of total replicas). The resulting distribution of NP parameters gives an estimate of the sensitivity for PDF distribution. The NP parameters and their uncertainties that we have obtained are the following In figure 8 we show the comparison of error-bands on the TMDPDF, obtained from the error-propagation from the experiment to NP parameters (blue band), and from the PDF uncertainty (red band), as described above. The main difference in these bands is that the PDF-uncertainty band is sizable already at b = 0, and for larger b these bands expand similarly. The PDF-uncertainty band is different for different flavors, and larger for non-valence partons. The resulting estimation for the (predicted) cross-section is shown in figure 9. For the high energy case, the uncertainty is of order of 1%, while at low energies it reaches 20-40%.
The bands that we show here certainly do not accurately represent the uncertainties of TMDPDF, since many of PDF replicas do not fit the data. It implies that the TMD distributions can be used as a tool for the restriction of collinear PDFs together with the standard collinear observables. At the current stage, we can only conclude that the uncertainties of TMDPDF at small-b (that are out of control in the current model) are sizable. For an accurate estimation of these errors one has to apply more sophisticated techniques, such as reweighing of PDF values [98] by TMD extraction, or even joint fits of TMD distributions and collinear distributions, which are beyond the scope of the present work.

Fit of SIDIS
In this section, we use the unpolarized TMDPDF and TMD evolution, extracted in the fit of DY data, to fit the SIDIS data. The main aim is to test the universality of the TMD evolution, and TMDPDF. Namely, the SIDIS data should be easily fitted adjusting only the parameters of TMDFF. Indeed, we have found that the TMDFF in eq. (2.85) (with a 4-parameter ansatz) together with the TMDPDF and D (extracted from DY data) provide a very good description of the available SIDIS data. This is one of the main JHEP06(2020)137  Table 6. Values of χ 2 and NP parameters obtained in the fit of SIDIS data with different FF inputs. The TMD evolution parameters and TMDPDF parameters are fixed from the fit of DY data (see table 5), and labeled by the PDF set. The visual presentation of this table is given in figure 10.
results of the present work that demonstrates the complete universality of TMD factorization functions. Another test of the TMD universality has been provided in [21], that is in the fit of pion-induced DY, and it has been used in studies of the TMD distributions with jets [99][100][101]. To our best knowledge, the test presented here is made for the first time, because in the previous studies DY and SIDIS cases were considered or independently or simultaneously [18]. Also we discuss the dependence on the collinear unpolarized FF, and the impact of power corrections.

Dependence on FF
In contrast to collinear PDFs, there are not too many extraction of collinear FFs. We have considered three sets of collinear FFs. Namely, DSS set 3 (that is a composition of pion FFs from [102] (DSS14) and kaon FFs from [103] (DSS17)), the JAM19 set [104] and the NNFF10 set [87]. All these extractions are made with NLO collinear evolution (a 2 s ). The comparison of fits with different FFs, and some of TMDPDF (together with TMD evolution) extracted in the previous section are presented in table 6. The NNFF set is not presented in the table due to the low quality of the predictions, as it is described below. As it is seen from table 6, the TMD factorization perfectly describes the low-q T SIDIS data with TMDPDF and TMD evolution fixed by DY measurements. It is one of the main result of the present analysis.
The values of χ 2 /N pt are rather small (e.g. 0.76 for combination of HERA20 & DSS), which may indicate an over-fit problem. However, this is not the case for the following reason. The main source of low-χ 2 is the COMPASS data set. The COMPASS data have very large uncorrelated systematic uncertainty for a great amount of points. Here, the systematic uncertainty is (much) larger than the statistical uncertainty, and therefore, the COMPASS data points form smooth lines with huge uncorrelated uncertainty band. As a result, the contribution of each point to the χ 2 -value is small.
The values of χ 2 depend on the input TMDPDF and TMD evolution (compare NNLO and N 3 LO cases) in a reasonable amount. This is mainly due to the different values of c 0 constant in these cases. We recall that the SIDIS measurements are made at much lower energy in comparison to DY, and thus they are more sensitive to D at large-b. Later in section 7 we show that in the joint fit of SIDIS and DY data, the uncertainty of the evolution factor at large-b is reduced.
The difference between DSS and JAM collinear FFs sets is of minor importance. It is due to the fact low-energy data are less sensitive to the small-b part of the TMD distributions (and thus to collinear distributions). Given in addition that the data are not very precise, the uncertainty in FF sets are compensated by the NP function D N P . The effect of compensation is clear from the very different values of η i constants for DSS and JAM19 set. Note, that in all cases we obtain a positive and sizable b 2 -term in D N P (parameter η 4 ). It could indicate a hidden issue in the values of collinear FFs. However, we conclude JHEP06(2020)137 that contrary to DY case, the SIDIS TMD data are not very restrictive on the values of collinear FFs. The NNFF distributions are not able to fit the data with a χ 2 /N pt better than ∼ 6.8. The reason of such an enormous discrepancy is obvious. The NNFF1.0 extraction is made from the ee-annihilation data only [87], and thus is sensitive only to particular combinations of quark-flavors. The flavour separation is thus made a posteriori assuming exact iso-spin symmetry. As a result, the FF for sea quarks have very small (and even negative) values. In the processes where the production of a hadron is dominated by the sea-quark channel, the cross-section obtained with NNFF10 collinear FF is much smaller then the experimental one. A crystal clear example is the process p → K − , where both valence quarks of K − ,ūs, are sea-quarks for the proton, and thus the dominant channel is the production of K − from u and d quarks. However, FF for u and d-quarks in K − are negative in NNFF extraction, and the resulting cross-section appears to be negative as well. The situation improves, if we select only the processes with dominant valence channel, e.g. d → π ± , in this case we obtain χ 2 /N pt ∼ 2.2. The COMPASS measurement can be also considered separately with the NNFF1.1 set of FF for charged hadrons [86], in this case we obtain χ 2 /N pt ∼ 1.6. In any case, we have found that NNFF sets of FF are not suitable for the description of SIDIS data.
The uncertainties on NP parameters presented in table 6 are unrealistically small. Given the fact that the data is not very accurate, it indicates a significant underestimation of the uncertainty for TMDFF. We guess that the underestimation of uncertainties is caused mainly by the function bias of D N P . To resolve the situation one could use a more flexible ansatz, e.g. by inclusion of more NP-parameters. Unfortunately, this strategy is not very efficient. Already with the current set of parameters we have very low χ 2 , and the increase of the number of parameters could lead to an over-fit problem. Also, the computation time with a bigger number of parameters increases.

Impact of power corrections
Considering the expression for the SIDIS cross-section eq. (2.31) we distinguish four types of power corrections: (m/Q) the corrections due to non-zero produced mass, (M/Q) the corrections due to non-zero target mass, (q T /Q) the q T /Q-terms in the expression for cross-section and (x S z S ) the q T /Q-terms in the expressions for x S and z S , eq. (2.15). In order to test the impact of these corrections, we have performed the (central value) fits including corrections in different combinations. The resulting values of χ 2 /N pt are reported in table 7.
Let us summarize the observations: • Produced mass corrections. The produced mass-corrections are not necessary extremely small, as it is typically assumed. These corrections appear in the ratio with other kinematic variables through the variable ς 2 , eq. (2.11). In most part of data bins the value of ς 2 is negligible, ς 2 ∼ 10 −3 , but for some low-energy and low-z bins it can reach ς 2 ∼ 10 −2 . For example, the HERMES bin with 0.  with produced kaon has ς 2 ∼ 0.04. As it is clear from table 7, current data are not sensitive to these corrections. The difference in χ 2 /N pt is of the order 10 −3 .
• Target mass corrections. The target mass corrections appear through the variable γ 2 in eq. (2.11) and at low Q it has a rather significant size, e.g. for some bins in HERMES data γ 2 ∼ 0.13, for some bins in COMPASS data γ 2 ∼ 0.06. Therefore, one can expect up 10% impact of γ 2 for certain bins. Note, that the dependence on γ 2 is non-linear and is different in different edges of the bin. Checking the values in table 7, we observe that the target mass correction produces a small but visible effect on the fit quality especially for HERMES data where the change in χ 2 /N pt is 1.09 → 1.24.
• q T /Q correction in kinematics. This correction cannot be large due to the cuts on the data sets that we have performed. For q T ∼ 0.25Q which is the highest value of q T that we have considered, we can have (q T /Q) 2 ∼ 0.06. In addition, the first correction of this type to the cross section is linear in (q T /Q) 2 and it can be easily compensated by a change of the non-perturbative parameters in D NP . Indeed, we observe that the impact on the χ 2 is small.
• q T /Q correction in x S and z S . For q T ∼ 0.25Q (which is the maximum considered q T ), the difference between exact x S and x is ∼ 0.06, and much smaller between z S and z. Nonetheless, this correction changes the shape of the cross-section in a way that is difficult to compensate by NP parameters. Thus, the inclusion of this correction visibly improves the agreement. Let us note that the same conclusion has been made for the DY case, in section 5.2.
We conclude that the impact of each individual correction is rather small, but the inclusion of any of them improves the agreement between theory and data. Most relevant effects are the target mass correction and the ones due to x S and z S . Accounting of all effects simultaneously leads to a qualitative improvement in χ 2 -values. We also admit that the inclusion of power corrections considerably affect the values of parameters η (especially η 1,4 ). The values of parameters η 1,4 varies in the range (−10, +25)%. The values of parameters η 2,3 varies in the range (−4, +8)%. It shows that our estimation of uncertainties on parameters presented in table 6 are extremely underestimated. Possibly, the main source of underestimation is the bias of our model, which is JHEP06(2020)137 not surprising since we have only 4 parameters for all partons flavors and particle kinds.
The tests of power corrections suggest that the real error-band on the extracted TMDFF is an order of magnitude larger.

Limits of TMD factorization for SIDIS
In ref. [19] we tested the limits for TMD factorization using the DY data, showing that the natural limit of the leading power TMD factorization is δ 0.2 − 0.25, where δ = q max T /Q and q max T is the maximum value of the transverse momentum in the data sets included in the fit. We have tested the same boundary using the SIDIS data and the result of the global fit (presented in the next section) evaluating the χ 2 (without minimization) for different selections of SIDIS data. We have considered two possible cuts on data selection Q > 1 and Q > 2 , eq. (3.1), and the result is shown in figure 11.
The values of χ 2 /N pt grow when δ > 0.25. The same effect has been observed in ref. [19] for DY. Therefore, we conclude that our earlier estimation of the validity interval of TMD factorization as δ 0.2 − 0.25 holds also in the SIDIS case. It is interesting to observe that the channel with the fastest growth of χ 2 /N pt is d → K − (and the next is p → π + ), which could indicate a possible tension in the description of this reaction.
The inclusion of data at Q < 2GeV almost doubles the values of χ 2 /N pt (e.g. χ 2 /N pt = 1.19 for δ = 0.25). Taking into account the large uncertainties of the COM-PASS measurement, it shows that the factorization is broken down at such low values of Q. This is an expected result, since in this region the power corrections dominate the cross-section. In section 7.1, we show data and our predictions including the low-Q bins and up to δ = 0.4.

Global fit of DY and SIDIS data
The fit of SIDIS data shows perfectly the universality of the TMD distributions and a good agreement between theory and experiment. Performing a global fit of DY and SIDIS data we essentially reduce the uncertainty for D. The resulting sets of TMDPDF, TMDFF and JHEP06(2020)137 RAD extracted from the global fit represent the SV19 TMD distributions (at NNLO and N 3 LO). As an input for this set we have used NNPDF31 and DSS collinear distributions, because these sets are in good agreement with the global set of collinear observables, and show the best values of χ 2 (see discussions in the previous sections). Table 8 shows the distribution of the χ 2 -values per individual experiments. In total we have considered 1039 points, 457 for DY and 582 for SIDIS. They form three large subsets: DY at high energy, DY at low energy, and SIDIS (at low energy). The worst χ 2 values are concentrated in the high energy DY subset, because of the very high precision of Z-boson production data measured at LHC. Simultaneously, these data robustly restrict the values of TMD distributions (TMDPDF and RAD) at b 1GeV −1 . The lowest χ 2 is for SIDIS data and especially for COMPASS data (χ 2 /N pt = 0.65, with N pt = 390 that is more than the third part of the total data), due to large uncorrelated systematic uncertainty (see discussion in section 6.1).

Agreement between theory and data
Altogether we obtain the global value of χ 2 /N pt = 0.95 and 1.06 for NNLO and N 3 LO respectively. These values can be compared to 1.55 (for N pt,total = 8059) and 1.02 (for N pt;SIDIS = 477 that is close to our data selection) obtained in the global fit of DY and SIDIS in ref. [18]. The increase of χ 2 /N pt between NNLO and N 3 LO cases does not indicate a reduction of the fit quality. This change in χ 2 happens mainly because of COMPASS data, for which the χ 2 -value increase 0.65 → 0.85. On the contrary, the χ 2 -value for ATLAS data reduces 2.12 → 1.82 (mostly due to the improvement in the total normalization). Therefore, we conclude that both NNLO and N 3 LO fits are in agreement, although N 3 LO shows a better agreement with high-energy data.
In table 8 we also present the values of the difference in the normalization between theory and data due to the correlated shift (see definition in (4.18)). The measurements in the table 8 without this value (e.g. CMS) are normalized to the total cross-section. Note, that the shift value is common to the full data subset (e.g. for all 195 point of COMPASS d → h + ).
Finally, we have some more considerations on each data set: • The high energy DY data have a common deficit of 2-5% in the normalization, which has been already observed in [20]. It can be caused by different sources, being the main ones the collinear PDF (e.g. in the case of HERA20 PDF the deficit is much smaller, 0-3%). Another source is the presence of corrections due to fiducial cuts that are linear in q T , as discussed in section 2.2.3. This deficit is responsible for a larger value of χ 2 for this sub-set. The nuisance parameter decomposition for high energy DY is 1.51 = 1.28 + 0.23, where the last number is the penalty contribution to χ 2 due correlated uncertainties.
• The low energy DY data are significantly underestimated by the TMD factorization formula. However, this underestimation is within the expected correlated systematic uncertainties of the data. This is a known issue of fixed target experiments. The underestimation has been also observed for the pion-induced DY [21]  JHEP06(2020)137 in ref. [85]. Note, that in ref. [85] the high-q T part of the measurements has been considered (in collinear factorization), and the observed discrepancy is an order of magnitude larger. Also, the present fit has somewhat lesser deficit in the normalization (by 5 − 6%) in comparison to previous one [20]. We connect it to the corrected shape of ζ-line at large-b.
• SIDIS data do not show any problem with the total normalization. This statement is in some contradiction to the literature. In [18] the authors report a significant contribution of normalization to χ 2 from the HERMES data (the COMPASS data was normalized exactly). In ref. [105] an enormous discrepancy between theory and data in the collinear factorization limit has been observed too.
In figures 12-21 we present all the data points used in the fit together with the theory prediction lines. In these figures we also show the data points that were not included in the fit due to the cutting conditions eq. (3.1), (3.2), (3.3) in order to demonstrate the behavior of TMD factorization beyond its limits.

Values of NP parameters
The extracted values of NP parameters are in the table 9. The central values of parameters do not shift much with respect to individual fits of DY and SIDIS data. The main effect of the global fit is the reduction of uncertainties for RAD and TMDPDF by ∼ 40 − 50%. In figures 6, 7 and 10 we show the values of NP parameters obtained in all fits of this work. Generally, the parameters obtained in different fits are in agreement, except λ 3,4,5 that mainly serve for the fine-tune of TMDPDF to LHC data.
All NP parameters are correlated. The correlation matrices for NP parameters are shown in figure 22. The explicit numeric expression for correlation matrices is given in appendix D. Ideally, one would expect the complete independence of NP parameters contributing to RAD, TMDPDF and TMDFF. In this case the correlation matrices would have a block-diagonal form. In reality, we observe correlations among the blocks related to independent functions. In the case of NNLO these correlations are not large, and the block-diagonal structure is evident. The biggest (anti-)correlation is between c 0 and λ 1 , with the correlation matrix element −0.67, with the rest being much smaller. The source of this correlation is evident -it is due to the precise Z-boson production measurements by JHEP06(2020)137    LHC. In the N 3 LO case the correlation are much stronger. The biggest (anti-)correlation is between c 0 and λ 1 , with correlation matrix element −0.82, with some other elements reaching ±0.5 and it indicates a possible tension in our description of the data at N 3 LO.

Comments on the extracted TMD distributions
The non-perturbative distributions extracted in this work show several features that are interesting for theory investigations. For instance, the RAD that measures the properties of the soft gluon exchanges and that is inclusively sensitive to the QCD vacuum structure. The factorization theorem ensures that the values of B NP and c 0 are totally uncorrelated from the rest of TMD parameters, because they are of complete different origin. As we have an extraction of these parameters from data we can expect that a certain correlation is reintroduced in the fitting process. In figure 22 (see also appendix D) we check this statement in the present global fit and we find that it is qualitatively verified in our DY+SIDIS fit.
JHEP06(2020)137 In the figure the only non-perturbative parameters which show a higher (anti)correlation with the RAD are c 0 and λ 1 in the TMDPDF. Apart from this, the independence of the RAD parameters from the rest of TMD is certainly a success of the ζ-prescription, which allows a clear separation of all these effects. In the rest of this section we report some specific comment for each of the functions that we have extracted.

Non-perturbative RAD
In figure 23 (left) we plot the RAD as a function of b with its uncertainty band. We present only the RAD extracted with NNPDF31 fits, but the picture does not change significantly for all other PDF sets. In this figure we can test the universality of the RAD looking at its extraction in DY and DY+SIDIS. At small b the perturbative structure of the RAD dominates and we find practically no difference in its behavior as coming from different fits. The difference between these two cases happens at large b and it is at most of 10%. The 1σ-uncertainty bands of DY and global fit do not strictly overlap, which possibility indicates their underestimation. In the same figure 23 (left) we also compare our RAD with the one obtained in [18] and [19]. In refs. [18,19] a different shape of NP ansatz for RAD has been used, with a quadratic behavior at large-b. Such an ansatz has been used often, and (as we have also checked) it is able to describe the data. Nonetheless we disregard it because the global χ 2 /N pt is worse (1.11 and 1.34 at NNLO and N 3 LO, correspondingly), with much larger correlation among parameters. Additionally, the linear asymptotic behavior used in our ansatz is supported by non-perturbative models, e.g. [64]. Possibly, the uncertainty band is biased by this model, and the realistic band is larger by a factor two at most.
In figure 23 (right) we show the scattering of replicas in (B NP , c 0 )-plane collected from all fits. It is clear that the parameters B NP and c 0 are strongly anti-correlated (see also figure 22) and this is a consequence of the non-perturbative model, since the variation of c 0 can be compensated by a variation of B NP up to b 4 -corrections. The replicas of JHEP06(2020)137 the global fit (orange points) are scattered in a much smaller area and this provides a ∼ 40% smaller error-bands on parameters. Generally, the inclusion of the SIDIS data drastically constraints the values of B NP , and for that reason they are very important for the determination of RAD. We conclude that the RAD extracted in the global fit is more reliable, in comparison to the one done using DY data only.
The RAD that we have extracted is valid for all distributions and it has been used also to describe the pion-induced DY [21]. For further reduction of the uncertainty of the RAD one should consider more precise low-and intermediate-energy processes, such as up-coming JLab12 measurements, and the future EIC.

TMD distributions
The quark TMDPDF and TMDFF are extracted simultaneously including high QCD perturbative orders for the first time to our knowledge. The non-perturbative parameters obtained using the PDF set NNPDF31 and the fragmentation set DSS are reported in table 9. Within one set of PDF the error induced from the PDF replicas dominates the experimental error of TMD. Thus, the error that we have reported on TMD parameters is certainly underestimated. To determine a realistic uncertainty band, one must invent a flexible ansatz for NP-part of TMD distributions that does not contradict the known theory. It appears to be a non-trivial task, which we leave for a future study.
The TMD distributions show a non-trivial intrinsic structure. An example of distributions in (x, b)-plane is presented in figure 24. Depending on x the b−behavior apparently changes. We observe (the same observation has been made in ref. [18]) that the unpolarized TMDFF gains a large b 2 -term in the NP part. It could indicate a non-trivial hadronisation physics, or a tension between colinear and TMD distributions. The study of its origin should be addressed by future studies.

Conclusion
Standing the TMD factorization of DY and SIDIS cross-section, one identifies at least three non-perturbative QCD distributions in each cross-section -two TMD parton distributions JHEP06(2020)137 and a non-perturbative rapidity anomalous dimension (RAD). These functions should be extracted from the experimental data. Given such a large number of phenomenological functions, their universality plays a crucial role. In this work, we have shown that the TMD distributions and RAD are indeed universal functions.
In order to confirm the universality statement, we have firstly extracted the RAD (D) and the unpolarized TMDPDF (f 1 ) from the DY data, and secondly we have used them to describe the SIDIS data (extracting in addition the unpolarized TMDFF, D 1 ). To our best knowledge, this is the first clear-cut demonstration of the universality of the TMD non-perturbative components. This demonstration is the main result of this work. The subsidiary results are the values of extracted unpolarized TMD distributions and RAD, that could be used to predict and describe the low-q T spectrum of current (LHC, COMPASS, RHIC) and future (EIC, LHeC) experiments.
The sets of data included in this analysis contain in total 1039 points (almost equally distributed between SIDIS, 582 points, and DY, 457 points). We have the data from fixed target DY measurements, Tevatron, RHIC, LHC, COMPASS, and HERMES. Unfortunately, only low-energy measurements are available for SIDIS data. At the moment, we have not included any data from HERA multiplicities because they do not accomplish the kinematical requirements for the TMD factorization. Contrary to some observations in the literature [14,18], we have not found any problem with the normalization of HERMES and COMPASS data, although the systematic experimental errors quit precision to the final result.
The data analysis is made with the current theory state-of-art, including all known perturbative QCD orders, i.e. N 3 LO for the hard part and the evolution, and NNLO for the collinear matching. The NNLO and N 3 LO predictions are very close to each other, which is a good signal indicating that the perturbative part of the cross-section is saturated. We have also collected all recent modifications and updates of the TMD factorization approach, such as target-mass corrections, frame-corrections, and exact evolution solution at large-b. Individually these aspects are subtle, however, cumulatively, they are sizable. In section 2 we have presented a comprehensive collection of theory expressions used in this work. Let us also mention that the N 3 LO evolution, as well as a non-trivial QCD matching for TMDFF (NNLO vs. LO) is used here for the first time. An open issue is represented by power corrections, which, given the current status of the art can be included only partially, as discussed in section 2. More work concerning this problem should be addressed in the future.
The definition of matching scales and the evolution/modeling separation is done according to ζ-prescription. The ζ-prescription is equivalent to the popular CSS-scheme since it satisfies the same set of differential equations. Nonetheless, this equivalence is strict only within an all-order perturbation theory and it is numerically violated for any truncated series. The origin for this discrepancy is well-understood [11] -it comes from spurious contributions in the CSS formalism that vanish in the exact perturbation theory. At LO and NLO, the numerical value of spurious contributions is large, but it is tiny at N 3 LO [11]. Therefore, the ζ-prescription provides a faster convergence and an improved stability of the perturbative series, as shown in figure 3. Additionally, but not less impor-JHEP06(2020)137 tantly, the ζ-prescription grants a strict separation of perturbative and non-perturbative pieces and thus it allows a stronger universality of the phenomenological functions, figure 22 and appendix D. In particular, the RAD extracted here can be used in the analysis of jet-production [99][100][101]. The success of the present global fit confirms the reliability of the ζ-prescription.
Many points of the TMD phenomenology are discussed quantitatively for the first time (to our best knowledge). We critically consider each detail of the factorization that have a disputable nature, f.i. power corrections to collinear variables. We demonstrated that the inclusion of these details improves the agreement between theory and the data. A particularly important check made here for the first time is the test of the limit of the TMD factorization approximation for SIDIS. In the DY case, the phenomenological limit of TMD factorization is q T 0.25Q, as it has been shown in ref. [19]. We have found that SIDIS also obeys this rule. This piece of information is important because it opens the door to reliable predictions of SIDIS experiments.
The estimation of the uncertainty for extracted distributions is made by the replica method that gives a reliable error-propagation of experimental errors. On top of it one should include the uncertainty of other theoretical ingredients, and in particular the collinear PDF error. We have checked that the prediction of the TMD factorization is crucially sensitive to the values of collinear PDFs. It indicates that our extraction has a considerable additional uncertainty due to the uncertainty of the collinear input. However, we were not able to accurately quantify the size of this uncertainty band, due to the high computational costs of such analysis. We leave this study for the future.

JHEP06(2020)137
In particular, in the DY kinematics the logarithms turns to ln(−1) = ±iπ, whereas in the case of SIDIS logarithm vanish. The NNLO expression for DY hard coefficient is where C F (= 4/3) and C A (= 3) are quadratic Casimir eigenvalues of fundamental and adjoint representation of SU (3). N f is the number of active quark flavors. The NNLO expression for SIDIS hard coefficient is One can see that the difference between SIDIS and DY hard coefficients is These corrections are known as (π 2 a s ) n -corrections. They could be resummed to all orders [108]. However, in the case of vector-boson production the correction coming from (π 2 a s ) n is not significant (of order of next-to-given order correction [108]).

B Perturbative expression for D
The rapidity anomalous dimension (RAD) D(µ, b) is generally non-perturbative function, which can be computed perturbatively only at small-b, see e.g. [24,28] for NNLO and N 3 LO computations. RAD satisfies the integrability condition (2.68) which can be seen as renormalization group equation, Consequently, the a n s -order of RAD contains logarithms up to order L n µ . We define D pert (µ, b) = ∞ n=1 a n s (µ) where d (n,k) are numbers, and The values for d (n,k) for k > 0 can be computed from (B.1) in the terms of Γ i , β i and d (n,0) . Here, and in the following we define beta-function, and coefficients of Γ cusp as The leading terms are β 0 = 11 3 C A − 2 3 N f , Γ 0 = 4C F , and d (1,0) = 0. The values of d (2,0) and d (3,0) were computed in [4,24] and [27,28] respectively. The β-function coefficients are known up to β 4 [31] and the Γ i are known up to Γ 3 -order for the quark case, see [32][33][34] and references within.
The series (B.1) has a small convergence radius since the expansion variable (a s L µ ) gets fastly bigger then 1 with the increase of b. To improve the convergence properties of RAD we use the resummed expression [11,39,58]. In this case we write D resum (µ, b) = ∞ n=0 a n s (µ)d n (X), X ≡ β 0 a s (µ)L µ . (B.5)

JHEP06(2020)137
At X → 1 this expression has a pole, that is equivalent to Landau pole. This show the convergence radius of this expansion b 2e −γ E Λ −1 QCD ≈ 4GeV −1 .

C Expression for ζ µ
The concept of the special null-evolution line plays a central role in ζ-prescription. The ζ-prescription, the double evolution and properties of TMD evolution have been elaborated in ref. [11]. In this appendix, we present the expressions for the special null-evolution line ζ pert and ζ NP that were used in the fit. The definition of the special null-evolution line is discussed in the section 2.4. Parameterizing an equipotential line as (µ, ζ µ (b)), one finds the following equation The special null-evolution line is the line that passes thorough the saddle point (µ 0 , ζ 0 ) of the evolution field. The saddle point is defined as Note, that this boundary condition guarantees the finiteness of ζ µ (b) for all non-singular values of µ.
Originally the ζ-prescription has been implemented in the perturbative regime only [19]. This part is the most important since it gives the cancellation of doublelogarithms in the matching coefficient. However, at large-b, non-perturbative corrections to RAD become large and can not be ignored (although they can be seen as a part of NP model, but it introduces an undesired correlation between f N P and D). Therefore, one have to solve equation (C.1) with a generic non-perturbative RAD. Such solution can be found but its numerical implementation is problematic at very small-b. The problem is that it is very difficult to obtain the cancellation of perturbative logarithms for the exact solution because at b → 0, because the numerical values of logarithms are huge. Therefore, a good practice is to use the perturbative solution at very small-b, (and hence cancel all logarithm exactly) and turn to the exact solution at larger b. This is implemented in the ansatz eq. (2.91).
In the following sections we provide expressions for ζ exact µ and ζ pert µ that were used in the fit procedure.

JHEP06(2020)137
with L µ defined in eq. (B.3). The general expression for v n can be found in [11] (see eq. (5.14)). We apply the boundary condition eq. (C.2), which in the perturbative regime turns into requirement of finiteness of v n at L µ → 0. The values of v n up to NNLO are The definition of perturbative coefficients is given in eq. (B.2), (B.4) and γ V (µ) = ∞ n=1 a n s (µ)γ n . (C.8) Similarly, to RAD the ζ pert µ (b) can be resummed in terms of a s L µ (see appendix A in [11]). However, this is not necessary when using our ansatz eq. (2.91), because the perturbative expression turns into exact much before the problems with convergence occur.

C.2 Exact expression
The evolution field, the equipotential line ζ µ and the position of the saddle point (µ 0 , ζ 0 ), depend on values of b, which is treated as a free parameter. This fact calls for some attention when implementating of the ζ-prescription exactly. The lesser problem is that additional numerical computations are required to determine the position of saddle-point and the values of the line for different non-perturbative models of D. The greater problem is that at larger b the value of µ 0 decreases and at some large value of b (typically b ∼ 3GeV −1 ) µ 0 is smaller than Λ QCD . Due to this behavior, it is impossible to determine the special null-evolution line at large-b numerically. However, the special null-evolution line is still uniquely defined by the continuation from smaller values of b.
In ref. [21] a simple solution of this problem has been found. The central idea is to use the non-perturbative RAD as generalized coordinates (a s , D) instead of (µ, b). We introduce the function g as Here we present the explicit expression for correlation matrices shown in figure 22. The NNLO fit yields the following matrix The enumeration of rows and columns in matrices corresponds to the NP parameters ordered as {B NP , c 0 , λ 1 , λ 2 , λ 3 , λ 4 , λ 5 , η 1 , η 2 , η 3 , η 4 }.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.