Extraction of unpolarized quark transverse momentum dependent parton distributions from Drell-Yan/Z-boson production

We present the extraction of unpolarized quark transverse momentum dependent parton distribution functions (TMDPDFs) and the non-perturbative part of TMD evolution kernel from the global analysis of Drell-Yan and $Z$-boson production data. The analysis is performed at the next-to-next-to-leading order (NNLO) in perturbative QCD, using the $\zeta$-prescription. The estimation of the error-propagation from the experimental uncertainties to non-perturbative function is made by the replica method. The importance of the inclusion of the precise LHC data and its influence on the determination of non-perturbative functions is discussed.


Introduction
The description of the hadron structure is one of the major challenges for the comprehension of strong interactions. Transverse momentum dependent parton distribution functions (TMDPDFs) depict parton momenta in 3-dimensions and provide more detailed information on hadrons than the one-dimensional collinear parton distribution functions (PDFs). In this work, we present the extraction of unpolarized TMDPDF and non-perturbative part of TMD evolution from the fit of Drell-Yan and Z-boson production measurements.
At hadron colliders, in the regime of the small transverse momentum of the produced vector/scalarboson, the cross-section is factorizable in terms of universal TMDPDFs [1][2][3]. The phenomenological analysis of Drell-Yan and Z-boson production processes (we refer to them as Drell-Yan (DY) processes, for simplicity) within the TMD factorization has a long history, see e.g. ref. [4][5][6][7][8][9][10][11][12][13]. However, many of these works have been produced before a rigorous formulation of the TMD factorization and TMD evolution and for that reason are outdated. These articles differ, among the others, in the phenomenological construction of the factorized cross-section (which is relevant for the theoretical precision that can be achieved), the composition of perturbative and non-perturbative contributions and the inspected data sets. Also, the majority of the fits included in this list operates only at perturbative leading order (LO) and do not include the highly precise measurements made at LHC. In the present work, we aim to cover this gap and to obtain precise values of the TMDPDFs and of the non-perturbative part of the TMD evolution consistently with modern theory and data.
Over the past few years the theory of TMD factorization has developed consistently. In particular, nowadays its perturbative structure is completely understood, which is confirmed by multiple next-to-next-to-leading order (NNLO) perturbative calculations [14][15][16][17][18][19][20][21][22]. Also there was a critical progress in the understanding of the structure of TMD evolution [1,3,[23][24][25], and the relation between different components of TMD scaling [26,27]. For a recent review of the state-of-the-art, we refer to [28,29]. The present extraction is founded on these theory achievements and uses the highest perturbative input available nowadays, that is, the complete NNLO (two-loop coefficient functions together with three-loop evolution).
The extraction of the TMDPDF requires an articulated consideration of the scale settings, which is performed here using the ζ-prescription. Since the approach is novel in the TMD factorization studies, we explain its origin and importance in a few words, and we refer to the original paper [27] for the details. The ζ-prescription consists of a particular choice of renormalization and rapidity evolution scales for TMD distributions. The double scale dependence is characteristic of the TMD distinctions, and it can be traced in perturbative calculation due to the different origin of divergences. The presence of two scales results in a non-elementary problem of the scale-fixation choice for TMD distributions. Within ζ-prescription the TMD evolution is made effectively onedimensional, which allows selecting the best values for the scale parameters (this choice is known as an optimal TMD distribution) that guarantee the perturbative stability. As a major outcome, the ζ-prescription consistently separates the non-perturbative part of the evolution kernel from the non-perturbative parton distribution. For this reason, the values of the non-perturbative evolution extracted in this work are universal and can be used directly in other applications, e.g., the analysis of polarized TMD distributions [30,31] or jet productions [32,33].
Beyond the modern state-of-the-art implementation of TMD factorization, here we reconsider the extraction of TMDPDF including a larger set of experimental data and we provide a solid statistical analysis of error-propagation. Comparing this fit with the most recent and complete extractions made in refs. [12,13], the number of analyzed data points is significantly bigger (457 points against 293 in [12] and 309 in [13], which is the biggest amount of DY data ever considered, to our knowledge). This number of data has been achieved by including the results from PHENIX [34], E772 [35] experiments, differential rapidity bins from ATLAS [36] and the measurement of the Drell-Yan cross-section in the muon channel at D0 [37]. These data points are included in the analysis of TMD cross-section for the first time 1 . For the determination of the extraction uncertainties we apply the replica method [41][42][43][44], routinely used for the extraction of collinear PDFs. We have found that the inclusion of the LHC data essentially reduces the uncertainty band for non-perturbative functions. Nonetheless, the available data leave uncovered a large portion of the energy/momentum phase space that should be filled by experiments in the future.
As a result, we obtain a consistent and complete picture of the unpolarized TMDPDFs and their evolution kernel supporting it with a well established statistical treatment. We think that such screening is fundamental to provide clear indications to experimentalists and theorists about the validity of the TMD factorization theorem, and it represents a notable improvement in the understanding of transverse momentum structure of a hadron. The results of this work are available as a part of artemide-package for TMD phenomenology [45]. The library contains the routines for the evaluation of TMDPDFs and their evolution (mean values and distribution of replicas) and the routines for the evaluation of the related cross-section.
The paper is organized as follows. In sec. 2 we review the TMD factorization and the necessary 1 Let us mention, that the LHC data has also been analyzed in the resummation approach [38][39][40] with the same level of perturbative input. However, the resummation approach should not be confused with the TMD factorization, although they have several common points. The resummation approach is founded on collinear factorization, and it has theoretically no access to a non-perturbatively generated transverse momentum. For that reason, the resummation approach is only applicable at high-energy and at larger values of q T . elements of the theory, such as TMD evolution and ζ-prescription in sec. 2.1, fundamental requirements on the model building and collinear matching in sec. 2.2. We formulate the non-perturbative models for rapidity anomalous dimension in sec. 2.3, and for TMDPDF in sec. 2.4. The selection of the data set is discussed in sec. 3, while the details of the statistical analysis can be found in sec. 4 and in the appendices. Finally, we present the results in sec. 5. In particular, the quality of the fit is discussed in sec. 5.1 and the extracted non-perturbative functions are discussed in sec. 5.2.

Drell-Yan cross section in TMD factorization
The leading term of the TMD-factorized cross section for the DY process (h 1 + h 2 → Z/γ * (→ ll ) + X) has the following structure [1,2,46] dσ where Q 2 = (l + l ) 2 , q T and y are transverse component and rapidity of the lepton pair momentum with respect to collision axis, and the variables x 1,2 are defined as The function F f →h is the unpolarized TMDPDF 2 of the parton flavor f in hadron h in impact parameter space b. The function H is the hard-scattering coefficient function and σ 0 is a kinematic factor. For a more detailed definition, we refer the reader to ref. [9,13,47]. The factorization formula in eq. (2.1) is accurate to leading power in q 2 T /Q 2 , while power-suppressed corrections are presently unknown (see ref. [48,49] for recent developments).
The scales µ and ζ 1,2 are the renormalization and rapidity scales, respectively [1,3,23,24]. In order to minimize the logarithms in hard coefficient function H, we set the renormalization scale µ equal to the hard scale Q. Moreover, the rapidity scales must obey the relation ζ 1 ζ 2 = Q 4 : we make the symmetric choice ζ 1 = ζ 2 = Q 2 .
In the following of this section, we briefly review the relevant ingredients of eq. (2.1), discussing the TMD evolution and the separation between perturbative and non-perturbative components. Then we describe the models used to parametrize the non-perturbative input. Finally, we give the final expression for the cross section and discuss the perturbative input used for the fits.

TMD evolution
In order to consistently combine the perturbative and non-perturbative parts of the TMD factorization formula (2.1), and to separate the matching and evolution effects within TMDPDFs, we use the ζ-prescription. It is based on the notion of double-scale evolution, and consists in a special definition evolution scale. We refer to ref. [27] for a detailed description of the double-scale evolution and its properties. In this section, we present minimal introduction to ζ-prescription and formulas that are used in the fit.
The TMD evolution in the (µ, ζ)-plane is governed by the pair of differential equations whose kernels define a bi-dimensional scalar potential. The logarithm of the TMD evolution factor R is given by the difference between potentials at different points of (µ, ζ)-plane, and for that reason, TMD distribution evaluated on two points with the same value of potentials are equal. Within the ζ-prescription, a TMD distribution is defined by an equipotential line, instead of the scales (µ, ζ), and it evolution is given by a transition between equipotential lines.
The line that goes through the saddle point of the potential is special, since it is a uniquely and non-perturbatively defined, and spans the whole range in µ and ζ. This line provides a natural starting point for the definition of the non-perturbative component of TMD distributions. Given ζ = ζ µ (b) belonging to the special line 3 , we define the optimal TMD distribution as where in the r.h.s. we have emphasized its "naive scale-independence". The evolution of the optimal TMD distribution to a generic set of scales (µ, ζ) is then simply given by where R f is the TMD evolution factor whose expression is Note that the r.h.s. of eq. (2.4) is effectively independent on µ 0 . The anomalous dimension γ F and rapidity anomalous dimension D are universal for all TMD distributions and their perturbative expressions are currently known up to three-loop [20,21,50,51]. Importantly, the rapidity anomalous dimension has a non-perturbative component that is usually extracted from data along with the non-perturbative component of TMD distributions. The integration path P in eq. (2.5), that connects the points (µ 1 , ζ 1 ) and (µ 2 , ζ 2 ) in the evolution plane, is in principle arbitrary. In practice, the evolution factor R f is independent on the path P only if all terms in the perturbation expansion of the anomalous dimensions are included. This property is violated by the truncation of perturbative expansion. However, one can define a scheme for the evolution that preserves the conservativeness of the potential. Clearly, the difference between schemes tends to vanish as more and more terms are included in the perturbative expansions. In this work, we use the so-called improved-γ scheme defined in ref. [27]. For the numerical implementation of the evolution factor we use the simplest possible path, i.e. a straight line that connects ζ to ζ µ (b) at fixed µ. By doing this, the evolution factor takes the form . (2.6) Remarkably, this expression does not involve any integration. This entails a great simplification of the numerical implementation of the TMD evolution.

General requirements for the TMD distributions
The non-perturbative parts of the TMDPDF F and the rapidity anomalous dimension are to be extracted from data. However, a number of theoretically justified constraints can be enforced.
• For b → 0, the non-perturbative component of both TMD distributions and rapidity anomalous dimension is expected to be suppressed. In particular, in this regime TMDPDF can be computed as where f f ←h is the collinear PDF for the parton flavor f . The coefficient functions C are currently known up to two-loop order [18,19].
• The leading power correction to the small-b is of order b 2 . This follows from the operator product expansion and it has been confirmed by the explicit evaluation of the renormalon contributions [25]. In general, power corrections to the small-b must scale as b 2n , i.e. only even powers of b are allowed in the Taylor expansion around b = 0.
• The asymptotic for b → ∞ is mostly unknown. A reasonable restriction is that both TMDs and evolution factor should tend to zero in this limit. However, the decay law is unknown. Typical choices are a gaussian or an exponential falloff.
These restrictions significantly constrain the behavior of the non-perturbative components, particularly at small b. At large b, instead, theoretical constraints are milder. Based of these considerations, in the following we propose models for the rapidity anomalous dimension and the intrinsic part of TMDPDFs.

Model for rapidity anomalous dimension
The non-perturbative rapidity anomalous dimension D f is modeled by the following function The resummed anomalous dimension D f res can be expanded as The leading term reads where β 0 is the leading-order (LO) coefficient of the expansion of the QCD β-function and Γ f 0 is LO cusp anomalous dimension (β 0 = (11C A − 2N f )/3 and Γ 0 = 4C F , respectively). For our studies we have used eq. (2.10) at NNLO (i.e. up to d f 2 ). The NNLO expression incorporates the three-loop anomalous dimension and can be found in ref. [27,52].
Due the definition of d f 0 in eq. (2.11), the resummed rapidity anomalous dimension is singular In order to avoid the singularity, we replace b with b * defined in eq. (2.9) in the resummed part of the anomalous dimension. Since b * never exceeds B NP , the value of D res approach D res (µ, B NP ) at large b. The function g(b) in eq. (2.8) represents the non-perturbative contribution to the anomalous dimension. Based on general considerations, the Taylor expansion around b = 0 of this function contains only even powers of b, starting from b 2 . Therefore, generally, the model (2.8) satisfies all requirements listed in sec.2.2.
In our research we have tested different models for g(b). We have found that, the current data do not allow for an accurate extraction of the function g at large-b. Practically, only the leading term ∼ b 2 could be rigorously fixed, and it should be small enough, so it does not affect the smallb part (that is fixed by perturbation theory). Finally, we have adopted a simple one-parameter exponential model (2.12) At small b this model behaves as g(b) ∼ λ 0 b 2 , whereas, at large b, instead, it behaves as g(b) ∼ λ 0 bB NP . The other candidate for the final model of non-perturbative evolution was a more traditional Gaussian model, g(b) ∼ c 0 b 2 (see ref. [53] for a recent review). However, since exponential and Gaussian models provide a similar description of the experimental data, we find preferable to use the exponential model in eq. (2.12). The reason is that it appears to extend the validity of the perturbative series to higher values of b.

Model for TMDPDF
In our fits, the model that parametrizes the intrinsic non-perturbative component of the TMDPDFs is implemented by means of the following general form where f NP is a function to be fitted to data. Eq. (2.13) is not the most general ansatz that satisfies the requirements discussed in the previous section.
In particular, f NP may depend on the flavor and also on the convolution variable y, but we have found that this ansatz is sufficient to describe the data at the current level of precision. The factorization scale µ in the r.h.s. of eq. (2.13) is chosen to be This choice allows the impact parameter |b| not to reach the Landau pole. In any case it was found that the dependence on the exact value of the scale is not very large [27]. Concerning the input collinear PDFs f f ←h , we have tried different publicly available sets and found that there is a marked dependence on the particular choice. It implies that the TMD physics is sensitive to the x-dependence at small-b, which is totally dictated by choice of PDF set by contraction of our model (2.13). We leave a detailed study of this dependence for a future publication. For the current fit, we have used the central replica of the NNPDF3.1 NNLO set [43] through the LHAPDF library [54]. This set provides the best description of the data. The LHAPDF library also provides the strong running coupling α s consistently with the PDF set 4 . The flavor number N f is so consistently and automatically fixed at the correct scale through α s and ultimately the PDF sets. The shape of the function f NP significantly influences the value of the cross section. Therefore, in order to avoid possible parametric biases, it should be chosen to be as flexible as possible taking into account the following theoretical constraints. First, f NP has to be such that lim b→0 f (x, b) = 1. Second, it should be an even function of b, i.e. the Taylor expansion around b = 0 should only contain even powers of b. We have found that a suitable parametrization of f NP has the form The function r 1 gives dominant behavior at small-b, whereas the function r 2 controls the large-b region. The Padé-like form of the exponent guaranties that the higher powers of b do not give a large contribution. Therefore the functions r 1 and r 2 can be expanded around b = 0 and truncated after the first few terms. We have performed numerous tests and found that the current data do not resolve the higher modes of the x-dependence, and thus the functions r 1 and r 2 can be simple polynomials in x. Specifically, we use the following model where λ 1,..,5 > 0. This parametrization, with five free parameters, is able to accommodate a range of different behaviors, such as the exponential and the Gaussian one, with some degree of redundancy. Specifically, we have found that the number of free parameters can be reduced to three or four without a significant deterioration in the description of the data.

Summary on theory input
The final formula to compare to the DY experimental data is The explicit form of the TMDPDFs F is given in eq. (2.13) with the non-perturbative input given in eq. (2.16). The expression for the TMD evolution factor is given in eq. (2.6). The model used for D anomalous dimension is given in eq. (2.8) with the non-perturbative input given in eq. (2.12). In conclusion, in our fit there is a total of seven free parameters (two for the evolution and five for the TMDPDFs). The summary of the perturbative input used for the computation of the observables is presented in Table 1. Table 1. Summary of perturbative orders used in the fit for each part of the cross section.

Data selection
The TMD factorization of the cross section is valid only in small transverse-momentum (q T ) regime. Therefore, we need to impose a cut on the experimental data set that limits the kinematics of the data points to this region. In our fit we have selected the data according to the following rule: given a data point p ± σ, with p being the central value and σ its uncorrelated relative uncertainty, corresponding to some values of q T and Q (which are taken to be the center of the bin), we include it in the fit only if These conditions are chosen for the following reasons. In ref. [13] it has been demonstrated that, within the experimental accuracy of the data set included in the fit, TMD factorization is valid in the range δ < that scale as q 2 T /Q 2 = δ 2 , should be taken into account. Specifically, in the TMD framework, these corrections can be regarded as a theoretical uncertainty. Based on this consideration, if the (uncorrelated) experimental uncertainty of a given data point is smaller than the theoretical uncertainty associated to the expected size of power corrections, we drop this point from the fit. This is the origin of the second condition in eq. (3.1). This data selection is particularly conservative because it drops points that could potentially be described by TMD factorization (see e.g. ref. [12] where less conservative cuts are used). However, this choice guarantees that we operate well within the range of validity TMD factorization. Table 2 reports a summary of the full data set included in our fit. Remarkably, after imposing the cut in eq. (3.1), the number of data points included in our fit is 457. Despite the conservative cut, this is the largest set of DY data considered so far within a TMD fit. Our data set spans a wide range in energy, from Q = 4 GeV to Q = 150 GeV, and in x, from x ∼ 0.5 · 10 −4 to x ∼ 1. We recall that a single DY data point is simultaneously sensitive to a larger and a smaller value of x. This is because the cross section is given by a pair of TMDPDFs, eq. (2.1), computed in x 1 and x 2 such that x 1 x 2 Q 2 /s, see eq. (2.2).
In our fit we have compared absolute values of cross-section, whenever they are available. The only data set that require normalization factors are all CMS data, ATLAS at 7 TeV, and DO electron-pair measurements. For these sets we have normalized the integral of the theory prediction to corresponding integral over the data (see explicit expression in ref. [13]). To our best knowledge, it is the first fit of TMD factorization to absolute values of cross-section in the modern time, compare e.g to the latest and most advanced fits in [11][12][13].
The kinematic region in x and Q covered by the data set considered for our fit is shown in fig. 1. The boxes enclose the sub-regions covered by the single data sets. Looking at fig. 1, it is possible to distinguish two main clusters of data: the "low-energy experiments", i.e. E288, E605, Total 457 *Bins with 9 Q 11 are omitted due to the Υ resonance. Table 2. Summary table for the data included in the fit.. For each data set we report: the reference publication, the centre-of-mass energy, the coverage in Q and y or xF , possible cuts on the fiducial region, and the number of data points that survive the cut in eq. (3.1).
E772 5 and PHENIX, that place themselves at invariant-mass energies between 4 and 18 GeV, and the "high-energy experiments", i.e. all those from Tevatron and LHC, that are instead distributed around the Z-peak region. From this plot we observe that, while the high-energy experiments span a wide range in x, the coverage in x of the low-energy ones is more limited. This is a consequence of the fact all the low-energy experiments but PHENIX are fixed-target experiments. On the other hand, the number of data points belonging to the low-energy and high-energy experiments is of the same order ensuring a balanced distribution of data in Q.

Statistical analysis
In this section we discuss the treatment of the experimental information within our fit. The final purpose is to provide a suitable definition of the χ 2 that allows for a correct exploitation of experimental uncertainties. A proper treatment of uncorrelated and correlated uncertainties is fundamental to obtain a faithful extraction of the TMDPDFs.
Let us consider an ensemble of n measurements having the following structure where m i , with i = 1, . . . , n, is the central value of the i-th measurement, σ i,stat its (uncorrelated) statistical uncertainty, σ i,unc its uncorrelated systematic uncertainty 6 , and σ (l) i,corr , with l = 1, . . . , k, its correlated systematic uncertainties. Uncorrelated uncertainties give an estimate of the degree of knowledge of a particular data point irrespective of the other measurements of the data set. A typical example of uncorrelated uncertainty is the statistical one but also other systematic sources are possible. Correlated uncertainties, instead, provide an estimate of the correlation between the statistical fluctuations of two separate data points of the same data set. Typically, correlated uncertainties are of systematic origin, e.g. they are connected with the apparatus used to perform the measurements.
With this information at hand, one can construct the experimental covariance matrix V ij as follows (see for example ref. [41,67]): Given a set of predictions t i corresponding to the n measurements of the ensemble, the χ 2 takes the form where in the second equality we have used the matrix notation and defined the residuals y i = m i −t i . The χ 2 in eq. (4.3) takes into account the possible different nature of the experimental uncertainties leading to a faithful estimate of the agreement between data and theoretical predictions. An efficient way to compute the χ 2 in eq. (4.3) is discussed in Appendix A. As we will show below, the presence of sizable correlated uncertainties may give rise to significant shifts such that a visual comparison between central experimental values and theoretical predictions is misleading. Specifically, an apparent visual disagreement may still be compatible with an acceptable value of the χ 2 . However, it is possible to quantify the effect of the correlated uncertainties on the single data points by computing the so-called systematic shifts d i . In this approach the χ 2 -value (4.3) is presented by a sum of two terms [67] where χ 2 D is the uncorrelated contribution and χ 2 λ is a penalty term. Loosely speaking, χ 2 D (χ 2 λ ) demonstrates the agreement in the shape(normalization) between theory and measurement. Applying these shifts to the theoretical predictions 7 should produce a more trustful visual comparison. The explicit computation of the systematic shifts is presented in Appendix B.

Results
In this section we present the results of our analysis. We start commenting the quality of the fit and comparing the input data set to the theoretical predictions. Then we turn to consider the outcome for TMDPDFs and the numerical values of the parameters extracted from the fit. We detail our study on error propagation from experimental data that is handled by a Monte Carlo sampling, known also as the replica method. To this end, we have generated 100 pseudodata replicas according the rules described in ref. [41], and we performed the χ 2 -minimization for each pseudodata set. The central values are the mean of the obtained 100 fits.

Agreement between theory and experiment
In tab. 3 we report the values of the χ 2 (for central values), normalized to the number of data points N pt , for the individual experiments, for some relevant subsets of experiments, and for the global data set included in this analysis. Specifically, tab. 3 displays, along the number of data points N pt , the uncorrelated contribution to the χ 2 (χ 2 D ), the penalty term (χ 2 λ ), and the sum of the two, i.e. the total χ 2 referring to eq. (4.4) (see also eq. (B.7)). The last column, instead, reports the average (over the data set) systematic shift d i (as defined in eq. (B.5)), over the cross-section value in percentage.
The first observation is that the value of the global χ 2 is particularly good (χ 2 /N pt = 1.18). This means that the fit has achieved a satisfactory description of the entire data set. We also observe that the description of the low-energy subset is substantially better (χ 2 /N pt = 0.93) than the high-energy one (χ 2 /N pt = 1.52). This is not surprising because the high-energy experiments from Tevatron and LHC are much more accurate than the low-energy ones. In addition, amongst the high-energy experiments, LHCb has the largest χ 2 , while ATLAS, CMS, and the Tevatron experiments are fairly described. Dropping the best (PHENIX) and the worst (LHCb 8TeV) set (in total 10 points), we get χ 2 /N pt = 1.12.
In order to achieve a visual assessment of the agreement between data and theory, in fig. 2, 3, 4 we display the ratio between theoretical predictions (red dashed lines) and experimental data points along with their uncorrelated uncertainty (blue bands) for some representative data sets included in the fit. In particular, we show plots for the LHC and one of the E288 data sets. An example of cross-section values without systematic shifts is given in appendix B in fig. 9. The theoretical predictions have been corrected including the systematic shifts computed as described in Appendix B (see eq. (B.6)).
From fig. 2-3, we see that, despite the small experimental uncorrelated uncertainties at the percent level or below, our fit is able to describe the LHC data sets fairly well. However, the 8 TeV data set of LHCb presents a pronounced shape discrepancy that causes the large value of the χ 2 reported in tab. 3. A similar tension between data and theory seems to be present also in the most forward rapidity bin (2 < |y| < 2.4) of the ATLAS data set at 8 TeV. We ascribe the origin of the discrepancy to the insufficient shape of collinear PDFs at very large x (x 0.7). In this region, collinear PDFs are poorly known. The fact that TMDPDF is sensitive to the shape of collinear PDF could be used to constrain the behavior of PDF. Such a study is certainly interesting but  Table 3. Distribution of values of χ 2 over the data set. Decomposition of χ 2 to uncorrelated part χ 2 D and shift part χ 2 λ is made with nuisance parameter. The average shift is (resulted from the nuisance parameters) is shown relative to the value of cross section.
goes beyond the scope of this paper. Note, that the LHCb set could also be affected by the poor knowledge of PDFs at small-x, since for this set x reaches values down to ∼ 10 −4 .
In fig. 4, the data-theory comparison for one of the E288 data sets shows that the uncorrelated experimental uncertainties range between 5% and a few tens of percent. Such large uncertainties make the agreement with the theoretical predictions easier to achieve, giving rise to small χ 2 's. Similar comments apply to all low energy experiments. We note the systematic underestimation for the cross-section for experiments E288, E605 and E772, which is of the order of 25% on average. Nonetheless, such a large difference between data and the theory does not produce large χ 2 -values,  due to large systematic uncertainties for this data. The reported correlated systematic error for E288(E605, E772) experiments is 25%(15%, 10%) [35,55,56]. This systematic discrepancy has been recently discussed in [68], where it was connected to the fixed-target nature of these experiments.

Extracted values of TMDPDF and rapidity anomalous dimension
We now turn to the values of the TMDPDFs and rapidity anomalous dimension as extracted from the fit. Our results for the non-perturbative parameters are presented in tab. 4  and the uncertainty band correspond to the mean and standard deviation of parameter distributions obtained by χ 2 -minimization of 300 pseudodata replicas. One should take into account that the uncertainties presented here take into account the correlation among parameters. Analyzing the result of the fit one should keep in mind that high-energy and low-energy experiments unequally contribute to the χ 2 -value. Because the data from LHC have tiny errors (especially the data measured at ATLAS), they contribute decisively to the value of χ 2 . For this reason, the minimum of χ 2 is shifted towards the local minimum of the LHC data set, especially for smaller x values (say x 0.05). To determine the effect of the LHC data set we have additionally performed a fit without the LHC data (100 pseudodata replicas) and we show the results in the second part of tab. 4. One can see that the values obtained in both fits nicely agree with each other, apart from for B NP (and we discuss this fact later in the text). It is clear that inclusion of the LHC data affects very strongly the uncertainty in the parameter determination.
The plot of the extracted rapidity anomalous dimension (together with 1σ band) is shown in fig. 5 at µ = 4 GeV and µ = 91 GeV. One can see that the fitted value of B NP is pretty large. This reflects the fact that high-energy experiments (which dominate our χ 2 ) prefer the entirely perturbative rapidity anomalous dimension. This was already pointed out in previous works [11,13,52]. The value of the parameter c 0 extracted from the fit is compatible with the  renormalon approximation discussed in ref. [25]. In the absence of LHC measurements the fitted value of B NP = 1.2, which is very close to values obtained in previous LHC-less data fits (compare to b max ∼ 1.1 in ref. [6,12]). We have observed that the values of global χ 2 (for the full data set) are practically the same for the values of B NP in a wide region. Fixing B NP = {1., 2., 3., 4.} GeV −1 we have obtained the minimal values of χ 2 /N pt = {1.27, 1.18, 1.17, 1.18}. At larger B NP , the fit becomes unstable due to influence of the Landau pole (the actual position of the singularity in the resummed expression depends on the realization of the strong coupling values at very-low energies, and typically located at b = 5. − 8. GeV −1 .). We admit that the distribution of the χ 2 between experiments is different. In particular, for the very large B NP small-value of χ 2 is achieved by better agreement with LHCb experiment, whereas the agreement with the majority of the data is worsen. Considering this picture, we conclude that the obtained error-band on B NP , presented in table 4, does not reflect the realistic state. It is probably due to strong correlation between B NP and other parameters, and due to the theory-data tension for some particular data subsets. To support the extraction presented here, and to show that it is not strongly affected by this freedom, we have also performed the fit of the data at fixed B NP = 2.5 GeV −1 . The results are presented in table 4. Clearly, all parameter of f NP are in agreement within uncertainty band, while the value of c 0 (which is anti-correlated to B NP is tends to compensate its change. In fig. 6 we show the intrinsic non-perturbative part of TMDPDF, f N P , as a function of b at different values of x. We present f N P extracted respectively from the full (blue band) and from LHC-less (red band) data sets. Notably and for all the values of x, the inclusion of LHC data reduces the error-band. The reduction is not so significative at x ∼ 0.1, but it is an order of magnitude at x ∼ 10 −3 . One should also take into account that this picture is somewhat model-biased. The high-energy experiments (and thus LHC data) are sensitive to small-b values (say b 2 GeV −1 ) and they are practically insensitive to large-b values. On the contrary for the low-energy experiments one finds that the values of b ∼ 5-6 GeV −1 give a sizable contribution to the cross-section. Given the small number of parameters in our model, one cannot entirely decorrelate large and small b behavior, and thus the error-band at large-b is particularly underestimated.
An important feature of our extraction is the essential dependence of f N P on x. Indeed, in the overwhelming part of previous studies (see e.g. [6,11,13]) the x dependence of f N P was absent (an exception is the x-dependent f N P in ref. [12]). In our case, the x-dependence is strong and it has been uncovered due to presence of high-precision high-energy experiments. We have checked, that we are not able to fit LHC data with x-independent f N P , whereas the rest of data could equally-well be described by a simpler x-independent f N P . We have found that the present data set prefers a wide exponential-like f N P at larger x (x ∼ 0.1 − 0.5) and narrower Gaussian-like f N P at smaller x. In order to quantify this behavior we consider b-moments of f N P defined as The values of f N P (x) and b 2 N P (x) are shown in fig. 7. Unfortunately, these functions have no direct physical meaning, but they show clearly that at x 0.05 the non-perturbative behavior of the unpolarized TMDPDF changes to become wider and exponential-like. In k T -space it would correspond to a narrower k T distribution for larger x. Such behavior has been already observed in ref. [12]. Still observing fig. 7, it is clear that the data without LHC points have no restricting power for x 10 −2 .
Finally, in fig. 8 we present the three-dimensional illustration for the unpolarized TMDPDF f 1 in position and momentum spaces. The TMDPDF in momentum space is defined as The 1σ-uncertainty level is presented by color since the absolute value of the band is visually unresolved. For demonstration purposes we present the combination of the d-andd-flavor distributions. Note, that generally, f N P is flavor dependent, although we omit its flavor dependence in the present work. Nonetheless, the extracted TMDPDFs have a flavor dependence and it is driven solely by the collinear PDF.

Conclusions
We have extracted the unpolarized transverse momentum dependent parton distribution function (TMDPDF) and rapidity anomalous dimension (also known as Collins-Soper kernel) from Drell-Yan data. The analysis has been performed in the ζ-prescription with NNLO perturbative inputs. We have also provided an estimation of the errors on the extracted functions with the replica method. The values of TMDPDF and rapidity anomalous dimension, together with the code that evaluates the cross-section, are available at [45], as a part of the artemide package. We plan to release grids for TMDPDFs extracted in this work also through the TMDlib [69].
Theoretical predictions are based on the newly developed concepts of ζ-prescription and optimal TMD proposed in ref. [27]. This combination provides a clear separation between the nonperturbative effects in the evolution factor and the intrinsic transverse momentum dependence. Additionally, the ζ-prescription permits the usage of different perturbative orders in the collinear matching and TMD evolution. For that reasons, the precise values of the rapidity anomalous dimension (±1%(4%, 6%) accuracy at b = 1(3, 5) GeV −1 ) are relevant for any observable that obeys TMD evolution.
In our analysis, we have included a large set of data points, which spans a wide range of energies (4 < Q < 150 GeV) and x (x > 10 −4 ), see fig. 1. The data set can be roughly split into the low-energy data, which includes experiments E288, E605, E772 and PHENIX at RHIC, and the high-energy data from Tevatron (CDF and D0) and LHC (ATLAS, CMS, LHCb) in similar proportion. To exclude the influence of power corrections to TMD factorization we consider only the low-q T part of the data set, as described in sec. 3. A good portion of data is included in the fit of TMD distributions for the first time, that is the data from E772, PHENIX, some parts of ATLAS and D0 data. For the first time, the data from LHC have been included without restrictions (the only previous attempt to include LHC data in a TMDPDF fit is [13], where systematic uncertainties and normalization has been treated in a simplified manner). We have shown that the inclusion of LHC data greatly restricts the non-perturbative models at smaller b (b 2 GeV −1 ) and smaller x (x 0.05), and therefore they are highly relevant for studies of the intrinsic structure of hadrons. A detailed comparison of fits with and without LHC data has been discussed in sec. 5.
The extracted TMDPDF shows a non-trivial x-dependence that is not dictated only by the collinear asymptotic limit of PDFs. In particular, we find that the unpolarized TMDPDF is bigger (in impact parameter space) at larger x, see fig. 7. This indirectly implies a smaller value of the typical parton transverse momentum k T for larger x. A similar behavior has been also observed in [12]. We also find a strong dependence on the PDF set. The PDFs play the role of a "modelindependent" input at small values of b, and largely determines the x-dependence of TMDPDF. In particular, we have used the NNPDF3.1(nnlo) set [43], since it provides the best agreement with data. We think that the reason for the better agreement with this PDF set is that it has been fitted to the modern LHC data. The fact that TMD observables are so sensitive to the collinear input can be used to put extra restrictions to PDFs. A detailed study of this possibility is left for the future.

A Efficient computation of χ 2
The evaluation of χ 2 values (4.3) involves the inversion of voluminous covariance matrix. A convenient way to compute the χ 2 relies on the Cholesky decomposition of the covariance matrix V, which is presented in this appendix.
The Cholesky decomposition can be applied for any symmetric and positive definite matrix, such as the covariance matrix V, defined in eq. (4.2). The decomposition has the form where L is a lower triangular matrix whose entries are related recursively to those of V as follows: It is then easy to see that the χ 2 can be written as Now, the vector x ≡ L −1 · y is the solution of the lower-diagonal linear system: that can be efficiently solved by forward substitution, so that: Following this procedure, one does not need to compute explicitly the inverse of the covariance matrix V, simplifying significantly the computation of the χ 2 . qT (GeV) qT (GeV) Figure 9. Ratio of theoretical and experimental points as a function of the binned di-lepton transverse momentum for the measured at ATLAS in the range 66 < Q < 116 GeV. Black lines corresponds to the values ti predicted by the theory, whereas red dashed lines corresponds toti (B.6). The experimental points (blue dots) are surrounded by a box describing their error. For this data set, the correlated systematic uncertainty is mainly given by luminocity uncertainty is ∼ 2.8% [36].

B Determining the systematic shifts
In this appendix we present the decomposition of the χ 2 -value to the uncorrelated and penalty parts with the help of the so-called "nuisance parameters". This representation is helpful for visualization of the effect of systematic uncertainties, and allows to compute the systematic shifts. Our presentation follows refs. [41,67]. In order to quantify the effect of systematic uncertainties, we write the χ 2 in terms of the socalled "nuisance parameters" λ α . It is possible to show [67] that the definition of the χ 2 in eq. (4.3) is equivalent to where s 2 i = σ 2 i,stat + σ 2 i,unc . The optimal value of the nuisance parameters can then be determined by minimizing the χ 2 with respect to them imposing that in eq. (B.1) can be interpreted as a shift caused by the correlated systematic uncertainties. As a matter of fact, defining the shifted predictions as the χ 2 reads Therefore, up to a penalty term χ 2 λ given by the sum of the square of the nuisance parameters, the χ 2 takes the form of the uncorrelated definition χ 2 D , i.e. with diagonal covariance matrix. In order to achieve a visual assessment of the agreement between data and theory, it appears natural to compare the central experimental values m i to the shifted theoretical predictions t i in units of the uncorrelated uncertainty s i . The example of comparison of shifted/unshifted data is given in fig. 9.