Collinear matching for Sivers function at next-to-leading order

We evaluate the light-cone operator product expansion for the unpolarized transverse momentum dependent (TMD) operator in the background-field technique up twist-3 inclusively. The next-to-leading order (NLO) matching coefficient for the Sivers function is derived. The method, as well as many details of the calculation, are presented.


Introduction
The exploration of the internal structure of nuclei is a fascinating task, which identifies transverse momentum dependent (TMD) distributions as one of its most powerful tools. Transverse momentum dependent factorization theorems present a consistent description of double-inclusive processes, such as Drell-Yan/Vector/Scalar boson production(DY) [1,2] and semi-inclusive deep inelastic scattering (SIDIS) [1,3,4] in the regime of small transverse momentum. Within the TMD factorization approach, the information on hadron structure is encoded in TMD parton distribution functions (TMDPDFs) and TMD fragmentation functions (TMDFFs). The presence of the transverse scale allows to resolve the internal structure of hadron with more details than collinear parton distributions. Many polarization phenomena, which are subleading in collinear factorization, are described by the leading order TMD factorization. In this work, we study the Sivers function [5,6], which describes the correlation of an unpolarized parton transverse momentum and a hadron polarization vector. The Sivers function is an essential part of the single-spin asymmetry (SSA) phenomenon. Experimentally, SSA has been measured in SIDIS at Hermes [7], COMPASS [8,9], JLab [10] and in Drell-Yan at RHIC [11][12][13]. Its measurement is planned also for the future Electron-Ion Collider (EIC) [14]. SSA has been also an object of intensive phenomenological analysis, see e.g. [15][16][17][18][19]. The resulting predictions differ substantially among these studies owing to TMD evolution [20], which shows the importance of a correct treatment of QCD perturbatively calculable parts. In the literature, there are several available calculations of the SSA in perturbative QCD. The leading order (LO) (and partially the next-to-leading order (NLO)) calculations for the SSA were performed in many works [21][22][23][24][25][26][27]. In principle, following these works it is possible to obtain the perturbative expression for Sivers function at NLO (however, different schemes are used for different parts of the calculation, see discussion in sec. 7.3). Therefore, the SSA and the Sivers function are probably one of the most reknown and intensively studied polarized TMD quantities.
Although the TMD distributions are genuine non-perturbative functions that should be extracted from data, they can be evaluated in a model-independent way in terms of collinear distributions in the limit of large-q T , or small-b in the position space.This procedure is called "matching", and typically, it serves as an initial input for the non-perturbative model of the TMD distributions, see e.g. [17,28,29]. The matching greatly increases the agreement with data [28]. From the theory side, the matching procedure consists in the selection of the leading term in the light-cone operator product expansion (OPE) for the TMD operators [30,31]. Alternatively, the matching can be obtained by taking the small-q T limit of collinear factorization [26,27], which, however, is not always possible [32].
Only a few TMD distributions of leading-dynamical twist match the twist-2 collinear distributions. These are the unpolarized, helicity and transversity TMDPDFs and TMDFFs. The matching coefficients for these distributions are known uniformly at the next-to-leading order (NLO) [1,2,31,33,34], and some are known at NNLO [30,35,36]. The remaining TMD distributions match twist-3 collinear distributions (apart of the pretzelosity which is apparently of twist-4 [36,37]). The knowledge of the matching for these distributions is very poor: the quark TMDPDFs are all known at LO [21,22,25,38,39], and only Sivers function is known at NLO [26,27] (however, see discussion in sec. 7.2). The matching for some of quark TMDFFs, such as Collins function, is known at LO [38]. The matching for the majority of gluon TMD distributions is unknown.
The importance of the computation of the perturbative part of a TMD distribution in order to meet an agreement between theory and experiment has been shown already in [28] for the unpolarized case. Depending on the experimental conditions, the measured data can be sensitive to various aspects of the theory such as power corrections in the evolution [40], power correction at small-x [41], small-x effects in the evolution [42], and many others. The full control of all of these sources of non-perturbative physics requires an accurate setting of the perturbative scales, as provided, for instance, by the ζ-prescription of [43].
In this work, we perform a complete NLO computation of the Sivers function starting from it operator definition and performing a light-cone OPE in background field [44]. To our best knowledge, this approach is used for the description of TMD operator for the first time, despite the fact that it is a standard tool in higher twist calculus, see e.g. [45,46]. This technique grants an unprecedented control of the operator structure and it allows a very general treatment for twist-3 distributions. Therefore, the result obtained in this work is also interesting for a broader study. For the first time, we demonstrate how the TMD renormalization (ultraviolet and rapidity renormalization [47]) is organized at the operator level. We also articulate the role of the gauge links and their direction, and show (at the level of operators) the famous sing-change in-between DY and SIDIS definitions of the Sivers function [48]. Motivated by these considerations, we provide a detailed and pedagogical explanation of the calculation method, which is a major target of this article. For that aim, the Sivers function represents an ideal case, because one can cross-check the calculation with other methods already used in the literature. We anticipate that our results agree with the results present in the literature only partially, however, the origin of the discrepancy is clear.
The article is organized as following. Sec. 2 is a general introduction to SSA in the TMD factorization approach. Here we collect the expressions for SSA structure functions, and describe the role of Sivers function and its collinear matching. In sec. 3.1 we introduce and describe in detail the operator that defines Sivers function. Its renormalization properties are discussed in sec. 3.2. Sec. 3.2 is devoted to the detailed derivation of OPE at LO. We discuss separately the evaluation in regular (sec. 4.1) and light-cone (sec. 4.2) gauges. The NLO evaluation is presented in sec. 5. We make a pedagogical introduction to the background field method in sec. 5.1-5.2. The details on the NLO evaluation of diagrams are given in sec. 5.3. In sec. 5.4-5.5 we discuss the appearance of rapidity divergences and their renormalization. The difference in the evaluation of DY and SIDIS operators is discussed in sec. 5.6. The extra details on the calculation are given in appendices B, where we present a step-by-step calculation of a diagram, and C.1, where we give the diagram-by-diagram expressions for OPE. The collinear distributions are defined in sec. 6. Additional details of the parametrization definition are given in appendix. A. The transition from operators to distributions is discussed in sec. 7.1, and the collection of diagram-by-diagram expressions can be found in appendix C.2. The final result of calculation is given in sec. 7.2. The discussion and comparison with earlier calculations is given in 7.3.

Sivers effect and TMD factorization
TMD distributions are defined by a large set of parameters: collinear momentum fraction x, transverse distance b (or transverse momentum p T ), polarization, parton flavor f , the type of hadron h, ultraviolet and rapidity renormalization scales (µ and ζ), and the defining process (DY or DIS). An explicit designation of all these parameters would lead to a heavy notation such as f ⊥ 1T,q←h;DY (x, b; µ, ζ), which should be read as the Sivers function for a quark q with momentum faction x at the transverse parameter b produced by hadron h in the DY kinematics, measured at scales µ and ζ. Most of this information is not needed in perturbative calculations and in the following we skip the unnecessary parts of the notation, e.g. the renormalization scales are usually dropped. We also distinguish the momentum and coordinate space TMD distributions only by their arguments. In the rest of this section we show how the Sivers function arises in SIDIS and DY cross sections.

Sivers function in SIDIS
The semi-inclusive deep inelastic scattering (SIDIS) is a common name for a set of processes where l(l ′ ) is a lepton, N is a nucleon target, and h is the produced hadron. The TMD factorization is applicable in the regime |P h | ≪ Q, where Q 2 = (l − l ′ ) 2 is a hard scale of the scattering, P h is the transverse component of the momentum P h . In the following, we use the bold fond notation for the transverse components of vectors.
In the case of unpolarized lepton beam, unpolarized produced hadron h and a transversely polarized target N , the cross-section for SIDIS contains three structures. The so-called Sivers effect (proportional to sin(φ h − φ s )), Collins effect (proportional to sin(φ h + φ s )) and the sin(3φ h − φ s ) asymmetry. The structure functions corresponding to these effects within TMD factorization can be found e.g. in [4,15,49]. The structure function for the Sivers effect is denoted by F where the variables x and z are the momentum fractions of partons and M is the hadron mass. The functions D 1 and f ⊥ 1T are unpolarized and Sivers TMD distributions. The factorization scale µ is typically chosen to be of order Q. The scales of soft exchanges (rapidity factorization) ζ 1,2 satisfy The TMD factorization is naturally formulated in position space, where the Fourier convolution in eq. (2.2) turns into a product of functions. In position space the structure function reads The functions D 1 and f ⊥ 1T depend only on the length of the vector b but not on its direction, and one can also simplify the angular dependence [39,50] where J 1 is the Bessel function of the first kind. The equation (2.4) is the usual starting point for the parametrization of the Sivers effect in TMD factorization.

Sivers function in DY
The Sivers effect also appears in the Drell-Yan/vector boson production process where one of the initial hadrons is polarized [49,[51][52][53]. In general one refers to structure functions F 1 UT when the hadron h a is polarized and F 1 T U when the hadron h b is polarized. The structure function F 1 T U in TMD factorization (i.e. for q T ≪ Q) reads [54] where Q 2 = (l + l ′ ) 2 is the hard scale of the process, x a,b are momentum fractions of partons, q T is the transverse component of q = l + l ′ relative to the scattering plane and f 1 is the unpolarized TMD distribution. The factorization scales are defined similarly to the SIDIS case, i.e. µ ∼ Q and ζ 1 ζ 2 = Q 4 . The transformation of the structure function under interchange of the polarized hadron The structure functions can be also written in the form where we have integrated out the angular dependence. The Sivers functions in SIDIS, eq. (2.2) and DY, eq. (2.6), have different labels that specify the processes. These functions have different operator definitions (see sec. 3.1). However, de facto, the process-dependence reduces to a simple sign change [21,48,55,56] In the following, we demonstrate the origin of the sign-change at the level of OPE.

TMD evolution and operator product power expansion
The practical application of TMD factorization relies on the concept of TMD evolution, which allows to relate structure functions at different values of Q. Here, we should stress that a TMD distribution is an involved non-perturbative function. In fact, in addition to the non-perturbative structure of TMD distribution (which involves the dependence on the variables (x, b)), the TMD factorization also contains a non-perturbative part of the evolution factor (which depends only on b). An efficient implementation of the TMD approach should be able to disentangle these nonperturbative contributions. The parametrization and extraction of three non-perturbative functions (two TMD distributions and the evolution kernel) of two variables would be a hopeless task if the TMD factorization would not allow us to separate the problem into pieces. First of all, the TMD evolution is regulated by two scales (µ, ζ) and it is process independent. It factors out the non-perturbative evolution effects into an evolution factor which is strictly universal for all structure functions and for all TMD factorizable processes. Nonetheless, the TMD evolution still non-trivially affects the (x, b) dependence of the distribution which should be modeled as a function of two variables. To simplify this procedure one can use any available information that restricts the functional form of the TMD. In particular, at small values of b a TMD distribution can be related to collinear distributions in a model-independent way in perturbation theory. Such a relation has the general form provided by OPE (2.10) where C i are perturbatively calculable Wilson coefficient functions which depend on b only logarithmically, f i are collinear distributions of increasing twist and ⊗ is an integral convolution in the variable x. This expansion is valid only in a certain range of b, say |b| < R, where R is some matching scale. For values of b larger than R TMD distribution is completely non-perturbative. In fact, as the value of b gets closer to R, the contribution of higher order terms in the small-b expansion becomes more important. However, our knowledge of the corresponding higher-twist distributions is very limited. Thus, it is of practical convenience to use only the first term of the small-b expansion in eq. (2.10) and replace the rest by a generic non-perturbative function, i.e. (2.11) The practical success of such an ansatz can be easily understood if we notice that the main contribution to the Fourier integrals in eqs. (2.4, 2.8) comes from the small-b region. Therefore, we can expect that the function f N P has a simple behavior in x and b, which is indeed confirmed by phenomenological applications of this formula. The details of the modeling procedure which is based on eq. (2.11) are different in different approaches, but the core picture described here remains unchanged.
The small-b matching is an essential part of the modern TMD phenomenology. In ref. [28] a comparison of different orders of the matching to experimental results has been performed. It has been shown that the NLO matching is essential for the predictive power of the approach. The NNLO matching provides further improvements and it can be necessary for the description of the most precise experiments.
The achievable precision can also be affected by the choice of scales in the matching. Let us also mention that in [43] the authors have proved the possibility to disentangle the procedure of small-b matching and TMD evolution using the ζ-prescription which is not entirely possible in other formulations. The ζ-prescription allows using different perturbative orders for TMD evolution and small-b matching. This means that the modeling of the TMD through eq. (2.11) is completely separated from the evolution part of the TMD (that is, the scale choice does not mix up non-perturbative pieces of different origin). This fact results to be extremely useful for phenomenology since it allows to use the highest allowed expressing for evolution [57] together with polarized observables where high orders of the perturbation theory are unknown. The universal non-perturbative part of evolution could be extracted from the precisest data (such as Z-boson production at LHC) [58].
Let us conclude this section recalling that the hard coefficient functions H DIS and H DY within TMD factorization are given by the quark form factor evaluated in the different analytical regions. At the NLO they differ only by a π 2 -term, where l Q 2 = ln(µ 2 /Q 2 ), and a s = g 2 /(4π) 2 . The NNLO and NNNLO expression can be found in [59].

Operator definitions for unpolarized and Sivers TMD distributions
In this section, we introduce and review the main properties of TMD distributions.

Definition of TMD distributions
Through the article we use the standard notation for the light-cone decomposition of a vector where The vectors n andn are light-like Their particular definition is related to the factorization frame of the scattering process. The transverse part (with respect to vectors n andn) of the metric and Levi-Civita tensors are where ǫ µνρσ is in the Bjorken convention (ǫ 0123 = −ǫ 0123 = 1). In four dimensions (with n andn localized in the plane (0, 3)) both tensors have only two non-zero components, g 11 T = g 22 T = −1, and ǫ 12 T = −ǫ 21 T = 1. Since the transverse subspace is Euclidian, the scalar product of transverse vectors is negative, v 2 T < 0. In the following, we adopt the bold font notation to designate the Euclidian scalar product of transverse vectors, i.e. b 2 = −b 2 > 0, when it is convenient.
Using this notation, the transverse momentum dependent parton distribution functions (TMD-PDFs) for unpolarized quark are defined by the matrix element [1,2,60] where [a, b] are Wilson lines defined in eq. (4.2). The notation ±∞n indicates different cases of TMD distributions, which appear in different processes. The TMD distributions that appear in SIDIS have Wilson lines pointing to +∞n, while in Drell-Yan they point to −∞n as in fig. 1. The Wilson lines within the TMD operator are along the light-like direction n. The matrix element in eq. (3.4) for the polarized hadron is parametrized by two independent functions [39,50] where M is the mass of the hadron, and s T is the transverse part of the hadron spin-vector S, i.e. s µ T = g µν T S ν . The function f 1 is the unpolarized TMDPDF, which measures the unpolarized quark distribution in an unpolarized hadron. The function f ⊥ 1T is known as the Sivers function, which measures the unpolarized quark distribution in a polarized hadron.
The parametrization of eq. (3.5) is given in position space. The distributions in momentum space are defined in the usual manner where the scalar product (bp) is Euclidian. Correspondingly, the momentum space parameterization reads [4,61] Φ Some explicit relations among particular TMDPDFs can be found in the appendix of ref. [39]. These relations are used to relate structure functions in momentum and coordinate representations in sec. 2.
The anti-quark TMD distribution is defined as (3.8) Using charge-conjugation, one can relate the quark and anti-quark TMD distributions [60], from which it follows Therefore, in the following we associate the anti-quark distributions with the negative values of x and we define the TMD distributions in the range −1 < x < 1 as The small-b expansion (often called small-b matching or collinear matching) presents a TMD distributions as a series of collinear distributions and Wilson coefficients in the vicinity of b = 0 as in eq. (2.10). For instance, the leading term of the small-b expansion for unpolarized TMD is expressed by the (unpolarized) collinear PDF f 1 (x) where the sum index f indicates gluons, quarks, and antiquarks of all flavors. The coefficient function C is the perturbative Wilson coefficient, which depends on b logarithmically. Its leading term is δ(1−y) and the perturbative corrections are known up to NNLO [35]. The power corrections (as in eq. (2.10)) contain collinear distributions of twist-2 and twist-4, and they are currently unknown.
The expression for the small-b matching of the Sivers function is where T are the collinear distributions of twist-3, to be defined in secs. 6.1, 6.2. The symbol ⊗ denotes an integral convolution in the variables x 1,2,3 . At leading order the expression for the coefficient function is known to be ±πδ( [21,22,25,39] (and we also re-derive it in the next section). The status of the NLO expressions is cumbersome. In principle, the quark-to-quark part can be found in [26], where it has been extracted from computation of the cross-section made in [22][23][24]. However, the computations made in [22][23][24] miss certain parts, and for this reason they are partially incorrect (see extended discussion in [62]). The quark-to-gluon part is evaluated in [27], however, the authors use a scheme which is different from the standard one for twist-2 computations. We return to this discussion in sec. 7.2.

Evolution and renormalization
The renormalized TMD, unlike usual parton distributions, depend on a pair of scales. This is a consequence of the TMD factorization procedure, which decouples the hard scattering factorization and the factorization of the soft-gluon exchanges [1,47,63,64]. As a result the evolution of TMD is given by a pair of equations where γ F and D are respectively the ultraviolet (UV) and rapidity anomalous dimensions. Eq. (3.16-3.17) are independent of polarization and TMD structure. The double-scale nature of factorization and evolution opens also unique possibilities for the phenomenological implementation of TMD. In particular, it allows a universal scale-independent definition of a TMD distribution [43]. At the operator level the double-scale nature of evolution is reflected by the presence of two types of divergences, namely UV and rapidity divergences. Both divergences are to be renormalized. The UV renormalization factor is known as TMD-renormalization factor Z TMD f , and it can be extracted from the UV renormalization of quark (or gluon) vertex attached to the (light-like) Wilson line. The rapidity renormalization is made through the rapidity renormalization factor R f (for the proof of multiplicativity of rapidity divergence renormalization see [47]). It is important that both renormalizations are made at the level of operator, and thus do not depend on the hadron states. The renormalized TMD operators U f that defines the physical TMD distribution, reads here Z i is the renormalization of the field wave functions (Z 2 for the quark field, and Z 3 for the gluon field). The TMD operators U relevant for this work are defined later in eq. (4.1, 4.3). Both renormalizations are scheme dependent. We use the conventional MS-scheme together with the dimensional regularization for the UV divergences. For the rapidity renormalization we use the conventional scheme [1,2,65] that is fixed by the requirement that no remnants of the soft factor contribute to the hard scattering [47]. In this scheme the rapidity renormalization factor is given by the inverse square root of the TMD soft factor R = 1/ √ S. The particular expression depends on the order of application of the renormalization factors. In this work, we fix the order as in eq. (3.18) and we use the δ-regularization, whose definition is given in sec. 5.4. Then the rapidity renormalization factor reads [66] where B = b 2 /4e −2γE , and a s = g 2 /(4π) 2 . The UV renormalization constant is [30] Here, we list only the renormalization constants for quark operators at one-loop, since they are the only required in the present calculation. The gluon case, as well as, two-loop expressions can be found in [30].

Light-cone OPE at leading order
In this section we present the operators that enter in the definition of the Sivers function and their LO limit for small-b, recovering the results of [39]. The notation for operators established in this section is the one used in the NLO computation.

Light-cone OPE in a regular gauge
Let us denote the operator that defines the TMD distributions in DY case as where the Wilson lines are defined as The operator that defines the TMD distributions in the SIDIS case reads Generally, the links which connect the end points of Wilson lines at a distant transverse plane must be added in both operators (for DY and for SIDIS) [67,68]. Here, we omit them for simplicity, assuming that some regular gauge (e.g. covariant gauge) is in use. In non-singular gauges the field nullifies at infinities, A µ (±∞n) = 0, and the contribution of distant gauge links vanishes. The case of singular gauges is discussed in the following section. The operators in eq. (4.1, 4.3) are defined for arbitrary light cone positions z 1 and z 2 , although the definition of a TMD distribution depends only on the difference of these points. Such a generalization does not complicate the calculation, moreover, it allows to cross-check certain results. Note, that the operators in eq. (4.1, 4.3) define the generalized transverse momentum distributions (GTMDs), and thus the obtained OPE can be applied for generalized TMD (GTMD) kinematics as well.
It is straightforward to check that the spatial separations between any pair of fields in the operators defined in eq. (4.1, 4.3) are space-like. For that reason we can replace the T -andTorderings by a single T -ordering. This significantly simplifies the calculation and in the following we do not explicitly show the symbol of T-ordering, but we suppose that each operator is T-ordered. The possibility to reorder the fields is not a general feature, e.g. TMD operators for fragmentation functions do not allow this simplification and thus, their consideration and properties are drastically different.
At LO in the perturbation theory one can treat the fields as classical fields, i.e. omit their interaction properties. In this approximation, the small-b expansion is just the Taylor expansion at b = 0. Expanding U in b up to linear terms we obtain The leading term is the same for DY and SIDIS cases Note that the half-infinite segments of Wilson lines compensate each other due to the unitarity of the Wilson line, and the resulting operator is spatially compact. The derivative term in eq. (4.4) is different for different kinematics Here, the derivative prevents the compensation of infinite segments of Wilson lines. Acting by derivative explicitly we obtain where the covariant derivative and the field-strength tensor are defined as usual (4.10) The operators which contribute to each order of the small-b expansion have different geometrical twists 1 . In particular, the first term in eq. (4.8) is a mixture of twist-2 and twist-3 operators, while the second term is a pure twist-3 operator (the same for eq. (4.9)). The procedure of separation of different twist contributions is explained in details in [39]. In the present paper, we skip this discussion because the Sivers function contains only contribution of geometrical twist-3 operator. Indeed, comparing the results for DY in eq. (4.8) and SIDIS in eq. (4.9) kinematics we observe that the first terms are the same, while the last terms differ. Therefore, already at this stage it is clear that the Sivers function is made of the operators from the last terms, i.e. pure twist-3 operator.

Light-cone OPE in the light-cone gauge
Before entering a detailed description of the background field method it is convenient to formulate the derivation of the small-b limit of the TMD functions at LO in the light-cone gauge. This gauge will then be used in the following to describe the background fields.
The definition of TMD operators is gauge invariant. In order to demonstrate this explicitly, let us restore the formal structure of gauge links in eq. (4.1, 4.3). We have Notice, that in order to write eq. (4.11, 4.12) we have explicitly used the fact that the T-ordering can be removed. In the absence of such assumption the finite distance transverse link must be replaced by two half-infinite links [67]. The light-cone gauge is defined by the condition The application of this condition removes the contribution of gauge links along vector n in the TMD operator, i.e. [zn + b, ±∞n However, the status of the transverse gauge links is unresolved. This reflects the known fact that the gauge fixing condition (4.13) does not fix the gauge dependence entirely but should be supplemented by an additional boundary condition. There are two convenient choices for boundary conditions in our case 2 Clearly, each of these boundary conditions is advantageous in some particular kinematics. As so, we apply the retarded boundary condition for the DY operator. That is, the transverse link at −∞n vanishes, Whereas for the SIDIS operator we apply the advanced boundary condition. That is, the transverse link at +∞n vanishes, Thus, the operators have the same expression in different gauges. In order to recover the structure of gauge links (and hence to obtain the explicitly gauge-invariant operators), we can make a gauge transformation of the operator, and subsequently replace each gauge-rotation factor by a Wilson line along the vector n to the selected boundary. The OPE in the light-cone gauge has a compact form. The leading term of eq. (4.4) is The expression for the derivative of the operator is also independent of the underlying kinematics (compare to eq. (4.6, 4.7)) 19) and in fact, it already gives the final expression of the correction linear in b in the light-cone gauge. Let us show how the results for LO OPE in eq. (4.8, 4.9) are recovered starting from eq. (4.19). One starts rewriting eq. (4.19) explicitly in a gauge-invariant form. With this purpose we replace the partial derivatives in eq. (4.19) with covariant derivatives, see eq. (4.10), by adding (and subtracting) appropriate gluon fields To proceed further, we have to recall the used boundary condition in the form in the advanced light-cone gauge, (4.22) where x is an arbitrary point. Substituting these expressions into eq. (4.20) we arrive to eq. (4.8, 4.9).

Light-cone OPE for the gluon TMD operator
The quark operators described in the previous sections mix with the respective gluon operators, which have a similar expansion at small-b. Since this expansion for gluons has never been considered in the literature we briefly describe it here. We define the gluon TMD operator as (compare to eq. (4.1, 4.3)) where the Wilson lines are in the adjoint representation, i.e. the contraction of the color indices 3 is . The parametrization of the corresponding TMD matrix elements can be found e.g. in [34]. The evaluation of the light-cone OPE for gluon operators is totally analogous to the one made in sec. 4.1. The only difference is that the quark fields are replaced by F +µ , and the covariant derivatives act in the adjoint representation. We obtain the following analog of eq. (4.8, 4.9) where the covariant derivatives are in the adjoint representation. Alike the quark case, the only operators which contribute to the Sivers function are given in the second lines of these equations.

Light-cone OPE at next-to-leading order
The object of this section is to introduce the calculation of OPE for U up to terms linear in b at NLO in perturbation theory. The OPE is realizable when b 2 ≪ Λ −2 and it looks like where C are the coefficient functions which depend on b 2 logarithmically, n enumerates all available operators at this order, and ⊗ is some integral convolution in variables z. Here, we also introduce the notation for the coupling constant a s = g 2 /(4π) 2 , and for the logarithm combination that typically enters in perturbative calculations The variable µ represents the scale of OPE. The complexity of the computations for OPE increases drastically passing from LO to NLO in perturbative QCD. In the latter case one cannot omit the field interactions, as it happens in ordinary Taylor expansion as in eq. (4.4). The propagation of fields between different points is responsible of the fact that eq. (4.4) is to be modified in the presence of interactions which can change flavor or pick up additional fields from the vacuum. Moreover, the OPE with interacting fields contains all possible operators with correct (as prescribed by the theory) quantum numbers.
An additional difficulty in the present calculation is that only a few computing methods have been tested on higher twist operators. For the twist-2 TMD operators the matching procedure works more easily because OPE is in a one-to-one relation with the matrix elements over freestates. In the case of higher twists instead the relation between matrix elements and operators becomes more difficult due to increased number of operators and non-trivial mixing. An additional difficulty is represented by the fact that the higher-twist operators can mix with each other via the QCD equations of motion. The better method to evaluate the coefficient functions at higher twists is the background-field method, which is very effective for the higher twist calculations. In the following we concentrate on this method, for which we provide a brief general introduction in sec. 5.1. The details of the calculation are given in sec. 5.2-5.3. The treatment of rapidity divergences and renormalization needs a special discussion which is provided in sec. 5.4-5.5. All the computation is done for the DY case, but the passage to the SIDIS case does not present particular difficulties and the comparison of the two cases is provided in sec. 5.6.

OPE in background field method
The background-field method is founded on the idea of mode separation. The operator matrix element between states S 1 and S 2 is defined as where the letter Φ represents any QCD field {q, q, A µ }, Ψ S is the wave function of the state S, and S is the action of QCD. Let us split the fields into the "fast" and "slow" (or "short-correlated" and "long-correlated" in position space terminology) components, as Here, the "fast" modes φ have momentum p > µ, while "slow" modes have momentum p < µ. The (factorization) scale µ is not explicitly defined but it is large enough to guarantee the convergence of the perturbative series. In the following we omit the argument µ for the fields. We postulate that physical states (hadrons) are built from the "slow" components, i.e. Ψ S [Φ] = Ψ S (ϕ) so that eq. (5.3) turns into In this expression the integral over "fast" components can be evaluated, and the expression for observables has the following effective form The mode separation then assumes that the "slow" fields can be treated as free-fields on distances x 2 . This hypothesis is typical for effective field theories (see for instance [70][71][72] for the application of similar concepts in soft collinear effective theory (SCET) or [46] for TMD factorization at small-x). One can interpret the construction (5.7) as an evaluation of the perturbative QCD fields in a general parton background, which gives the method its name. After the integration of the "fast" fields in eq. (5.7), the resulting effective operator is then expanded using free-theory twist expansion, as it was done in sec. 4. It is important to realize that in background calculation the result is gauge-invariant and satisfies QCD equations of motion at each step of the evaluation (even for each diagram). The result then is also universal, that is, it is valid for all states (we do not even specify them), and thus, we can operate only with fields ϕ. Essentially, the background field methods is concentrated in a single definition, eq. (5.7).
The background field method is an essential tool of the modern small-x calculations. In this case the separation of kinematic modes is based on the strong ordering in rapidity, which is a distinctive feature of the small-x kinematics. To define different modes one has to introduces a rapidity cutoff parameter σ, which separates "fast" (p + < σ) and "slow" (p + > σ) fields based on the value of the longitudinal component of the momenta p + . Instead of the twist expansion the calculation of the functional integral over "fast" fields (5.7) is now performed in the so-called shockwave approximation. Since the procedure of separation of modes is quite general, the method can incorporate different kinematic regimes, which has been recently employed in [42,46].

QCD in background field
The QCD Lagrangian reads where the covariant derivative and F µν are defined in eq. (4.10). Following the mode separation we split the fields as A µ → A µ + B µ and q → q + ψ, where ψ and B µ are "fast" fields, and q and A µ are "slow" (background) fields. The separation of modes in the main body of the Lagrangian is straightforward, but the gauge fixing term should be considered with caution. The ultimately convenient point of the background field method is the possibility to choose different classes of gauge fixing for different modes. The detailed discussion on gauge fixing in QCD with background method is given in [73,74].
(1) Figure 2. Example of diagrams that vanish in our scheme of calculation. Diagrams (1) and (2) vanish due to A+ = 0. Diagram (3) is proportional to 1 − α and vanish at α = 1. Diagrams (4) and (5)  We choose the most convenient combination of gauges for our task. For "fast" components we use the background-field gauge, which is the analog of covariant gauge fixing in the usual QCD perturbation theory. In particular, the propagator has the familiar form where α is a free parameter. For background fields we use light-cone gauge eq. (4.13) with retarded boundary condition eq. (4.21) for DY operators, and advanced boundary condition eq. (4.15) for SIDIS operators.
In background field formulation, the Lagrangian of QCD splits into three parts where the first two terms are usual QCD Lagrangians built for particular modes, and the last term is the "fast-slow" modes interaction, where δL ABB (δL ABBB ) is the interaction of a single field A µ with two (three) fields B µ , and δL AABB is the interaction of two fields A µ with two fields B µ . These terms depend on the gauge fixing condition. For our calculation we need only the δL ABB interaction. It reads The rest of the terms can be found in [73]. In the following, we consider the case α = 1, which corresponds to the "Feynman gauge version" of the background gauge.

Evaluation of diagrams
We would like to evaluate the effective operator in eq. (5.7) up to twist-3 corrections, at a s order. The computation proceeds expanding the interaction part of the exponent in eq. (5.7), and integrating the "fast" modes by the Gaussian integration formula. i.e we obtain the Feynman diagrams with background fields as the external sources. The divergences of loop-integrals are regularized as in [30,35,66], which allows us to use the TMD renormalization factor as in eq. (3.19, 3.20) [30].
• The UV and collinear divergences are regularized by the dimensional regularization with d = 4−2ǫ. We use the conventional MS scheme with (e −γE /4π) ǫ factor for each a s = g 2 /(4π) 2 .
Within this scheme many diagrams vanish. Some examples of null diagrams are shownin fig. 2. (i) and more specifically we have the following cases of vanishing diagrams: (i) The diagrams with the background field coupled directly, or through a sub-graph, to the Wilson lines, such as diagrams diagrams (1) and (2)  The rest of contributions are conveniently ordered with respect to the number of background fields. Since the number of fields in the operator is less or equal to the twist of the operator, only the diagrams with two or three background fields contribute at a specific power of OPE. There are 6 non-vanishing diagrams at this order (4 of them have charge conjugated diagrams). The diagrams with two quark fields are shown in fig. 3. The diagrams with two quark and gluon fields are shown in fig. 4. There are also diagrams (with two and three field) that mix the quark operator with the gluon operator, as in fig. 5. In principle, there could be also diagrams with more gluon insertions, which are to be combined with a single gluon insertion into a gauge invariant combination F µν (with both transverse indices). However, we recall that only F µ+ contributes to operators of twist-3, and in the light-cone gauge F µ+ = −∂ + A µ . Thus, such diagrams should not be considered at twist-3 accuracy.
The process of diagrams computation is almost elementary. Let us show here the evaluation of the simplest diagram, diagram A. A similar evaluation (with the only difference in the path of Wilson lines) is presented in [45], which allows an instructive comparison. Also, in ref. [75] the diagram A (and the diagram B) has been calculated in momentum space for all values of b, which allows to match the scheme factors. Importantly, the diagram A plays a special role in TMD physics, since it is the only diagram which has rapidity divergences as discussed in the next section.
In appendix B we also present a detailed explanation of the computation technique for one of the most difficult diagrams (diagram E). The diagram A comes from the following contraction of fields in eq. (5.7) where the factor in the square brackets is part of the Wilson line, and the factor in the round brackets is part of δL (see eq. (5.12)). Note, that here we consider the DY operator, which dictates the integration limits over σ. The propagators in dimensional regularization (with d = 4 − 2ǫ) are where the gluon propagator is taken with α = 1. Explicitly, the diagram reads where we have simplified gamma-and color-algebra.
In order to evaluate the integral over y, we recall that the background field is a classical field, and the expressions of the form eq. (5.18) should be understood as a generating function for the whole tower of twist-operators. Therefore, we are allowed make the twist-expansion under the sign of the loop-integration. In the considered case, we make the Taylor expansion at y µ = 0, q(y + x) = (1 + y µ ∂ µ + y µ y ν /2 ∂ µ ∂ ν + ...)q(x). The loop-integration can be taken for each term in the series. The necessary loop-integral reads where g s is a completely symmetric composition of metric tensors. For an odd number of indices the loop-integral is zero. Metric tensors produced by loop-integration can contract derivatives, vectors b µ and n µ . Each term in the series should be sorted with respect to its twist. The thumb rule is that each transverse derivative increases the twist of an operator, but the light-cone derivative does not. Thus, the higher derivative term could be dropped. Alternatively, one can count the power of the vector b. In our current calculation, we evaluate up to terms linear in b. Note, that strictly speaking we should also expand fields in the powers of b, but it does not affect the diagram evaluation and can be postponed until later stage.
The expression in eq. (5.18) has a very simple numerator, which is linear in y. So, only odd terms of Taylor series contribute. Moreover, already the second term in the expansion, the one with three derivatives ∼ y µ y ν y ρ ∂ µ ∂ ν ∂ ρ q/3!, vanishes after contraction. Indeed, it generates ∂ + ∂ 2 q, that is at least twist-4 (on top, this contributions is proportional to b 2 ). Therefore, we consider only the single-derivative term of the series and obtain where B = b 2 > 0. Charge-conjugated diagrams can be evaluated independently, or obtained from the direct diagrams by reversing the order of field arguments and with the replacement z 1 ↔ z 2 . I.e. the diagram A * reads These expressions contain rapidity divergences, which are discussed in the next section. All other diagrams are evaluated similarly. The expression for the diagram A in SIDIS kinematics is almost identical to DY case. The only modification is the lower limit for integration over σ iin eq. (5.14), which must be changed to (+∞) for the SIDIS case. Such a replacement does not affect the evaluation of the diagram and thus the analog of eq. (5.21) in the SIDIS kinematics is obtained replacing (−∞) by (+∞).

Treatment of rapidity divergences
The rapidity divergences appear due to the localization of a gluon field in the transverse plane at the light-cone infinity [47]. There are three diagrams that have interactions with a Wilson line, and thus, that are potentially rapidity divergent. These are diagrams A, C and E. However, according to the general counting rule [47], only the diagram A is rapidity divergent. In this section, we demonstrate how rapidity divergences arise in background field calculation.
The fact that diagram A is rapidity divergent is well-known. It has been calculated in numerous works, see e.g. the discussions in ref. [1,2,31,35,75]. In all these works, the diagrams have been calculated in momentum space, where the loop-integral is explicitly divergent. In our case the loopintegral in the diagram A has been evaluated without any problems, however, as we demonstrate shortly, the result of the integral in eq. (5.21) is ambiguous, and the resolution of this ambiguity gives rise to the rapidity divergence.
The ambiguity in diagram A is hidden in the argument of the quark field. Indeed, its value at point (α, σ) = (0, −∞) depends on the path used to approach this point. In particular, we find and the integration over σ and α does not commute in the vicinity of (0, −∞).
In order to resolve the ambiguity, the dependence on α and σ should be separated. Let us rewrite eq. (5.21) as where we set b in the arguments of the fields to 0, for demonstration purposes (the presence of b in the argument does not change the procedure of rapidity divergence elaboration and we restore it at the end of the section). In eq. (5.25) the ambiguity at (0, −∞) is enforced by the divergence of the integrand at α → 0. We isolate the ambiguous part of the diagram splitting the integration into two parts The regular part does not contain the problematic point, and thus the order of integration is irrelevant. Taking the integral over σ by parts, we obtain This expression is regular at α → 0 since z α=0 21 = z 2 , and it is a position representation form of the well-known "plus"-distribution.
To evaluate the singular part we introduce a regulator. Here, we use the δ-regularization, which consists in the following modification of the Wilson line where δ > 0. Such modification breaks gauge invariance by power corrections, and therefore, only the limit δ → 0 is gauge invariant. For the detailed discussion of this issue we refer to [66]. In the δ-regularization the integrand of eq. (5.28) receives a factor e σδ . With such a factor the ambiguity is resolved because the integrand is zero at σ → −∞ irrespectively of the value of α. In order to evaluate it, we make the change of variable τ = α(σ − z 2 ) and we obtain The integral over α can be taken explicitly and in the limit δ → 0 we obtain The logarithm of δ represents the rapidity singularity. The integral over τ in eq. (5.31) is then taken by parts, where the parameter ν + is built from γ E and z 2 . The parameter ν + is an infrared scheme-dependent parameter, which is the parameter of the rapidity renormalization procedure [47]. It is an analog of the familiar scale µ in the UV renormalization. In order to match our calculation to the commonly used scheme of rapidity renormalization, we compare eq. (5.33) with the calculations made in momentum space, such as in [35,75]. From this comparison it follows that ν + = p + with p + being the hadron momentum. However, the operator does not know the value of the hadron momentum, and therefore, we leave ν + as it is. The same method can be used when the position of fields is shifted by b. The result for the diagrams A can be written in the form where we have added a total shift ∼ αb to the first operators, to make the expression more compact. Including such a shift does not affect the expression for the TMD distribution, since it is proportional to the difference between the momenta of initial and final states. Notice that while in TMD distributions this difference is null, it is not the case for generalized TMD distributions (GTMD).

Renomalization
Performing the evaluation of all the other diagrams in a similar manner (see an explicit example for diagram E in the appendix B), we get the OPE for the bare TMD operator, which schematically can be written as where the indices i enumerate all operators that enter the expression and ⊗ is some integral convolution in the light cone positions variables z. Here, the coefficientsC depend on ǫ, δ and light-cone positions z 1,2 , the dependence b is concentrated entirely in the factors B ǫ . The explicit form of each term in eq. (5.36) is rather lengthy. We present it diagram-by-diagram (since there is practically no simplification in the diagram sum) in appendix C. The bare OPE eq. (5.36) requires renormalization as in eq. (3.18), i.e. both sides of eq. (5.36) are to be multiplied by Z −1 2 Z T MD q R q (b), whose LO expressions are given in eqs. (3.19) and (3.20). We recall that this renormalization is universal, in the sense that, it is common for all terms of the smallb expansion and for various Lorentz structures of TMD operator. An example of this universality is already provided by the diagram A, discussed in the previous section. Indeed, according to eqs. (5.34, 5.35) the rapidity divergence enters the expression multiplying the bare TMD operator U(z 1 , z 2 ; b). In other words, we can extract the rapidity divergent terms from eq. (5.36) and write it as

rapidity finite terms).(5.37)
Multiplying it by R q , given in eq. (3.19), the logarithm of δ cancels for all terms of the smallb expansion to all orders of ǫ. To our best knowledge this is the first explicit demonstration of rapidity divergences renormalization of TMD at higher twists. The renormalization of eq. (5.36) makes this expression finite. However, coefficientsC contain singularities in ǫ. These singularities are collinear singularities and are compensated by UV behavior of light-cone operators. To remove them explicitly we replace the bare operators on r.h.s. by the renormalized operators O bare = Z −1 ⊗ O R (µ). The factor Z −1 being convoluted with coefficient function removes the remaining poles in ǫ.
Concluding, the renormalized expression for small-b OPE has the form where the operators are renormalized at scales µ and ζ, and we have set the scale of renormalization for light-cone operators to be the same as for TMD operator for simplicity. The expression for the coefficient functions at NLO for any twist can be written as where we have used the explicit expressions for the renormalization factors, eq. (3. 19, 3.20). With this formula it is simple enough to obtain the coefficient functions for small-b OPE in the coordinate space. However, they are of little use, since in practice, one operates in terms of momentum fractions x and the corresponding collinear distributions. The transition to the distribution and the corresponding expressions are discussed in sec. 7.

Difference in the evaluation of DY and SIDIS operators
The operators for the DY and SIDIS initiated TMD distributions differ by the geometry of Wilson lines. This dependence influences the calculation in two aspects. The first one is the explicit expression for diagrams that have interaction with Wilson line, such as diagrams A, C and E. The second one is the preferred boundary conditions for the gauge fixing for the background field, the retarded for DY-type operators, eq. (4.14), and advanced one for SIDIS-type operators, eq. (4.15). Let us note, that boundary conditions do not influence the process of diagram evaluation, but rather the procedure of recompilation of the expressions in terms of gauge-invariant operators, see eq. ( Here, dots indicate various compositions of fields, functions and integrals that do not change. Such a structure is already evident at the tree level order, as one finds comparing eq. (4.8) and eq. (4.9). As we will see, in terms of distributions this difference will result into a different global sign of the coefficient functions.

Definition of collinear distributions
In order to proceed further we need to evaluate the hadronic matrix element of OPE. This procedure is scheme dependent in the following sense: We recall that our computation is made in dimensional regularization, and after the renormalization procedure the expressions are finite for ǫ → 0. Nonetheless, the finite part of the results depends on ǫ and moreover the expressions so obtained have a tensor structure which also depends on the number of dimensions. Thus, in order to completely define the scheme, we should specify the order of operations with respect to the limit ǫ → 0. There are two major options. The first one is to set ǫ → 0 at the level of operators, and define the distributions in 4 dimensions. The second one is to define distributions in d dimensions, and set ǫ → 0 after the evaluation of matrix elements. Both schemes have positive and negative sides. In fact, this problem has not been accurately addressed in the TMD-related literature. We have checked the traditional calculations of TMD matching at twist-2 [1,30,31,76], and conclude that in all these cases the second scheme is used. Therefore, to be consistent with earlier calculations, we use the second scheme. In the present work, i.e. for the Sivers function, the differences between schemes appear only in the quark-gluon mixing terms.
In the rest of this section we define the twist-2 and twist-3 matrix collinear distributions and evaluate the TMD matrix element over the small-b OPE obtained in the previous section.

Quark distributions
The forward matrix elements of the light-cone operators are parametrized by collinear distributions, or parton distribution functions (PDFs). For this work we need the forward matrix element of twist-2 and twist-3 operators only. We start discussing the required quark distributions, while the gluon distributions are treated in the next section.
There are three quark operators contributing to the OPE of the Sivers function, The operator in eq. (6.1) is twist-2, whereas the operators in eq. (6.2, 6.3) are twist-3. We emphasize that all indices appearing in eq. (6.2, 6.3) are transverse. The forward matrix element depends only on the distance between fields, but not on the absolute position. A shift of the common position can be written as a total derivative of the operator, which is a momentum transfer between initial and final states. It is the consequence of the quantummechanical definition of the momentum operator: where O is any operator. It allows to move each term of OPE to a convenient position, and to drop terms with total derivatives. Altogether it significantly simplifies the evaluation. To resolve the total derivative terms one should consider a non-forward kinematics, that defines GTMD distributions, and generalized parton distributions. In the following, we consider each operator in a convenient point. The standard unpolarized PDF comes from the forward matrix element of O γ + , The PDF is non-zero for −1 < x < 1, and where q(x) andq(x) are the quark and anti-quark parton densities in the infinite momentum frame. The definition of twist-3 PDFs is more cumbersome since they depend on two momentum fractions x i , and they have a different interpretation relative to a domain of variables. The notation simplifies considerably if one writes the twist-3 distributions as a functions of three momentum factions x 1,2,3 . Each momentum fraction is the Fourier conjugate of the corresponding coordinate z 1,2,3 . We define where M is the mass of the hadron, and the integral measure is defined as Such an integral measure automatically takes into account the independence of forward matrix element on the total shift, eq. (6.5).
The functions of three variables T (x 1 , x 2 , x 3 ) have several symmetry properties. It is natural to consider them as functions defined on the hyperplane x 1 + x 2 + x 3 = 0, since only this domain contributes to forward matrix element. The domain can be split into six regions, corresponding to different signs of the variables x i , see fig. 6. Each of these regions has a different interpretation in parton language: depending on the sign of x i the corresponding parton is either emitted (x i > 0) either absorbed by a hadron [77], as it is shown schematically in fig. 6.
The functions T and ∆T are not independent and mix under the evolution. In ref. [62] it is shown that there exist a combination of T and ∆T which evolve autonomously, but we do not use them in this work.
The definitions in eq. (6.8, 6.9) are understood in d-dimensions. That is, the vectors µ is some vector that turns intos µ = ǫ µν T s ν when ǫ → 0. The definition of the non-perturbative functions T and ∆T coincides 4 with the one made in [39]. Also it coincides (up to a factor M ) with the definition given in [62]. The articles [23][24][25][26]78] use a less convenient two-variable definition, which is related to the definition with three variables by (here we compare to [78]) Using time-reversal and hermiticity, one can show that the functions T and ∆T are real and obey the property These properties are central in the following calculation. They represent the simple statement that gluon is a neutral particle. In barycentric coordinates the time-reversal transformation turns the picture upside down as shown in fig. 7. Therefore, the function T (∆T ) is (anti)symmetric with respect to the horizontal line x 2 = 0 (given by red line in fig. 6). PDFs defined on these lines are known as Qui-Sterman distribution. They play a special role in TMD physics, since they provide the LO matching, as it shown in the next sections.

Gluon distributions
The gluon operators of twist-2 and twist-3 are (6.14) where f ABC and d ABC are symmetric and anti-symmetric structure constants of the gauge-group.
In the definitions (6.15) we have dropped the Wilson lines for simplicity 5 . The forward matrix element is parametrized by where λ is a hadron helicity, and a µν is an antisymmetric tensor such that lim ǫ→0 a µν = ǫ µν T . (6.18) The latter expression defines a scheme, but luckily in our calculation a µν enters linearly and we do not need its specification. The distributions g(x) and ∆g are conventional unpolarized and polarized gluon distributions. There is no standard parametrization for the twist-3 gluon operator. Here we introduce the parameterization that is convenient for our calculation. It is different (but equivalent) to other parameterizations used e.g. in [27,62,[78][79][80][81]. The main difference is that we use two distributions with different properties, instead of a single one. We have The overall minus sign is set in order to have a simple relation to the distributions defined in [62,78]. The foundation for this parameterization is discussed in appendix A. Despite its cumbersome appearance, this parameterization has some natural properties, that significantly simplify the calculation. Time-reversal and hermiticity imply that which reflects the fact that the gluon is a neutral particle, and thus, "anti-gluon" distribution is equal to the "gluon" one. Due to the permutation properties of the operator, the distributions are highly symmetric. Namely, the distribution G − (G + ) is (anti-)symmetric with respect to permutation of any pair of arguments The distribution Y − (Y + ) is (anti-)symmetric with respect to to permutation of x 1 and x 3 , Additionally, the distributions Y ± obey a cyclic rule The graphical representation of these transformation in barycentric coordinates is shown in fig. 7. The symmetry properties in eq. (6.20-6.23) significantly restrict the functional form of distributions. In particular, the functions G ± are entirely defined by its values in the region 0 < x 1 /2 < −x 2 < x 1 . Whereas the functions Y ± are defined by its values in the region 0 < x 1 /2 < −x 2 < 2x 1 . Graphically these relations are demonstrated in fig. 8.
The functions G and Y mix under evolution. In many aspects they are similar to the functions T and ∆T of the quark case. Nonetheless, the parametrization given here grants many simplification during calculation, because each of the structures in eq. (6.19) belongs to an irreducible representation of the Lorentz group. For that reason these structures enter the dimensionally regularized expression with different ǫ dependence.
The relation of the functions G ± and Y ± to the functions used in [62] is It is important to note that this comparison is made at ǫ = 0, because at ǫ = 0 the comparison is impossible. The inverse relation is Therefore, our basis is equivalent to a decomposition of a general 3-variable function into antisymmetric and cyclic components. The reduction of three-variable notation used here and in [62] to the two-variable notation used in [78,80,81] is the same as for quarks in eq. (6.11). In [27,79] a different notation is used, which again can be related to our functions at ǫ → 0. For a detailed comparison we refer to the discussion in [79].

Small-b expansion for unpolarized and Sivers distributions
Having at hand the parametrization of the matrix elements we can obtain the matching coefficient for TMD distributions to collinear distributions. The standard protocol to achieve this is the following. We derive the TMD distribution using the operators U (compare eq. (3.4) and eq. (4.1)), Next, we substitute the expression for OPE eq. (5.38) into the matrix element and we evaluate the Fourier transform using the parameterization for collinear matrix elements. In this way we obtain obtain the small-b expansion for the TMD distribution Φ [γ + ] . Collecting all terms with appropriate Lorentz structures, eq. (3.5), we obtain the small-b expansion for individual TMD distributions, in our case these are the unpolarized and Sivers distributions. The procedure is rather straightforward and it can be performed for each diagram independently. In sec. 7.1 we give several comments on the evaluation of it, while the final result is presented in sec. 7.2. The results for individual diagrams are presented in appendix C.2.

From operators to distributions and tree level results
The tree level order of OPE is given in eq. (4.4). Applying the transformation in eq. (7.1), and using the definitions in eq. (6.6, 6.8) we obtain 6 To evaluate the second line we use the following trick. We consider the two integrals over τ separately, and change the variables x 1,2,3 → −x 3,2,1 , τ → −τ in the second one. The integrand is invariant under such transformation, due to the property in eq. (6.12) while the limits of integration change to (−z, +∞). As a result the two integrals over τ can be combined into a single integral over τ from −∞ to +∞, Let us stress that the dependence on the intermediate gluon position τ disappears. This property holds for all diagrams, and allows to combine seemingly cumbersome expressions into simple ones. It is the result of time-reversal symmetry. Therefore, to observe such cancellation, one should collect a diagram with its conjugated. I.e. the dependence on the intermediate point cancels in combination of diagrams A and A * , C and C * , E and E * , D and D * . The rest diagrams are self-conjugated. The time-reversal symmetry is also responsible of the different relative sign in the matching of DY and SIDIS operators. Indeed, since the integrands are symmetric under time-reversal, the intermediate point cancels and the only thing that matters is a common global sign. This sign is necessarily different between DY and SIDIS expressions, due to different boundary conditions holding in two cases. In other words, all gluon fields in the DY case are tailed from −∞, and the corresponding integrals are −∞ . Whereas for SIDIS they are tailed to +∞ and corresponding integrals are +∞ = − +∞ . In this way, we observe the well-known relation i.e. the matching (Wilson coefficient) of the Sivers function has a different sign in DY and SIDIS. This observations agrees with the time-reversal property of the Sivers distribution observed a long ago [48]. Coming back to eq. (7.3), the integrals over τ and z decouple and both produce a δ-function. We obtain Using the delta-function in the definition of [dx] in eq. (6.10), the integrals over x's can be evaluated, This expression gives the leading order matching for unpolarized and Sivers TMD distributions in eq. (3.5) where + sign is for DY operator, and − sign is for SIDIS operator. The same procedure with minimal modifications could be done for each term of OPE also at higher orders. In appendix C.2, we present the expressions for each diagram at NLO and the corresponding final result is given in the next section. The T and ∆T distributions defined on the line x 2 = 0 are generally known as Efremov-Teryaev-Qui-Sterman (ETQS) distributions [82,83]. In the next section, we write explicitly the evolution equation for these functions in eq. (7.15). Here, we just remind that the ETQS functions are not autonomous, meaning that their evolution involves the values of these functions in a full domain of x 1,2,3 . However, we have found that the finite part of small-b matching involves only ETQS functions.
The line x 2 = 0 plays a special role in the matching of TMD distributions as shown in red in fig. 6. In the parton picture the distributions defined on this line can be interpreted as "gluonless". Indeed, while the quarks are normally emitted and absorbed by a hadron (as in usual twist-2 distribution), here the gluon is in an "intermediate state" nor emitted, nor absorbed, but smoothly distributed all-over the space. This picture also supports the interpretation of variables x, as the parton momenta measured as the fraction of the hadron momentum. In such a momentum picture, the line x 2 = 0 corresponds to 0-energy gluon.
The symmetry properties of the distributions allow some simplification along the line x 2 = 0. In particular, the ∆T function (which in principle appears when x 2 = 0) does not explicitly contribute to the matching due to eq. (6.13) ∆T (−x, 0, x) = 0, (7.10) but it will appear in the evolution of the ETQS functions, as we show in the next section. Due to the anti-symmetry property the function G ± when one of their arguments in 0, they can be expressed as ETQS distributions The functions Y ± at x i = 0 also can be expressed via ETQS distributions, but with a different rule The application of these rules significantly simplifies the calculation.

Results at NLO
The NLO matching of Sivers TMD distribution at small-b reads where on the right hand side all distributions are defined at the scale µ,ȳ = 1 − y and l ζ = ln µ 2 ζ . (7.14) Eq. (7.13 ) is written for the DY definition of the TMD distribution. In the case of the SIDIS definition the factor π in the first line should be replaced by −π. The symbol P ⊗ T represents the evolution kernel for the function T (x 1 , x 2 , x 3 ) on the x 2 = 0 line. It reads where the plus-distribution is defined as usual (f (y)) + = f (y) − δ(ȳ) 1 0 dy ′ f (y ′ ). (7.16) Note that the gluon part is regular for ξ → 0 since functions G ± and Y ± vanish at x 1,2,3 = 0. In eq. (7.13, 7.15) the integrals over y and ξ together with the δ(x − yξ) reproduce the Mellin convolution. This convolution naturally appears during the calculation and it is defined for the whole range of x, (−1 < x < 1) (and we recall that the anti-quark TMD distributions are given by values of x < 0, see definition in eq. (3.13)). It should be understood literally (7.17)

Discussion and comparison with earlier calculations
The evolution kernel in eq. (7.15) derived by us agrees with the known results in [62,84]. Also, the matching of the twist-2 part coincides with earlier works exactly i.e. as the whole function of ǫ. Altogether this provides a very strong check for the whole procedure and results derived by us.
It is instructive to compare eq. (7.13) to the small-b expansion of the unpolarized TMD distribution, which we have also reevaluated in this work to provide an additional cross-check. Following the notation of this work, it reads [1,2,30,75] where the evolution kernel is One can see that eq. (7.13) and eq. (7.18) have a very similar structure and, more precisely, the finite parts of these expressions have the same y-behavior. It is possible that this fact indicates some hidden correspondence which is to be understood in the future. Let us note that our calculation scheme (namely, the definition of distributions in d-dimensions, as it is discussed in sec. 6) affects only the quark-from-gluon mixing terms. Had we used the distributions definitions in 4-dimension these terms would have to be modified, e.g. in the twist-2 part (7.18) 2yȳ → −1. However, the current definition coincides with the conventional one, which is already seen from the agreement of (7.18) with earlier calculations.
The expressions for coefficient functions in eq. (7.13-7.18) are given for a general scale setting (µ, ζ). For practical applications, it is convenient to use the ζ-prescription [28,43], where a TMD distribution is defined at the line ζ = ζ(µ). This line depends on certain boundary conditions that can be uniquely fixed and which define the so-called optimal TMD distribution, see a detailed discussion in [43]. The line ζ µ is universal for all TMD distributions and on this line the expression for the coefficient function simplifies. Namely, in eq. (7.13, 7.18) one should set in ζ-prescription: It is easy to see that in ζ-prescription the TMD distribution is (naively-)independent on the scale µ.
The matching coefficient for Sivers function can be found in the literature scattered in different works: the quark-to-quark part has been deduced in [26], and the quark-to-gluon part has been evaluated in [27]. In both references the derivation of the matching coefficient has been made indirectly, refactorizing the factorized cross-section for SSA with the help of known matching for unpolarized TMD distribution. In our approach we evaluate the Sivers function directly, which grants us a better control over factors and schemes. Let us compare and comment on these works one-by-one.
In [26] the quark-from-quark part of the matching (the first term in square brackets in eq. (7.13)) is derived. A comparison with this work shows a disagreement in the logarithmic part, but an agreement in the finite part (i.e. compare eq. (7.15) with eq. (12) of [26]). The origin of this difference is clear. The calculation of ref. [26] is based on the fixed-order calculation of SSA made in [22,24]. The latter contains a known mistake which mainly consists of a missing contribution, which roughly corresponds to our diagrams D (see detailed discussion in [62,84]), which in turn, contributes only to the logarithmic part of matching coefficient (7.13)).
In [27] the quark-to-gluon matching has been calculated. The result is presented using the functions N (x 1 , x 2 ) and O(x 1 , x 2 ) which can be related to a combination of the functions G and Y , similar to eq. (6.25, 6.26) (for a comparison of the definitions of these functions see [79]). In particular, Using these relations and comparing with eq. (44) of [27] we find a complete agreement with the logarithmic part (which is expected since it is given by the evolution kernel), but disagreement in the finite part. We claim that this disagreement is the result of a different parametrization of the gluon PDF used in [27]. Indeed, according to eq. (39) of [27], the authors of [27] define PDF in d-dimensions, but they do not decompose the tensors to irreducible representations and therefore ǫ-dependent pre-factors of PDFs are different.
In fact, the method of ref. [27] could be inconsistent beyond LO. Indeed, the parameterization of the twist-3 matrix element used by [27] is based on the 4-dimensional relation (see also [79]) g µν ǫ αβρδ = g µα ǫ νβρδ + g µβ ǫ ανρδ + g µρ ǫ αβνδ + g µδ ǫ αβρν , (7.21) which is used to reduce the number of degrees of freedom. In d dimensions the relation in eq. (7.21) is not valid. Instead one has to use the decomposition to irreducible components (see discussion in appendix A), as it is made in this work. In order to consistently use the parameterization based on eq. (7.21), the limit ǫ → 0 must be taken prior to the application of the parameterization, i.e. the approach one, as it is discussed in the introduction to the sec. 6. Contrary, the authors of [27] have used a 4-dimensional parametrization within the d-dimensional calculation. There is no apparent contradiction at one-loop level, however, it can appear at higher perturbative orders.

Conclusion
We have derived the matching of the Sivers function to collinear distributions at NLO. The final result is given in eq. (7.13) both for quark-to-quark and quark-to-gluon channels. The final result can be compared to the known calculations piece by piece: the logarithmic part agrees with the evolution kernel derived in [62,84], the finite quark-to-quark part agrees with the one derived in [26] and the finite quark-to-gluon part is in disagreement with [27]. In sec. 7.3 we argue that the disagreement between our calculation and the calculation made in [27] is due to the difference in calculation schemes. The peculiarities of our calculation scheme are given in beginnings of sec. 5.3 and sec. 6. We also argue that our calculation scheme is equivalent to the scheme commonly used for twist-2 TMD matching, which we also confirm by comparing the twist-2 part of our calculation, eq. (7.18).
In contrast to all previous evaluations of Sivers function we do not consider any process but derive it directly from the definition of the TMD operator. The evaluation presented here is in many aspects novel, especially for the TMD community. Our calculation is made at the level of operators within the background field method which provides the most complete type of calculation and in the text we have described many details. In particular, for the first time, we explicitly demonstrate the appearance of rapidity divergences at the operator level, sec. 5.4, and explicitly demonstrate its renormalization at all twists of collinear OPE (sec. 5.5). We also demonstrate the appearance of the famous sign flip for Sivers functions defined for DY and SIDIS, eq. (2.9).
The method outlined in this work can be used also for the evaluation of the other leading order distributions which match on collinear twist-3 operators. All intermediate results of the calculation are presented in the appendix. Since the calculation is made at the level of operators, it contains the complete information on small-b OPE. In particular, it can be used to write down the matching of GTMD distributions to GPDs. Also, many diagrams can be used without recalculation for other polarizations. We expect that this line of research will give new results in the near future and before the advent of the Electron Ion Collider (EIC).

A Parametrization of twist-3 operators and decomposition of 3-tensors
The light-cone gluon operators that enter our calculation are T µνρ + (z 1 , z 2 , z 3 ) = igf ABC F A;µ+ (z 1 n)F B;ν+ (z 2 n)F C;ρ+ (z 3 n), (A.1) where f ABC and d ABC are structure constants of the gauge-group. Here we omit the Wilson lines, for simplicity. To find an appropriate parametrization of these operators in dimensional regularization, we proceed as the following. First of all, we decompose the V × V × V -tensor (with V being a 2 − 2ǫ dimensional vector) into irreducible components. There are 7 irreducible components, which can be selected by appropriate projectors. Explicitly the projectors read [85], where S µ1..µn;ν1..νn (A µ1..µn;ν1..νn ) are (anti)symmetric products of n g T 's, with normalization factor 1/n!. These projectors satisfy The dimension of corresponding irreducible sub-spaces are ǫ). So, one can see that 3'd, 5'th and 7'th subspaces vanishes at ǫ → 0. These subspaces represent evanescent components of operator.
In the next step we construct tensors that belong to particular subspaces, These tensors parameterize the forward matrix element, and thus can be built out of single s µ , a µν (a d-dimensional analog of ǫ µν T ) and g µν T . We found Note, that s µ a νµ =s ν . The tensor t µνλ 1 = 0, since it is not possible to build completely traceless tensor with a single entry of a vector.
Finally, we parametrize the matrix element as where the integral measure is defined in eq. (6.10). The evanescent distributions F 3,5,7 can be set to zero, since they do not mix with physical distributions. Therefore, we have three functions F 2,4,6 that survive in the limit ǫ → 0.
The operators T ± have the following property under permutation of arguments which put some constraints on the functions F 2,4,6 . The function F − 2 (F + 2 ) is completely (anti)symmetric, The functions F 4 andF 6 are related to each other. We find it convenient to use F 4 as independent, setting The function F ± 4 has the following symmetry properties For convenience of comparison we introduce additional ǫ-dependent factors and denote and we obtain the parametrization in eq. (6.19). Let us also make an analogy with the parameterization of quark operator. The general quark operator with positive parity has three indices (if we omit evanescent operators with anti-symmetric products of 4, 6, etc. indices). It reads where all indices are transverse. Here, we omit the Wilson lines, for simplicity. Therefore, it is parameterized by the same set of tensors, For the same reasons as for the gluon operator we drop all functions Q 3,5,7 . The remaining functions Q 2,4,6 are not independent, but can be related by time-reversal symmetry. In particular we get Q 2 = Q 4 . Comparing to the parameterizations in eq. (6.8, 6.9) we get Therefore, we can conclude that the function ∆T is the quark analog of F 6 gluon distribution.

B Example of evaluation: diagram E
In this appendix we give a detailed technical description of the evaluation of a diagram. For demonstration purposes we have selected the diagram E (see fig. 4) since it is the most involved diagram, which allows to demonstrate all particularities of the calculation. The remaining diagrams are obtained in a similar manner, albeit the evaluation is typically shorter. The diagram reads where the factors in the round brackets come from the expansion of the action exponent. Using the expressions for propagators in dimension regularization (with d = 4 − 2ǫ) we obtain and we have used that γ + γ + = 0.
The expression in eq. (B.3) should be understood as a generating function that contributes to all orders of small-b expansion. The typical task requires a consideration of terms with a particular counting only. For instance, in this work we need only the terms proportional to b µ . The most straightforward approach to extract particular contributions from such generating function is to Taylor expand all fields around a point (say 0), and evaluating the loop integral that decouples from the fields. In the resulting series, the desired contributions are to be sorted out and resummed back to the non-local form. However, this is a very algebraically heavy way. Here we use an equivalent, but much more efficient, strategy that requires the evaluation of only several terms. It is described in the following.

(B.4)
After these transformations the expression for the diagram is (αβγ) 1−ǫ ρ −ǭ q(u + r u )A µ (x + r x )γ ν ( u − 2 βγρ λ b)γ + ( x + 2 αγρ λ b)γ µ ( x− y − αβρ λ (2 b + z 12 γ + ))γ ν q(y + r y ) [−(β + γ)x 2 − (γ + ρ)y 2 − (α + ρ)u 2 + 2ρ(uy) + 2γ(xy) + 4 αβγρ λ b 2 + i0] 7−4ǫ . The next step is to expand the fields around the points r i . The resulting expression is a series of integrals with a given propagator and monomials built of x µ , y µ and u µ . The open indices of such integral can result only into the metric tensors g µν . The dimension of the loop-integral is carried entirely by b 2 , and can be easily computed. The loop-integral and the numerator is the only source of b. It also enters the argument of the fields, but this source is independent from the loop-computation and can be considered later. Thus we sort all terms in the expressions in powers of b and select the ones that are linearly proportional to b.
Note, that the terms with the same dimension do not necessary have the same b-counting. As so, all terms without z 12 has counting n + 1 (where n is the number of fields derivatives). Therefore, only terms without field derivative contribute in this case. The terms that contain factor z 12 has counting n + 0, and require the expansion of the fields up to one derivative. Let us note, that the expansion of fields in b rises the counting even more, and so it does not contribute at considered order. For that reason we can neglect b in the argument of fields (Such contributions can appear only in the diagrams that also contribute to twist-2, i.e. A,B, and L).

C Diagram-by-diagram expressions
In this appendix we collect the expressions for diagrams presented in figs. 3, 4 and 5.

C.1 Expressions for OPE
In this appendix we provide the full set of expressions obtained from the evaluation of diagrams in background field. The expressions are given in light-cone gauge for the Drell-Yan operator eq. (4.1) (i.e. with retarded eq. (4.14) boundary conditions). The analogous expressions for the SIDIS operator, eq. (4.3), are obtained by replacing −∞ with +∞ in the integration limits, as it is discussed in sec. 5.6. We stress that the calculation has been done for an operator with unrelated light cone positions of fields z 1 and z 2 . Therefore, the OPE presented here is also suitable for evaluating the matching of the GTMD distributions.

C.2 Expressions for TMD distributions
In this section, we present the results for the matrix element in eq. (7.1) of the OPE contributions, We collect all diagrams with their corresponding time-reversal and we have Combining these expressions with the renormalization constants and taking the limit ǫ → 0, as it is discussed in eq. (5.39) we find eq. (7.13, 7.15, 7.18, 7.19).