Resummation of Transverse Momentum Distributions in Distribution Space

Differential spectra in observables that resolve additional soft or collinear QCD emissions exhibit Sudakov double logarithms in the form of logarithmic plus distributions. Important examples are the total transverse momentum $q_T$ in color-singlet production, N-jettiness (with thrust or beam thrust as special cases), but also jet mass and more complicated jet substructure observables. The all-order logarithmic structure of such distributions is often fully encoded in differential equations, so-called (renormalization group) evolution equations. We introduce a well-defined technique of distributional scale setting, which allows one to treat logarithmic plus distributions like ordinary logarithms when solving these differential equations. In particular, this allows one (through canonical scale choices) to minimize logarithmic contributions in the boundary terms of the solution, and to obtain the full distributional logarithmic structure from the solution's evolution kernel directly in distribution space. We apply this technique to the $q_T$ distribution, where the two-dimensional nature of convolutions leads to additional difficulties (compared to one-dimensional cases like thrust), and for which the resummation in distribution (or momentum) space has been a long-standing open question. For the first time, we show how to perform the RG evolution fully in momentum space, thereby directly resumming the logarithms $[\ln^n(q_T^2/Q^2)/q_T^2]_+$ appearing in the physical $q_T$ distribution. The resummation accuracy is then solely determined by the perturbative expansion of the associated anomalous dimensions.


Introduction
An important class of differential observables at colliders are those that resolve additional soft or collinear QCD emissions on top of the underlying hard Born process. They typically exhibit Sudakov double logarithms of the form α n s ln m (Q/k) with m ≤ 2n, where Q is the relevant hard scale of the problem and k the differential observable. The bulk of the cross section is usually contained in the regime k Q, where the double logarithms can become large and eventually spoil the convergence of the fixed-order perturbative expansion. The resummation of the Sudakov logarithms to all orders in α s becomes necessary to obtain a stable and reliable prediction in this regime.
Resummation can be carried out either using Monte Carlo techniques such as parton showers, or analytically based on factorization theorems which can be derived diagrammatically or using effective field theories. All ingredients of the factorized cross section obey (renormalization group) evolution equations of the form where F represents an ingredient of the factorized cross section depending on the (momentum space) variable k and the (unphysical) scale µ, and γ F is its anomalous dimension. eq. (1.1) encodes the logarithmic structure of F to all orders. Solving it allows to exponentiate all large logarithms ln(k/µ), provided that µ 0 is chosen of the order of k, µ 0 ∼ k. In this case, F (k, µ 0 ) is free of large logarithms and can be reliably calculated in fixed-order perturbation theory, whereas all logarithms ln(µ/k) are explicitly exponentiated. While this is straightforward for ordinary functions F such as hard functions describing virtual corrections, it becomes more involved when F contains (plus) distributions. These naturally arise for many observables in order to properly cancel infrared divergences between the different functions. Well-known examples are the total transverse momentum q T in color-singlet production, N -jettiness (with thrust or beam thrust as special cases), but also jet mass and more complicated jet substructure observables. The simplest example to illustrate the complications in such cases is a pure δ-distribution, which typically arises as the LO boundary term. In this case, eq. (1.2) reads such that it is obviously not possible to choose µ 0 = k to fully resum all logarithms ln(k/µ). Hence one needs a technique that correctly turns this expression into a sum of plus distributions ln n (k/µ)/k µ + that are known to arise at higher orders. In this paper, we derive a well-defined technique to solve RGEs in distribution space by introducing a distributional scale setting µ 0 = k| + . It allows to treat evolution equations such as eq. (1.1) like ordinary differential equations, or equivalently treats distributional logarithms like ordinary logarithms. Hence this technique allows to minimize logarithmic terms in the boundary condition F (k, µ 0 ) through distributional canonical scale setting, thereby fully resumming all logarithms directly in distribution space. We also show that this is fundamentally different from performing the resummation in conjugate space, which also turns distributions into ordinary functions, but induces subleading terms to all orders.
We then apply this technique to the resummation of the transverse momentum ( q T ) distribution in color-singlet production. The transverse-momentum spectrum is a key observable for many processes at hadron colliders. In Higgs production, it is one of the primary variables describing the production kinematics [1][2][3][4][5][6][7][8]. In Drell-Yan production it is an important benchmark observable, which has been measured to very high precision [9][10][11][12][13].
It has been a long-standing open question whether a direct resummation of the q T distribution in momentum (or distribution) space is possible at all. Instead, the resummation is usually carried out in the conjugate Fourier ( b T ) space, which however fundamentally resums logarithms ln(b T Q) rather than the logarithms ln(Q/q T ) appearing in the physical q T spectrum (with Q being the hard scale of the process). Previous attempts so far have either encountered spurious divergences in the resummed q T spectrum, see e.g. refs. [14,15], or the evolution is still partially performed in Fourier space, see e.g. refs. [16][17][18][19][20], while refs. [21][22][23] tried to obtain a closed form of the Fourier-resummed spectrum in momentum space. Recently, ref. [24] obtained the resummation in momentum space based on the coherent branching formalism [25,26] rather than solving the evolution equations associated with a factorization theorem. They showed that the spurious divergences are avoided by expanding around the transverse momentum of the hardest emission instead of q T . We briefly compare our findings to those of ref. [24] in sec. 7.
In this paper, we derive the solution to perform the RG evolution entirely in distribution (momentum) space, thus allowing for the explicit resummation of all logarithmic contributions [ln n (q 2 T /Q 2 )/q 2 T ] + appearing in the physical q T distribution. We show that it intrinsically requires distributional scale setting due to the two-dimensional convolutions appearing for q T , which is not the case for one-dimensional observables such as thrust. In particular, we find that this is the origin of the divergences observed in previous attempts of momentum-space resummation.
An advantage of performing the resummation via the solution of the q T evolution equations is that the solution automatically applies to all orders in resummed perturbation theory, and the resummation accuracy is solely defined through the perturbative accuracy of the anomalous dimensions (as well as matching conditions). In particular, this allows one to completely avoid any discussions of how to consistently count explicit logarithms in the cross section (which has been part of the difficulties in previous attempts). We indeed find that apparent subleading terms in the spectrum become important to obtain a well-defined prediction. The resummation through RG evolution also provides a natural and convenient way to smoothly turn off the resummation and match to the fixed-order region at large q T , as well as to estimate perturbative uncertainties through the variation of all the appearing resummation scales. Importantly, by performing the evolution in momentum space, these steps can be carried out directly at the level of the physical q T spectrum. In this paper, our primary purpose will be to derive the all-order solution for the momentum-space evolution, while we leave its numerical implementation to future work.
The remainder of the paper is organized as follows. In sec. 2, we introduce the technique of distributional scale setting and discuss how it is used to solve distributional differential (RG) equations in distribution space. We also discuss why this is fundamentally different from performing the resummation in conjugate space. For simplicity, most of this general discussion is for one-dimensional observables, and then generalized to two-dimensional cases at the end. The remaining sections are then devoted to the q T resummation. In sec. 3, we briefly review the relevant factorization theorem and evolution equations, and discuss the difficulties associated with the two-dimensional nature of q T . In sec. 4, we discuss the resummation of the rapidity anomalous dimension via the solution of its RGE. We also briefly discuss how its intrinsic nonperturbative contributions arise and can be handled in momentum space. In sec. 5, we derive the solution for the momentum-space evolution for the soft and beam functions. We then discuss how all pieces get assembled into the final resummed q T spectrum in sec. 6 and provide some comparisons to existing approaches and implementations in the literature in sec. 7. We conclude in sec. 8. In the appendices we collect many relevant and useful definitions and relations for plus distributions.

Scale setting in distribution space
A key feature of observables k that resolve the IR structure of soft and collinear emissions, including the transverse momentum q T , is that the logarithmic structure of the differential spectrum in k is given in terms of plus and delta distributions, which encode the cancellation of the associated IR divergences between real and virtual contributions.
The correct scale setting to minimize logarithmic distributions is a crucial ingredient in the resummation of the differential distributions directly in distribution space. In particular, as we will see, for the q T spectrum this becomes a necessity to obtain well-defined predictions.
In this section, we discuss on general grounds several aspects of solving distributional differential equations. In particular, we introduce a distributional scale setting to correctly minimize distributional logarithms in the RGE boundary conditions. We first discuss the simpler case of one-dimensional distributions, before generalizing it to the two-dimensional case relevant for q T in sec. 2.7.
(2.1) (For the purposes of this toy example we neglect the running of α s for simplicity.) Solving eq. (2.1) yields the formal solution F (k, µ) = F (k, µ 0 ) U (µ 0 , µ) , U (µ 0 , µ) = exp α s ln µ 0 µ , (2.2) where U (µ 0 , µ) is the evolution kernel with U (µ 0 , µ 0 ) = 1 and F (k, µ 0 ) is the boundary condition. The evolution kernel resums logarithms ln(µ 0 /µ) and shifts the logarithms ln(k/µ 0 ) of F (k, µ 0 ) into the logarithms ln(k/µ) of F (k, µ). Hence, if F (k, µ 0 ) is known "exactly" at the starting scale µ 0 , eq. (2.2) gives F at the different scale µ with all its logarithms ln(k/µ) resummed. More precisely, eq. (2.2) determines F (k, µ) to the logarithmic accuracy to which both the boundary condition and the evolution kernel are known. In some cases, the boundary condition is known (or assumed to be known) exactly. For example, the nonperturbative parton distribution functions (PDFs) are extracted from the experimental data at some reference scale µ 0 and are then evolved to an arbitrary scale µ using the DGLAP evolution equations.
In contrast, if the boundary condition F (k, µ 0 ) is calculated perturbatively, it must be calculated including all logarithms (to the desired logarithmic accuracy). Typically, this is achieved by choosing a particular starting scale µ 0 for which F (k, µ 0 ) is free of logarithms.
In the simplest case, k is a scalar quantity such that F is a regular function and the RGE eq. (2.1) is multiplicative, as is the case for example for a hard matching coefficient in SCET. The logarithmic structure of F is then where the ellipses denote higher-order terms ∼ α n s ln n (k/µ 0 ) as well as possible constant (nonlogarithmic) terms. In this case, all logarithms vanish at µ 0 = k, and F (k, µ 0 = k) can be calculated in fixed-order perturbation theory, F (k, µ 0 = k) = 1 + · · · , (2.4) where the ellipses now only contain possible higher-order constant terms. Equation (2.2) then gives F (k, µ) at an arbitrary scale µ by F (k, µ) = (1 + · · · ) U (µ 0 = k, µ) , (2.5) where the logarithms ln(k/µ) in F (k, µ) are now directly and fully resummed by the evolution kernel U (k, µ). Hence, the crucial ingredient that allows the RGE to predict the logarithms ln(k/µ), as opposed to merely shifting them from one scale to another, is the choice of µ 0 that eliminates all logarithms in the boundary condition F (k, µ 0 ). In the more complicated cases we are interested in, F (k, µ) is a differential distribution involving plus and delta distributions in the observable k. Let us first consider a toy example where after transforming to an appropriate conjugate space, e.g. Fourier or Mellin space, the RG eq. (2.1) has an analogous form µ dF (y, µ) dµ = −α sF (y, µ) ,F (y, µ) =F (y, µ 0 )Ũ (µ 0 , µ) . (2.6) Here, y is the conjugate variable to k andF (y, µ) is the corresponding transformed function in conjugate space. We stress that as long as µ 0 is kept symbolic, the formal RGE solution is equivalent in any space, i.e. it does not actually matter in which space the evolution kernel is determined, it might just be easier to find a concrete solution in a particular space. The differences arise entirely in how the boundary condition is chosen. The above discussion for the multiplicative example now applies toF (y, µ), which has the simple logarithmic structureF (y, µ 0 ) = 1 − α s ln(yµ 0 ) + 1 2 α 2 s ln 2 (y µ 0 ) + · · · . (2.7) Choosing µ 0 = 1/y, the conjugate boundary conditionF (y, µ 0 = 1/y) is free of logarithms and can be calculated in fixed-order perturbation theory, while all logarithms ln(yµ) in the conjugate functionF (y, µ) are correctly predicted byŨ (µ 0 = 1/y, µ). This procedure of solving the RGE with setting the starting scale (or more generally determining the boundary condition) intrinsically in conjugate space fundamentally resums the conjugate logarithms and therefore we will refer to it as evolution or resummation in conjugate space.
For the reasons discussed before, we are interested in directly resuming the (distributional) momentum-space logarithms of k, which we will refer to as evolution/resummation in momentum (or distribution) space. The first attempt would be to use the canonical scale choice from momentum space, µ 0 = k, in the formal solution in conjugate space, eq. (2.6). Doing so, the required boundary condition is However, using fixed-order perturbation theory amounts to truncating this series at some low order in α s and neglecting all logarithmic terms at higher orders in α s . This would only be sufficient if it were guaranteed that the neglected logarithms in the boundary condition do not affect the desired logarithmic accuracy in the final momentum-space distribution, which is far from obvious. Although naively, one might think that the y-integration in the inverse transform should be dominated by y ∼ 1/k, where the neglected logarithms are small, the integration also includes the regions y → 0 and y → ∞, where the neglected logarithms get arbitrarily large. As we will see, this is the reason why this naive attempt in fact fails in case of q T (leading to the aforementioned spurious divergences). Instead, performing the evolution in momentum space requires to set the scales and determine the appropriate fixed-order boundary condition in momentum space. Before discussing the analogous exponential toy example in distribution space, we can illustrate the arising difficulties with the simple distribution and L 0 (x) = [θ(x)/x] + is the standard plus distribution (see appendix B for more details). Since L 0 (k, µ) encodes a logarithmic divergence, it counts as a single logarithm in logarithmic accuracy counting. The distribution D 0 (k, µ) fulfills the differential equation which has the general solution Using the full result for D 0 (k, µ 0 ) from eq. (2.9), together with the distributional identity [see eq. (B.5)] eq. (2.12) gives (2.14) Thus, the µ 0 -dependent terms cancel and we reproduce the correct result for D 0 (k, µ). This shows that the formal solution in eq. (2.12) is sufficient to shift the µ-dependence from µ 0 to µ. However, obtaining the correct result for D 0 (k, µ) from it like this crucially relies on the fact that we used the full result for the boundary condition D 0 (k, µ 0 ). In reality, calculating the boundary condition in fixed-order perturbation theory would amount to truncating D 0 (k, µ 0 ) = δ(k) and neglecting the higher-order α s L 0 (k, µ 0 ) term. 1 Naively setting µ 0 = k to eliminate this term is clearly ill defined. Since µ 0 appears in the boundary condition of the distribution, this would lead to an ill-defined expression δ(k) ln k [as can been seen e.g. from eq. (2.13)]. Instead, we need a distributional way of choosing "µ 0 = k" that behaves analogously to the multiplicative case above. Namely, it should set the logarithmic distribution L 0 (k, µ 0 ) in the boundary condition to zero so fixed-order perturbation theory can be used for the boundary condition without having to neglect any higher-order logarithmic terms, and such that the RGE directly predicts the correct L 0 (k, µ). One of the key results of this paper is a generic method to do so, which works for arbitrary distributions and will be discussed next.

Distributional scale setting
A general plus distributions of a function g(k) is defined through the conditions (see appendix B for details) Here and in the following we always keep the lower integration limit implicit in the support of the distribution. An important example are the logarithmic distributions For a general distribution D(k, µ) we define the distributional scale setting µ = k| + as Here, the derivative acts on everything inside the square brackets, and µ = k is set normally in the integrand, which is well defined since D is evaluated at the integration variable k . The idea behind eq. (2.18) is that it turns logarithmic distributions into ordinary logarithms, which can then be eliminated by a normal scale choice. For example, Similarly for the higher logarithmic distributions, Note that although L n (k, µ) = [ln n (k/µ)/k] µ + contains a pure logarithm ln(k/µ), the appearance of µ in the boundary formally prohibits to simply set "µ = k" to extract the boundary term. 2 It is also easy to see that this generalizes to any distribution [θ(k)g(k)] µ + , The cumulant integral exactly vanishes by definition of the distribution. On the other hand, any µ-independent constant terms (i.e. pure boundary terms) are unaffected since the operator d dk precisely inverts the cumulant k dk . In particular, Here it is crucial to keep track of any θ(k) appearing in the cumulant to properly recover the distributional structure. For future convenience we use the notation to denote the pure µ-independent constant term, which will typically be D[k] ∼ δ(k), but in general could also contain regular (integrable) functions of k.
Having a well-defined method for distributional scale setting allows us to easily solve distributional differential equations, as it allows us to essentialy treat distributions like ordinary logarithms. To see that eq. (2.18) has the desired properties in more nontrivial cases, consider the simple examples These relations are quite intuitive in that the distributional scale setting essentially moves any naively appearing ln k terms inside a suitably regulated logarithmic distribution. The appearing prefactors count the order of the distributional logarithm in each equation.

Integrating distributional differential equations
First, consider again the simple example in eq. (2.9). Setting µ 0 = k| + now has the desired effect of being able to predict the complete logarithmic distribution from the general solution eq. (2.12) by a scale choice, Similarly, we can reproduce a L n (k, µ) for n ≥ 1 from its µ dependence. Consider a distribution D n (k, µ), which satisfies the differential equation Integrating this from µ 0 to µ, we get the general solution Here we used eq. (B.5) to shift the boundary condition in the plus distribution from µ to µ 0 , such that the integral can be pulled inside the plus distribution and performed. (This step explicitly requires that µ 0 does not depend on k, as otherwise the boundary condition of the plus distribution would be changed.) Using eq. (2.24) to set µ 0 = k| + , we thus find In practice, we can also specialize eq. (2.18) to define an integral over an arbitrary distribution G with starting scale µ 0 = k| + as This allows us to write a generic integral solution as Although this looks like an ordinary integral solution, the important difference lies in the lower integration limit k| + which enforces the distributional scale setting.

Toy example in distribution space
We can now discuss the exponential toy example in distribution space. Consider a distribution with the logarithmic structure corresponding to an exponential in distribution space. It satisfies the differential equation whose general solution is easily seen to be Using the definition in eq. (2.18) to set µ 0 = k| + we obtain The k -integral in the first step only acts on F (k ) and eliminates all distributions leaving only constant boundary terms 1 + · · · . In the second step we used eq. (B.4) to take the derivative involving the θ(k), where ξ is arbitrary and cancels between the two terms. In the last step we chose ξ = µ. Expanding the exponential in α s , we can easily see that this reproduces the full distributional logarithmic structure of eq. (2.32) With the general properties in eqs. (2.24) and (2.25), we can also simply use the fact that setting µ 0 = k| + eliminates any contributions from distributional logarithms in the boundary condition F (k, µ 0 ). We can therefore directly plug in F (k, µ 0 = k| + ) = F [k] = (1 + · · · )δ(k) for the boundary condition to get (2.37) Using eq. (2.25) we can see that this is identical to what we got in eq. (2.35), (2.38) Alternative derivation Another way to derive the same solution without distributional scale setting is to start from the general Ansatz Plugging this back into eq. (2.33) and using eq. (B.7) to take the µ derivative, (2.40) yields a coupled system of differential equations for f 0 and f 1 , The advantage is that these are now two ordinary RGEs, which can be solved straightforwardly without having to be careful about producing distributions from taking derivatives of θ(k). Solving for f 1 in terms of f 0 , Plugging back into the Ansatz, we obtain the result which for f 0 = 1 + · · · confirms the earlier result eq. (2.35). This form suggests to interpret f 0 (µ) as the boundary term, as it completely predicts the (logarithmic) structure of F . Indeed, setting now µ = k| + yields F [k] = f 0 (ξ) δ(k) + [θ(k)f 0 (k)] ξ + , confirming that the boundary term can be obtained as the coefficient f 0 (µ) of δ(k). Note that the essential element in this derivation is the same as above, namely that with the distributional canonical scale setting F (k, µ = k| + ) = F [k] reduces to a pure boundary term that is free of logarithmic distributions. In actual applications, it would be calculated in a fixed-order expansion in α s (µ).
This simple toy example illustrates that with the distributional scale setting, using the canonical scale choice µ 0 = k| + as in eq. (2.37) becomes exactly analogous to using the ordinary canonical scale choice in the multiplicative case in eq. (2.5). In particular, the pure boundary condition F [k] can now be calculated in fixed-order perturbation theory, while the RGE solution predicts the complete distributional logarithmic structure. It should be evident that this is true generically also for more complicated cases. We will encounter two more complicated examples when discussing the RGE for the rapidity anomalous dimension in sec. 4 and for the soft function in sec. 5.

Comparison to evolution in conjugate space
It is interesting to compare the result of the momentum-space evolution to the evolution in appropriate conjugate spaces, i.e. solving the RGEs in conjugate space with scale setting therein. In general, using equivalent boundary conditions in different spaces can lead to different predictions in physical space. To illustrate this, we consider again the simple example distributions D n , satisfying 44) and assume that only the LO boundary term is known from a fixed-order calculation, In the following we compare this momentum-space result to the resummation in cumulant and Fourier space.

Cumulant space
Taking the cumulant, eqs. (2.44) and (2.45) become µ dD n (k cut , µ) dµ = −α s θ(k cut ) ln n k cut µ , (2.47) where the cumulant is defined as The solution to eq. (2.47) is easily obtained as The important point is that all distributions in momentum space correspond to logarithms ln(k cut /µ) in cumulant space, which can be fully resummed by choosing µ 0 = k cut . With this choice, all logarithms inD n (k cut , µ) are eliminated, and from eq. (2.48) it follows that the LO boundary condition isD n (k cut , k cut ) = θ(k cut ). The solution is thus given bȳ D n (k cut , µ) = θ(k cut ) + α s n + 1 θ(k cut ) ln n+1 k cut µ .
(2.51) -12 -Transforming back to momentum space by taking the derivative with respect to k cut yields which exactly reproduces the momentum-space solution in eq. (2.46). Hence, resummation in cumulant and momentum space to predict the logarithmic structure is equivalent for this example. This is of course not very surprising, considering the intimate relation between distribution and cumulant space.

Fourier space
Taking the Fourier transform of eqs. (2.44) and (2.45) using eq. (B.13), we find where the Fourier transform is defined as The general solution is given bỹ Since all logarithms depend on iyµe γ E , they are fully resummed by choosing µ 0 = −ie −γ E /y, 3 which eliminates all logarithms in the boundary term. This allows us to use the known fixed-order boundary term eq. (2.54),D n (y, µ 0 ) = 1, and gives Transforming back to momentum space thus yields While the plus distribution matches the one in eq. (2.46), which is of course necessary to provide the correct µ-dependence, the solution contains an additional term −δ(k)R This shows explicitly that the Fourier-space resummation, which fundamentally resums logarithms of the conjugate variable y, is not equivalent to the momentum-space resummation, as it induces an additional boundary term. This discrepancy is due to the fact that pure plus distributions L n (k, µ) do not correspond to pure powers of ln(iyµe γ E ) and vice versa.
In principle, the additional α s δ(k) term in eq. (2.59) would be compensated for if we were to include the boundary conditionD n (y, µ 0 ) in Fourier space to O(α s ). However, for a general distribution, such spurious terms are generated to all orders in α s . (We will see this explicitly in sec. 4 for the example of the rapidity anomalous dimension.) Thus in practice for the purposes of resummation, where the boundary condition is only calculated to some fixed order, the resummation in Fourier space intrinsically induces additional subleading terms to all orders in α s , which is one the main motivations for performing the resummation directly in momentum space.

Implementation of scale variations and profile scales
For phenomenological applications it is necessary to also use noncanonical scale choices. First, varying the scales away from their strict canonical values is a standard and convenient way to probe the size of higher-order logarithms and thereby estimate perturbative uncertainties. Furthermore, it is important to be able to dynamically turn off the resummation in the fixed-order region of the distribution. A standard way to achieve this is to employ profile scales µ 0 (k) [27,28], which smoothly interpolate as a function of k between the strict canonical scale choice in the resummation region and a common fixed scale µ FO in the fixed-order region. Profile scales are also used to implement dynamical scale variations to distinguish different sources of perturbative uncertainties, for example resummation and fixed-order uncertainties [29,31,32].
We discuss the implementation of scale variations and more generally profile scales within our framework using the exponential toy example in eq. (2.32) with the associated RGE solution in eq. (2.34): Here we have included the superscript "FO" to stress that the boundary condition is obtained from a fixed-order calculation, whereas U is the evolution kernel. In a typical application, one wants to perform the resummation using the canonical scale choice µ 0 = k| + for k Q, where the logarithms of k are large and should be resummed. On the other hand, for k ∼ Q, one wants to turn off the resummation and recover the exact fixed-order result by taking µ 0 = µ FO = µ. Both requirements are fulfilled by choosing µ 0 to be a profile function µ 0 (k) behaving as The profile furthermore smoothly interpolates between the two regimes to capture the turning-off of the resummation. Such a profile can be conveniently implemented by generalizing the distributional scale setting in eq. (2.18), µ 0 = k| + , to a generic function µ 0 = µ 0 (k)| + , In the resummation regime, this reproduces exactly the canonical distributional scale setting µ 0 = k| + . In the fixed-order regime, U = 1 and µ 0 is independent of k, such that the derivative simply inverts the integral to reproduce F (k, µ) = F FO (k, µ). Furthermore, the profile should allow scale variations to probe subleading logarithms. To see in more detail how this works, consider the NLO boundary term The solution with arbitrary profile µ 0 (k) is then The canonical scale choice to predict all logarithmic terms to this order is µ 0 (k) = k, such that we get which is exactly the expected structure. More general, we can vary the scale µ 0 around the canonical value k, µ 0 (k) = a · k, to probe subleading logarithms in the resummation regime. With this choice, we obtain F (k, µ) = 1 + α s (f 1 − ln a) δ(k) + α s L 0 + α 2 s L 1 + · · · e αs ln a . (2.67) This clearly probes subleading terms through the induced exponential. Note that the − ln a in the first term cancels the O(α s )-term of the exponential to correctly reproduce the fixed-order content. Lastly, we can test the fixed-order structure by varying µ 0 in the fixed-order regime. To fully turn off the evolution kernel in eq. (2.63), we choose µ 0 (k) = µ = a · µ FO , yielding The effect of such a scale variation is to induce the term α s ln(a)δ(k), which precisely probes the O(α s ) boundary term. For a = 1, this obviously restores the FO content.
For a more general scale variation, one can also vary µ 0 (k) and µ independently. We consider the example µ 0 (k) = a · µ FO , µ = µ FO , where we only vary µ 0 . This gives (2.69) The additional exponential probes higher order logarithms and is a remnant of the mismatch between the resummation scale µ 0 and the common scale µ. Distributional scale setting thus allows for a straightforward implementation of profile scales, which can be used to smoothly transition from resummation to fixed-order regime and allow one to probe the typical size of subleading terms and thereby estimate perturbative uncertainties in both regimes. Of course, in actual applications one has to choose suitable profile functions and variations such that they produces reasonable uncertainty estimates.

Distributional scale setting in 2D
In this section we collect the corresponding formulas for distributional scale setting and solving distributional differential equations for the two-dimensional case. The derivations are almost identical to the one-dimensional case, and are not repeated here.
We consider the example of transverse momentum dependent functions, which will be used throughout the rest of this paper. Such functions inherently contain divergences as 1/p 2 T in the limit of small transverse momenta p T , which are regulated through plus distributions. These are defined through the conditions which are the two-dimensional analog of the L n (k, µ) in eq. (2.17). As in the one-dimensional case, the boundary condition of the L n ( p T , µ) encodes a logarithmic dependence ln n (p T /µ), which can be seen by shifting their boundary condition [see eq. (C.4)], For a general two-dimensional distribution D( p T , µ) we define the distributional scale setting µ = p T | + as The derivative acts on everything inside square brackets, and µ = p T can be set normally in the integrand since the integral runs over the auxiliary vector k T . Like in the onedimensional case, eq. (2.74) sets purely distributional terms to zero, since the cumulant integral exactly vanishes by definition of the plus distribution, whereas any µ-independent constant (boundary) terms are not affected at all. For convenience, we denote the pure µ-independent boundary terms as Here, D[ p T ] can only depend on the quantity p T . The simplest case is D[ p T ] ∼ δ( p T ), but in general it can also contain regular (integrable) functions of p T . We will see examples of more general boundary terms in secs. 4 and 5.
The distributional scale setting can be readily applied to solve differential equations with two-dimensional distributions. The solution to the differential equation is given by where the integral over the distribution G( p T , µ) starting at the canonical scale µ = p T | + is given by A list of useful integrals is given in appendix C.4. We conclude this section by comparing the resummation in distribution space with resummation in Fourier space, analogous to sec. 2.5. Specifying to the example and ignoring α s running, the distributional solution for the simplest boundary term D n [ p T ] = δ( p T ) is given by The solution obtained in cumulant space is again equivalent. In contrast, solving the RGE with scale setting in Fourier space with the boundary where the constant is given by eq. (C.17), As for the one-dimensional case, we find that performing the resummation in Fourier space adds an additional boundary term compared to the momentum-space resummation. In this specific example, it could be compensated for by including the boundary conditioñ D n ( b T , µ) in Fourier space to O(α s ). However, for a general distribution, such spurious term are generated to all orders in α s . Thus in practice, the resummation in Fourier space induces additional subleading term to all orders in α s as well.

Overview and complications in q T resummation
Many of the difficulties in the resummation of the q T spectrum in momentum space are due to the intrinsic two-dimensional nature of q T and the involved convolutions, and are absent for one-dimensional variables like transverse energy or thrust. In this section, we explore in detail the underlying reason for this. After briefly reviewing q T factorization and the relevant associated RG equations in sec. 3.1, we argue in sec. 3.2 that the appearing two-dimensional convolutions requires very careful scale setting, which turns out to be the crucial complication of q T resummation in momentum space. In particular, we will see that setting the boundary scales to the overall q T does not correctly resum all logarithms, as one might naively expect. This will be illustrated in secs. 3.3 and 3.4 by reproducing a spurious divergence, which is well-known in the literature and as we will show originates from this incorrect scale setting, or more generally from an incorrect treatment of the boundary condition. In the following secs. 4 and 5 we then show how the q T resummation via RG evolution in momentum space can be carried out using the distributional scale setting of sec. 2.
We consider the measurement of the transverse momentum q T of a color-singlet system X with invariant mass Q and total rapidity Y . In the limit of very small transverse momentum, q T Q, the cross section is dominated by soft and collinear gluon emissions from the incoming partons that recoil against the hard system X. The emissions cause large logarithms ln m (q 2 T /Q 2 ), with a power m ≤ 2n − 1 at order α n s , which we like to resum to all orders to retain predictive power in the perturbative series at small q T . In the limit q T Q, the cross section can be factorized as [15] dσ where and for simplicitly we have kept the sum over partonic channels (and helicities in case of incoming gluons) implicit. Equation (3.1) is valid up to power corrections in q T /Q. We are only interested in the leading-power cross section, which contains all singular logarithms, and drop the power correction.
Here, σ 0 denotes the Born cross section and H(Q, µ) is the hard function containing virtual corrections to the partonic process. Following refs. [60,61], the bare beam functions B i are defined in SCET in terms of forward proton matrix elements of collinear quark and gluon operators, They encode the effects of collinear initial-state radiation and are equivalent to transversemomentum dependent parton distributions. They depend on the flavor i and light-cone momentum ω = xp − n of the parton that enters the hard interaction, where p n is the proton momentum. They also depend on the total transverse momentum p T of collinear initialstate radiation that was emitted prior to the hard interaction. The soft function S measures the total transverse momentum originating from soft emissions and is defined as the vacuum matrix element The B µ n,⊥ and χ n are collinear gluon and quark fields in SCET, P is the momentum label operator, and the S n,⊥ are soft Wilson lines along the light-cone direction n. For more details see refs. [15,62].
The bare hard, beam, and soft functions are divergent quantities and require renormalization. The UV divergences are regulated as usual by dimensional regularization. The resulting renormalized functions, which appear in eq. (3.1), satisfy the renormalization group equations where the anomalous dimensions have the all-order structure The µ-independence of the cross section implies the RG consistency relation The beam and soft functions furthermore depend on a rapidity scale ν, associated with an additional regulator required to regulate rapidity divergences. These arise because both soft and beam modes describe modes of virtuality µ 2 ∼ q 2 T , leading to an "overlap" of soft and collinear momentum regions which are not resolved by dimensional regularization There have been a variety of rapidity regulators suggested in the literature to deal with these divergences. In the original CSS approach a non-lightlike axial gauge was employed [38], whereas in recent work Wilson lines off the light cone are used [41,63]. In the context of SCET, the utilized regulators include the analytic regulator acting on eikonal propagators [64][65][66], the η-regulator inserted into Wilson lines [15,33], the δ-regulator adding a mass to eikonal propagators [56,67], and the exponential regulator acting on the phase space [68]. For all of these approaches, the beam and soft functions are currently known to NNLO [62,[69][70][71][72][73]. These fixed-order ingredients are not necessary for the purpose of this paper, which only relies on the RGE structure of beam and soft functions.
For concreteness we employ the η-regulator together with the rapidity renormalization group [15,33]. Our discussion can be applied similarly to other regulators. The crucial point is that the additional rapidity regulator induces an additional rapidity scale, here denoted as ν. In particular, ν is analogous to the scale ζ in the original CSS formulation ref. [38], see also ref. [59]. (In some formalisms or applications the rapidity scale is kept implicit and/or fixed to canonical values.) The rapidity renormalization group equations are given by 14) The overall ν-independence of the cross section implies the consistency relation which means that there is only one independent rapidity anomalous dimension, which we denote as γ ν . The crucial difference to the µ-RGE is that the ν-RGE is intrinsically a convolution. The commutativity of d/dν and d/dµ relates the µ and ν anomalous dimensions through We stress that the singular logarithmic structure of the q T spectrum is fully encoded in the RGE equations eqs. (3.6)- (3.16). While their precise definitions and derivation depend to some extent on the employed effective-field theory framework, equivalent evolution equations with the same momentum structure exist in all approaches. In particular, the original CSS formulation for q T resummation is based on analogous evolution equations [38][39][40]. Also, the corresponding system of differential equations in Fourier space, which is often considered, is completely equivalent. The SCET framework with rapidity renormalization we use is convenient in that it makes the structure maximally general and explicit.
The nontrivial task at hand is to solve the RG equations with appropriate momentumspace boundary conditions in order to perform the resummation in momentum space. In other words, we want to predict the all-order distributional logarithmic structure in q T from the RG evolution. As we will see, carrying out the RG evolution in momentum space is quite complicated due to the distributional nature coupled with the two-dimensional convolutions in k T . The correct solution for the momentum-space RG evolution is the main purpose of this paper.
An important comment concerns the definition of the formal resummation accuracy. In problems with Sudakov double logarithms, the cross section exponentiates into the form σ ∼ exp[α n s ln n+1 +α n s ln n +α n s ln n−1 + · · · ]. Including the first set of terms in the exponent ∼ α n s ln n+1 corresponds to the leading-logarithmic (LL) order, including the next set of terms ∼ α n s ln n corresponds to the next-to-leading logarithmic (NLL) order, and so forth. Alternatively, one can consider the logarithm of the cross section and count the corresponding terms in its α s expansion. One issue with this way of counting logarithms is that it is intrinsically ambiguous, i.e., it is always possible to have slightly different definitions which agree to a given order, but differ by contributions that in one or the other definition are formally subleading. As we will see the q T spectrum in momentum space does not exponentiate into a simple exponential, but rather it will have a generalized exponential structure in distribution space, which makes this counting of explicit logarithms even less well-defined. (We will see that this can even lead to producing spurious divergences in the resummed result from formally subleading terms.) The exponential structure of the resummed q T spectrum is fully encoded in its evolution equations, or equivalently the anomalous dimenions are in essence the generalized "logarithm" of the resummed cross section. Therefore, an easy, fully consistent, and unambiguous way to define the resummation order is strictly by the pure fixed-order expansions of the cusp and noncusp anomalous dimensions together with the fixed-order expansions of the pure boundary terms for the hard, beam, and soft functions. Since these fixedorder series are the fundamental input to the RG evolution, with everything else following from them, it makes sense to define the fundamental resummation accuracy solely by the perturbative accuracy at which these inputs are included. In particular, with this strict definition one is not allowed to disregard seemingly subleading logarithmic terms at intermediate stages. This also means that the various RG consistency relations should always be exactly fulfilled by the resummed result. Our goal is to derive the solution for the evolution and resummation in momentum space, i.e., with anomalous dimensions and boundary terms defined in momentum space, valid to in principle any order in this strict definition.
Note that given the resummed result at strict LL, NLL, etc. one can of course consider additional choices or further approximations to simplify the result and study their numerical impact, which however is not our concern here. Notation: In the remainder of this paper, we denote the final transverse momentum of the produced color-singlet state X by q T , while the transverse-momentum argument of a specific function is typically denoted by p T , and the integration momenta in the convolution integrals are usually denoted by k i . Convolutions are often abbreviated as where the ellipses denote possible additional variables. Multiple convolutions are abbreviated as These formulas are also summarized in appendix A.

Implications of two-dimensional convolutions
The transverse momentum spectrum is generically given in terms of plus distributions, which are necessary to regulate its 1/q 2 T divergences. Furthermore, the factorized cross section and its RGEs involve two-dimensional convolutions, see eqs. (3.1) and (3.14). While the issue of scale setting with distributions has been addressed in sec. 2, we now discuss the implications of the two-dimensional convolutions.
To illustrate the arising subtleties, it is sufficient to focus on the rapidity RGE of the soft function, as evolving it to the beam scale ν = ν B ∼ Q eliminates all rapidity logarithms in the beam functions entering eq. (3.1). A formal solution to eq. (3.14) is readily derived to be where the rapidity evolution kernel V is given by Here (γ ν ⊗ n ) denotes n convolutions of γ ν with itself, see eq. (3.18). V is an exponential in convolution space and eq. (3.20) can be derived equivalently in convolution space or Fourier space (see sec. 3.3 below). Taking the derivative with respect to ν, one can easily verify that it provides a solution for eq. (3.14).
The evolution kernel eq. (3.20) has a simple physical interpretation. Each factor of γ ν ( k i , µ) corresponds to a real soft emission with momentum k i . The convolution integrals are the remaining transverse phase-space integrals, which are constrained such that the transverse momenta of all emissions sum up to the total p T . Each emission is dressed with a rapidity logarithm ln(ν B /ν S ) to evolve in rapidity from the soft scale ν S to the beam scale ν B , which corresponds to the effective range in rapidity over which the soft emission has been integrated. The n emission term then scales with ln n (ν B /ν S ), which precisely builds up an exponential in convolution space.
To investigate the structure of V in more detail, we focus on the first nontrivial convolution. It involves integrating over two real emissions with momenta k 1,2 such that k 1 + k 2 = p T . Figure 1 illustrates the momentum regions in | k 1,2 | contributing to this integral. The region between the solid lines is allowed, while the gray region outside cannot fulfill the measurement constraint. The dashed lines correspond to a fixed angle between the two emissions of ∠( k 1 , k 2 ) = 90 • , 135 • . The larger this angle is, the larger the allowed magnitudes | k 1,2 | are. In the limit where the emissions are back-to-back, the magnitudes | k 1,2 | can become infinitely large, as long as their difference still gives p T . This is the limit given by the two blue lines. Hence, the convolution integrals in eq. (3.20) in principle receive contributions from infinitely large momenta. Physically, the limit of both emissions having large | k 1,2 | ∼ Q should be power-suppressed in q T /Q and hence not affect the singular logarithmic structure. However, in this limit the emissions are not correctly described anymore by the underlying soft expansion, which assumes | k 1,2 | ∼ p T Q. Therefore, we can in principle receive spurious contributions to the integral from this region. On the other hand, there can also be relevant physical contributions from any intermediate region p T | k 1,2 | Q that must be correctly taken into account. It was already argued in ref. [74] (see also ref. [24]) that the very small q T region can be influenced (or even be dominated) by such kinematic cancellations of harder emissions.
One might now be worried that the factorization theorem is intrinsically ill-defined, as it contains effects of arbitrarily hard emissions. This is unavoidable, as the soft approximation eliminates the phase-space constraints that would normally cut off such emissions, as already noticed in ref. [14]. However, all large rapidity logarithms should arise from the soft region where all emissions are of the order of the final q T , | k i | ∼ q T . From the above observations, it is clear that the rapidity evolution kernel eq. (3.20) violates this requirement. This is perfectly fine because a priori eqs. (3.19) and (3.20) only shift logarithms ln In particular, if S( p T , µ, ν S ) is already resummed, eqs. (3.19) and (3.20) are valid solutions of the rapidity RGE.
The situation changes if we want to use the rapidity RGE to predict all logarithms of the soft function. Then we would like to start from a boundary condition for the soft function without any real emissions, and subsequently let the evolution add real emissions by convolving with γ ν . Each such emission with momentum k i should scale with a rapidity logarithm ∼ ln(ν B /| k i |) to evolve in rapidity from its own emission scale to the beam scale. For the example of two emissions, we thus expect a contribution of the form Figure 1. Illustration of the transverse momentum regions contributing to the convolution (γ ν ⊗ γ ν )( p T , µ). The region between the solid blue and orange lines contributes to the convolution integral. The dashed lines correspond to a fixed angle θ = ∠( k 1 , k 2 ) = 90, 135 • between the two emissions. Canonical scaling | k 1 | ∼ | k 2 | ∼ p T is only fulfilled in the shaded orange region.
(For the moment we ignore that the rapidity logarithms need to be properly included into plus distributions, which will be taken care of in the final solution in sec. 5.) If the k 1,2 integrals were dominated by | k 1,2 | ∼ p T , indicated by the shaded orange region in figure 1, both rapidity logarithms could be approximated as ∼ ln(ν B /p T ), and a double logarithm ln 2 (ν B /p T ) could be pulled out of the convolution integrals. This is precisely what happens in eq. (3.20). However, as explained above, the convolution also gets contributions from energetic emissions | k 1,2 | p T , which happen to be back-to-back such that k 1 + k 2 = p T . In this region, the rapidity logarithms ln(ν B /| k 1,2 |) in eq. (3.21) get smaller and eventually become irrelevant for | k 1,2 | ∼ Q. Instead, the approximation | k 1,2 | ∼ p T would result in a spuriously large rapidity logarithm ln 2 (ν B /p T ), which would artificially enhance this otherwise suppressed phase-space region. To properly treat these emissions, it is necessary to keep the correct rapidity logarithms ∼ ln(ν B /| k i |) in the convolutions. This is obviously not possible by simply choosing a fixed value for ν S in eqs. (3.19) and (3.20). In particular, if we were to use the naive canonical choice ν S ∼ p T , we would artificially enhance unphysical contributions from energetic emissions. In sec. 3.4, this point will be stressed by showing that doing so actually produces a spurious divergence in the q T -spectrum. The correct RG evolution to predict the resummed soft function is significantly more complicated than eq. (3.20), and in particular will rely on distributional scale setting. It will be derived in sec. 5.
So far we have focused on the role of the rapidity scale. The two-dimensional convolution structure also has implications for the scale µ entering the rapidity anomalous dimension and soft function. For this purpose, it suffices to consider eqs. (3.19) and (3.20), which correctly evolve the (already resummed) soft function between two scales ν S and ν B . The final results in eqs. (3.19) and (3.20) after all integrations can only depend on p T and thus can only contain logarithms ln(p T /µ). One might therefore expect that when performing the rapidity evolution at µ ∼ p T , it should be sufficient to evaluate γ ν at fixed order. However, what is important is that the rapidity anomalous dimension contains logarithms ln(| k i |/µ) (in distributional form). These are only minimized by µ ∼ p T if the convolutions were dominated by | k i | ∼ p T . However, as argued above, since the convolution intrinsically probes momenta | k i | p T , setting µ ∼ p T can induce spurious logarithms ln(| k i |/p T ). Hence, the rapidity anomalous dimension γ ν entering the convolutions should always be resummed in order to correctly describe emissions at any | k i | Q. As we will see in sec. 4, the main effect of resumming γ ν ( k i , µ) is to evaluate it at α s (| k i |) rather than α s (µ). This suppresses the amplitude of energetic emissions, which is particularly important because there is no phase-space suppression due to the soft approximations, as discussed above.
At this point we can also discuss why these complications do not arise for scalar quantities like thrust or transverse energy. Taking transverse energy as the closest example 4 to q T , the measurement constraint (for two emissions) changes to This is clearly a much stronger constraint and forces both momenta to be of the order of the total transverse energy, | k i | ∼ E T . In figure 1, this corresponds to forcing the momenta to lie on the orange line, and it does not allow any contributions from the large momentum region. In particular, the rapidity logarithms ln(ν B /| k i |) are now well approximated by ∼ ln(ν B /E T ). As we will see in the next subsection, this is precisely the reason why in the one-dimensional case the convolution structure of the RGE does not lead to a spurious singularity.

Rapidity evolution in Fourier space
To illustrate that the problems observed in the previous section are not an artifact of the momentum space approach, we now consider the resummation of the soft function in Fourier space. There, the rapidity RGE in eq. (3.14) becomes multiplicative, where b T is Fourier conjugate to p T and the tilde denotes the Fourier transformed quantities.
from which we recover the momentum space solution For arbitrary ν 0 , this is exactly equal to eqs. (3.19) and (3.20), and allows one to correctly shift logarithms from ν 0 to ν. However, to instead predict the soft function including its logarithms, one needs to choose ν 0 suitably. Since the final S( p T , µ, ν) result is known to contain logarithms ln(p T /ν), one might be tempted to set ν 0 = p T and use the fixed-order boundary conditioñ S( b T , µ, ν 0 ) = 1 + · · · . However, we know thatS contains logarithms ln(ν 0 b T ), and these are of course not eliminated by ν 0 ∼ p T . Since the Fourier integral runs over all b T , they can in principle become relevant at small and large b T , in particular wherever the simple scaling b T ∼ p −1 T is violated. (This is the same situation as discussed in eq. (2.8).) This is analogous to setting ν S in eq. (3.19) to the overall q T and using the pure fixed-order boundary condition for S( k T , µ, ν S ).
We will explicitly see in the next section that the b T → 0 region causes troubles if one were to setS( b T , µ, ν 0 ) = 1. In Fourier space, the problem is easily overcome by choosing ν 0 ∼ 1/b T , which would allow one to evaluate the boundary condition in a pure fixed-order expansion. However, this corresponds to an intrinsic scale setting and thus resummation in Fourier space. The correct momentum-space analog corresponds to the discussion in eq. (3.21) and will be derived in sec. 5.

Illustration: effects from energetic emissions
In the sec. 3.2 we argued that the two-dimensional convolutions of γ ν are intrinsically sensitive to large transverse momenta. As a result the formal solution eq. (3.20) does not allow one to correctly predict the all-order soft function because these energetic emissions are artificially enhanced by large rapidity logarithms. We will now explicitly demonstrate that this incorrect treatment is what causes a well-known spurious singularity in the evolution kernel.
We first calculate the rapidity evolution kernel V in eq. (3.20) using the fixed leadingorder expression for γ ν , which is given by Inserting this into eq. (3.20), the evolution kernel can be calculated either through Fourier transformation or by explicitly calculating L 0 ⊗ n [see eq. (C. 24)]. Both techniques yield where 28) and the plus distribution is defined as [see eq. (C.8)] The result in eq. (3.27) contains an explicit divergence at ω s = 1, which has been encountered before [14][15][16][17]24]. Using what would seem to be the canonical scale choices, µ = ν 0 = q T and ν = Q, the divergence at ω s = 1 occurs when For illustration, for gluon-induced processes like gg → H, where Γ cusp ∼ C A , this happens at q T ≈ 8 GeV for Q = 125 GeV and q T ≈ 27 GeV for Q = 1 TeV. For quark-induced process like Drell-Yan, where Γ cusp ∼ C F , it occurs at q T ≈ 2 GeV for Q = m Z and q T ≈ 4 GeV for Q = 1 TeV. Clearly, the all-order q T spectrum cannot contain such a singularity, especially since for large Q it happens at purely perturbative q T . Its appearance for small enough q T /Q respectively large enough ln(q T /Q) indicates that the above naive attempt simply does not properly treat the resummation of logarithms. As we have shown, eq. (3.27) can be derived working entirely in momentum space and without any reference to Fourier space, which means that it is not related to a possibly ill-defined inverse Fourier transformation.
To show that the origin of this divergence is indeed due to contributions from large transverse momenta in the convolution, as argued earlier, we regulate the LO anomalous dimension in eq. (3.26) by introducing an explicit cut off Λ in momentum space, Since γ ν ( p T , µ) corresponds to a single real emission with momentum p T , this is equivalent to cutting off emissions with large transverse momentum | p T | > Λ. Using eq. (3.31) in eq. (3.20), the rapidity evolution kernel becomes rather complicated and is most easily evaluated in Fourier space, where the regulated anomalous dimension eq. (3.31) reads is a generalized hypergeometric function. The rapidity evolution kernel then is This is perfectly finite for ω s = 1. It shows that by cutting off energetic emissions the divergence disappears, and confirms our observation that it is precisely the large momentum region that is being treated incorrectly. Taking the Λ → ∞ limit in eq. (3.33) is slightly nontrivial but reproduces eq. (3.27). In particular, from the expression in the last line of eq. (3.33) it is easy to see that for ω s = 1 the prefactor diverges for Λ → ∞, while it does not for ω s < 1.
Interestingly, by choosing Λ = p T , meaning that no single emission is allowed to be harder than the total p T , one obtains where the p T -dependence is exactly the same as in the unregulated solution eq. (3.27), while the remaining integral is a function of ω s that is finite for any ω s (except for ω s = 0, for which the whole expression has to reproduce the boundary condition V ( p T , µ, ν, ν) = (2π) 2 δ( p T )). This stresses again that emissions much harder than the actual final state p T cause the divergence.
To see more explicitly how the divergence in eq. (3.27) arises, we can focus on its δ( p T ) term. Using the expression for L 0 ⊗ n from eq. (C.24), we get where the ellipses denote the remaining distributions L n that built up the L ωs term in eq. (3.27). The (−1) n R (n) 2 is exactly the nth derivative of the ω s -dependent factor in eq. (3.27), e −2γ E ωs Γ(1 − ω s )/Γ(1 + ω s ), see eq. (C.17). The last sum in the first line hence precisely leads to the δ( p T ) piece of eq. (3.27), including its divergence at ω s = 1. In the last step we inserted the asymptotic behaviour R (n) 2 ∼ (−1) n n! [see eq. (C. 21)]. Each factor ω s corresponds to a single γ ν and hence a single real emission. Therefore, the n! growth of the contribution from n emissions, which leads to the divergence for ω s = 1, simply reflects the number of combinations of n individual emissions k i to yield an overall p T = i k i . Hence it is of a kinematic origin and intrinsic to the two-dimensional nature of p T .
To further illustrate this, we can show that this divergence does not appear for the one-dimensional case, where the convolution momentum is strictly limited to k ≤ k. The analogous one-dimensional problem (which would be relevant for E T ) is given by The corresponding evolution kernel is given by, where the ellipses in the second line denote plus distributions L n (k, µ), which we omitted for simplicity to focus on the δ(k) term. As before, each power of ω s corresponds to a single emission. The n-emission term comes with a coefficient (−1) nR (n) 1 , which is the nth derivative of e −γ E ωs /Γ(1 + ω s ), see eq. (B.16). This function is well defined for all ω s , and in particular its derivatives do not show the factorial growth that is present in the two-dimensional case R (n) 2 ∼ (−1) n n!. Finally, for completeness we show how the divergence arises from the Fourier space calculation. In this case, the evolution kernel is given by the inverse Fourier transform The Fourier transform of the LO expression for γ ν in eq. (3.26) is given bỹ with ω s as in eq. (3.28). The integral converges at large b T . At small b T the integrand behaves as where we approximated J 0 (b T p T ) ≈ 1 for b T → 0. Hence we find a logarithmic divergence from the small b T limit when ω s → 1. This is consistent with the previous calculation, since roughly b T ∼ p −1 T , and thus we can again conclude that energetic emission cause the divergent term. To identify the factorial growth, we expand the exponential in the integrand and perform the integration in the small b T region, where J 0 (b T p T ) ∼ 1, Hence, the small-b T region of the Fourier integral reproduces the n! growth responsible for the first pole in ω s = 1 in eq. (3.27). Note that expanding the Bessel function J 0 to higher orders similarly produces the other poles at ω s = 2, 3, · · · of eq. (3.27).
In conclusion, the two-dimensional nature of the convolutions appearing in transverse momentum distributions pose a significant complication to performing the associated renormalization group evolution and resummation. The main problem is that these convolutions can intrinsically probe emissions at arbitrarily large momentum via kinematic cancellations.
In principle, such kinematic cancellations among several hard emissions to produce a small value of q T is a physical effect which will be present in the full all-order result for the spectrum, However, in the above naive approach, the large-momentum emissions get artificially enhanced by fake rapidity logarithms. By explicitly cutting off such emissions, we have shown that they indeed produces unphysical contributions and are the origin of a spurious singularity in the rapidity evolution kernel. As discussed in sec. 3.2, the appearance of these energetic emissions requires a careful scale setting when performing the RG evolution for q T -spectrum in momentum space, which will be the focus of the remainder of this paper.

Resummation of the rapidity anomalous dimension
In this section, we carry out the resummation of the rapidity anomalous dimension γ ν ( p T , µ) in momentum space by solving its differential equation [see eq. (3.16)], using the techniques introduced in sec. 2. Since eq. (4.1) encodes the consistency (i.e. exact path independence) between the µ and ν evolutions [15,33], the solution of eq. (4.1) is an important ingredient in the full momentum-space resummation. In particular, as was discussed in sec. 3, since the two-dimensional convolutions are intrinsically sensitive to emissions at all momentum scales, one cannot naively use a fixed-order approximation for γ ν even when the rapidity RGE is performed at µ ∼ q T . We will also use the resummed result for γ ν ( p T , µ) as an example to study the differences to carrying out the resummation in Fourier space, the appearance of nonperturbative effects in the q T spectrum, as well as the implementation of a profile scale and how it allows to probe subleading logarithms.

Resummation of γ ν in closed form
We can solve eq. (4.1) distributionally by integrating it from arbitrary µ 0 to µ and then setting µ 0 = p T | + using eq. (2.74), Here we used that the boundary term can only depend on α s (p T ) and that by virtue of the cumulant must be proportional to θ(p T ), to write it as θ(p T )γ ν [α s (p T )]. Evaluating the derivative [see eq. (C.7)] yields The µ dependence is carried by the first distribution, which can easily be seen to fulfill eq. (4.1). The last two terms arise from the derivative acting on θ(p T )γ ν [α s (p T )] and are hence independent of any scale except p T . The ξ dependence exactly cancels between the two terms, but is necessary to introduce a plus distribution to regulate the 1/p 2 T divergence. The same result is also obtained by directly solving eq. (4.1) in cumulant space.
Let us briefly discuss the form of eq. (4.3). Recall that γ ν ( p T , µ) corresponds to a single real emission with transverse momentum p T . The factor 1/p 2 T inside the plus distributions corresponds to the propagator associated with such an emission. The associated infrared singularity at p T → 0 has to cancel against virtual corrections, which is encoded by the plus distribution that regulates the divergence. The effect of solving the RGE of γ ν is to evaluate the anomalous dimension at α s (p T ) rather than α s (µ). This is not very surprising, as we would expect the µ-RGE to resum virtual corrections to this single emission, which naturally pushes α s to be evaluated at the emission scale rather than the overall scale µ.
The boundary term γ ν [α s (ξ)] can be extracted by integrating eq. (4.3) up to p T ≤ µ, It corresponds precisely to the noncusp piece of the anomalous dimensions. We expand it as 5) and the constants γ ν n are the coefficients of δ( p T ) in the fixed-order calculation of γ ν ( p T , µ).

Iterative resummation of γ ν
It is instructive to see how the resummed result for γ ν ( p T , µ) arises order-by-order in perturbation theory by using a recurrence relation. To obtain the recurrence relation, we expand Differentiating this with respect to µ and plugging it back into eq. (4.1) gives the relation where Γ n and β n are the coefficients of the cusp anomalous dimensions and beta function, see eq. (A.11). Applying the integration rule in eq. (2.79) yields the solution (4.8) The first two terms follow from integrating the right hand side of eq. (4.7), leaving only a pure number times δ( p T ) as possible boundary term. This shows explicitly that the noncusp coefficients γ ν n determine the boundary condition, which can (and must) be determined in fixed-order perturbation theory, while all plus distributions arise from the µ-evolution. Note that the remaining integral has to be evaluated with distributional scale setting according to eq. (2.79). The first few terms following from eq. (4.8) are given by (4.9) Here we already used γ ν 0 = 0 to simplify the expressions. One can easily check that the same result is obtained by expanding the full resummed result in eq. (4.3).

Comparison to resummation in Fourier space
The transverse momentum resummation is typically performed in Fourier (impact parameter) space. The rapidity anomalous dimension in Fourier space satisfies the corresponding differential equation (4.10) Here,γ ν ( b T , µ) is the Fourier transform of γ ν ( p T , µ). It naturally depends on ln(b 2 T µ 2 /b 2 0 ) with b 0 = 2e −γ E , such that the resummation with canonical scale choice whereγ ν [α s (b 0 /b T )] is the boundary term. To compare eq. (4.11) to the momentum space solution in eq. (4.3), we need to take the inverse Fourier transform, which involves integratingγ ν ( b T , µ) over the nonperturbative region 1/b T Λ QCD . It is hence easier to compare at the level of the perturbative reexpansion of γ ν . A recurrence relation similar to eq. (4.8) is easily derived in Fourier space, Calculating the first few terms and transforming them to momentum space, we obtain The first two terms agree with eq. (4.9), and hence to this order we find identical noncusp constants,γ ν 1 = γ ν 1 . The differences compared to eq. (4.9) start at O(α 3 s ), where the b-space resummation induces additional terms. This means that the 3-loop and 4-loop noncusp constants are related by (4.14) In addition, the L 0 term in γ ν differs by a contribution 8ζ 3 β 3 0 Γ 0 , which is induced by the different boundary termγ ν 2 , which feeds into the logarithmic distribution at higher orders.
Similar additional terms are induced at each higher order. The reason is that pure b-space logarithms ln n (b 2 µ 2 /b 2 0 ) do not correspond to pure plus distributions L n−1 ( p T , µ), but also induce δ( p T ) terms when Fourier transformed back to momentum space. The implications of this were discussed already in secs. 2.5 and 2.7. If γ ν is calculated in full fixed order to α n s , the boundary terms can be extracted up to this order, which also takes into account the differences between γ ν n andγ ν n up to this order. However, for the unknown higher order terms one would setγ ν n = 0 in the b-space resummation (i.e. by setting the boundary condition in b-space) and γ ν n = 0 in the momentum-space resummation (i.e. by setting the boundary condition in momentum space). This means if one were to use the b-space resummation to obtain the momentum-space result this would induce additional boundary terms which then lead to additional subleading terms to all orders compared to the momentum-space resummation, and vice versa.
Even though the inverse Fourier transformation over the full b-space result is not possible, we can express the differences in closed form by relating eqs. (4.3) and (4.11) through Here we used thatγ ν only depends on the magnitude of b T ,γ ν ( b T , µ) ≡γ ν (| b T |, µ). The relation is derived by relating the pure δ( p T ) pieces via eq. (4.4) to each other. It holds in the sense that the fixed-order expansions of both sides match to all orders. The ∆γ ν contains the differences in the boundary condition and the second term in eq. (4.15) explicitly shows the subleading logarithmic terms contained in the inverse Fourier transform of the b-space solution.

Turning off resummation using profiles
Having obtained the resummed rapidity anomalous dimension, we can use it as an illustrative example for the implementation of a profile scale to smoothly transition between resummation and fixed-order regimes, as discussed in sec. 2.6. Starting from the formal solution the aim is to choose a profile scale µ 0 (p T ) that smoothly transitions from the canonical scale choice µ 0 = p T | + in the resummation regime to the fixed scale µ 0 = µ that turns off the resummation. Note that the profile function can also depend on further variables such as Q or the final q T . Since these dependences do not require distributional scale setting, we suppress them. As discussed in sec. 2.6, the p T -dependence of µ 0 (p T ) requires distributional scale setting using eq. (2.74) generalized to a function µ 0 (p T ), (4.18) Here the superscript "FO" makes explicit that the boundary term is obtained from a fixed-order calculation. Focusing for the moment on the second term, which predicts the logarithmic terms at higher orders in α s , the derivative can be evaluated [see eq. (C.7)], This has to be compared to the corresponding term in the canonical solution in eq. (4.3), This result is recovered using the canonical scale µ 0 (p T ) = p T , for which the integral in eq. (4.19) vanishes. On the other hand, the resummation is turned off by choosing µ 0 (p T ) = µ, for which both terms in eq. (4.19) vanish and only the fixed-order term in eq. (4.18) survives. In between, the shape of the derivative d ln µ 0 (p T ) d ln p T = 1 entering in the plus distribution controls how the plus distribution, which is entirely generated by the resummation, is being turned off until it vanishes in the fixed-order regime.
Another purpose of the profile scale is to probe the size of higher-order terms through scale variations. To show concretely how this works, consider choosing µ 0 (p T ) = 2p T , for which eq. (4.19) becomes We observe two effects. On the one hand, the plus distribution now has a different shape in p T due to Γ cusp [α s (2p T )]. This is reasonable, as all higher-order logarithms in γ ν are resummed into the scale entering the cusp anomalous dimension Γ cusp , so we expect a variation there. On the other hand, the δ( p T ) term does not vanish anymore, and hence probes the constant fixed-order boundary terms.
In conclusion, the distributional scale setting allows one to use profile scales as usual, allowing to smoothly connect the resummation and fixed-order regimes as well as to implement profile scale variations.
Example at lowest order As an illustrative example, we consider resumming γ ν at lowest order (i.e. LL including β 0 and Γ 0 ) with a profile scale in conjunction with the full O(α s ) boundary term, (This combination corresponds to a partial NLL result and would typically not arise in practice, but it is useful for illustration.) From eq. (4.18), we obtain The first term in each of the curly brackets is from the fixed-order boundary condition and the second term is from the LL µ evolution. Expanding the result in α s (µ), we obtain ν is exactly reproduced independently of the choice of µ 0 (p T ). The two-loop term has to be compared to the full fixed-order result from eq. (4.9), With the fixed-order scale µ 0 (p T ) = µ, the two-loop term γ (N)LL(0) ν completely vanishes, leaving only the fixed-order term γ (0) ν . With the canonical resummation scale µ 0 (p T ) = p T , the first term in γ (N)LL(0) ν exactly reproduces the leading-logarithmic two-loop term ∼ β 0 Γ 0 L 1 , while the other terms vanish. Varying µ 0 (p T ) induces higher-order terms with precisely the structure of the formally higher-order Γ 1 and γ ν 1 terms. Furthermore, the p T -dependence of the variation allows one to distinguish and separately probe the size of the higher-order logarithmic and higher fixed-order boundary pieces.

Nonperturbative modeling with the moment expansion
It is well known from the resummation in Fourier space that the rapidity evolution kernel becomes intrinsically nonperturbative at 1/b Λ QCD [38][39][40]. In momentum space this corresponds to the fact that the resummed result for γ ν ( p T , µ) in eq. (4.3) explicitly depends on α s (p T ), which means that it becomes nonperturbative for p T Λ QCD . This nonperturbative region is necessarily probed in the rapidity RGE of the soft function, see e.g. eqs. (3.14) and (3.20), since γ ν ( k T , µ) is integrated over all k T .
Although we will show in sec. 5.3 that the final result for the momentum convolutions is actually perturbative up to power corrections O(Λ 2 QCD /p 2 T ), it is still important to properly handle and isolate the nonperturbative contributions to γ ν . To do so, we can write the true all-order result for γ ν as Here, we have replaced the fixed canonical scale µ 0 = p T | + in the resummed result in eq. (4.3) by a cutoff function µ 0 (p T ), which is equal to µ 0 (p T ) = p T for p T Λ QCD but cuts off and approaches some constant (perturbative) value for p T Λ QCD . The difference to the true result is absorbed into the nonperturbative contribution γ (np) ν ( p T ). Since the resummed perturbative result is now only ever evaluated at perturbative scales, all nonperturbative contributions must be contained in γ (np) ν ( p T ). The precise choice of µ 0 (p T ) here corresponds to a perturbative scheme dependence in the definition of γ (np) ν ( p T ), which cancels on the right-hand sice in eq. (4.26) (and which we suppress in our notation). At the same time, γ (np) ν ( p T ) is µ independent because the (perturbative) µ dependence is by definition carried by the resummed perturbative contributions.
Letting µ 0 (p T ) = p T in the perturbative regime, γ (np) ν ( p T ) will have support of order Λ QCD . Hence we can expand it in its moments, where ∆ p T = ∂ 2 /∂ 2 p x +∂ 2 /∂ 2 p y is the Laplace operator. Since γ ν is known to be azimuthaly symmetric, no other operators, such as e.g. linear operators, may arise. The associated Ω n coefficients are nonperturbative parameters, which have to be extracted from data. Like γ (np) ν ( p T ) itself, they are µ independent but scheme dependent. This is reminiscent of the moment expansions to treat the nonperturbative corrections in one-dimensional soft functions [27,76].
The moment expansion eq. (4.27) becomes more intuitive upon Fourier transformation. In impact parameter space, it turns into a simple polynomial in −b 2 T , The analysis of ref. [77] found that the perturbative result forγ ν ( b T , µ) has a leading renormalon contribution which scales as b 2 T . This is consistent with the fact that the first nonperturbative moment Ω 1 scales with ∆ p T or b 2 T , and the renormalon in the perturbative series should be precisely cancelled by a corresponding renormalon in Ω 1 . We expect that the above cutoff definition can be used to provide a renormalon-free scheme definition for Ω 1 in momentum space and presumably γ (np) ν ( p T ) as a whole. It would be interesting to investigate this in more detail in the future.
To compare the above moment expansion to the literature, consider the formal solution of the rapidity RGE in eq. (3.14) in impact parameter space, Including both the perturbative and nonperturbative contributions toγ ν , this yields is the Fourier transform of the resummed perturbative contribution. Hence, the momentum-space resummation reproduces and confirms the often-used procedure in the literature to model nonperturbative effects using a Gaussian factor in impact parameter space, which is also motivated by renormalon analyses [77,78]. In practice, a variety of nonperturbative models have been suggested, see e.g. refs. [79,80] for recent studies. The above also confirms that the nonperturbative correction scales with a rapidity logarithm [39,81].

Resummation of soft and beam functions
In the previous section we have solved the RGE for the rapidity anomalous dimension γ ν , which already illustrated some of the key features of the distributional scale setting in a nontrivial setting. In this section, we will solve the RGEs for both beam and soft function and derive the correct momentum-space evolution to predict their complete distributional logarithmic structure. The implications on the full q T spectrum are discussed in sec. 6.

Soft function
The soft function obeys the coupled system of renormalization group equations, eqs. (3.8) and (3.14), where the ν-RGE involves a two-dimensional convolution, which is the main source of complications, as discussed in sec. 3.2.

Iterative solution
We first derive an iterative solution order by order in α s . This requires the expansions where the γ (n) ν were derived in sec. 4.2. The two RGEs then become These equations allow to determine the α n s coefficient S (n) from the lower order terms S (m) with m < n. Solving first the ν-RGE in eq. (5.7), we obtain the formal solution The missing boundary term S (n) ( p T , µ, ν 0 ) is deduced from the µ-RGE in eq. (5.6), The remaining boundary term S (n) ( p T , µ 0 , ν 0 ) can only contain logarithms ln(µ 0 /p T ) or ln(ν 0 /p T ) or any combination of these. In particular, these logarithms can be hidden inside the boundary condition of plus distributions L n ( p T , µ), L n ( p T , ν). To eliminate them, we apply eq. (2.74) to set µ 0 = ν 0 = p T | + . The remaining pure fixed-order boundary term must then be proportional to δ( p T ), and we obtain In the µ-integral, both distributional scale settings should be performed in one step, µ 0 = ν 0 = p T | + . By construction, this solution fulfills both RGEs and fully resums all logarithmic distributions. We have explicitly verified this solution by predicting the structure of the soft function through O(α 6 s ) and comparing with the results obtained from impact parameter space, where the distributions become simple functions and hence the complication of distributional scale setting does not arise. As for γ ν , the constant pieces differ in general, i.e. S n =S n , thereby also inducing different logarithms at higher orders. Upon making the appropriate identifications betweenS n and S n , exact agreement between both iterative solutions is obtained. This shows that the distributional scale setting is well-defined for the soft function, even though it involves two distinct scales. For illustration, the fixed-order expansion of the soft function is given through O(α 3 s ) in appendix D, including the relation between the boundary terms S n andS n .
For completeness, we also give the iterative solution when first solving the µ-RGE and then the ν-RGE: Equations (5.10) and (5.11) are useful to illustrate again the correct scale setting for distributions. Firstly, the lower order terms S (m) feeding into S (n) are themselves distributions with boundaries µ or ν. Hence integrating over µ and ν with starting scale p T requires distributional scale setting, which we apply according to eq. (2.74). Secondly, we see that it is crucial to set ν 0 = p T | + at each order rather than keeping ν 0 arbitrary and setting ν 0 = p T only at the very end. The reason lies in the convolution term in eqs. (5.10) and (5.11), (γ (n−m−1) ν ⊗ S (m) )( p T , µ, ν ): The convolution depends on whether the integrand contains a formal scale ν 0 or whether it is set to the convolution variable k T . In contrast, multiplicative RGEs are not sensitive to this, such that the scale can be set at the very end.

Solution in closed form
We now proceed to derive a closed form of the solution. We start by solving the µ-RGE in eqs. (3.8) and (3.11) at ν 0 = p T | + . The solution is straightforwardly obtained using the technique of sec. 2.3. The formal solution only requires to set µ 0 = ν 0 = p T | + , yielding 5 . 5 To allow scale variations, one could of course choose profile functions µ0 = µ0(pT )|+, ν0 = ν0(pT )|+.
The plus distribution contains the exponentiated Sudakov double logarithm at ν 0 = p T | + as dictated by the RGE. The pure boundary term, which can only depend on α s , is defined as and has the perturbative expansion The constants S n can be obtained as coefficients of δ( p T ) in the fixed-order calculation. More generally, the soft function at the canonical distributional scales is given in terms of this boundary term by where the ξ dependence exactly cancels.
In the next step, the ν-RGE in eq. (3.14) has to be solved, which is more complicated. Inspired by the iterative solution, we expand the soft function as where the S [n] correspond to the soft function including n contributions from γ ν . More intuitively, S [n] is the piece of the soft function originating from n real emissions. Since γ ν describes a single real emission, the rapidity RGE becomes Hence the S [n] are determined through the recursive RGE The solution at order n is where it is now quite intuitive that ν 0 should be set to p T | + and not to the overall q T . Iterating this, the only leftover boundary term is precisely S( p T , µ, p T | + ) = n S [n] ( p T , µ, p T | + ), and we can write the formal all-order solution as where in the last line k 0 ≡ p T and ν 0 ≡ ν. All ν i integrals have to be understood according to eq. (2.79).
From the explicit form of the first few terms, it is easy to see that eq. (5.21) fulfills the rapity RGE and that by choosing the overall ν = p T | + reproduces the (µ-evolved) boundary condition S( p T , µ, p T | + ). Note that although both γ ν and S will be probed at nonperturbative α s in the convolutions of eq. (5.21), the result is actually perturbative up to corrections O(Λ 2 QCD /p 2 T ), as will be discussed in sec. 5.3. For practical purposes it is nevertheless necessary to use a nonperturbative modelling, as already discussed for γ ν in sec. 4.5. To allow scale variations, one would replace all k i | + by a profile functions ν 0 (k i )| + .
Together, eqs. (5.13) and (5.21) completely predict the logarithmic structure of the soft function to all orders. Crucially, the canonical-scale boundary condition for the solution is S( p T , p T | + , p T | + ), which can be reliably calculated in a pure fixed-order calculation. The iterative structure of the rapidity evolution eq. (5.21) ensures that the correct rapidity logarithms ln(ν/| k i |) evolving each emission to the overall rapidity scale are resummed.
Once the soft function has been evolved to some scale ν S using this method, it can be evolved further to a different overall scale ν using the simple kernel of eq. (3.20), where we introduce the notation V S for the evolution kernel when used in this way for continuing the evolution. This simple structure immediately emerges from eq. (5.21) when replacing all ν 0 = k i | + by a common scale ν S , such that all ν-integrals factor out of the convolutions. This shows that eq. (5.22) is indeed sufficient to continue the ν evolution in eq. (5.21) from the overall scale ν S to a new overall scale ν.

Comparison to "naive" scale setting
Having obtained a solution correctly minimizing the boundary term distributionally, we can turn back to the discussion of sec. 3.2. There we had argued that the naive solution, eqs. (3.19) and (3.20), leads to wrong predictions, because energetic emissions are artificially enhanced by unphysical logarithms ln(Q/q T ). Instead, each convolution should be dressed with its proper rapidity logarithm ln(Q/| k i |), which is precisely what is happening in eq. (5.21). To see in more detail the different treatment of these rapidity logarithms, we now compare the iterative solution to the naive evolution kernel eq. (3.20). To do this, we simply assume the trivial boundary condition S( k T , µ, k T | + ) = δ( k T ). This of course means that the following result is not the properly resummed soft function, but it allows to disentangle effects from rapidity and µ evolution, since the latter are now neglected in the boundary term. Furthermore, we neglect running of α s for simplicity to keep the following formula as compact as possible, and work only to LL accuracy. In this toy model, the rapidity anomalous dimension is given by where at LL we can ignore all constant pieces. To keep track of only the first two emissions, we evaluate the resummed toy soft function only up to O(Γ 2 cusp ). We then obtain from eq. (5.21) For comparison, we calculate the soft function using the naive resummation eqs. (3.19) and (3.20), where the starting scale ν 0 is kept arbitrary. Similarly to above, we set the boundary term to S( p T , µ, ν 0 ) = δ( p T ), and evaluate the resummed soft function only at LL without α s running. We find Comparing the two results eqs. (5.24) and (5.25), we observe a very different logarithmic structure. In particular, in the solution eq. (5.24) derived from the known exact solution, all rapidity logs are necessarily sensitive to the soft function p T . This becomes crucial when the soft function is inserted into further convolutions, for example to predict the higher order terms of S (toy) itself.
The main difference to the soft function is that the canonical rapidity scale is ν B = ω rather than ν B = p T | + , and hence does not require distributional scale setting. The solution of the µ-RGE at the canonical ν-scale is The boundary term is defined as and is expanded as It is more complicated than for the soft function, because the beam functions are further matched onto PDFs by an operator product expansion (see ref. [62] for more details), Here B a is the beam function for flavor a and f i is the PDF for flavor i. (We ignore for the moment that gluon beam functions furthermore have a helicity structure.) While applying eq. (5.29) to eq. (5.31) eliminates all distributions in p T in the I ai matching kernels, the boundary term still involves a convolution in z with the PDF. The rapidity evolution kernel is obtained from eq. (5.21) by replacing all ν 0 = k i | + by ν 0 = ν B ∼ ω. Since this scale is independent of p T , the ν-integrals can be pulled out as where the −1/2 arises from γ ν,B = −γ ν /2. Hence, for the beam function we recover the simple exponential in convolution space with the fixed-order boundary condition B(ω, p T , µ, ν B ). This allows for a simple scale variation by choosing ν B to be a suitable profile.

Perturbativity of convolutions
The solution eq. (5.21) of the rapidity RGE contains multiple convolutions of the form (γ ν ⊗ S)( p T ) or (γ ν ⊗B)( p T ). These convolutions naturally probe α s (| k T |) in the nonperturbative regime | k T | Λ QCD . Fortunately, nonperturbative effects turn out to be suppressed by O(Λ 2 QCD /p 2 T ), which ultimately means that the q T spectrum is also perturbative up to corrections O(Λ 2 QCD /q 2 T ), as one would naively expect. To show this, we consider the convolution (f 1 ⊗ f 2 )( p T , µ) of two generic distributions which only depend on the magnitude | k T |, which is the typical case we are interested in. In general, the distributions can depend on further parameters which we suppress, as they do not affect the following calculation. We also do not include δ( k T ) terms, as they trivially factor out of the convolution f 1 ⊗ f 2 . Assuming perturbative | p T |, µ Λ QCD , we introduce an auxiliary parameter Λ fulfilling Λ QCD Λ p T to split the integration range of the convolution into three pieces (see fig. 2), two discs around 0 and p T of radius Λ, and the rest of the plane. The convolution integral then splits into three pieces, where B Λ ( p T ) denotes the ball of radius Λ around the origin p T . In the first integral, the range is restricted to be close the origin, and hence the plus prescription of f 1 can be dropped, and vice versa for the second integral. The third integral is separated from both poles by Λ Λ QCD such that both distributions can be dropped. Furthermore, it also avoids both Landau poles at | k T | = Λ QCD and | p T − k T | = Λ QCD , and hence is completely perturbative. This isolates all the nonperturbative contributions into the first two integrals.
We now focus on the first integral. Since by construction | k T | ≤ Λ | p T | , we can Assuming that the functions only depend on the squared magnitudes, we find

(5.36)
Here we used that the plus distribution by definition removes all contributions for | k T | ≤ µ and therefore its integral is only sensitive to the upper integration limit Λ. Hence, we find that the leading nonperturbative effect is suppressed as O(Λ 2 / p 2 T ). The same approximation can be applied to the second integral in eq. (5.35), such that eq. (5.35) becomes In conclusion, we find that convolutions (f 1 ⊗ f 2 )( p T , µ) are well behaved with nonperturbative contributions suppressed as O(Λ 2 / p 2 T ), as long as both | p T | and µ themselves are perturbative, | p T |, µ Λ QCD .

Illustration:
The above result can be illustrated by an explicit example, which also shows a different strategy to evaluate such convolutions. We consider the convolution which is the prototype of nonperturbative convolutions appearing in the q T spectrum. Even without an explicit calculation, its generic form is known from its µ-dependence. From it follows that Integrating this according to the prescription eq. (2.79), we find The plus distribution is perturbative as long as both | p T |, µ Λ QCD , which is precisely the regime we have considered in the above general derivation. The structure of the boundary term is not known, but its most generic structure to be µ 0 -independent is where the dependence on the arbitrary parameter ξ cancels between the two terms. Since (f ⊗f ) scales as 1/ p 2 T , F must be a scalar function. It can hence only depend on Λ QCD /| p T |, as no other scales can be combined into a scaleless number. To be any physically reasonable function, F should vanish (or at most become constant) for Λ QCD → 0. Furthermore we expect it to only depend on the magnitude p 2 T , and hence the boundary term should scale as O(Λ 2 QCD / p 2 T ), just as expected from the general calculation.

The resummed transverse-momentum spectrum
We now discuss in more detail the final result for the resummed transverse momentum spectrum eq. (3.1), where ω a,b = Q ±Y . All distributional logarithms in the q T spectrum are fully resummed in momentum space by using the RG-evolved hard, beam, and soft functions in eq. (6.1).
To simplify the complicated structure of the ν-evolution, we use eq. (5.32) to evolve the beam functions B a,b from their natural rapidity scales ν a,b ∼ ω a,b to ν. Since both kernels are simple exponentials in convolution space, they can be combined into a single evolution kernel, which due to 2γ ν,B = −γ ν,S = −γ ν precisely yields the kernel used to shift the soft function, eq. (5.22), (ω a , k a , µ, ν a ) where ν B = √ ν 1 ν 2 . Since beam functions inside the final convolution are evaluated at their natural scales ν a,b ∼ ω a,b , they are free of large rapidity logarithms.
In addition to soft and beam evolution, we also need the RG evolution of the hard function, which is straightforwardly obtained from eq. (3.6), where µ H ∼ Q ensures that H(Q, µ H ) is free of large logarithms. To assemble the cross section eq. (6.1), we set the arbitrary scale µ to the total transverse momentum q T . As usual, this requires distributional scale setting, µ = q T | + ≡ µ T | + . 6 We keep the symbolic notation µ T rather than q T to make the origin of all factors of q T and µ T in the final formula clear. We then find The first line contains the µ T -independent pieces and the cumulant integral from the distributional scale setting µ = µ T | + . The second line makes explicit that the µ-evolution resums logarithms ln(µ T /µ H ) ∼ ln(q T /Q). The third line contains the ν-evolution kernel, eq. (5.21), to evolve the soft function to ν 0 ≡ ν B = √ ν a ν b . The last line contains beam and soft functions evaluated at their natural ν-scales, and hence all rapidity logarithms in eq. (6.4) are fully resummed. Note that eq. (6.4) could be the starting point for a numerical evaluation. Although the infinite number of convolutions cannot be calculated in closed form, one could for example evaluate the result iteratively and truncate the sum in eq. (6.4) once a desired numerical accuracy is reached.
One of the original motivations to carry out the momentum-space resummation (see refs. [14,21,82]) was to avoid the intrinsic nonperturbative sensitivity at large q T Λ QCD arising in the Fourier-space resummation (see sec. 7.1.1 for more details). From eq. (6.4) it is clear that the analogous sensitivity is still present in the momentum-space resummation, namely the cross section is intrinsically sensitive to the nonperturbative contributions from the rapidity anomalous dimension, since the convolutions probe | k i | Λ QCD . Fortunately, in either case these effects turn out to be suppressed by Λ 2 QCD /q 2 T , as shown in sec. 5.3. The last line of eq. (6.4) contains the µ-evolved beam and soft functions given by eqs. (5.13) and (5.28). To investigate its structure in more detail, we omit the pure δ-terms in eqs. (5.13) and (5.28), which trivially factor out of the convolution, and for simplicity write everything into a single plus distribution. This gives The first line of the plus distribution contains the pure fixed-order boundary terms, which are free of any logarithmic distributions. The exponential explicitly contains µ-evolutions from the convolution momenta to the scale µ T . These are required to properly satisfy the exact path independence of the µ-and ν-evolution. This can be checked explicitly by verifying the µ T -independence of eq. (6.4). For practical purposes, it might be sufficient to use the fixed-order expansion for eq. (6.5), but since this would deviate from the strict resummation order, it would have to be verified numerically.

Illustration at LL
To illustrate eq. (6.4), we evaluate it to leading-logarithmic (LL) order, which is strictly defined by keeping the LO boundary terms for hard, beam, and soft function, keeping Γ 0 and β 0 in the anomalous dimensions, and dropping all noncusp anomalous dimensions. In this limit, the µ-evolved beam function boundary term becomes where the µ-evolution drops out because γ B (µ, ν = ω) vanishes at LL. We can also drop the PDF evolution, since it is a subleading logarithmic effect (i.e. the PDF anomalous dimensions are counted as noncusp contributions). The µ-evolved soft function boundary term at LL is (6.7) With strict canonical scales, ν B = Q and µ T = q T , eq. (6.4) is then given by In this form, it is somewhat reminiscent of the form suggested by Dokshitzer, Dyakonov, and Troyan for the LL cross section in refs. [83,84], (6.9) However, comparing this to eq. (6.8), e S would be given by the full cumulant integral of eq. (6.8), which has an exponential structure but is by no means a simple exponential. Only if one were to neglect all convolutions, i.e. keep only the first δ( p T − k s ) term in the rapidity evolution in the second line of eq. (6.8), one would find the expected Sudakov factor S = q T Q dµ µ γ H (Q, µ ). In a full LL resummation, the rapidity evolution forbids such a simple relation, and as a consequence the simple DDT form eq. (6.9) cannot hold for the q T spectrum. (At higher orders, the effect of PDF running neglected in eq. (6.6) would similarly spoil this.) In fact, the DDT formula was never derived as LL solution of a factorization theorem, but from a summation of ladder diagrams relevant at LL [83].
An interesting feature of eq. (6.8) is that after carrying out all convolutions, the result can always be written in the form δ( p T ) + c n L n ( p T , µ T ) + d m L m ( p T , ν B ). With strict canonical scale setting, µ T = q T | + , the first sum vanishes due to L n ( q T , q T | + ) = 0. In contrast, the second sum is converted into L m ( p T , ν B ) → L m ( q T , ν B ). Since ν B ∼ Q, this precisely yields the rapidity logarithms ln(Q/q T ), which however do not have a simple exponential structure any more. For noncanonical scales, the L n ( p T , µ T ) would not exactly vanish, but yield small corrections that probe the all-order logarithmic structure.
As an illustrative example, we carry out the first few convolutions, but ignore α srunning for simplicity. This is analogous to the simple study in sec. 5.1.3. We find . (6.10) We have not evaluated the derivative, as in this form the terms in square bracket are exactly the result of the rapidity evolution. It clearly induces apparent subleading terms in the cross section, which however are part of the strict LL order that is based on the expansion of the anomalous dimensions. For comparison, the Fourier-resummed spectrum (see sec. 5.1.3 for more details) with its canonical scale choices ν S = µ = b 0 /b T , neglecting again α s -running, is given by As before, we neglect running in the PDFs for simplicity to evaluate them at the random scale µ. The Sudakov double logarithm is clearly a well-behaved function that drops fast for both b T → 0 and b T → ∞, and hence the inverse Fourier integral must be fine as well.
To compare this expression to eq. (6.10), we split ln(Qb T /b 0 ) = ln(Q/µ) + ln(µb T /b 0 ), with µ arbitrary. Expanding the integrand in Γ cusp , we only need to Fourier-transform powers of logarithms ln n (µb T /b 0 ), which is well known (see appendix C.2). Setting µ = q T | + gives (6.12) The result agrees with eq. (6.10) up to the constant terms −10/3 ζ 2 3 Γ 3 cusp and 28ζ 3 ζ 5 Γ 4 cusp . These nonlogarithmic terms are examples of different boundary terms induced by using the Fourier-space boundary conditions. The important observation is that the logarithmic structure exactly matches in both attempts.
Lastly, we also compare this to the result from using the naive rapidity evolution of the soft function. In this case, eq. (6.1) is evaluated at µ = µ T ∼ q T , and the soft function is taken from eqs. (3.19) and (3.20), Assuming the LO boundary conditions for the hard, beam, and soft functions, we obtain (6.14) Setting µ T = ν S = q T | + , we get The obtained highest logarithmic term Γ 3 cusp ln 3 (Q 2 /q 2 T ) matches the one in eqs. (6.10) and (6.12), but all other terms are missing. The second line of eq. (6.15) is nothing but the familiar factor e −2γ E ωs Γ(1 − ω s )/Γ(1 + ω s ) from eq. (3.27), which diverges at ω s = 2Γ cusp ln(Q/q T ) = 1. The fact that this term also appears in the correctly resummed results in eqs. (6.10) and (6.12) is quite surprising as one might expect it to get modified in order to alleviate the spurious divergence. On the other hand, since the naive rapidity evolution kernel eq. (3.27) does correctly shift the rapidity logarithms, it must in fact also appear in eqs. (6.10) and (6.12), and so it seems that both the p T -space and b Tspace resummation still contain this divergence. However, we know that eq. (6.11) is well behaved, and hence all the additional logarithmic terms in eqs. (6.10) and (6.12) that are not present in eq. (6.15) must conspire to cancel the divergence in the spectrum. Hence, we find the peculiar feature that apparent-NNLL and higher terms cancel the divergence caused by the apparent-NLL terms in the strict LL spectrum. A detailed numerical study of this effect would be an interesting extension of this work.
This simple exercise clearly shows that counting logarithms ln(Q/q T ) in the q T spectrum is an intrinsically ill-defined notion. Instead, the resummation order should be defined strictly and unambiguously through the perturbative order of anomalous dimensions and pure boundary terms entering all RGEs. This is done in both our momentum-space and the Fourier-space resummation, which both lead to well-behaved results.

Comparison to the literature
In this section, we discuss the relation of our results to the Fourier-space resummation in the original CSS formulation and various existing implementations (sec. 7.1), as well as other approaches for a direct momentum-space resummation (secs. 7.2 and 7.3).

Comparison to CSS formalism
The equivalence of the Fourier-space resummation in the original CSS formulation and performing the RG evolution in Fourier space in the context of SCET has been discussed several times before [16,85] (see e.g. refs. [30,[86][87][88] for one-dimensional cases like thrust or threshold resummation). It is not unexpected that the two formulations are equivalent, since both are based on the same underlying factorization in the soft-collinear approximation.
Interestingly, in their first paper [38] in the context of back-to-back jets in e + e −collisions, Collins and Soper formulated an evolution equation in transverse momentum space that precisely corresponds to the rapidity RGE in our framework; compare eq. (6.4) in ref. [38] with eq. (3.14), where the rapidity scale ν essentially corresponds to the scale ζ in the CSS formulation. However, due to the much simpler formulation in Fourier space, as first noted by Parisi and Petronzio [74], they carried out their analysis in Fourier space. Studying in detail the properties of the inverse transform, with particular focus on the effect of the (nonperturbative) large b-region [39], then paved the road to resumming the q T -spectrum in hadronic collisions [40].
The CSS formula of ref. [40] can be easily related to the SCET formalism by Fourier transforming the beam and soft functions and their RGEs given in sec. 3.1. For example, for the soft function we have and most importantly the rapidity RGE becomes multiplicative. Since logarithms in impact parameter space always depend on µb T /b 0 or νb T /b 0 , all logarithms are minimized with boundary scale choice µ 0 = ν 0 = b 0 /b T . With this choice, the above RGEs are solved bỹ Note that by first solving the ν-RGE at the low scale and then the µ-RGE,γ ν enters evaluated at its natural scale µ 0 = b 0 /b T for which it reduces to the pure fixed-order boundaryγ ν [α s (b 0 /b T )]. In this way, the resummation ofγ ν becomes trivial in the Fourier space evolution. The beam functions are similarly resummed with boundary scales and can furthermore be matched onto PDFs using an operator product expansion, can be further split into a process-dependent hard virtual and a process-independent softcollinear piece, as was noticed also in refs. [46,50]. The evolution kernels are related by The important observation that A(α s ) receives a contribution from the rapidity anomalous dimension in addition to the cusp anomalous dimension was first made in ref. [16].
To compare to our canonical resummation in momentum space, first note that setting ν 0 = b 0 /b T inside the Fourier integral is the analog of choosing the scale ν 0 = k T | + inside the convolutions building up the ν-evolution kernel. Similarly, choosing µ 0 = b 0 /b T for both beam and soft function inside the Fourier integral corresponds to choosing µ 0 = p T | + when solving the µ-RGE for S( p T ) and B( p T ), which then accordingly enters the convolutions B ⊗ B ⊗ S. In practice, the Fourier-space resummation is of course more easily implemented because the measurement constraint q T = i k i is translated into a common impact parameter b T for all convolved functions, and hence the iteratively defined convolution exponential in eq. (5.21) turns into a standard exponential function in Fourier space. As discussed on general grounds in sec. 2.5 and in detail for γ ν in sec. 4.3, the formal difference between the different spaces arises from the fact that one uses different boundary conditions which induce different subleading terms to all orders in α s . Hence with a robust estimate of theoretical uncertainties at any given order, one would expect that both techniques yield results compatible within their uncertainties, but a direct comparison of both results would provide another interesting way to assess theory uncertainties, in particular of nonperturbative effects.

Practical implementations
In the following we briefly comment on several implementations (without claiming to be exhaustive), which perform the resummation fully or partially in b space. As reference to compare to we take either the canonical b-space or the canonical momentum-space evolution we have derived, as both techniques reproduce all logarithms in the fixed-order reexpansion in their respective space. We only focus on the effects of deviating from these canonical scale choices and the strict resummation order. A detailed discussion regarding the phenomenologically important aspects of matching to the full fixed-order and assessing theory uncertainties can be found e.g. in ref. [85].
The original CSS formula is the basis of refs. [42][43][44][45] as well as refs. [46][47][48][49][50][51][52][53][54][55]. For the latter, the canonical Fourier-space logarithms are replaced by The benefit of shifting the argument of the logarithm is that it suppresses the region p T ∼ 1/b T Q. This should effectively suppress the contributions from energetic emissions, and it would be interesting to study its effect on the small q T region compared to the strict canonical resummation. Furthermore it ensures that integrating over the q T -spectrum restores the inclusive cross section. The Landau pole intrinsic to the calculation is treated using the minimal prescription [89,90] by deforming the integration contour around the Landau pole, which is valid as long as q T is sufficiently perturbative.
The first SCET-based calculation 7 was carried out in refs. [16][17][18] in the so-called collinear anomaly framework using an analytic regulator [64][65][66]. The resummation of rapidity logarithms there corresponds to performing the ν-evolution in Fourier space with fixed canonical scale choice ν 0 = b 0 /b T . However, the low scale µ 0 is set to q T , and the rapidity anomalous dimension γ ν (corresponding to the anomaly coefficient F in their framework) is not resummed but expanded to a certain fixed order (referred to as -expansion). This looses some of the strict resummation accuracy, and in light of our discussion, one might expect that energetic emissions are incorrectly treated, which could affect the region of very small q T . The naive divergence in the cross section (see sec. 3.4) is further avoided by choosing µ 0 = q T + q * . The offset q * ≈ 2 GeV for Drell-Yan and q * ≈ 8 GeV for Higgs production is chosen large enough to explicitly avoid the divergence. However, it does so by essentially turning off the resummation below q T q * , which then also drops the sensitivity to nonperturbative effects in the rapidity evolution kernel.
In refs. [56][57][58], a factorization theorem has been derived using the δ-regulator [56,67], and has been applied e.g. in refs. [19,20]. Here, the rapidity evolution is also performed in Fourier space, while the µ-RGEs are solved in momentum space, but in contrast to the above the rapidity anomalous dimension (in their notation the D function) is fully resummed. The low scale µ 0 is chosen as µ 0 = q T + Q 0 , where Q 0 ≈ 2 GeV for both Drell-Yan and Higgs production serves as a cutoff for the nonperturbative region. While at very small q T Q 0 deviations from the canonical resummation are expected, these can be effectively absorbed into the nonperturbative contributions that become relevant in this region. Indeed ref. [19] takes great care to assess nonperturbative effects in the Drell-Yan spectrum.
Lastly, the factorization theorem of refs. [15,33] has been applied to Higgs production in ref. [85]. They employ a canonical resummation fully in b-space, similar to the CSS approach, but instead of shifting the arguments of the b-space logarithms, the resummation is turned off with profile scales in b-space whose form is based on the final value of q T . The nonperturbative large b T -region is avoided by explicitly cutting off the Fourier integration at b T ≤ 2 GeV −1 , while verifying that changing the cut in 1.5 − 3 GeV −1 only produces a negligible variation. Because of the generic suppression of nonperturbative effects by Λ 2 QCD /q 2 T , this can be expected to hold as long as q T is sufficiently perturbative.

Early approaches for direct q T -space resummation
There have been several attempts in the past to carry out the resummation directly in momentum space. They typically attempt to explicitly count logarithms ln(Q/q T ) in the q T spectrum. As we discussed before, this is dangerous as it can easily lead one to discard apparent subleading contributions that are seemingly unimportant but are actually relevant. The original DDT approach [83,84] introduced the LL cross section in the form of eq. (6.9), (7.9) Its connection to our full LL solution was already discussed in sec. 6. Ref. [14] tried to extend eq. (7.9) to NLL, counting logarithms L = ln(Q/q T ) in the Sudakov exponent S. In this counting, ref. [14] finds the cross section which diverges for h = −2. This can be directly related to our results, where the logcounting in the exponent corresponds to using the naive solution of the rapidity RGE in eq. (3.20) and keeping the rapidity anomalous dimension γ ν at a fixed order rather than fully resuming it. As we saw in sec. 3.4, the result for the rapidity evolution kernel in this approximation [see eq. (3.27)] contains exactly the same spurious divergence at ω s ≡ −h/2 = 1 as eq. (7.10). As we have argued, this is caused by the incorrect treatment of energetic emissions, which become increasingly important for small q T . This agrees with ref. [14], where it is remarked that energy conservation constraints are not correctly implemented in the resummation formula. This is remedied by basing the logarithmic order counting on the anomalous dimensions, and hence is not a flaw of the factorization theorem itself. The second DDT-based approach, ref. [82], counts logarithms L = ln(Q/q T ) directly in the cross section, i.e. only counting α s L 2 ∼ 1. Interestingly, this actually happens to lead to a numerically well-defined prediction of the cross section. To see this, consider again the naive rapidity evolution kernel eq. (3.20), Counting logarithms in the cross section is equivalent to truncating the n-fold convolution γ ν ⊗ n at the desired accuracy. For example, at LL σ where γ ν ( p T , µ) = 2Γ cusp [α s (µ)]L 0 ( p T , µ), one would only keep the first term of the sum (γ ν ⊗ n )( k T , µ) = (2Γ cusp ) n n L n−1 ( k T , µ) + 4ζ 3 n − 1 n − 4 L n−4 ( k T , µ) + · · · , (7.12) but treat the first subleading term L n−4 as a N 3 LL σ correction. The evolution kernel would then be given by where ω s = 2Γ cusp [α s (µ)] ln(ν/ν 0 ). Comparing to eqs. (3.27) and (7.10), where all terms in eq. (7.12) are kept, only keeping the first term in eq. (7.12) completely removes the divergence in the kernel. However, this cross-section counting is of course only applicable in an intermediate q T range and only includes a small subset of logarithms compared to the full resummation.
A different approach was adopted in refs. [21][22][23], which attempt to obtain an explicit momentum-space expression of the inverse Fourier transform of the b-space result. The resummed b-space result dσ/db 2 T is then expanded in terms of logarithms L b = ln(b 2 T µ 2 /b 2 0 ), whose inverse Fourier transform is known for arbitrary powers L n b , see appendix C.2. This allows one to construct a series in momentum space that approximates the Fourier transform to an in principle arbitrary precision. However, the resummation itself is effectively still performed canonically in b space.

Coherent branching formalism
Very recently, a different approach using the coherent branching formalism of refs. [25,26] has been proposed in ref. [24], which is not based on a factorization theorem or solving the q T evolution equations.
While a detailed comparison to the NNLL result given in ref. [24] would be very interesting, we leave it for future work and in the following concentrate on comparing to their NLL result, which is already instructive. The starting point of their derivation is the cumulative cross section obtained by summing over any number of independent emissions where the phase-space measure is dk T = dk T k T dφ 2π . The differential spectrum follows to be where we use a notation resembling our convolution notation. Here, the hardest emission k 1 has been singled out. The parameter 1 reflects that emissions with k i < k 1 are unresolved. Correspondingly, the exponential e −R( k 1 ) encodes the Sudakov suppression of having no emission between scales k 1 and Q. The radiator R is given at NLL by where we converted from the Catani-Marchesini-Webber scheme [94] used in ref. [24] to the MS-scheme. Its derivative R evaluates to 8 The results in eqs. (7.14) and (7.15) are technically only NLL accurate up to the fact that the PDFs in σ0 are evaluated at fixed µF rather than k1, which however is irrelevant for the present discussion. We have dropped the constant β0αs/π here, which would be canceled if the PDFs were evaluated at k1.
The structure of eq. (7.15) is closely related to our results, as it essentially contains an infinite number of convolutions, with the exception that the hardest emission is explicitly singled out. Furthermore the unresolved regions 0 < k i < k 1 that are cut out of the k i integrals (i ≥ 2) are already captured in the Sudakov exponent. Letting → 0 is then equivalent to the cancellation of IR singularities encoded in the plus distributions in our framework. Each factor R corresponds to a ν-anomalous dimension times its rapidity logarithm, where we assumed k i > 0 to drop the plus prescription in γ ν . In summary, eq. (7.15) agrees very well with our momentum-space resummed cross section that follows from the factorization theorem, up to the different treatment of the cancellation of IR divergences. Differences arise when the resummed spectrum is further expanded to obtain NLL accuracy, defined by counting logarithms in the cumulative cross section. Expanding the individual emission momenta around k i ∼ q T and counting logarithms ln(Q/q T ), they also reproduce the spurious divergence in the cross section. This is precisely equivalent to using the naive rapidity resummation, which does not treat energetic emissions correctly, as discussed in secs. 3.2 and 3.4.
To circumvent this, ref. [24] instead expands all radiators appearing in eq. (7.14) around the hardest emission k 1 . At NLL, the necessary expansions are A simplification is that all radiators R(k i ) and R (k i ) are now evaluated at α s (k 1 ) rather than α s (k i ), thereby removing the nonperturbative effects that would otherwise be present in the rapidity evolution. This procedure hence fundamentally resums logarithms of ln(Q/k 1 ) to NLL, where the resummation accuracy is defined by explicitly counting logarithms in the cumulative cross section (using exponent counting). Ref. [24] then argues that the formal accuracy in terms of counting logarithms ln(Q/q T ) will be the same, and only differ by subleading terms from the naive result. Compared to our exact solution, this procedure effectively corresponds to approximating the rapidity logarithms ln(ν/k i ) in the exact rapidity evolution kernel by logarithms ln(ν/k 1 ), which allows to pull them out of the convolutions. Since this avoids the spurious singularity, one might expect that this approximation is safer than the naive one of taking ln(ν/k i ) ∼ ln(ν/q T ). On the other hand, these differences are all of apparent subleading nature, and it would be interesting to study in more detail to what extent this approach reproduces the subleading terms in the q T spectrum that are included in the strict LL and NLL evolution in either momentum or Fourier space.

Conclusion
We have investigated solving differential equations for arbitrary distributions. These arise naturally in differential spectra of observables resolving additional soft or collinear QCD radiation, where the cancellation of IR singularities is encoded through plus distributions, and the all-order logarithmic structure is fully encoded in (renormalization group) evolution equations. Solving these equations allows the resummation of the logarithmic distributions to all orders, provided that the boundary term in the solution is free of logarithms, such that it can be reliably calculated in fixed-order perturbation theory.
We have introduced a technique for distributional scale setting µ = k| + that allows one to treat logarithmic distributions like ordinary logarithms. In particular, it eliminates any logarithms contained in the boundary condition of plus distributions such as [µ/k] µ + . It can be straightforwardly applied to solve distributional differential equations for both one-dimensional and two-dimensional distributions, where it ensures that the appearing boundary term is free of any logarithmic distributions. This allows one to perform the RG evolution and resummation directly in distribution (momentum) space. It thus enables the implementation of profile scales, the transition and matching to the full fixed-order distribution, and the estimation of perturbative uncertainties through scale variations directly in distribution space.
The technique has been applied to obtain the resummation of the transverse momentum (q T ) spectrum for the first time by solving the associated evolution equations in momentum space. We showed that a well-known spurious singularity in the spectrum arises from wrong scale setting, i.e. wrong boundary terms in the RG evolution, causing an incorrect treatment of energetic emissions, and which is cured by a proper distributional scale setting. This yields a well-defined resummation of the q T spectrum, whose resummation accuracy is strictly defined by the perturbative expansion of the associated anomalous dimensions (and boundary terms) without requiring to count explicit powers of logarithms. We indeed find that trying to specify the logarithmic accuracy by explicitly counting logarithms ln(Q/q T ) in the spectrum can be ill defined, and partly is the reason for the spurious divergences encountered in previous attempts to perform resummation in momentum space.
Previous attempts at a momentum-space resummation were partially motivated by trying to avoid nonperturbative effects in the q T spectrum, which are unavoidable in the Fourier space resummation due to integrating α s (1/b T ) over its Landau pole. We find that analogous nonperturbative effects also appear in the rapidity evolution kernel in the strict momentum-space resummation, because real emissions with momentum k i naturally scale with α s (k i ), which is necessary to suppress energetic emissions. We discussed how the nonperturbative contributions to the rapidity anomalous dimension can be isolated in momentum space, which closely reproduces the common treatment in Fourier space. We also showed that nonperturbative effects arising from integrating over the k i → 0 region inside convolutions are generically suppressed as Λ 2 QCD /q 2 T , such that they do not spoil the predictivity of the resummation for perturbative q T .
The correct momentum-space rapidity evolution involves an intricate iterative convolution structure. Its numerical implementation is nontrivial, which we plan to address in future work. The rapidity evolution has so far been performed in Fourier b T space, where logarithms ln(Qb T ) rather than ln(Q/q T ) are resummed. While both approaches are formally equivalent, our general analysis shows that the boundary conditions employed in the evolution intrinsically differ to all orders. In the future, it would be interesting to compare them numerically, as they probe different subleading terms to all orders, and this could also provide new insight into nonperturbative effects.

A.2 Convolutions
One-dimensional convolutions are defined as where the dots stand for possible additional arguments of the functions. Multiple convolutions are abbreviated as Two-dimensional convolutions are defined as where the dots stand again for possible additional arguments of the functions. Multiple convolutions are abbreviated as The soft function, and analogously other functions, are expanded as (A.13)

B One-dimensional plus distributions
For completeness we collect and extend the definitions and formulas for one-dimensional plus distributions from refs. [27,95].

B.2 Fourier transformation
The Fourier transformation of a plus function L a (k, µ) with respect to y = y − i0 is given by which holds for a > −1 through analytic continuation. In the second step we introduced the abbreviations Exact value Numerical value   where the quantities R The factorial behavior of R (n) 1 reflects that the Taylor series of R 1 (a) = e γ E a Γ(1 + a) only converges for |a| < 1. For R (n) 1 there is no such simple asymptotic behavior, but due to the infinite radius of convergence ofR 1 (a) = e γ E a /Γ(1 − a), it is expected to grow much less severely. This is confirmed by the first values shown in 4.
For illustration, table 1 shows the Fourier transforms of L n (k, µ) for n ≤ 5, table 2 the Fourier transforms of L n y for n ≤ 6. The first few R

B.3 Convolutions
The convolution of two plus distributions is given by where V 1 (a, b) is defined by which satisfies V 1 (0, 0) = 0. Taking derivatives with respect to a and b we can get the corresponding formulas for convolutions of the form L n ⊗ L a and L m ⊗ L n . The explicit results can be found in Appendix B of ref. [27]. An important special case are multiple convolutions of L n , which is given by where g( p T ) diverges at most as 1/p 2 T for p T → 0. The boundary condition can be shifted using where the − sign holds for ξ < µ and the + sign for ξ > µ. For azimuthally symmetric functions, g( p T ) ≡ g(| p T |), this simplifies to g(| p T |) It is important to keep in mind that it is nevertheless defined as a two-dimensional distribution, even though the distribution effectively contains a scalar function. It follows that the derivative with respect to the boundary condition is given by For a scalar input function f (p T ), the azimuthally symmetric two-dimensional plus distribution can be defined by the derivative 1 2πp T d dp T θ(p T ) = δ( p T ) , Note that the appearance of the θ-function is crucial to render the derivative well defined in form of a plus distribution. We define the frequently occurring logarithmic plus distributions as where the L a (x) and L n (x) on the right-hand side are the standard ones for dimensionless arguments with boundary condition x 0 = 1 as defined in eq. (B.8). They are related by L n ( p T , µ) = d n da n L a ( p T , µ) a=0 . (C.10) Their cumulant and inverse cumulants are Finally note that our definition of L n ( p T , µ) is related to that in ref. [15] by L n ( p T , µ) = 2(−1) n L 0 n (µ, p T ; µ) , (C.12) where L 0 n (µ, p T ; µ) is defined in eq. (F.1) in ref. [15].

C.2 Fourier transformation
The Fourier transform of L a ( p T , µ) is given by It is convenient to express the Fourier transform of L n ( p T , µ) as polynomials of 14) The Fourier transform of L n ( p T , µ) then follows from eq. (C.13) using eq. (C.10), The inverse transformation is given by Approximating the zeta-function with its asymptotic value ζ(n) ≈ 1, it is easy to see from this equation that R For illustration, we list in table 5 the Fourier transforms of L n ( p T , µ) for n ≤ 5, and in table 6 the inverse Fourier transforms of L n b for n ≤ 6. The first few R (n) 2 are given in table 7.

C.3 Convolutions
Convolutions of plus functions can be conveniently calculated by transforming to impact parameter space with eqs. (C.13) and (C.15), where convolutions become simple products, and then Fourier transforming back using eq. (C.16).
The convolution of two L a is given by [15] ( Table 5. Fourier transform of L n ( p T , µ) to b T -space for n ≤ 5. Results are expressed in terms of L b = ln(b 2 T µ 2 e 2γ E /4). See eq. (C.15).

C.4 Integral relations
Here we collect a few important integrals with distributional scale setting according to eq. (2.79), 25) which are useful to resum the soft function iteratively using eqs. The coefficients through O(α 3 s ) are expressed in terms of L ν ≡ ln(µ/ν) and L n ≡ L n ( p T , µ), The results through O(α 2 s ) agree with the explicit calculation using the η-regulator in ref. [62]. The S n denote the constants as derived using distributional setting, see sec. 5. They are related toS n , the correspond constants with scale setting in Fourier space, by