CGC factorization for forward particle production in proton-nucleus collisions at next-to-leading order

Within the Color Glass Condensate effective theory, we reconsider the next-to-leading order (NLO) calculation of the single inclusive particle production at forward rapidities in proton-nucleus collisions at high energy. Focusing on quark production for definiteness, we establish a new factorization scheme, perturbatively correct through NLO, in which there is no `rapidity subtraction'. That is, the NLO correction to the impact factor is not explicitly separated from the high-energy evolution. Our construction exploits the skeleton structure of the (NLO) Balitsky-Kovchegov equation, in which the first step of the evolution is explicitly singled out. The NLO impact factor is included by computing this first emission with the exact kinematics for the emitted gluon, rather than by using the eikonal approximation. This particular calculation has already been presented in the literature [1,2], but the reorganization of the perturbation theory that we propose is new. As compared to the proposal in [1,2], our scheme is free of the fine-tuning inherent in the rapidity subtraction, which might be the origin of the negativity of the NLO cross-section observed in previous studies.


Introduction
Using perturbative QCD, we would like to study particle production in high-energy proton-nucleus (pA) collisions in the kinematical regime where the produced particle is semi-hard to hard (meaning that its transverse momenta can be larger than the nuclear saturation momentum Q s , but not much larger) and it propagates at forward rapidity in the proton fragmentation region (that is, it makes a very small angle w.r.t. the collision axis) [3][4][5][6][7][8][9][10][11][12][13][14][15].
What is special about this kinematics is that the scattering probes the small-x part of the nuclear wavefunction, but the large-x part of the proton wavefunction, so it acts as a clean probe of the nuclear gluon distribution in the interesting regime where one expects large gluon occupation numbers and strong non-linear phenomena, like gluon saturation. This probe is 'clean' since the large-x part of the proton wavefunction is very dilute and hence well described by the standard QCD parton picture and the associated collinear factorization. Accordingly, the overall process can be depicted as follows: a collinear parton from the proton undergoes multiple scattering off the dense gluon system in the nuclear target and hence acquires some transverse momentum k ⊥ , before eventually fragmenting into the hadrons that are measured in the final state.
The above physical picture naturally lends itself to a hybrid factorization scheme [5,11] for the calculation of the single-inclusive hadron multiplicity, which combines the collinear factorization for the parton distribution of the incoming proton and also for the fragmentation of the produced quark or gluon [16], with the CGC factorization for the high-energy scattering between the collinear parton and the nucleus. The 'CGC' refers to the Color Glass Condensate effective theory, which is the appropriate pQCD framework to address the problem of high-energy scattering in the presence of high gluon densities [17][18][19][20]. This is essentially a theory for the gauge-invariant correlations of Wilson lines and their evolution with increasing energy. A Wilson line (a unitary matrix in the color group SU(N c )) is the S-matrix of an energetic parton which undergoes multiple scattering off a strong color field representing the gluon distribution of the target. The CGC factorization 1 for 'dilute-dense' scattering associates one such a Wilson line to each of the partons partaking in the collision, separately in the direct amplitude and the complex conjugate amplitude. Cross-sections are obtained by averaging over all the configurations of the color fields in the target, a procedure which generates the Wilson-line correlators aforementioned. In the simplest case, that is for single-inclusive particle production at leading order, this correlator involves the trace of the product of two Wilson lines 2 , which can be identified with the elastic S-matrix of a color dipole which scatters off the nuclear target.
Whereas this hybrid factorization may indeed look natural, in view of the underlying physical picture, its foundation in pQCD is not obvious, nor easy to establish. In order to make sense beyond tree-level, this scheme must be consistent with the QCD radiative corrections and notably with the collinear and high-energy evolutions. As we shall shortly explain, this issue is already non-trivial at leading-order (LO) and it becomes even more so at next-to-leading order (NLO) and beyond.
The LO version of the hybrid factorization for single-inclusive hadron production [5,11] includes the LO DGLAP evolution for the parton distribution in the proton and for the parton fragmentation in the final state. It furthermore includes the LO B-JIMWLK evolution of the dipole S-matrix. The B-JIMWLK (from Balitsky, Jalilian-Marian, Iancu, McLerran, Weigert, Leonidov and Kovner) equations [23][24][25][26][27][28][29] form an infinite hierarchy of coupled equations which describes the non-linear evolution of the n-point correlations of the Wilson lines. (Operators with different number of Wilson lines couple under the evolution due to multiple scattering.) This hierarchy drastically simplifies in the limit of a large number of colors N c 1, in which expectation values of gauge-invariant operators factorize from each other. In that limit, the evolution of the dipole S-matrix is governed by a closed non-linear equation, known as the Balitsky-Kovchegov (BK) equation [23,30].
A first subtle point, which arises already at LO, refers to the relation between the cross-section for parton-nucleus scattering on one hand, and the dipole scattering amplitude on the other hand. In general, cross-sections and amplitudes are different quantities (e.g. they have different analytic properties) and it is only due to the high-energy approximations -notably, due to the fact that the high-energy amplitudes are purely absorptive -that such an identification becomes possible in the problem at hand. Yet, the consistency between this relation and the high-energy evolution is far from 1 The CGC factorization can be viewed as the generalization to high gluon density of the k ⊥ -factorization [21,22], which deals with the 'unintegrated gluon distribution' and the associated BFKL evolution [20]. The k ⊥ -factorization applies so long as the gluon density is moderately low and non-linear effects like gluon saturation and multiple scattering can be still neglected. 2 The Wilson lines are in the fundamental representation of SU(Nc) if the colliding parton is a quark and in the adjoint representation if this parton is a gluon. Accordingly, the color dipole is either a quark-antiquark pair, or a pair of two gluons, in an overall color singlet state.
which exploits the integral representation for the solution to the BK equation. Any approximation in solving this equation, as well as the subsequent approximations which are in practice needed to derive the 'plus' prescription, will lead to an imbalance between the large, 'added' and 'subtracted', contributions and thus possibly to unphysical results.
(ii) The 'plus' prescription is actually not needed: the would-be 'rapidity divergence' is truly cut off by physical mechanisms, namely by energy conservation for the 'real' corrections and by probability conservation for the 'virtual' ones. The role of the energy conservation in constraining the longitudinal phase-space for the primary emission is in fact well appreciated [50,53]. This constraint has been used to cut off the soft divergence in Ref. [50] and to alleviate the negativity problem in the context of the 'plus' prescription in Ref. [57]. The corresponding constraint on the 'virtual' corrections has not been discussed to our knowledge, so we shall devote an appendix to an explicit NLO calculation which demonstrates this. Specifically, in App. A we show that the 'virtual' corrections with very short lifetimes -outside the physical range for the 'real' corrections -mutually cancel each other. This result is in fact natural: the 'real' and 'virtual' corrections must have the same support in longitudinal phase-space, since they should combine with each other to ensure probability conservation.
(iii) To NLO accuracy, the calculation of the single-inclusive forward particle production in dilute-dense collisions can be given a different factorization, cf. Eq. (3.7), in which the small-x logarithm associated with the primary gluon emission is not included in the high-energy evolution, but is implicitly kept within the impact factor 4 . The latter includes the incoming quark and its (not necessarily soft) primary gluon, which together scatter off the gluon distribution in the nucleus and thus measure its high-energy evolution. This is the same picture as for the CGC calculation of di-hadron production [63][64][65][66], except that the kinematics of the primary gluon is now integrated over and one must add the 'virtual' corrections.
(iv) In the limit where the primary gluon is soft and treated in the eikonal approximation, our general formula in Eq. (3.7) reduces to the integral representation (3.17) of the solution to rcBK. This is indeed the correct result for the quark multiplicity at LO. Vice-versa, our formula may be viewed as a generalization of the LO BK evolution in which the very first gluon emission (and that emission only) is treated beyond the eikonal approximation. In view of that, we expect our result for the cross-section to be positive semi-definite, albeit we have not been able to prove this explicitly.
(v) Our general formula (3.7) is probably too cumbersome to be used in practice, due to the complicated structure of the transverse and longitudinal integrations, which are entangled with each other. Fortunately though, the problem can be considerably simplified in the interesting regime where the transverse momentum k ⊥ of the produced quark is sufficiently hard, k ⊥ Q s . In that regime, the primary gluon is relatively hard as well, with a transverse momentum p ⊥ ∼ k ⊥ (see the discussion in Sect. 2.4). This allows us to replace p ⊥ ∼ k ⊥ within the rapidity variables in Eq. (3.7) and thus deduce a much simpler result, Eq. (3.12), which is of the same degree of difficulty as the formulae used within previous numerical simulations [51,53,56,57,67], while at the same time avoiding the rapidity subtraction and the associated fine-tuning.
(vi) To ensure the desired NLO accuracy of the overall scheme, the high-energy evolution of the color dipoles must be computed to NLO as well. In Sect. 3.4 we complete our factorization scheme by 4 At this level, we use the notion of 'impact factor' in a rather informal way, as a proxy for the dilute projectile made with the incoming quark and its primary gluon. The more conventional impact factor which is defined order-by-order in perturbation theory and involves no high-energy evolution (i.e. no small-x logarithms), will be computed to NLO in Sect. 3.5, after separating our general result into leading-order plus next-to-leading order contributions. specifying the NLO corrections associated with the high-energy evolution. As we also explain there, the inclusion of the NLO evolution in the problem at hand is a priori problematic, for two reasons: (a) the evolution of a dense wavefunction, like a nucleus, is not known beyond leading-order, and (b) the strict NLO approximation is expected to be unstable, due to large corrections enhanced by collinear logarithms. We provide solutions to these problems by relating the target evolution to that of the dilute projectile, which is indeed known to NLO accuracy [32], including the all-order resummation of the collinear logarithms [37,45]. This is furthermore discussed in Appendices B and C.
(vii) To make contact with the formalism in [1,2], we consider in Sect. 3.5 the decomposition of our general result (3.12) between NLO dipole evolution and NLO corrections to the impact factor. This decomposition, shown in schematic notations in Eq. (3.29), relies in an essential way on the fact that the dipole S-matrix obeys a specific evolution equation -either the LO BK equation with running-coupling (rcBK) [68][69][70], or the NLO BK equation [32] with collinear improvement [37,45], depending upon the desired accuracy. Indeed, the integral version of the evolution equation is used to reshuffle the largest contribution to the cross-section, that associated with the LO evolution.
Eq. (3.29) is very similar, but not fully identical, to the factorization scheme proposed in [1,2], which is schematically shown in Eq. (3.30). As explained in Sect. 3.5, the differences between Eqs. (3.29) and (3.30) are irrelevant to NLO accuracy, but they can be nevertheless important in practice, as they introduce an imbalance between the terms included in the dipole evolution and those subtracted via the 'plus' prescription. As already mentioned at point (i), we believe that this imbalance is responsible for the problem of the negativity of the cross-section.
To summarize, all the potential difficulties with the subtraction method can be avoided by computing the cross-section directly from our formula (3.12), which involves no subtraction at all. This formula can be evaluated with a suitable approximation for the high-energy evolution, like rcBK or the more elaborated approximations described in Sect. 3

.4 and in Appendix B.
This paper includes two major sections, devoted to the LO and the NLO calculations respectively, and three appendices. Sect. 2 starts with a discussion of the kinematics and of the importance of the choice of a Lorentz frame for building a physical picture. Such a picture is first developed in the target infinite momentum frame (in Sects. 2.1 and 2.2), then extended to a mixed frame, where one of the evolution gluons (the 'primary gluon') is viewed as an emission by the incoming quark, whereas all the subsequent ones are included in the evolution of the nuclear target (in Sect. 2.3). In Sect. 2.4, we discuss the di-jet configurations which control the final state in the regime where the produced quark has a large transverse momentum k ⊥ Q s . The first subsection of Sect. 3 summarizes the result for the NLO impact factor obtained in [1,2] and also extends that result by specifying the rapidity variables for the evolution of the various dipole S-matrices. This discussion motivates our main result in this paper, that is, the NLO factorization displayed in Eq. (3.7). This general but rather cumbersome expression is rendered more tractable and also more explicit in Sects. 3.3, 3.4 and in Appendix B, where we simplify the kinematics via approximations appropriate at large k ⊥ Q s and we replace the unknown NLO evolution of the nuclear target by that of the dilute quark-gluon projectile (with collinear improvement). In Sect. 3.5, we isolate LO from NLO contributions, as described at point (vii) above. Sect. 4 contains our conclusions. In Appendix A we demonstrate the mutual cancellation of the 'virtual' fluctuations whose lifetime is shorter than the longitudinal extent of the target. Finally, Appendices B and C give more details on the NLO evolution of color dipoles.

The leading order calculation
In this section, we shall briefly review the leading-order (LO) calculation of single-inclusive quark production in high-energy proton-nucleus (pA) at forward rapidities (i.e. in the proton fragmentation region). This calculation relies on a hybrid factorization scheme [11] which involves collinear factorization at the level of the proton wavefunction together with the dipole picture for the scattering between a collinear quark from the proton and the nuclear gluon distribution.

General picture and kinematics
To LO in perturbative QCD and in a suitable Lorentz frame, the forward production of a quark in pA collisions proceeds via the transverse momentum broadening of one of the quarks from the incoming proton: the quark, which was originally collinear with the proton, acquires a transverse momentum k via scattering off the small-x gluons in the nuclear wavefunction and thus emerges at a small angle θ k ⊥ /k + w.r.t. the collision axis. The typical situation is such that the quark undergoes multiple soft scattering and thus accumulates a transverse momentum of order Q s -the target saturation momentum at the longitudinal resolution probed by the scattering (see below). But the k ⊥ -distribution of the produced quark also features a power-like tail at high momenta k ⊥ Q s , which is the result of a single, relatively hard, Coulomb scattering off the color sources in the target.
The physical picture actually depends upon the choice of a Lorentz frame. The picture that we have just described only holds in a 'target infinite momentum frame', where the nuclear target carries most of the total energy, so the high-energy evolution via the successive emissions of soft gluons is fully encoded in the nuclear gluon distribution. On the other hand, the picture would be different in a frame where the projectile proton carries most of the total energy; in that case, the wavefunction of the incoming quark is highly evolved, in the sense that it contains many soft gluons, which can be put on-shell by their scattering off the (un-evolved) nucleus. The final transverse momentum k acquired by the quark is then the result of the recoil from this induced gluon radiation.
To transform these pictures into actual calculations, we need to better specify the kinematics. We work in a Lorentz frame where the proton is a right mover, with longitudinal momentum Q + , while the nuclear target is a left mover, with longitudinal momentum P − per nucleon. The highenergy regime corresponds to the situation where the center-of-mass energy √ s, with s = 2Q + P − , is much larger than any of the transverse momentum (or virtuality) scales in the problem; in particular, s k 2 ⊥ and s Q 2 s . For our purposes, the longitudinal momentum k + of the produced quark is most conveniently parametrized in terms of the boost-invariant ratio x p ≡ k + /Q + ≤ 1 (the quark longitudinal momentum fraction w.r.t. the incoming proton).
Let us start by choosing a target infinite-momentum frame, where the scattering involves a bare quark from the proton and the highly-evolved gluon distribution of the nucleus. Prior to the collision, the quark has only a 'plus' momentum q + 0 . After the scattering, which can involve one or several gluon exchanges with color sources from the target, the quark emerges with the same longitudinal momentum, k + = q + 0 (since gluons from the target have negligible 'plus' momenta), but it acquires a transverse momentum k and also a 'minus' component k − , which is needed for the produced quark to be on-shell: k − = k 2 ⊥ /2k + . This condition fixes the total longitudinal momentum fraction carried by the gluons from the target that were involved in the collision 5 : To relate to the experimental situation, it is customary to express the longitudinal fractions x p and X g in terms of the rapidity η ≡ (1/2) ln(k + /k − ) of the produced quark in the center-of-mass frame (where Q + = P − = s/2). Using x p = k + /Q + and k + = (k ⊥ / √ 2)e η , one finds The forward kinematics corresponds to the situation where η is positive and large. Then Eq. (2.2) makes it clear that X g x p < 1, thus confirming that forward particle production explores the small-X g part of the nuclear wavefunction, as anticipated in the Introduction.

Dipole picture
To LO in the CGC effective theory, the 'quark multiplicity' (i.e. the distribution of the produced quarks in transverse momentum k and COM rapidity η) is computed as follows where the kinematic variables k, η, x p , and X g have already been introduced, x p q(x p ) is the quark distribution in the proton for a collinear quark with longitudinal momentum fraction x p , and S(k, X g ) is the Fourier transform of the elastic S-matrix for the scattering between a color dipole in the fundamental representation and the nucleus: The 'color dipole' is a quark-antiquark pair in a color-singlet state. In the present context, this appears as merely a mathematical representation for the cross-section for the scattering between the produced quark and the nucleus: the 'quark' component of the dipole is the colliding quark viewed in the direct amplitude (DA) and the 'antiquark' is the same physical quark, but viewed in the complex conjugate amplitude (CCA).
To the accuracy of interest, the dipole-nucleus scattering can be computed in the eikonal approximation, i.e. the transverse coordinates of the quark (x) and the antiquark (y) can be treated as fixed during the collision. Then the only effect of the collision are color rotations of the two fermions, as described by Wilson lines extending along their trajectories: (2.5) 5 Clearly, in the case of multiple scattering, the light-cone energy q − which is individually carried by the exchanged gluons can be smaller than this overall value k − = k 2 ⊥ /2k + , as emphasized in Ref. [50]. However, we disagree with the conclusion there that the longitudinal fraction Xg which counts for the target evolution can be parametrically different from the estimate (2.1) (and in particular independent of the quark transverse momentum k ⊥ ). Indeed, the relevant value of X is the one which controls the energy dependence of the target saturation momentum Qs(X). As well known, the latter is fully determined by the condition that the amplitude for a single scattering become of order one [60,61].
Here, U (x) and U † (y) are Wilson lines in the fundamental representation, e.g., and A − a (x) is (the relevant component of) the color field representing the gluons from the target with longitudinal momentum fraction X ≡ q − /P − . In general, this field is strong (corresponding to large gluon occupation numbers) and the path-ordered phase in Eq. (2.6) resums multiple scattering to all orders. In the Fourier transform in Eq. (2.4), the transverse momentum k of the produced quark is conjugated to the dipole size r ≡ x − y. Both the l.h.s. and the r.h.s. of Eq. (2.4) depend upon the impact parameter b ≡ (x + y)/2, but this dependence is unessential for what follows and will be omitted: for our purposes, the target can be treated as quasi-homogeneous in the transverse plane.
The brackets in the r.h.s. of Eq. (2.5) denote the target average over the color field A − a (x), as computed with the CGC weight functional [17][18][19]. By using the JIMWLK equation for the latter, or directly the Balitsky equations for the color dipole operator, one finds an equation for the evolution of the dipole S-matrix with decreasing X. In general, this is just the first equation from an infinite hierarchy, but the situation simplifies in the limit of a large number of colors N c 1, where the dipole S-matrix obeys a closed, non-linear, equation, known as the Balitsky-Kovchegov (BK) equation [23,30]. This equation will be later needed, so let us display it here: whereᾱ s ≡ α s N c /π and Y ≡ ln(1/X). The integration variable z in Eq. (2.7) represents the transverse coordinate of a soft gluon with longitudinal fraction X = q − /P − 1, which is emitted by 'fast' color sources from the target (valence quarks and gluons from the previous generations, with momentum fractions X X) and absorbed by the projectile dipole. Eq. (2.7) must be integrated from some lower value Y 0 ≡ ln(1/X 0 ), where one can use a low-energy model for the nuclear gluon distribution, up to Y g ≡ ln(1/X g ) = ln(ŝ/k 2 ⊥ ), where we compute the quark production. Typically, X 0 ∼ 1 X g . For instance, if one uses a valence-quark model for the nucleus, like the McLerran-Venugopalan (MV) model [71,72], then X 0 must satisfyᾱ s ln(1/X 0 ) 1 and the whole gluon distribution is built up via evolution.
In the above discussion, we have privileged the viewpoint of target evolution, that is, we have described Eq. (2.7) as the result of a change in the gluon distribution of the target. The complementary point of view, that of projectile evolution, will be useful too for what follows and will be introduced in the next subsection.
Also, we implicitly assumed that the relation (2.3) between the quark transverse momentum broadening and the dipole S-matrix remains valid in the presence of the high-energy evolution; that is, both sides of Eq. (2.3) evolve in exactly the same way with increasing energy (i.e. with increasing η, or decreasing X g ). This is known to be true, at least, up to NLO accuracy, as demonstrated in Ref. [4] for the LO BK evolution and in Ref. [31] for the NLO one. However, Eq. (2.3) is not complete beyond leading order: to this 'dipole' piece, one must add the 'corrections to the impact factor', that is, the contributions from partonic configurations which do not reduce to either the dipole, or its high-energy evolution. Such corrections will represent a main topic of the NLO discussion in Sect. 3.

Target versus projectile evolution
Previously, we have insisted that the physical picture of the high-energy evolution depends upon the choice of a frame, but as a matter of facts the BK equation (2.7) holds exactly as written in any frame that is obtained from the COM frame via a boost. This includes the infinite momentum frame of the nucleus that we have considered so far, but also the corresponding frame for the projectile, where the incoming proton carries most of the total energy and the high-energy evolution of interest refers to the emission of soft gluons in the wavefunction of the colliding dipole (or quark). By 'soft gluons' in this case, one means gluons which carry small fractions x ≡ p + /q + 0 1 of the longitudinal momentum q + 0 of the parent quark.
This 'boost invariance' of the LO BK equation is not an automatic consequence of the underlying Lorentz symmetry of the problem -after all, the respective evolution variables are different: Y = ln(1/X) for the target evolution and y ≡ ln(1/x) for that of the projectile. Rather, it reflects approximations specific to the LLA at hand, whose effect is to identify these two variables Y and y to the accuracy of interest (up to a change of sign). In other terms, at LLA, the fact of decreasing Y is indeed equivalent with increasing y, meaning that the evolution can be progressively transferred from the projectile to the target, and back. This point will play an important role in our subsequent discussion of the NLO contribution to particle production. In preparation for that, let us briefly remind here the kinematical assumptions underlying the LLA and thus expose their limitations.
To that aim, we consider the situation where only one of the soft gluons has been emitted by the quark, while all the other ones belong to the wavefunction of the target (see Fig. 1). The gluon that has been singled out in this way is the one to be closest in rapidity (y) to the incoming quark; we shall refer to it as the primary gluon and write its longitudinal and transverse momenta as p + = xq + 0 and respectively p. Consider a 'real' graph in which the primary gluon, albeit unmeasured, is released in the final state 6 . Then longitudinal momentum conservation implies k + = (1−x)q + 0 = x p Q + and therefore q + 0 /Q + = x p /(1−x). [Recall that x p is defined as the boost-invariant ratio x p ≡ k + /Q + , which in the COM frame takes the form shown in Eq. (2.2).] Both the quark and the primary gluon must be on mass-shell in the final state. Then, light-cone energy conservation implies that the scattering off the nuclear target must transfer a total 'minus' component q − = X(x, p ⊥ )P − with (recall thatŝ = x p s) The LLA essentially relies on the two following kinematical assumptions: (i) The longitudinal fraction of the emitted gluon is small: x 1. This allows one to simplify the calculation, notably by computing the quark-gluon vertex in Fig. 1 in the eikonal approximation.
(ii) The transverse momenta of the successive emissions are parametrically of the same order: k ⊥ ∼ p ⊥ or, more precisely, ln(1/x) | ln(k 2 ⊥ /p 2 ⊥ )|. This condition is necessary to simplify the energy denominators (by neglecting k − ≡ k 2 ⊥ /2k + compared to p − ≡ p 2 ⊥ /2p + ) in the study of soft successive emissions and thus obtain the LO BK (or BFKL) equation. Figure 1: An illustration of the LO high-energy evolution of the 'cross-section for quark production' S(k, X g ) (the Fourier transform of the dipole S-matrix). The 'primary gluon' (the first gluon emission by the quark, which carries a longitudinal momentum fraction x 1 is) viewed as a part of the wavefunction of the incoming quark (a right mover). The subsequent emissions are rather associated with the evolution of the gluon distribution in the nuclear target (a left mover), within the range X(x) < X < X 0 , with X(x) = X g /x and X 0 ∼ 1. Alternatively, and equivalently at LLA, they can be associated with the evolution of the wavefunction of the primary quark-gluon pair (a right mover) within the 'plus' momentum range x g < x < x.
Under these assumptions, the second term in the r.h.s. of Eq. (2.8) (the light-cone energy of the primary gluon) dominates over the first one and, moreover, one can ignore the difference between p ⊥ and k ⊥ when computing the evolution variables Y = ln(1/X) and y = ln(1/x).
The above argument can be immediately extended to an arbitrary separation of the LO highenergy evolution between the quark projectile and the nuclear target: successive emissions in the projectile are strongly ordered in x but have comparable transverse momenta, hence both energy conservation and the energy denominators are controlled by the last emitted gluon -the one with the smallest value of x and a transverse momentum of order k ⊥ . Accordingly, instead of Eq. (2.8), one can use the following, simpler, relation, in order to connect the LO evolution of the projectile to that of the target. The differences between (2.8) and (2.9) become however important starting with NLO, as we shall see. The above discussion can be summarized by the following integral representation of the solution to the BK equation, illustrated in Fig. 1, in which the total evolution is explicitly split between exactly one soft gluon (x 1) in the quark wavefunction and an arbitrary number of soft gluons (X 1) in the wavefunction of the target: In this equation, S 0 is the initial condition at X 0 1 and the lower limit x g ≡ k 2 ⊥ /ŝ for the integral over x corresponds via (2.9) to X = 1, i.e. to the situation where the soft gluon from the projectile probes bare nucleons from the target. Eq. (2.10) can also be viewed as purely target evolution provided one changes the integration variable from x to X ≡ X(x). Then it becomes obvious that this is the same as Eq. (2.7) integrated over Y , from Y 0 = 0 up to Y g .
In Fig. 2, we present some Feynman graphs which contribute to the integral term in Eq. (2.10). We do not use the dipole picture, rather we show graphs which enter the cross-section for quark production (so, in particular, we use the transverse momentum representation). The nuclear target evolved up to X(x) is represented as a shockwave (recall that we are still in a frame where the target is ultrarelativistic). There are two types of graphs: 'real', where the primary gluon appears in the final state -it is emitted in the direct amplitude (DA) and reabsorbed in the complex conjugate amplitude (CCA) -and 'virtual', where the gluon is both emitted and reabsorbed on the same side of the cut (either in the DA, or in the CCA). It is instructive to notice how such graphs are generated from the BK equation (2.10): decomposing the dipole kernel there as one can check that the 'virtual' terms are generated by the first 2 terms in the r.h.s. of Eq. (2.11), whereas the 'real' terms come from the third one. Note that, for the particular 'real' term where the gluon crosses the shockwave twice, cf. Fig. 2.a, the gluon interactions cancel between the DA and the CCA, by unitarity, hence the respective contribution to the r.h.s. of Eq. (2.10) involves only the scattering of the original quark (as described by the dipole S-matrix S x, y; X(x) ). Given the 'boost-invariance' of the (LO) BK equation alluded to above, one may wonder what can be the utility of dividing the evolution between target and projectile, as we did above. As a matter of facts, there are several advantages for doing that. First, one should keep in mind that the laboratory frame for 'dilute-dense' (d+Au or p+Pb) collisions at RHIC and the LHC coincides with the COM frame (at RHIC), or is close to it (at the LHC). Hence the picture of the high-energy evolution which is directly visible in the experiments is that of an evolution shared by the two incoming hadrons. Second, we shall shortly argue that the first gluon emission by the incoming quark (the 'primary gluon') plays in fact a special role, at least for relatively large k ⊥ Q s . Because of that, it is preferable (and even compulsory, starting with NLO) to view this gluon as a part of the quark evolution, like in Eq. (2.10). Still beyond LO, it is conceptually simpler to associate the high-energy evolution with the target wavefunction. As we shall see, the complete result for the quark multiplicity to NLO, to be presented in Sect. 3, can be viewed as a natural generalization of Eq. (2.10).

Hard transverse momentum and di-jet events
In this subsection, we shall discuss the physical picture of forward quark production in the IMF of the projectile, or, more generally, in any 'mixed' frame, like that illustrated in Fig. 1, where the quark wavefunction contains at least one soft gluon. We would like to show that, in any such a frame, the tail of the quark distribution at relatively high k ⊥ comes from the recoil in the emission of the primary gluon (the first gluon emitted by the quark). That is, a forward quark with large transverse momentum k ⊥ Q s is produced in a di-jet event where the quark is accompanied by a recoil gluon and the two particles propagate back-to-back in the transverse plane. This point is important in that it will affect the NLO calculation of the quark production at relatively high k ⊥ , where the negativity problem in the cross-section has been observed.
At a first sight, the prominence of the di-jet configuration at large k ⊥ might look rather obvious, as an immediate consequence of transverse momentum conservation at the emission vertex. But the situation is a bit more subtle, since a large transverse momentum can also be transferred by the target, via a sufficiently hard scattering. As a matter of facts, in the classical approximation at low energy (i.e. in the absence of any evolution), a power-like tail ∝ 1/k 4 ⊥ in the quark distribution at high k ⊥ is generated via Coulomb scattering (see below). The same physical picture would also hold at high energy (within the limits of the LLA), but only in the target IMF, where there is no gluon emission by the quark. But in a frame where the quark itself is allowed to radiate, the phase-space for high-energy evolution at high k ⊥ favors configurations where the momentum ⊥ transferred from the target to the projectile (the quark together with its small-x radiation) is relatively low, ⊥ k ⊥ . Because of that, the only way to produce a quark with very large k ⊥ is via a di-jet event, as anticipated.
Consider first the semi-classical approximation (no evolution), that we shall treat within the MV model. Since we are interested in a relatively hard quark with k ⊥ Q s , we can limit ourselves to the single-scattering approximation, as obtained by expanding the Wilson lines in Eq. (2.5) up to second order in the target color fields A − (see Fig. 3). Writing S = 1 − T , one finds the dipole scattering amplitude in the 2-gluon exchange approximation as (recall that r = x − y) where in the second line we have used the MV model expression for the 2-point correlator of the color fields in a dense nucleus (with atomic number A and transverse area πR 2 A ), namely The quantity µ 2 represents the color charge squared of the AN c valence quarks (treated as uncorrelated color sources) per unit transverse area. The variable q that is integrated over in Eq. (2.12) is the transverse momentum transferred from the target to the dipole and Λ is an infrared cutoff (say, the confinement scale). The unit term within the square brackets corresponds to the case where the two exchanged gluons are attached to a same quark leg within the dipole, while the exponential e iq·r refers to attachments to both legs (see Fig. 3). For relatively small dipole sizes r 1/Λ, the integral over q ⊥ develops a transverse logarithm which can be isolated by expanding out the exponential to second order. This yields the final result shown in Eq. (2.12). When this result becomes of O(1), multiple scattering becomes important and the above approximation breaks down. This condition defines the target saturation momentum Q 0 at low energy: This simple calculation makes it clear that the scattering of a small dipole (1/r Q 0 ) is controlled by relatively soft gluon exchanges (Λ q ⊥ 1/r) with the target. Let us similarly compute the quark production, for a quark with large transverse momentum k ⊥ Q 0 . When taking the Fourier transform of T 0 (r), the unit term within the square brackets in Eq. (2.12) does not matter (this would describe an elastic scattering without net momentum transfer; see Fig. 4), whereas the exponential term there selects q = k. This is simply the expression of momentum conservation and confirms that one needs a hard (inelastic) scattering in order to produce a hight-k ⊥ quark. One thus finds 7 T 0 (k ⊥ ) = g 2 C F µ 2 /k 4 ⊥ , which is recognized as the Rutherford cross-section for the Coulomb scattering between the quark and the nucleus; therefore, We shall now study the high-energy evolution of the above results, in the double logarithmic approximation (DLA) which is appropriate for sufficiently small dipole sizes, or large k ⊥ . For the present purposes, it is convenient to work in a frame where this is viewed as projectile evolution; that is, the soft gluons belong to the wavefunction of the quark and they are all right movers.
In transverse coordinate space, the DLA corresponds to the splitting of the original dipole (x, y) into two daughter dipoles, (x, z) and (z, y), whose transverse sizes are much larger, but still small enough to undergo only single scattering: r z 1/Q 0 , withz ≡ |x − z| |z − y|. The respective evolution equation is obtained from the general BK equation (2.7) by (i) linearizing w.r.t. T = 1 − S (by itself, this step yields the BFKL equation), then (ii) approximating the dipole kernel as M xyz r 2 /z 4 , and (iii) keeping only the scattering amplitudes T (x, z) + T (z, y) 2T (z) for the two daughter dipoles, whose scattering is stronger (since T (r) ∝ r 2 in this physical regime). One thus finds (as before, we use y = ln(1/x) for the evolution 'time' of the projectile) Since T (z) ∝z 2 , the integral in the r.h.s. is clearly logarithmic. The first iteration of this equation, as obtained by evaluating its r.h.s. with the amplitude T 0 from Eq. (2.12), describes the first gluon emission by the parent dipole. The physical picture of this emission follows from the previous discussion: the original dipole with size r emits a relatively soft gluon with transverse momentum p ⊥ ∼ 1/z within the range Q 0 p ⊥ 1/r, which then suffers an even softer scattering off the nuclear target, with transferred momentum Λ q ⊥ p ⊥ (see Fig. 5 left). This picture extends to the whole gluon cascade generated by iterating Eq. (2.15): successive gluon emissions are strongly ordered not only in x but also in transverse momenta, and the final exchange with the target is even softer.
We now turn to the corresponding picture in transverse momentum space, that is, to the problem 1/Q s . The daughter gluon is typical soft and thus emitted at a large distance |z − x| |z − y| r from the parent dipole. The gluon exchange q with the nuclear target is even softer. Right: one step in the DLA evolution of the cross-section for quark production, at large transverse momentum k ⊥ Q s . The primary gluon is as hard as the produced quark and they are both much harder than the gluon exchanged with the target: k ⊥ p ⊥ q ⊥ = |k + p|. The other diagrams contributing to this process at the level of the amplitude are shown in Fig. 6. of quark production (see Fig. 5 right and also Fig. 6). The momentum-space DLA equation reads where the integral in the r.h.s. is indeed logarithmic, since T (q ⊥ , y) ∝ 1/q 4 ⊥ . Within this integral, T (q ⊥ , y) should be interpreted as the cross-section for a single scattering, with transferred momentum q ⊥ , between partons in the quark wavefunction and the target. The factorᾱ s /k 4 ⊥ in front of the integral does not represent anymore a t-channel exchange with the target, as in Eq. (2.14), but rather it comes from the propagator of the intermediate quark, or gluon, in the s-channel (see Fig. 6). Hence, the physical picture of the first emission is now as follows: the original quark with zero transverse momentum emits a gluon with momentum p and turns into a final quark with momentum k, while at the same time receiving a momentum transfer q from the target (via a scattering that can occur either before, or after the splitting). Transverse momentum conservation requires q = k + p. But the overall cross-section, as described by Eq. (2.16), favors soft scattering, with transferred momenta q ⊥ k ⊥ . Accordingly, the first emitted gluon must be hard, p ⊥ k ⊥ , to balance the momentum of the produced quark.
As for the subsequent gluon emissions, starting with the second one, they follow the standard DLA ordering, in both x and p ⊥ , as in the respective calculation in coordinate space, cf. Eq. (2.15). This argument too shows that, when computing particle production, it is quite natural to associate the primary gluon with the wavefunction of the produced particle, whereas the other gluons are more conveniently included in the gluon distribution of the target, as measured by the hard splitting process. While natural already at LLA, this viewpoint becomes almost unavoidable when moving to the nextto-leading order calculation, where the primary gluon is also allowed to have a large longitudinal momentum p + ∼ q + 0 . The NLO calculation will be discussed in the next section.

Next-to-leading order
In order to move on to next-to-leading order (NLO) accuracy, one must relax some of the previous approximations and add new contributions which start at NLO. By inspection of the LO result (2.3), it is clear that one ingredient required in that sense is the NLO version of the B-JIMWLK (or BK) equations [32][33][34], together with their all-order 'collinear' resummations [37,[44][45][46]. This in particular means that some gluon emissions must be computed beyond the eikonal approximation: besides the effect of order α s Y , which dominates at high energy, one must also keep, for each such an emission, the 'pure-α s ' corrections which are not enhanced by the rapidity logarithm Y (but may be accompanied by transverse logarithms). So long as these NLO corrections refer to generic gluons inside the cascade, they can be absorbed into a renormalization of the kernel of the evolution equation. The same is true for the quark-antiquark loop which at NLO can be inserted within any of the gluon lines. But the NLO corrections associated with the 'primary gluon' (the very first emission by the leading quark) must rather be used to renormalize the 'impact factor', i.e. the value of the cross-section in the absence of high-energy evolution. At LO, the impact factor is the cross-section for the inelastic scattering between the leading quark and the low energy nucleus (say, as described by the MV model). Equivalently (to the accuracy of interest), it can be written as the S-matrix for the elastic scattering of a qq dipole. At NLO, one must add the impact factor encoding the inelastic scattering of the quark-gluon pair made with the leading quark and the primary gluon. Unlike the emission of the primary gluon, which must be computed exactly, the scattering between the quark-gluon pair and the target can still be computed in the eikonal approximation and thus related to elastic scattering amplitudes for color multipoles [2,63,65].
So, it may look like, in order to compute quark production at NLO, one must dress the two contributions to the impact factor aforementioned with the high-energy evolutions of the respective scattering amplitudes (themselves computed at NLO) and then add the results. But a moment of thinking reveals that the two pieces of the impact factor mix with each other under the high-energy evolution: a part of the primary gluon emission that we have explicitly included in the NLO impact factor is also included (within the limits of the eikonal approximation) as the first small-x gluon in the evolution of the dipole S-matrix from the LO cross-section (2.3). This is the problem of overcounting. Previous papers in the literature [1,2] proposed a solution to this problem, in the form of a 'plus' prescription which subtracts the LO evolution from the NLO impact factor. This prescription however appears to be responsible for the problem with the negativity of the cross-section discussed in the Introduction. In what follows, we shall propose a different way to organize the calculation, which avoids the over-counting without performing any subtraction. Our strategy will naturally exploit the structure of perturbation theory at high energy. As we shall see, the contribution to the cross-section which includes the NLO correction to the impact factor does also encode, completely and faithfully, the LO evolution of the dipole S-matrix. Hence, by computing this contribution as it stands, one can simultaneously include both effects without any ambiguity, or over-counting. On top of that, there is a NLO correction to the evolution of the color dipole; this will be clearly identified and related to recent results concerning the NLO version of the BK equation [32] and its collinear resummations [37,45].

Revisiting the NLO calculation by Chirilli, Xiao, and Yuan
In this subsection, we shall exhibit, discuss, and adapt to our present purposes the result of the NLO calculation of the impact factor by Chirilli, Xiao, and Yuan [1,2]. First, we shall display their 'bare', or 'unsubtracted', result, where the soft divergence 8 at x → 0 is explicit. Then we shall briefly mention the 'plus' prescription advocated in Refs. [1,2] in order to subtract the rapidity divergence. (We shall return to this point in Sect. 3.5.) Finally, we shall explain our strategy to deal with this problem, which is to use kinematical constraints like energy conservation in order to cut off the soft divergence and at the same time fix the rapidity variables for the evolution of the dipole S-matrices. The only subtle point here is the treatment of the virtual corrections, where the phase-space for the emission of the primary gluon is not directly constrained by the kinematics. Yet, as we shall demonstrate via explicit calculations (in Appendix A), the same lower limit on x applies in that case too, albeit its emergence is now dynamical.
The NLO result in Refs. [1,2] has been obtained by evaluating Feynman graphs like that illustrated in Fig. 7 in which the emission of the primary gluon is treated exactly. There is a similar graph where the gluon emission occurs after the scattering between the quark and the target. And there are of 8 We recall that x = p + /q + 0 is the longitudinal momentum fraction of the primary gluon relative to the incoming quark. In Refs. [1,2], one has rather used the variable ξ ≡ 1 − x, hence our 'soft divergence' at x = 0 appears there as the 'rapidity divergence' at ξ = 1. To facilitate the comparison, in this section we shall use both notations, x and ξ. course virtual graphs, whose evaluation is somewhat subtle as just mentioned and that we shall deal with in some detail. (See Figs. 8 and 9 below for more examples of Feynman graphs.) After the scattering, both the quark and the gluon will fragment into hadrons and thus contribute to singleinclusive hadron production. There is also another channel where the original collinear parton is a gluon, which splits into a pair of gluons, or into a quark-antiquark pair, in the process of scattering. As before, we shall omit the discussion of the fragmentation process and concentrate on quark production alone (see Refs. [1,2] for a complete discussion and also [50] for an alternative calculation, whose precise relation to the original results in [1,2] is still unclear). That is, the primary gluon is not measured, so one needs to integrate out its kinematics -the longitudinal momentum fraction x = p + /q + 0 and the transverse momentum p.
The NLO result in Refs. [1,2] can be conveniently written as the sum of 2 pieces 9 : (A) A piece proportional to the quark Casimir C F which develops no logarithm at small x (the respective integrand vanishes as x → 0), but has collinear divergences in the transverse momentum integrations. In [1,2], these divergences have been isolated with the help of dimensional regularization and reabsorbed into the leading-order DGLAP evolution of the quark distribution function q(x p ) (if the primary emission occurs prior to scattering) and of the quark-to-hadrons fragmentation function (if the emission occurs after the scattering). This prescription leaves a finite remainder of NLO order whose explicit evaluation poses no special problem.
(B) A piece proportional to the gluon Casimir N c which is free of collinear problems but develops a logarithm at small x (the respective integral over x exhibits a logarithmic divergence at x → 0 in the absence of any physical regulator). The proper way to deal with this 'rapidity divergence' at small x represents our main concern in this paper. To better focus on this problem while avoiding cumbersome notations, we shall omit the piece proportional to C F in what follows. (This piece can be easily added to our main result shown in Eq. (3.7) below.) As for the second piece, proportional to N c , we start by displaying the original result, as presented in Refs. [1,2] : where ξ ≡ 1−x and the two functions J (k, ξ) and J v (k, ξ) correspond to real and virtual contributions to the process illustrated in Fig. 7. They read (our present notations are slightly different from the original ones Refs. [1,2], but follow closely the recent paper [56]) and respectively As before, the dipole S-matrices like S(k) or S(q) refer to dipoles in the fundamental representation (cf. footnote 9). To simplify writing, we have considered the large N c limit, in which the scattering  of a system of two dipoles factorizes as the product of two individual dipole S-matrices, but this limit is not essential for what follows. The variables q and which appear in the above integrations represent transverse momenta exchanged between the target and the quark-gluon pair. For what follows, it is important to understand their precise meaning and notably their relation with the transverse momentum p taken by the primary gluon. By following the derivation of these results in Refs. [1,2], one can check that q = p + k whereas is independent of p. For more clarity, let us briefly discuss the physical interpretation of the various terms in Eqs. (3.2) and (3.3).
The 'real' terms in Eq. (3.2) represent processes where the primary gluon, albeit not measured, is released in the final state (see Fig. 8). For such processes, longitudinal momentum conservation implies ξ = k + /q + 0 . The first term in Eq. (3.2), which is linear in S(q), represents situations where the hard splitting occurs either after the collision, or prior to it, in both the DA and the CCA. In these cases, the gluon either does not interact with the target at all (emissions after the collision), or the effects of its interaction cancel out from the final result, by unitarity, because the gluon is not measured (emissions before the collision). Accordingly, there is only one dipole S-matrix, S(q), which physically describes the inelastic scattering of the quark. This scattering transfers a non-zero transverse momentum q to the quark; then momentum conservation implies q = p + k, as aforementioned.
The second term in Eq. (3.2), bilinear in the dipole S-matrix, corresponds to interference processes, where the primary gluon is emitted prior to scattering in the direct amplitude (DA) and after the scattering in the complex conjugate amplitude (CCA), or vice-versa. In such processes, both the quark and the gluon can participate in the collision. At large N c , this yields 2 dipole S-matrices: one made with the quark in the DA and the antiquark piece of the gluon, the other one with the quark piece of the gluon and the antiquark in CCA. One of these S-matrices, denoted as S( ) in (3.2), describes the elastic scattering of a physical dipole -i.e. a dipole whose both fermion legs exist on the same side of the cut (either in the DA, or in the CCA). For this elastic scattering, there is no net transfer of transverse momentum; e.g., if S( ) is computed in the 2-gluon exchange approximation, then the momentum transferred by the first exchanged gluon towards the dipole is subsequently taken back by the second exchanged gluon. The other dipole S-matrix, S(q), describes an inelastic scattering with net momentum transfer q = p + k.
Consider similarly the 'virtual' contributions encoded in (3.3) (see Fig. 9). In that case, the primary gluon is both emitted and reabsorbed on the same side of the cut, hence the momentum k of the produced quark fully comes via inelastic scattering (and k + = q + 0 ). In the first term in (3.3), the gluon fluctuation has no overlap with the target, hence the (inelastic) scattering refers to the quark alone. In the second term, the gluon can scatter too. Accordingly, this term involve 2 dipole S-matrices, one describing an elastic scattering (S( )), the other one an inelastic one (S(k)).
The following observations will be useful for the subsequent arguments: (i) In Eq. (3.1) one recognizes the full LO DGLAP quark-to-quark splitting function P qq (ξ), in line with the fact that the gluon emission has been treated exactly, and not in the eikonal approximation.
(ii) In Eqs. (3.2) and (3.3), the splitting fraction ξ is visible only in the various kernels describing the transverse momentum structure of the hard splitting, which in turn have been generated by combining the light-cone energy denominator with factors coming from the splitting vertex.
(iii) The various dipole S-matrices in Eqs. (3.2) and (3.3) are supposed to describe scattering off the nuclear gluon distribution evolved up to the right 'rapidity' (Y = ln(1/X)) scale, but this scale is left unspecified in the above equations. For the 'real' contributions at least, we know by now what is the typical longitudinal momentum fraction X of the gluons from the target which are probed by this scattering: this is the value X(x, p ⊥ ) given by Eq. (2.8). Hence, the dipole S-matrices in Eq. (3.2) must be evaluated at X X(x, p ⊥ ), where it is understood that p = q −k. We shall later demonstrate that X(x, p ⊥ ) with p = q − k is also the appropriate choice for the rapidity argument of S-matrices which enter the 'virtual' terms in Eq. (3.3). This means that, strictly speaking, one cannot factorize the S-matrix S(k) in front of the integrals in Eq. (3.3), in contrast to the results in [1,2].
(iv) The integral over ξ in Eq. (3.1) seems to develop a logarithmic singularity at ξ = 1, meaning an infrared divergence associated with the emission of very soft (x → 0) gluons. (This is the meaning of the upper label 'unsub' in the l.h.s. of Eq. (3.1).) As already mentioned, Refs. [1,2] proposed to eliminate this divergence via the 'plus' prescription, defined as (for a generic function F (ξ)) After this subtraction, the result in Eqs. (3.1)-(3.3) is supposed to represent a purely NLO correction, to be added to the respective LO result in Eq. (2.3). We shall further discuss this particular prescription in Sect. 3.5, but already at this level it should be clear that, as a matter of facts, there is no physical singularity in Eq. (3.1): for the 'real' terms at least, the integral over ξ is cut off near ξ = 1 by energy conservation, cf. Eq. (2.8). Specifically, by using Eq. (2.8) together with the kinematical limit X(x, p ⊥ ) ≤ 1, one finds the following lower limit on x = 1 − ξ: where we also used x m 1. Still for the 'real' terms, there is also an upper limit x ≤ 1 − x p , coming from the condition x p /(1 − x) ≤ 1 on the longitudinal fraction of the incoming quark.
This lower limit x x m , can be recognized as the condition that the lifetime ∆x + ∼ 2p + /p 2 ⊥ of the softest primary gluon emission be at least as large as the longitudinal width 1/P − of the target (a necessary condition for having significant scattering). This condition has been previously emphasized in [50] (the 'Ioffe time') and numerically implemented in [57] (where however the dependence of the various S-matrices upon the target rapidity X(x, p ⊥ ) has not been taken into account).
The existence of a physical lower limit on x is indeed crucial for our subsequent construction, which will not involve the 'plus' prescription, or any other infrared regularization of the integral over x. It is therefore important to demonstrate that such a limit exists also for the 'virtual' terms in Eq. (3.3), for which the previous argument on energy conservation does not apply. We shall do that in Appendix A, where we demonstrate that the same lower limit x x m holds also for the 'virtual' terms, as a consequence of fine cancellations among the virtual gluon graphs which occur in the complementary region at x < x m . Whereas mathematically subtle, the occurrence of such cancellations has a clear physical interpretation: gluon fluctuations with x < x m cannot interact with the target, since their lifetime is too short. Accordingly the emissions of such short-lived gluons cannot modify the S-matrix of the projectile. Since 'real' emissions with x < x m are anyway forbidden by energy conservation, it follows that the respective 'virtual' graphs must cancel among themselves. The precise way how such cancellations occur is demonstrated in Appendix A (for the somewhat simpler, but general enough, situation where the target itself is a quark).
(v) If one takes the limit ξ → 1 (i.e. x → 0) in the transverse kernels in Eqs. (3.2) and (3.3) (but not necessarily also in their implicit dependence upon x via the rapidity cutoff X(x, p ⊥ )), then the combination of these two terms which enters Eq.

CGC factorization at NLO
After this preparation, we are now in a position to present our master formula for the single-inclusive quark multiplicity valid through next-to-leading order (i.e. which includes both the LO and the NLO contributions). The relation between this formula and the factorization scheme proposed in Refs. [1,2] will be discussed later, in Sect. 3.5. As already mentioned, we systematically omit the NLO corrections proportional to the quark Casimir C F , which play no special role for the high-energy evolution. These corrections can be taken over from Refs. [1,2] and simply added to our master formula, which reads This formula is illustrated in Fig. 10. As before, S 0 (k) denotes the tree-level contribution to the dipole S-matrix, say as given by the MV model (see e.g. [17][18][19][20] for an explicit expression). The quantity ∆S(k, X g ) within the square brackets denotes a NLO correction to the dipole S-matrix, to be specified in Sect. 3.4. The last term in Eq. (3.7), which involves a double integration -over the longitudinal fraction x = 1 − ξ and the transverse momentum p of the primary gluon -is the main term for our present purposes. It encodes the impact factor to NLO accuracy, the LO evolution of the dipole S-matrix and also a part of the respective NLO contribution (namely, the part which is not included in ∆S(k, X g ); see Sect. 3.4 for details).
The new functionsJ k, x; p, X(x, p ⊥ ) andJ v k, x; p, X(x, p ⊥ ) are essentially the integrands in Eqs. (3.2) and respectively (3.3), in which we replaced ξ → 1 − x and q → p + k. (As compared to (3.2) and (3.3), we now use p and as integration variables; the integral over p is explicit in Eq. (3.7), while that over is included in the definitions ofJ andJ v .) The notation emphasizes that the various dipole S-matrices implicit in these functions should be evaluated at a target rapidity Y = ln(1/X) with X = X(x, p ⊥ ), cf. Eq. (2.8). The lower limit x m (p ⊥ ) is shown in Eq. (3.5). In the real term, it is understood that the support of the quark distribution limits the integration to x < 1 − x p .
For more clarity, let us exhibit here the functionJ k, x; p, X(x, p ⊥ ) which enters the 'real contribution (the corresponding expression forJ v can be similarly written): It is perhaps interesting to notice that the linear combination P ≡ (1 − x)p − xk which appears in the above integrand is the momentum conjugated to the transverse separation x − z between the quark and the primary gluon. Similarly, the total momentum q ≡ k + p is conjugated to the center-of-mass The LLA limit of Eq. (3.7) is obtained by making approximations appropriate at small x, that is, by treating the emission of the primary gluon in the eikonal approximation and by replacing p ⊥ ∼ k ⊥ in the kinematical limits and the various rapidity variables; that is, one approximates x m (p ⊥ ) x g = k 2 ⊥ /ŝ and X(x, p ⊥ ) X(x) = X g /x, with X g = x g (cf. Eq. (2.9)). Also, all dipoles S-matrices are now understood to obey the LO BK evolution, from X 0 1 down to X(x). Under these assumptions, one can first commute the integrations over p and x in Eq. (3.7) and then use the identity (3.6) to rewrite the simplified version of this equation in the following, suggestive, form The above integral runs over the longitudinal momentum fraction x = p + /k + of the right-moving gluon, whereas the evolution of the dipole S-matrix has been rather performed w.r.t. the longitudinal fraction X of the gluons in the target. However, to LLA, x and X are related via X(x) = X g /x, so one can change the integration variable from x to Y ≡ ln(1/X) and thus identify a total derivative  .7) can be viewed as the generalization of that integral representation to NLO and to the exact kinematics for the primary gluon emission. (This will be confirmed by the discussion in Sect. 3.3.) The explicit separation of the first gluon emission from the remaining evolution, as operated by this representation, has allowed us to promote the calculation of the impact factor to NLO accuracy, while at the same time avoiding over-counting.
At this point, it is important to more precisely specify the perturbative content of Eq. (3.7). As just explained, the integral term there fully encodes the LO evolution of the dipole S-matrix, that is, it resums corrections of the type (ᾱ s Y g ) n to all orders. It obviously encodes NLO corrections due to the fact that the emission of the primary gluon is treated exactly; that is, the integral over x also generates corrections of O(ᾱ s ) besides the dominant contribution of O(ᾱ s Y g ), which counts for the LO evolution. To ensure NLO accuracy, the evolution of the various dipole S-matrices must be computed to NLO as well. Indeed, the NLO BK kernel includes corrections of O(ᾱ s ); hence, the solution to the NLO BK equation involves corrections proportional toᾱ s (ᾱ s Y g ) n , which count to NLO.
By a similar argument, one must include the (one-loop) running coupling corrections within the QCD couplingᾱ s associated with the primary gluon vertex. This is why, in writing Eq. (3.7), we have inserted the factor α s inside the double integral over p and x: after including the running coupling corrections, this factor will depend upon the transverse momenta k and p which enter the emission vertex and possibly also upon x (via the gluon kinematics). Specifying this dependence requires a prescription, which is most conveniently formulated in the transverse coordinate representation (since this is the representation in which the BK equation is generally solved in practice). Such prescriptions will be discussed in Sect. 3.4 (see e.g. Eq. (3.16) there), together with the other NLO corrections to the dipole evolution. Notice however that, in order to evaluate Eq. (3.7), one also needs a prescription for the running coupling which is directly formulated in momentum space. In general such a prescription will be different from the one in coordinate space. This mismatch could have consequences for the fine-tinning issue to be discussed in Sect. 3.5.
The above arguments show that, strictly speaking, the integral term in Eq. (3.7) also includes terms of NNLO, as generated by the product between the NLO correction to the impact factor and the NLO effects in the high-energy evolution, or in the running coupling. As we shall explain in Sect. 3.5, these various types of NLO effects can be disentangled from each other via a reorganization of the perturbation theory which involves a 'rapidity subtraction', as in Refs. [1,2]. Yet, this procedure has some inconveniences, as we shall see (notably, it introduces the 'fine-tuning' issue anticipated in the Introduction). So it is important to stress here that, although going beyond a strict NLO approximation, the result (3.7) is in fact the natural outcome of perturbative QCD -that is, the direct result of evaluating Feynman graphs at the loop-order of interest, before performing additional manipulations like the 'rapidity subtraction'.
We conclude this subsection with a discussion of potential difficulties with using Eq. (3.7) in practice. All these issues will be addressed in more detail in the next two subsections, where we shall provide solutions to them, at least at the expense of further approximations.
First, Eq. (3.7) looks very cumbersome, notably due the intricacy of the multiple integrations, over both transverse (p, ) and longitudinal (x) momenta, which are entangled with each other. It is not clear to us whether these integrations can be computed as such, not even numerically. The calculations might be further complicated by the need to compute the Fourier transform of the dipole S-matrix, as numerically obtained by solving the BK equation in coordinate space.
Second, Eq. (3.7) is somewhat formal, in that the dipole S-matrices implicit there are supposed to encode the evolution of the target gluon distribution at NLO. However, the high-energy evolution of a dense nucleus has not been explicitly computed beyond LO. (All NLO calculations to date refer to the evolution of a dilute projectile, like a dipole [32][33][34].) Besides, a purely NLO approximation to the high-energy evolution is likely to become unstable (and thus require resummations) in the 'collinear' regime where the transverse momentum k ⊥ is relatively large (k ⊥ Q s ). Finally, the NLO calculation based on Eq. (3.7) is formally sensitive to the physics of the nuclear wavefunction at large values of X (recall that x ∼ x g corresponds to X ∼ 1), which is not really under control within the present, high-energy approximations. This cannot be a serious difficulty in the physical context at hand: the NLO corrections to the impact factor are controlled by relatively hard primary gluons with x ∼ O(1) and hence X(x) 1; such a hard emission by the projectile should be well separated from the valence structure of the target at X ∼ 1. At the end of the next subsection we shall describe an explicit procedure which implements this separation.

Simplifying the kinematics
In this subsection, we shall propose strategies to deal with some of the problems alluded to at the end of the previous subsection. First, we shall argue that one can approximate p ⊥ ∼ k ⊥ within the rapidity variables for the high-energy evolution and thus greatly simplify the structure of the transverse and longitudinal integrations in Eq. (3.7). Second, we shall discuss the prescription for the running of the coupling in the emission of the primary gluon. Third, we shall reformulate the initial condition at low energy in such a way to reduce the sensitivity to the large-X region in the target wavefunction.
Concerning the first point -the dependence of the kinematical constraints on the high-energy evolution upon the transverse momenta of the primary quark-gluon pair -, we note that there are two interesting physical regimes: (i) Hard quark production and di-jet configurations: k ⊥ Q s (X). This is the regime which is primarily concerned by the negativity problem discussed in Refs. [50,51,[53][54][55][56][57]. In this case, we have already argued in Sect. 2.4 that the transverse momentum of the primary gluon and that of the produced quark must balance each other : p ⊥ k ⊥ .
(ii) Semi-hard quark production: k ⊥ Q s (X). This regime includes the 'geometric scaling' window [59], where the scattering is weak, but the S-matrix is still influenced by non-linear effects, via the 'saturation boundary' at Q s (X) [60][61][62]. In this regime, all the relevant transverse momenta -the k ⊥ of the produced quark, the p ⊥ of the primary gluon, and the q ⊥ transferred by the nuclear target -take typical values of O(Q s ), since this is the value naturally acquired via rescattering off the saturated gluon distribution of the target.
We see that in both cases the quantities p ⊥ and k ⊥ cannot be very different from each other, so one can approximate p ⊥ k ⊥ when evaluating the rapidity variables (cf. Eqs. (2.8) and (3.5)) : Remarkably, thanks to this approximation, we have returned to the same expressions for the rapidity variables x g and X(x) as at LO, cf. Eq. (2.8). This greatly simplifies Eq. (3.7) since the transverse momentum integrations can now be performed prior to the integral over x. Then Eq. (3.7) takes a form closer to that in Eq. (3.1), namely, where the functions J k, x; X(x) and J v k, x; X(x) have the same formal expressions as in Eqs. (3.2)-(3.3) [with ξ = 1−x, of course], except for the fact that the rapidity variable for the evolution of the dipole S-matrices is now clearly identified, namely Y = ln 1/X(x) . Once again, it is understood that the integral over x in the real term is restricted to x < 1 − x p . In writing Eq. (3.12), we have also identified the argument of the running coupling for the primary vertex as k 2 ⊥ . This is unambiguous under the present assumptions, since k ⊥ ∼ p ⊥ is the only hard scale involved in that splitting 10 . Eq. (3.12) also shows that the natural 'LO approximation' for the highenergy evolution in the problem at hand is the LO BK equation with running coupling (rcBK). Indeed, when evaluating the second term in Eq. (3.12) within the eikonal approximation, as appropriate for x 1, the r.h.s. of this equation becomes proportional to the integral version of rcBK; that is, this is tantamount to evaluating the LO formula (2.3) with the solution S rcBK (k, X g ) to rcBK.
The considerably simpler structure of Eq. (3.12) also allows us to reformulate the initial condition at low-energy, in such a way to avoid the 'dangerous' region at X ∼ 1. To that aim, let us introduce a 'better' separation scale X 0 for the rapidity evolution of the target, which is such that the high-energy approximations are indeed justified for any X ≤ X 0 . (For instance the value X 0 = 0.01 is often used in the fits to the HERA data for deep inelastic scattering; see e.g. [45,73].) In particular, this scale should obey x g X 0 1. We then separate the integral over x into two regions, x g < x < x g /X 0 and x g /X 0 < x < 1, which in terms of X = X g /x correspond to 1 > X > X 0 and respectively X 0 > X > X g . (We recall that X g = k 2 ⊥ /ŝ = x g .) Within the first region, one has x 1, hence one can replace x → 0 within the transverse kernels which enter the functions J and J v [e.g., J k, x; X(x) → J k, 0; X(x) ] and also within the quark distribution.
Under these assumptions, the sum between the 'initial condition' S 0 in Eq. (3.12) and the part of the integral there which runs over the small-x interval at x g < x < x g /X 0 is formally the same as the r.h.s. of the LO BK equation with running coupling (rcBK) integrated from X = 1 down to X 0 (recall Eq. (3.6)). It might be tempting to interpret this sum as the solution to rcBK evaluated at X = X 0 , but we shall not adopt this point of view: after all, this 'rcBK evolution' refers to the large-X interval at 1 > X > X 0 , where the high-energy approximations are not valid. We shall rather replace the result of this fictitious 'rcBK evolution' with a new initial condition, denoted as S(k, X 0 ), which is formulated directly at X 0 . That is, we replace Eq. (3.12) by 11 where it is now understood that all the S-matrices implicit in the second term in the r.h.s. are obtained by solving appropriate evolution equations with the initial condition formulated at X = X 0 . The evolution equations to be used in this context will be discussed in the next subsection. 10 Recall that the momentum conjugated to the transverse separation between the quark and the primary gluon is P = (1 − x)p − xk ; when p ⊥ ∼ k ⊥ , we have P ⊥ ∼ k ⊥ for any x, hence the transverse separation is of order 1/k ⊥ . 11 Strictly speaking, the integral over x implicit within the NLO quantity ∆S(k, Xg), as visible e.g. in Eq.

The evolution of the color dipole beyond leading order
In this subsection, we shall describe the NLO evolution of the dipole S-matrices to be used in conjunction with the factorization scheme in Eq. (3.7) or (3.12). In particular, we shall present an explicit expression for the NLO correction ∆S which enters this scheme but so far has not been specified. To simplify the arguments and the notations, we shall work at the level of the kinematic approximations introduced in Sect. 3.3, that is, we shall built on top of Eq. (3.12). There are two aspects which are rather subtle and that we shall try to clarify in what follows. The first refers one to the proper inclusion of NLO corrections within ∆S without any over-counting: this quantity must contain only those corrections to the high-energy evolution that are not already included in the integral term in Eq. (3.12). The second aspect refers to the relation between the viewpoint of target evolution, that was explicitly used in our previous arguments (for instance, when specifying the rapidity arguments of the various S-matrices), and that of the evolution of the projectile (a color dipole), for which the evolution equation is currently known at NLO accuracy [32], including the collinear improvement [37,[44][45][46]. Indeed, the two evolutions refer to different variables ('evolution times') -x = q + /q + 0 for the right-moving projectile and respectively X = q − /P − for the leftmoving target -, so the corresponding equations cannot be identical beyond leading order, when the differences between various transverse momenta and also the off-shell effects start to play a role.
As mentioned after Eq. (3.7), the unknown quantity ∆S(k, X g ) represents a part of the NLO corrections to the evolution of the dipole S-matrix. Since these corrections are fully known by now [32], the simplest way to obtain ∆S(k, X g ) is by clarifying its relation to the NLO calculation in Ref. [32]. We first observe that, by definition, the full NLO result for the quark multiplicity must involve two types of NLO corrections: those associated with the impact factor and those related to the high-energy evolution of the color dipoles. Hence, if one 'switches off' the first type of corrections (which one can do by computing the emission of the primary gluon in the eikonal approximation), then the r.h.s. of Eq. (3.12) should be proportional to S(k, X g ) -the dipole S-matrix computed at NLO accuracy and for the kinematics of interest. This argument implies where the function J k, 0; X(x) is obtained from Eq. (3.2) by letting x ≡ 1−ξ → 0 and by evaluating the S-matrices there to NLO accuracy and for a rapidity argument Y = ln(1/X(x)) [and similarly for the function J v k, 0; X(x) ]. Using this condition together with the NLO result for the dipole S-matrix which emerges from Ref. [32], it is possible to identify the quantity ∆S(k, X g ). By the 'NLO result for S', we mean the integral representation for the NLO S-matrix, as obtained by formally solving the respective evolution equation -that is, the generalization of Eq. (2.10) to NLO. Before we proceed, it is useful to, first, clarify what we precisely mean by the 'LO evolution' and, second, introduce some simplified notations, which focus on the essential and help making the subsequent arguments more transparent.
As already mentioned after Eq. (3.12), our LO approximation to the dipole S-matrix is the solution S rcBK (k, X) to the LO BK equation with running coupling (rcBK) [68][69][70]. This choice deserves some comment, in that it already includes a subset of the NLO (and higher-order) corrections, via the running of the coupling. But precisely because of that, rcBK offers a better starting point for a perturbative expansion than the strict leading-order approximation -the LO BK equation with fixed coupling. This is related to the poor convergence of the perturbative expansion at high energy: the LO BK equation with fixed coupling is well known to predict an unrealistically fast evolution with increasing energy, meaning a too large value for the saturation exponent. Hence, for sufficiently high energies, the strict LO estimate (2.3) for the multiplicity becomes exponentially larger (in the sense of an exponential in Y = ln(1/X)) than the actual result at NLO. This problem is considerably alleviated if one instead uses rcBK as the 'leading order' evolution: this approximation predicts a significantly slower evolution [61,74] and offers a reasonably good description for the phenomenology [12,73,75].
The all-order resummation of running coupling corrections requires a prescription. Here, we shall mention two popular such prescriptions, both built with the one-loop approximation for the running coupling and which are roughly equivalent to each other (see Ref. [45] for a recent discussion). Consider the splitting of the parent dipole (x, y) into two daughter dipoles (x, z) and (z, y), as described by the LO BK equation (2.7). The smallest dipole prescription consists in replacing 12ᾱ s →ᾱ s (r min ), where r min ≡ min |x−y|, |x−z|, |y−z| and The other prescription, known as fastest apparent convergence (fac), amounts toᾱ s →ᾱ fac , with This last prescription is particularly useful for what follows, in that it simplifies the expression that we shall obtain for ∆S. After a Fourier transform to momentum space, the solution to rcBK can be given the following integral representation: where the factorization of the running couplingᾱ s (k 2 ⊥ ) was possible because of our assumption that k ⊥ is sufficiently hard (recall the discussion after Eq. (3.12)). The functions J rcBK k, 0; X(x) and J v,rcBK k, 0; X(x) are obtained from the respective functions in Eq. (3.14) after replacing S → S rcBK .
We now introduce more schematic notations, as anticipated. Specifically, let us ignore (just in our notations) the transverse momentum convolutions, the inessential numerical factors, and the nonlinear structure of functions like J w.r.t. the dipole S-matrix. Also, in writing the cross-section, we shall omit the quark distribution function. That is, we shall rewrite Eq. (3.12) simply as where the kernel K(x) encodes all the momentum space variables and convolutions (but no dipole S-matrix) and S X(x) succinctly denotes all the factors involving the dipole S-matrix, which could be either linear, or bi-linear, in S. Both 'real' and 'virtual' contributions are implicitly added in the above integral. With these new notations, Eqs. (3.14) and (3.17) become   and respectively where the kernel is now evaluated at x = 0 (or ξ = 1), that is, in the eikonal approximation. Clearly, the compact notation K(0) stays for the LO BK (or dipole) kernel. In particular, the rcBK equation (3.20) is graphically illustrated in Fig. 11. These simpler notations hopefully make clear that the quantity denoted as ∆S must encode all NLO corrections to the BK kernel except for those expressing the running of the coupling. These corrections should be computed in the large-N c limit, for consistency with our previous approximations. They can be inferred by inspection of the NLO version of the BK equation shown in Eq. (5) of Ref. [32]. For convenience, we display the large-N c version of this equation in Appendix C, where we also discuss its collinear improvement, along the lines of Refs. [37,45]. For simplicity, we shall stick here to our schematic notations and refer to Appendix C for more explicit formulae.
The integral version of the NLO BK equation reads (in momentum space and adapted to the kinematics at hand) where K 2 (0) is a compact notation for the NLO piece of the BK kernel alluded to above: its action on S generates all the NLO terms visible in the r.h.s. of Eq. (C.1) except for the running coupling corrections, which are explicitly included in the middle term of Eq. (3.21). More precisely, if one uses the prescription (3.16) for the running coupling, then in constructing K 2 (0) one should omit the NLO terms in Eq. (C.1) which are proportional tob (but keep all the other ones). Besides, the large NLO corrections enhanced by double or single collinear logarithms require all-order resummations ('collinear improvement'), to be shortly described (see also the discussion in Appendix C). By comparing Eqs. (3.19) and (3.21), it is now obvious that ∆S(X g ) must be identified with the third term in the r.h.s. of Eq. (3.21), that we now describe in some detail. As a general rule, the NLO corrections to the BK kernel are obtained by evaluating all the 2-loop graphs which yield a contribution of order α 2 s Y and hence count for a single step in the high-energy evolution (see Ref. [32] for an exhaustive list of diagrams and explicit calculations). A typical such a graph involves a sequence of two gluon emissions, whose longitudinal fractions x 1 and x 2 obey x 1 ∼ x 2 1; that is, the two gluons have similar rapidities, but they are both soft compared to the projectile. The integration variable x in the last term in Eq. (3.21) can be interpreted as x ≡ x 1 + x 2 . The other independent rapidity integration, say over the variable u ≡ x 2 /(x 1 +x 2 ) with 0 < u ≤ 1, is implicitly included in the structure of the NLO kernel K 2 (0). This is possible since the scattering between the 2-gluon system and the target is independent of u to the accuracy of interest 13 . A rather schematic, but intuitive, graphical illustration of Eq. (3.21) is shown in Fig. 12.
At large N c , two successive gluon emissions from the original qq dipole can generate up to three dipoles in the fundamental representation. Accordingly, the third term in Eq. (3.21) involves contributions which are cubic in the dipole S-matrix, together with quadratic and linear terms. All these contributions are visible in Eq. (C.1). To the accuracy of interest, the kinematics of the primary gluon (with energy fraction x = x 1 + x 2 ) can be treated in the LLA. This explains why we were able to use the same lower limit x g in the integral over x and also the same rapidity variable X(x) for the relevant S-matrices as in the middle term in Eq. (3.21), which encodes the LO evolution.
To conclude, the result for the quark multiplicity which is complete to NLO accuracy can be compactly, but schematically, written as 22) and is illustrated in Fig. 13. To deduce a more explicit expression, one should use Eq. (3.12) together with the formula for ∆S shown in Appendix C. Although the above factorization scheme is formally correct to NLO, it might be still too hard, if not impossible, to achieve a full NLO accuracy in a practical calculation. Indeed, as already mentioned, the target evolution is not known to NLO and the relation between the function S X(x) which enters the above factorization and the solution S(x) to the NLO BK equation for the projectile is not known 13 The double integral over x and u also generates contributions of order (αsY ) 2 , which count for 2 steps in the LO evolution. Such contributions are exlicitly subtracted in Ref. [32], via the 'plus' prescription, since already generated via 2 iterations of the middle term in Eq. (3.21). Figure 13: A graphical illustration of the factorization of quark production at NLO, as schematically encoded in Eq. (3.22). The primary gluon emission, with kernel K(x), is included with exact kinematics. The evolution step depicted as two-gluon emission with kernel K 2 (0) succinctly represents the NLO piece of the BK kernel (but without the running-coupling corrections, which were already included in the middle term). Non-linear effects in S, corresponding to multiple scattering, are implicitly included but not explicitly shown.
at the accuracy of interest. Besides, numerical calculations might be hindered by the complexity of the NLO BK equation and of the transverse convolutions implicit in the two integral terms in Eq. (3.22). In view of that, we would like to propose two approximation schemes which we believe have more chances to be transposed in practice. In both schemes, the approximations refer to the dipole evolution, whereas the NLO impact factor, as represented by the kernel K(x) in the middle term in Eq. (3.22), should be treated exactly.
The simplest approximation which is still physically meaningful is the LO approximation (in the sense of rcBK) to the dipole evolution. This is obtained by neglecting the third term in Eq. (3.22) and replacing S S rcBK within the integrand of the middle term. A similar strategy has been used [51,53,57] in relation with the 'plus' prescription [1,2] ; the respective numerical calculations, albeit very complex, turned out to be tractable [67].
The second approximation, which is more ambitious, refers to the use of the collinearly-improved version of the BK equation, as proposed in Refs. [37,45]. We recall that, besides the running coupling corrections, this equation also resums double-collinear logarithms together with a subset of the single collinear logarithms (which includes the respective contribution at NLO). The corresponding approximation to Eq. (3.22) involves two aspects. On the one hand, one must relate the function S X(x) , which encodes the evolution of the target, to the solution to the collinearly-improved BK equation (which refers to the evolution of the projectile). This aspect is particularly important for the middle term in Eq. (3.22) and will be discussed in Appendix B. On the other hand, one must use a simplified version of the NLO BK kernel K 2 (0) which keeps only those corrections to the LO kernel which refer to the collinear improvement. This will be described in what follows.
To that aim, it is convenient to use the coordinate representation: the last term in Eq. (3.22), which we recall corresponds to the piece ∆S(k, X g ) in Eq. (3.12), will be written as ∆S(k, X g ) = d 2 r e −ik·r ∆S(r, x T ) , (3.23) where r = x − y and x T x S z, y; x T x − S x, y; x T x . (3.24) In the above equation, we recognize the dipole kernel M xyz and the running couplingᾱ fac that were introduced before, together with two multiplicative corrections to the kernel, K DLA and K SL , which encode the resummations of double and respectively single collinear logarithms, as alluded to above. Physicswise, K DLA implements the condition of time-ordering for the successive soft gluon emissions by the projectile, whereas K SL resums a subset of the DGLAP logarithms (see Refs. [37,45] for details).
The new rapidity argument x T which appears too in Eq. (3.24) will be later explained. Specifically, K DLA is defined as the function If the double logarithm L xzr L yzr is negative, then one uses its absolute value and the Bessel function J 1 gets replaced by the modified Bessel function I 1 . Note however that if, e.g., (x − z) 2 r 2 , so that L xzr < 0, then (y − z) 2 r 2 and hence L yzr 0. Accordingly, the relatively small daughter dipoles bring no significant contributions to the difference K DLA − 1. Furthermore, where A 1 is the 'gluonic anomalous dimension' of the DGLAP evolution (see e.g. [40] for details). Note that the difference K DLA K SL − 1 starts at O(ᾱ s ), as expected. However, keeping only that lowest-order term in the expansion would artificially enhance the importance of the 'collinear' regions in phasespace -the regions where the successive gluon emissions (or dipole splittings) are strongly ordered in transverse sizes, or momenta. This is the origin of the instability of the strict NLO approximation to the high-energy evolution, as previously mentioned. Vice-versa, the all-order resummation of such corrections within the factor K DLA K SL suppresses the contributions from the 'collinear' regions and thus restores the convergence of perturbation theory. This is rather obvious for the second factor K SL , which exponentially cuts off the configurations where the daughter dipoles are either much smaller, or much larger, than the parent dipole. But this is also true for the other factor K DLA , which, as already mentioned, becomes important only when the daughter dipoles are sufficiently large, such thatᾱ s ρ 2 1. In that case, the Bessel function J 1 2 ᾱ s ρ 2 is rapidly oscillating when varying the position z of the emitted gluon, hence the integral over the regions in space where 'z is large' (in the sense that |z − x| ∼ |z − y| r) averages to zero. To conclude, for gluon emissions which are strongly ordered in transverse momenta, we have K DLA K SL 0 and then the overall kernel in Eq. (3.24) reduces to minus the LO kernel −K(0). In turn, the latter subtracts the soft and collinear contributions to the middle term in Eq. (3.22), that is, it implements the collinear improvement for the LO evolution, as it should.
There is one more aspect of Eq. (3.24) which requires a few words of explanation: the rapidity arguments of the various S-matrices and, related to them, the lower limit x T for the integral over x.
If the evolution of these S-matrices is computed to LO, i.e. according to rcBK, then one can identify x T x g = k 2 ⊥ /ŝ and there is no distinction between projectile and target evolutions. Albeit such an approximation would be formally justified to NLO accuracy, it is still preferable to use the collinearlyimproved evolution, i.e. the BK equation with the resummed kernel M xyz K DLA K SL . Indeed, this would ensure a smooth matching with the middle term in Eq. (3.22), where the collinearly-improved version of the evolution becomes compulsory. Since the latter has been formulated for projectile evolution alone [37,45], one must understand what is the longitudinal phase-space for the evolution in x = q + /q + 0 which corresponds to the physical range for the evolution in X = q − /P − . In coordinate space and to the accuracy of interest, one can write (see e.g. [37]) where we have temporarily introduced the subscripts T ('target') and P ('projectile'), to make the discussion more transparent and we recall thatŝ = 2q + 0 P − . In this equation, Q = 1/r is the dipole resolution scale in the transverse plane and can be also identified with the transverse momentum k ⊥ of the produced quark, via the Fourier transform (3.23); hence, X X g = x g . Furthermore, Q 0 is the saturation scale in the nuclear target at low energy (say, within the MV model). The interesting situation is such that k ⊥ Q s (X g ) Q 0 and therefore x T x g , as indicated too in Eq. (3.27). The relation (3.27) can be understood as follows. The scattering between the small dipole and the nuclear target probes the evolution of the latter down to values of X such that the longitudinal extent ∆x + = 1/(XP − ) of the softest target fluctuations is still smaller than the lifetime 2q + 0 /Q 2 of the dipole projectile. This argument, which selects X = Q 2 /ŝ as anticipated, is merely a variation of our earlier derivation of Eq. (2.1) in Sect. 2.1. If on the other hand the evolution is encoded in the wavefunction of the projectile, then one should allow for the small-x fluctuations with transverse momenta q within the range Q 2 > q 2 ⊥ > Q 2 0 and with large enough lifetimes 2xq + 0 /q 2 ⊥ 1/P − (indeed, 1/P − is the longitudinal extent of the un-evolved target). These conditions imply x q 2 ⊥ /ŝ ≥ Q 2 0 /ŝ, or x ≥ x T , which explains the lower limit in the integral over x in Eq. (3.24).
One may find surprising that the rapidity interval ∆y = ln(1/x T ) available for the evolution of the projectile is much larger than that, ∆Y = ln(1/X g ), allowed for the evolution of the target. But one should keep in mind that the projectile evolution with decreasing x is strongly constrained by the condition of time-ordering, which limits the corresponding transverse phase-space (via the multiplicative correction K DLA to the kernel) and thus reduces the evolution speed. By contrast, there is no similar constraint for the evolution of the target with decreasing X, since in that case the physical condition of time-ordering is automatically satisfied.

Subtracting the leading order evolution: why is this subtle
The factorization scheme that we have constructed in the previous sections, cf. Eqs. (3.12) or (3.22), does not involve any subtraction: there is no over-counting of the relevant perturbative contributions and hence no need for a subtraction. We therefore expect that calculations based on this factorization should yield a positive result for the quark multiplicity. More explicit arguments in that sense will be presented later in this section. This represents a significant improvement over the previous proposal in Refs. [1,2], so it is interesting to better understand the relation between these two schemes.
From the discussion in Sect. 3.1, we recall that in the approach [1,2] on explicitly subtracts the LO evolution of the dipole S-matrix from the NLO correction to the impact factor. Such a subtraction can also be performed within our present approach. To synthetically describe that, we shall again use the schematic notations introduced in the previous subsection. Using Eq. (3.21), we can express the last term in the NLO cross-section (3.22) (a NLO correction to the dipole evolution) as the difference between the dipole S-matrix at NLO and its LO evolution: The r.h.s. of this equation is not exactly the same as the difference S − S rcBK , but it is very close to it. (The expression between the square brackets would reduce to S rcBK if the S-matrix under the integral would be itself approximated by rcBK; recall Eq. (3.31).) So, clearly, Eq. (3.28) expresses a NLO correction which is a priori small as the difference between two quantities which are individually large: each of them includes the LO contribution S rcBK . Inserting Eq. (3.28) into Eq. (3.22), one deduces an alternative expression for the NLO cross-section, 29) in which the LO evolution (as represented by the kernel K(0)) is explicitly subtracted from the impact factor. Since the difference K(x) − K(0) vanishes as x → 0, the above integral is controlled by large values x ∼ 1 and it does not generate a small-x logarithm anymore. That is, the second term in the r.h.s. of Eq. (3.29) is a pureᾱ s correction, that can be viewed as the NLO contribution to the impact factor. The NLO correction to the dipole evolution is now fully encoded in the first term S(X g ). Indeed, to the NLO accuracy of interest, the S-matrix within the integral can be evaluated by using the LO approximation, S S rcBK . Eq. (3.29) is very similar to the proposal in Refs. [1,2], which in our schematic notations reads The main difference w.r.t. Eq. (3.29) is that the S-matrix within the above integral over x is not evaluated at the x-dependent rapidity argument X(x), but rather at its endpoint value X g = X(1). In other terms, Eq. (3.30) is local in the target rapidity X g , as standard for the k ⊥ -factorization. In spite of this difference, the results in Eqs. (3.30) and (3.29) are perturbatively equivalent to NLO accuracy. Indeed, since the integral over x in (3.29) is controlled by x ∼ 1, there is no loss of NLO accuracy if one replaces S X(x) → S(X g ). This can be easily checked by using the dominant energybehavior of the BK solution in the weak scattering regime, namely S(X) ∝ 1/X λ with λ = O(ᾱ s ).
This being said, one should keep in mind that the limiting value S(X g ) is strictly larger than S X(x) for any x < 1, because the function S(X) increases when decreasing X. Hence, albeit formally allowed to the accuracy of interest, the replacement S X(x) → S(X g ) could still be troublesome, in that it might result in an over-subtraction. Indeed, as discussed in [56], the difference K(x) − K(0) is strictly negative for sufficiently large k ⊥ Q s . Hence, the negative correction to the impact factor at large k ⊥ is over included in (3.30) as compared to (3.29), a feature which might contribute to the 'negativity' problem under consideration.
In order to discuss this problem -the fact that the cross-section computed according to Eq. (3.30) becomes negative at sufficiently large (but still semi-hard) k ⊥ -, it is useful to keep in mind that previous numerical evaluations of Eq. (3.30) used approximate versions of the dipole S-matrix, like S rcBK [51,53,57]. Hence, it would be interesting to understand why Eq. (3.30) with S → S rcBK can lead to a negative cross-section at large transverse momenta.
To that aim, let us consider the rcBK approximation to the 'subtracted' version of our factorization scheme, cf. (3.29). This reads Via the rcBK equation (3.20), this is equivalent to which is of course the same as our 'un-subtracted' result in Eq. (3.18) with ∆S = 0 and S → S rcBK . We believe that the r.h.s. of Eq. (3.32) should be positive. Indeed, when the primary gluon emission is treated in the eikonal approximation (K(x) → K(0)), Eq. (3.32) is the same as the integral representation of S rcBK , which is well known to be positive definite (at least in coordinate space). With the actual kernel K(x), the value of the integral term should be somewhat reduced (since the correct phase-space for the primary gluon emission is smaller than its eikonal estimate), but this should remain positive, as it describes the growth of the dipole S-matrix via gluon emissions.
Hence, the 'subtracted' result in Eq. (3.31) should be positive as well. In spite of that, we believe that explicit numerical calculations based on Eq. (3.31) may run into difficulties (in particular, yield a negative result) because of the high degree of 'fine-tuning' inherent in the subtraction method: in going from Eq. (3.32) to Eq. (3.31), we have added and subtracted the same quantity (S rcBK ), but we have done that in a very peculiar way: we have added the l.h.s. of Eq. (3.20), that is, S rcBK itself, but we have subtracted the r.h.s. of Eq. (3.20), that is, the integral representation of S rcBK . This procedure leaves the result unchanged if and only if the function S rcBK is an exact solution to the integral equation (3.20). But any numerical approximation in solving rcBK, or in the Fourier transform S(r, X) → S(k, X), may lead to an imbalance between the terms that have been added and respectively subtracted.
For instance, such an imbalance could be introduced by the treatment of the running coupling corrections, which in general is not fully coherent between the coordinate-space and the momentumspace representations. In practice, the momentum-space version of the rcBK equation, as shown in (3.20), is not the exact Fourier transform of the respective equation in coordinate space: the Fourier transform is not also applied to the running of the coupling. So, strictly speaking, there is some mismatch between Eqs. (3.31) and (3.32) even when using the exact solution to the rcBK equation, as obtained in coordinate space. When computing the cross-section according to Eq. (3.30), this mismatch can be further enhanced by the fact that the function S rcBK X(x) gets replaced by its maximal value S rcBK (X g ) (cf. the discussion after Eq. (3.30)).
From the above discussion, we see that the rcBK equation plays the role of a 'self-consistency condition' for rewriting the cross-section in a 'subtracted' form. To illustrate the importance of having 'good' solutions to this equation, let us consider a somewhat extreme example, where this condition is strongly violated. (A similar discussion can be found in [56].) Namely, we consider the popular approximation in which the dipole S-matrix is taken from the GBW model [76], which is a Gaussian: where Q 2 s (X) = Q 2 0 (X 0 /X) λ with λ 0.3. Clearly, this is a very poor approximation at high k ⊥ Q s , where it decays exponentially, in sharp contrast with the power law tail 1/k 4 ⊥ predicted by pQCD. Using this particular model within Eq. (3.29), one finds The 'LO' piece S GBW exponentially vanishes at large k ⊥ Q s , whereas the 'subtracted' piece shows a tail ∝ 1/k 4 ⊥ , since obtained by iterating once the LO BK equation. So, the 'subtracted' piece is not only larger than the 'LO' one, but it is the only one to survive at large k ⊥ . Hence, at k ⊥ Q s the overall result reduces to the second, 'NLO', piece, which is negative in that particular region of phase-space [56], as already mentioned. This is in agreement with the numerical findings in [56,57].
To summarize, albeit Eqs. (3.31) and (3.32) are in principle equivalent with each other, the second equation is probably safer to use in practice. A similar discussion applies to the full NLO crosssection: the 'unsubtracted' factorization (3.22) should provide a meaningful result which is positive when evaluated with either the NLO S-matrix S, or with its collinearly-improved version S collBK . On the other hand, the 'subtracted' version (3.29), which may look appealing since structurally simpler, is likely to be more tricky to use in practice.

Summary and conclusions
In this paper, we have established a factorization scheme allowing the computation of single-inclusive particle production at forward rapidities in proton-nucleus collisions at next-to-leading order in pQCD, in the presence of the non-linear effects associated with the high gluon density in the nuclear target. The main difference with respect to the previous proposal in [1,2] is that our result involves no rapidity subtraction, meaning that it is free of the fine-tuning problem inherent in any such a subtraction scheme. The 'fine-tuning' refers to the fact that the numerical solution to the evolution equation for the dipole S-matrix must be precisely known to ensure that the quantity which is included as 'LO evolution' is properly subtracted from the 'NLO correction to the impact factor'. Further approximations within the subtraction scheme, which are formally allowed to NLO accuracy, or even small numerical errors in the associated calculations, may spoil such a fine cancellation and lead to unphysical results. In our opinion, this is the reason why the cross-section computed within this scheme appears to turn negative at sufficiently large transverse momenta [51,53,56,57]. By contrast, our scheme is more robust and should converge to physical, positive-definite, results with considerably less numerical efforts.
Our factorization scheme relies on the skeleton structure of perturbative QCD up to two-loop order, which is the relevant loop-order for the NLO calculation. The most general version of our result is exhibited in Eq. (3.7). This version however is probably too complicated to be used in practice. Fortunately, important simplifications become possible in the interesting situation where the transverse momentum k ⊥ of the produced quark is relatively large, k ⊥ Q s . In that case, the primary gluon is hard as well, p ⊥ ∼ k ⊥ , and the general formula (3.7) can be then replaced by Eq. (3.12) [or (3.13)], which is considerably simpler. The latter shows the same degree of complexity, in terms of transverse integrations, like the formulae already used in practice in relation with the subtraction method. So, we are confident that Eq. (3.12) can indeed by explicitly evaluated, via the numerical tools developed in [51,53,56,57,67]. As discussed in Sect. 2.4, the fact that we can approximate p ⊥ ∼ k ⊥ is specific to the problem of particle production. This would not apply, say, to deep inelastic scattering, where the dipole evolution must be computed in transverse coordinate space. In that case, if the parent dipole is sufficiently small (r 1/Q s ) -as appropriate for DIS at relatively high virtuality Q 2 Q 2 s -, then the first gluon emission in the high-energy evolution is considerably softer (it typically carries a transverse momentum p ⊥ 1/r) and the dependence upon its transverse kinematics cannot be simplified at NLO. This is visible in the NLO calculations of the DIS impact factor [77][78][79][80].
To explicitly evaluate the quark multiplicity according to Eq. (3.12), one also needs a suitable approximation for the high-energy evolution of the dipole S-matrix. The simplest such an approximation which is still meaningful for phenomenology is the LO BK equation with running coupling (rcBK) [68][69][70]. A more accurate treatment of the evolution can be obtained by using the collinearly-improved BK equation [37,[44][45][46][47][48], which resums the double collinear logarithms inherent in the 'hard-to-soft' evolution of the projectile together with a subset of single collinear logs. The inclusion of the collinear improvement in the problem at hand is quite subtle, due to the need to properly identify the rapidity phase-space for the evolution of the dilute projectile (the incoming quark, or the pair made with this quark and the hard primary gluon). This is clarified in Sect. 3.4 and Appendix B.
An important lesson emerging from our analysis is that one should not always insist in writing the result of a NLO calculation in pQCD at high-energy in a 'k ⊥ -factorized form', which is local in rapidity. This can be best appreciated by comparing our results in Eqs. (3.22), (3.29), and (3.30). Eq. (3.22) is obtained via a direct evaluation of the relevant, one-loop and two-loop, diagrams. As such, it does not involve any rapidity subtraction at one-loop level: the would-be NLO correction to the impact factor and the first step in the high-energy evolution of the color dipole are both encoded in the middle term in Eq. (3.22). In Eq. (3.29), the two contributions that we just mentioned are explicitly separated from each other, via a rapidity subtraction, but the result is still not 'factorized': the O(α s )-correction to the impact factor and the dipole S-matrix are still entangled by the rapidity integral over x (besides the transverse momentum convolutions which are implicit in our schematic notations). Finally, Eq. (3.30) expresses a genuine k ⊥ -factorization: the r.h.s. is local in the target rapidity X g and the NLO impact factor is explicitly factorized (in so far as the rapidity dependence is concerned) from the dipole S-matrices, which encode the high-energy evolution 14 . These three representations for the NLO cross-section, (3.22), (3.29), and (3.30), are all consistent with each other to NLO accuracy. Yet, the two representations involving a rapidity subtraction, Eqs. (3.29) and (3.30), are potentially affected by the issue of fine-tuning that we have identified in this paper. By contrast, the original result in Eq. (3.22) is free from this problem and therefore it should provide a positive-definite estimate for the cross-section.
We would like to conclude this section with some prospects for the extension of this factorization program beyond NLO. Our factorization scheme suggests an interesting pattern which is likely to survive beyond the present approximations. The distinguished feature of this pattern is the fact that the higher-order corrections to the impact factor are not explicitly separated from the relevant corrections to the high-energy evolution. To illustrate this point, let us consider quark production at NNLO. Without any claim to completeness and leaving aside the possibility of new contributions which go beyond the dipole picture, let us here indicate the generic structure that we expect in view of our previous analysis in this paper. Namely, we expect the NNLO cross-section to include the following four terms (in schematic notations, similar to those introduced in Sect. 3.4) where it is now understood that the dipole S-matrices evolve according to the (presently unknown) NNLO version of the BK equation. The argument of the running couplings is taken as k 2 ⊥ , for illustration, but this should be the right choice only in the limit where the transverse momentum of the produced quark is larger than any other transverse scale in the problem. Also, the lower limit x g on the integrals over x is purely illustrative: the actual limit should depend upon the exact kinematics. The first two terms in the r.h.s. of Eq. (4.1) have the same structure as the respective terms in Eq. (3.22); in particular, the second term encodes the NLO impact factor and the LO piece of the BK kernel. The third term is a generalization of the last term in Eq. (3.22): it encodes the NLO piece of the BK kernel together with the NNLO correction to the impact factor. This term is generated by 2-loop graphs where the 2 emitted partons (say, gluons) can have arbitrarily energy fractions x 1 and x 2 (with x 2 ≤ x 1 for definiteness), so their emissions must be computed with exact kinematics. The sum of all such graphs is schematically represented by the kernel K 2 (x 1 , x 2 /x 1 ). The subtraction of K 2 (x 1 , 0) is needed to avoid the over inclusion of the first two steps in the high energy evolution, that were already included in the second term, with kernel K(x). Finally, the fourth term in Eq. (4.1) represents the NNLO correction to the BK kernel, as generated by 3-loop graphs in which the 3 gluons are all soft, but close in rapidity to each other: x 1 ∼ x 2 ∼ x 3 1. Clearly, in the limit where the very first emission by the projectile is soft and computed in the eikonal approximation, the r.h.s. of Eq. (4.1) becomes proportional to the integral representation of the NNLO BK equation.

A Cancellation of shortly lived virtual fluctuations
As we have seen, the minus longitudinal momentum p − = p 2 ⊥ /2p + of the primary gluon is constrained by light-cone energy conservation, that is, by Eq. (2.8). This constraint however exists only for the gluons which are crossing the cut (albeit eventually non measured), but not also for those which are emitted and reabsorbed on the same side of the cut. In principle, such 'virtual' gluons are allowed to have very short lifetimes 2p + /p 2 ⊥ < 1/P − , however we shall show here that their respective contributions cancel exactly. To that aim, we shall separately consider the two possible regimes in terms of the ratio p ⊥ /k ⊥ . We recall that k ⊥ is the transverse momentum of the measured quark and for the virtual diagrams it coincides with the momentum transferred by the target via scattering.
(i) p ⊥ k ⊥ . This is the simplest case, as the virtual fluctuation cannot be resolved by the scattering (the transverse separation ∆x ⊥ ∼ 1/p ⊥ between the virtual gluon and its parent quark is much smaller than the transverse resolution ∼ 1/k ⊥ of the exchanged gluon). Accordingly, all the diagrams shown in Fig. 14 are equally weighted by the S-matrix, so their sum must vanish by probability conservation. In practice, this occurs because the two self-energy graphs in figures (a) and (c) have a different sign as compared to the vertex correction in figure (b) and also an additional factor 1/2.
(ii) p ⊥ k ⊥ . This case is much less trivial than (i), since now one has to also consider the diagrams in which the virtual gluon with momentum p is emitted and/or absorbed by the exchanged gluon. For the calculation that follows, it is convenient to change our apporach and use Light Cone Perturbation Theory (LCPT); see e.g. [20] for an introduction.
To this end, both the virtual and the 'exchanged' gluon are viewed as part of the projectile wavefunction and all the respective graphs are shown in Fig. 15. For simplicity, but without any loss of generality, we have taken the target to be a single quark. The four-momenta involved in the process are q 0 = (q + 0 , 0, 0) for the incoming projectile quark, P = (0, P − , 0) for the incoming quark from the target, = ( + , − , −k) for the 'exchanged' gluon and p = (p + , p − , p) for the fluctuation. According to the rules of LCPT, all internal lines carry a positive plus longitudinal momentum and all particles are on-shell, meaning that the respective minus components are fixed by the on-shell condition; e.g. − = k 2 ⊥ /2 + and p − = p 2 ⊥ /2p + . Our convention in drawing Fig. 15 is that momenta flow from the left to the right (since this is the natural flow direction for the plus components). In particular, the four-momentum of the -gluon has the opposite flow from the one in the main text, but we have set ⊥ = −k ⊥ so that the respective transverse momenta are still the same.
The target quark must remain on-shell after the scattering, i.e. in the final state crossing the cut.
This condition fixes its minus longitudinal momentum, that is, Now we invoke light-cone energy conservation (between the initial and final states), which implies Combining Eqs. (A.1) and (A.2) and using once more + q + 0 , we see that (P + ) − P − − and more importantly we can determine the plus component of the -gluon which reads Looking deeply into the regime 2p + /p 2 ⊥ 1/P − , which means that the p-gluon is a fluctuation which has by far the shortest lifetime, and using p ⊥ k ⊥ , we arrive at the strong ordering condition p + + .
We are not going to give a detailed calculation for all the diagrams appearing in Fig. 15, but only deal with some representative cases. Therefore, let us start by considering the diagram B2, which is decomposed in terms of the basic tree-level graph and the loop correction as shown in Fig. 16. In LCPT, we can write this loop correction as where t a are the SU(3) generators in the fundamental representation and λ (p) are the polarization vectors in the projectile light-cone gauge A + = 0, given by with λ two complex orthonormal two-dimensional vectors. Notice that since p + is very small, λ− is the dominant component of the polarization vector. Using q 0 = 0, q + 0 + p + and the condition in Eq. (A.3) which fixes + , one readily finds that the dominant term within both energy denominators and in the regime of interest (i.e. for p ⊥ k ⊥ and 2p + /p 2 ⊥ 1/P − ) is the light-cone energy p 2 /2p + of the virtual gluon. This was to be expected since, as already said, the p-line is by far the one with the shortest lifetime. Regarding the inner products, we have γ · (q 0 − p − ) γ · (q 0 − p) γ − q + 0 and γ · λ (p) γ + λ · p/p + . Then, by using t a t c t a = −(C F − N c /2)t c and putting aside a factor t c γ + which is part of the tree-level graph, we eventually arrive at One can calculate in a similar fashion the diagrams A, B1, D and E to find can be now attached to the gluon "exchange" -line. Four-momenta are shown in the graphs and flow from the left to the right. As in the text, q 0 = (q + 0 , 0, 0), P = (0, P − , 0), = ( + , − , −k) and p = (p + , p − , p), with − = k 2 ⊥ /2 + and p − = p 2 ⊥ /2p + . Just for C1, and for our convenience in the calculation, we let p → −p.
The factor −1/2 for diagrams A and D emerges from the calculation and it is the usual factor associated with the wavefunction renormalization. A similar argument applies to diagram E, however in that case we have put the factor 1/2 by hand, since half of the full result should be attached to the lower vertex of the -line. Now let us move to the C diagrams, for which it is not hard to see that they will have the same color factor N c /2 with diagram B1. We have where each term in the square bracket represents the respective contribution from C1, C2 and C3. The 4-vector n, appearing in the instantaneous term, is such that n · υ = υ + for any vector υ α . In Eq. (A.8) we have already simplified the energy denominators (in particular we have neglected the terms suppressed by 1/q + 0 ) and kept only the terms that will contribute to the final result (notice also that in C1 we have let p → −p in order to combine them in an elegant way). Since λ− is the large component of the polarization vector, we need to keep only the −− component of the tensor structure in the square bracket (notice that the indices here have been raised), which becomes which can be associated with the quark wavefunction renormalization factor, more precisely with where the r.h.s. follows since Z 2 − 1 ∼ α s 1. Similarly graph A gives another factor of 1/2(Z 2 − 1), while graphs B can be identified with the quark-gluon vertex renormalization factor Z 1 to order α s : Therefore, it is instructive to rewrite the sum of the one-loop virtual corrections as follows so that the first factor in the square bracket can be identified with 1 + A| loop , the second with 1 + (B 1 + B 2 )| loop and the last with 1 + (C + D + E)| loop . The fact that our calculation gives Z 2 − 1 = −(1/Z 1 − 1) at order α s , making the right hand side in Eq. (A.17) equal to zero, can be interpreted as the condition Z 1 = Z 2 valid in light cone gauge. At a first glance, it may seem strange to identify our calculation with an evaluation of renormalization constants when p ⊥ is not large, however one can check that the Slavnov-Taylor-Ward identities indeed require 2A + B = 0, as explicitly shown above.

B Target versus projectile evolution beyond LO
The NLO expressions for the quark multiplicity in Eqs. (3.12) or (3.13) involve dipole S-matrices like S q, X(x) , which in the main text have been assumed to follow the target evolution with decreasing X = q − /P − , from X = X 0 down to X(x) 1. The viewpoint of target evolution was indeed more convenient at a conceptual level, for developing a physical picture and also for formulating kinematical constraints like the energy conservation (2.8). This is however less convenient in practice, because the high-energy evolution of a dense nucleus is not known beyond LO (i.e. beyond rcBK). Yet, as we shall now explain, this unknown target evolution can be replaced for the present purposes with the evolution of the dilute projectile, which is indeed known to the accuracy of interest.
More precisely, what is known is the NLO version of the BK equation in coordinate space [32][33][34] together with its 'collinear improvement' [37,[44][45][46]. When adapting these equations to the problem at hand, the only subtlety refers to the proper identification of the longitudinal phase-space for the evolution of the projectile -that is, the interval in p + which is spanned by the evolution gluons. This identification was straightforward at LO, where the 'plus' and 'minus' rapidities can be identified with each other, as discussed in Sect. 2.3, but it is less trivial beyond LO, where the exact kinematics becomes important (recall also the discussion towards the end of Sect. 3.4).
When discussing projectile evolution in what follows, one should keep in mind that the relevant 'projectile' is the right-moving partonic pair made with the incoming quark and its primary gluon. The kinematics of that pair, in particular the 3-momentum (p + = xq + 0 , p) of the primary gluon, must be considered as fixed for the purposes of the evolution -this matter only for the boundaries of the evolution phase-space. The evolution rather refers to the three dipoles S-matrices S(q), S( ), and S(k) which appear in Eqs. (3.2) and (3.3) -and hence implicitly in equations like (3.7) or (3.12) -and describe the scattering between the primary quark-gluon pair (the 'projectile') and the nuclear target (whose wavefunction is not evolving anymore, as we now work in the infinite momentum frame of the projectile). These dipoles, that we referred to as 'daughter dipoles' when discussing the primary emission in Sect. 3.1, will now act as parent dipoles for the high-energy evolution. That is, the 'evolution gluons' will be soft gluons which belong to the wavefunctions of these three dipoles and whose longitudinal momenta p + i = x i q + 0 are necessarily smaller than the corresponding momentum p + = xq + 0 of the primary gluon. In what follows, we shall pick one of them, say S(q), and study its high-energy evolution beyond LO. Previously, we have (formally) expressed the result in terms of the evolution of the target, like S k, X(x) . In what follows, we would like to compute the same quantity from the evolution of the dipole S(q) itself, which is known beyond NLO.
Let us first recall, from the discussion in Sect. 2.3, that at LO one simply has S T q, X(x) = S P q, x g /x , where we have temporarily introduced the subscripts T ('target') and P ('projectile'), to make the discussion more transparent. The rapidity argument x g /x of S P can be understood as follows: the evolution variable is the longitudinal fraction z i ≡ x i /x of a generic evolution gluon w.r.t. the parent dipole. At LO, this is bounded by x g /x < z i < 1, where the lower limit comes from the kinematical limit X(x i ) = x g /x i ≤ 1. In other terms, the function S P q, x g /x encodes the probability to find a gluon with longitudinal fraction x g within the quark-gluon projectile (see also Fig. 1).
In what follows, we shall argue that beyond LO, the above relation should be extended to where Q 0 is the 'infrared cutoff' introduced by the initial condition at x 0 ∼ 1, e.g. the target saturation momentum at low energy (and the lowest transverse momentum scale in the problem). As before, x T refers to the softest gluon in the wavefunction of the projectile which is involved in the collision, whereas x P to the hardest gluon that can be emitted by the primary quark-gluon pair. The differences between x T and x g and, respectively, between x P and x, are consequences of time-ordering, as we shall shortly explain. It is understood that the function S P obeys a suitable 'beyond LO' version of the BK equation, which includes 'collinear improvement' [37,[44][45][46][47][48] -that is, which includes an all-order resummation of the radiative corrections enhanced by double collinear logarithms. This equation, conveniently written as a differential equation in the plus rapidity Y ≡ ln(x/x T ), must be integrated from Y 0 0 (where one can use the initial condition S 0 from the MV model) up to Y P ≡ ln(x P /x T ). Full NLO accuracy can be achieved by using the NLO version of the BK equation [32] amended by collinear improvement [37,45] (the feasibility of such a calculation has been demonstrated in [48]). To justify Eq. (B.1), consider the evolution of the original quark-gluon pair via successive emissions of soft gluons (see Fig. 17). We focus on the 'collinear regime' at k ⊥ Q 0 , where the collinear resummations are actually needed. Then the projectile evolution looks rather similar as at LO, in the sense that it is dominated by gluon emissions which are strongly ordered in both longitudinal (x x 1 x 2 · · · x T ) and transverse (p ⊥ p ⊥1 p ⊥2 · · · Q 0 ) momenta, but which beyond LO must be also ordered in time : ∆x + > ∆x + 1 > ∆x + 2 · · · > 1/P − . In the above, x = p + /q + 0 and p ⊥ = |p| refer to the kinematics of the primary gluon, as before, while x i = p + i /q + 0 and p ⊥i = |p i |, with i = 1, 2, . . . n, refer to the subsequent gluons in the cascade, which are softer. The 'lifetime' 15 of the ith generation, as given by the uncertainty principle or by the respective energy denominator, is roughly ∆x + i 2p + i /p 2 ⊥i . The 'lifetime' ∆x + of the primary Figure 17: A sequence of emissions contributing to the evolution of the quark-gluon projectile in the collinear regime. Both longitudinal and transverse momenta are strongly ordered down the cascade. The transferred momenta q ⊥i are relatively soft, q ⊥i p ⊥i , hence we also have q ⊥ p ⊥1 , q ⊥1 p ⊥2 , etc.
gluon can be similarly estimated as ∆x + 2p + /p 2 ⊥ or, more precisely (after using the complete energy denominator for the quark-gluon fluctuation), where in the final equality we have used p ⊥ ∼ k ⊥ , in line with the arguments leading to Eq. (3.12). Notice that, since both longitudinal (p + i ) and transverse (p ⊥i ) momenta are simultaneously decreasing during the evolution, the condition of time-ordering, ∆x + i > ∆x + i+1 , is not automatically satisfied: the LLA includes unphysical configurations for which this condition is in fact violated. This is corrected when going beyond LO, but the respective corrections are large (since they have to 'subtract' for double-logarithmic regions in phase-space) and hence must be resummed to all orders. This is achieved by the 'collinear resummations' aforementioned [37,[44][45][46], which essentially amount to enforcing time-ordering within the projectile evolution.
Specifically, for the first emission, the condition ∆x + 1 < ∆x + immediately implies x 1 x P with x P as defined in Eq. (B.1). (To obtain this, we have also used the fact that p ⊥1 q ⊥ k ⊥ , cf. Fig. 17.) That is, the longitudinal momentum of the first gluon which counts for the evolution of S P (q) is not limited (from the above) by the respective momentum p + = xq + 0 of the primary quark, but by the generally smaller value x P q + 0 . Similarly, the condition that the lifetime ∆x + n of the last (i.e. softest) emitted gluon be larger than the longitudinal extent 1/P − of the target, implies where we have also used q + 0 /Q + = x p /(1 − x) for the 'real' terms. These considerations confirm that the longitudinal phase-space for the high-energy evolution of the quark-gluon 'projectile' is given by x T < x i < x P , in agreement with Eq. (B.1).
Note that rapidity interval Y P = ln(x P /x T ) available for the evolution of the projectile is larger (when q ⊥ Q 0 ) than the corresponding interval Y = ln 1/X(x) for the evolution of the target: This difference is compensated by time-ordering, which effectively reduces the phase-space for the evolution of the projectile. emitted and reabsorbed either before, or after, the scattering. This 'virtual' term ensures that the potential 'ultraviolet' singularity due to the 1/(u − z) 4 factor in the kernel is in fact harmless. The same discussion applies to the second DI term, proportional to N f , which represents the case that the additional partons at the time of scattering are a quark and an antiquark, and thus the number of dipoles involved in the scattering is just two. The issue with the NLO BK equation given in (C.1) is that there are terms in the kernels which can get large in certain kinematic regimes, thus invalidating the strictᾱ s -expansion. The first class of such terms contain the corrections proportional tob in the SI term in Eq. (C.1), and has already been discussed in Sect. 3.4. The scale µ should be chosen in such way, that the logarithms of transverse dipole sizes proportional tob (and only those) become innocuous in any kinematic regime. It is trivial to see that both choices suggested in Sect. 3.4 will cancel any potentially large logarithmic contribution. In fact, our fac prescription, cf. Eq. (3.16), is by definition the one in which the sum of all terms proportional tob vanishes identically. Now we wish to discuss NLO corrections enhanced by logarithms associated with large separations in transverse sizes (or momenta) between successive emissions. These 'collinear' corrections become large only in the weak-scattering regime where all the dipoles are small compared to the target saturation scale 1/Q s , so that we can linearize w.r.t. to the scattering amplitude T . At this level, we can drop the term proportional to N f /N c in Eq. (C.1) since it vanishes after linearization, as one can check by using the symmetry of the kernel under the interchange u ↔ z. More precisely, we consider the strongly ordered regime 1/Q s |z − x| |z − y| |z − u| |u − x| |u − y| |x − y|, (C. 2) which means that the parent dipole is the smallest one, a gluon is emitted far away at u, a second one even further at z, but all the possible dipole sizes remain smaller than the inverse saturation momentum. Denoting by r,ū andz the size of the parent dipole, the size of the dipoles involving u and the size of the dipoles involving z, respectively, we have r 2 ū 2 z 2 . Inspecting the SI piece in the NLO BK equation (C.1), it is obvious that the term which involves the double transverse logarithm (DTL) is the dominant one in the kinematic region defined in Eq. (C.2). In this regime we are also allowed to write S xz S zy − S xy −T xz − T zy + T xy −2T (z), with the second approximate equality arriving from the fact that the dipole amplitude for a small dipole is roughly proportional to the dipole size squared. Thus, the net result comes from the 'real' term, i.e. the one which involves the large daughter dipoles.
A single transverse logarithm (STL) is hidden in the DI term. To see that, let us isolate the kernel which in the collinear regime can be successively written as where φ is the angle between r and any of the two dipoles involving u. To arrive at (C.4), we first set all dipole sizes which include z equal to each other, since any subleading term would be highly suppressed by inverse powers ofz. This simplifies significantly the expansion in the regime of interest, since the only z dependence left is the one explicit in the prefactor. Then we have taken the limit r ū and finally we have performed an average over the angle φ between the parent dipole and those involving u. As evident in Eq. (C.4), the would-be leading term of order 1/z 4 has cancelled and the first non-vanishing term is suppressed by the power factor r 2 /ū 2 . Now by linearizing the S-matrices multiplying M STL and realizing, as in the DTL case, that the 'real' term dominates, we can approximate S xu S uz S zy − S xu S uy −T uz − T zy + T uy −2T (z). Since this is independent of the intermediate dipole sizeū, the integration over the latter, within the range limited by r andz, leads to the anticipated STL.
Thus, putting together the LO term and the two NLO ones enhanced by the STL and the DTL, one finds that the NLO BK equation in the collinear regime (C.2) reduces to δT (r; x T ) =ᾱ s Clearly, for sufficiently large daughter dipoles, the NLO contributions are enhanced by large transverse logarithms and become comparable to, or larger than, the LO one. Then, the present perturbative expansion of the kernel, which is of fixed order inᾱ s , cannot be trusted anymore. Eventually this problematic behavior is transmitted to the solution of equations like (C.5) and (C.1) which becomes unstable, as indeed seen in [35][36][37]. Therefore, in order to have a meaningful evolution equation, we need to identify the physical origin of the large transverse logarithms and subsequently resum them to all orders in the couplinḡ α s . It should be obvious by now, that such higher order terms become more important than the purē α 2 s NLO terms (that is, the O(ᾱ 2 s ) corrections not enhanced by any transverse logarithm), so that the latter will be discarded in what follows. This is the approach taken in [37,45] leading to the collinearly improved BK evolution equation which admits a stable solution. Before giving this equation, we shall first write its collinear limit, i.e. we shall first resum the kernel appearing in Eq. (C.5).
The origin of the DTLs is in the kinematics [37,44]. Our main observation in [37] was that these large corrections arise from LCPT Feynman graphs in which the successive gluon emissions are not only strongly ordered in both longitudinal momenta and transverse momenta (or 'dipole sizes'), but are also ordered in lifetimes (or, equivalently, in light-cone energies [44]). The all-order resummation of the double collinear logarithms [ᾱ s ln 2 (z 2 /r 2 )] n , with n = 1, 2, 3, . . . , eventually leads to a modification of the kernel by a multiplicative factor given in terms of the Bessel function J 1 (see below).
The STLs find their origin in the DGLAP dynamics [45]. Since each power ofᾱ s is accompanied by a collinear logarithm, it is intuitively clear that such terms must represent DGLAP corrections to the BFKL dynamics. This is further confirmed by the fact that the coefficient −11/12 in front of the single logarithm in Eq. (C.5) can be recognized as the second-order term in the small-ω expansion of the largest eigenvalue P(ω) of the DGLAP anomalous dimension matrix in the large-N c limit, more precisely where the second term of A 1 can be dropped when working in the large-N c limit. Like the factor 1/ω generates a small-x gluon, the piece −A 1 corresponds to the most dominant sub-leading gluon emission (the term linear in ω would correspond to the next-to-most dominant one and so on). It is not hard to follow the combinatorics for an arbitrary number of such gluon emissions associated with A 1 , and one finds that the resummation of these [ᾱ s ln(z 2 /r 2 )] n terms leads to the exponentiation of the NLO correction. Of course this is only a partial resummation of STLs, but it is well-defined and in the spirit of the ω-expansion introduced and developed in [39,40,42]. Higher orders in ω would be related to next-to-most dominant gluons and beyond, and the corresponding STLs would start at higher orders in perturbation theory (NNLO or higher). Putting together the two types of resummation, one finds that the collinear kernel in Eq. (C.5) should be replaced by [37,45] r 2 z 4 J 1 2 ᾱ s ln 2 (z 2 /r 2 ) ᾱ s ln 2 (z 2 /r 2 ) r 2 z 2 ᾱsA 1 . (C.7) The last step in our construction consists of matching the above kernel with the LO BFKL/BK one in a consistent way and reinserting the virtual and non-linear terms in the respective evolution equation.
Then one arrives at the local equation where the precise form of the kernels K DLA and K SL has been given in Eqs. (3.25) and (3.26).