Single Inclusive Jet Production in $pA$ Collisions at NLO in the small-$x$ regime

We present the first complete NLO prediction with full jet algorithm implementation for the single inclusive jet production in $pA$ collisions within the CGC effective theory. Our prediction is fully differential over the final state physical kinematics, which allows the implementation of any IR safe observable including the jet clustering procedure. The NLO calculation is organized with the aid of the power counting proposed in [1] which gives rise to the novel soft contributions in the CGC factorization. We achieve the fully-differential calculation by constructing suitable subtraction terms to handle the singularities in the real corrections. The subtraction contributions can be exactly integrated analytically. We present the NLO cross section with the jets constructed using the anti-$k_T$ algorithm. The NLO calculation demonstrates explicitly the validity of the CGC factorization in jet production. Furthermore, as a byproduct of the subtraction method, we also derive the fully analytic cross section for the forward jet production in the small-$R$ limit. We show that in the small-$R$ approximation, the forward jet cross section can be factorized into a semi-hard cross section that produces a parton and the semi-inclusive jet functions. We argue that this feature holds for generic jet production and jet substructure observables in the CGC framework. Last, we show numerical analyses of the derived formula to validate our calculations. We justify when the small-$R$ approximation is appropriate. Like forward hadron production, the obtained NLO result also exhibits the negativity of the cross section in the large jet transverse regime, which signals the need for the threshold resummation. A sketch of the threshold resummation in the CGC framework is presented based on the multiple emission picture.

B Contributions from the other channels 51 B.1 g → g channel 51 B.2 g → q channel 54 B.3 q → g channel 55 C The small-R limit for other channels channel 56 C.1 g → g channel 56 C.2 g → q(q) channel 57 C.3 q → g channel 58

Introduction
Hard probes are among the most fundamental tools to understand the internal structure of hadrons. The partonic fields bound within the hadron can fluctuate and radiate virtual quanta that could live for a short time period, therefore probes such as (virtual) photons, will snapshot the hadron interior at different time resolution scales [3,4]. In the highenergy limit, the probe gets dramatically boosted, revealing smaller and smaller values of the Bjorken-x and observing a substantially growing gluon density. In the sufficiently small-x region, the gluons are so close to each other, initiating the recombination in which two gluons annihilate into one. In the small-x domain, the gluon recombination is equally important to the splitting process and the evolution of the partonic fields enters the regime where the nonlinear B-JIMWLK equation (or its infinite color approximation, the BK equation) [5][6][7][8][9][10][11][12] takes over the linear parton evolution equations [13]. The non-linear evolution predicts gluon saturation [14,15] as a consequence of the balance between recombination and gluon bremsstrahlung. The proper framework to describe the small-x dynamics is the color glass condensate (CGC) effective theory [16][17][18][19], in which the gluon saturation is featured by the saturation scale Q s which grows as x decreases. When Q s Λ QCD , perturbative QCD can be applied. Tremendous efforts have been devoted in searching for the emergent phenomena of the gluon saturation in the ultra-dense regime [20][21][22]. There exist experimental hints [23][24][25][26][27][28][29] that are compatible with saturation-model predictions, but we still lack deterministic evidence to claim a discovery. In the future, dedicated measurements at the Electron-Ion Collider (EIC) will provide further insight on the regime of gluon saturation [3,4,[30][31][32].
One piece of the hints of the gluon saturation comes from the suppression of the nuclear modification factor R dAu [24,26] in single forward hadron production in deuteron-nucleus collisions, d + A → h(y , p ⊥ ) + X  where tagging a hadron h with intermediate transverse momentum p ⊥ in the forward large rapidity y forces the projectile deuteron to probe the small-x component of the nuclear target [34,42,43,55,56]. The theoretical prediction is made out of the hybrid factorization theorem, in which the incoming nucleon (proton or deuteron) is regarded as a dilute system described by the collinear parton distribution functions (PDFs) while the nuclear target is treated as a strong classical field described by the CGC effective theory. The leading-order (LO) predictions have been presented in [36-39, 45, 46, 57-59], and the approximations to the next-to-leading order (NLO) corrections have been studied in [41,60]. The complete analytical NLO predictions were first carried out in [42,43] and later confirmed by [1,61]. The full calculation demonstrates the validity of the hybrid factorization at O(α s ). Meanwhile the threshold resummation was also carried out [2,62] to resolve the negativity of the cross section in which the NLO p ⊥ spectrum turns negative promptly when the hadron p ⊥ slightly exceeds the saturation scale Q s [47].
Despite the fact that the hadron production is conceptually simple, the extraction of the CGC nucleus distribution out of this process could be more involved due to the additional dependence on the non-perturbative fragmentation functions. On the other hand, in order to perform a global extraction of the gluon saturation phenomenon, we need other processes that could probe into the small-x regime. Among the probes of the small-x gluon saturation, jets have drawn tremendous attention in recent years. In comparison with hadron production, jets have the advantage that their behaviour can be predicted perturbatively and therefore one could have better theoretical control than with hadrons. Initially, two forward jets with a large rapidity separation, known as the Mueller-Navelet jet [63][64][65][66], was introduced to probe the linear BFKL eovlution in pp collisions. The diffractive dijet production in deep inelastic scattering (DIS) was later proposed and its phenomenology has been intensively discussed in the small-x physics [67][68][69][70][71][72][73], which has shown the feasibility in the study of the small-x gluon tomography [74]. The semi-inclusive dijet processes in DIS were also investigated in the scenario of extraction of the Weizsäcker-Williams gluon distribution [75,76] and the quadrupole correlator of Wilson lines at small x [77,78].
Modern jets are usually constructed out of recursive jet clustering algorithms with radius parameter R, such as the anti-k T jet algorithm [79] which is the most frequently used at the LHC and RHIC. However, the recursive clustering procedure imposes non-trivial constraints on the phase space and dramatically complicates the theoretical calculations. So far, no calculation with actual jet algorithm dependence is known within the CGC framework. Recently, there have been pioneered attempts to obtain the CGC jet or photon-jet processes at the NLO [80][81][82] using the small-cone approximation [83][84][85]. The main focus of these attempts is to demonstrate the infrared finiteness of the jet cross section. However, to have an apple-to-apple comparison of the CGC theory with the experimental results, it is necessary to include in the calculation the jet clustering procedure that strictly follows the experimental analyses. On the other hand, a complete NLO calculation with full jet algorithm dependence can be used to justify the reliability of the small-cone approximation to the full cross section.
In this work, for the first time, we present the NLO calculation with the complete jet algorithm implemented. We walk through the detailed calculation of the forward inclusive jet production in pA collisions to introduce our computational framework. The LO jet production in pA collisions is already presented in [34] and the recent phenomenological analysis based on the LO formalism is investigated in [86]. Meanwhile, efforts to go beyond the LO small-x eikonal approximation are also carried through in [87,88]. Remarkable efforts to march toward NLO predictions are set out to compute the real corrections for dijet production [89] and LO trijets [90] in pA collisions. A missing component of these calculations is to yet take into account real jet algorithms. The jet clustering procedure that decides when the partons are clustered together and when they form jets separately dramatically complicates the phase space integration compared to the hadron production. Further complications come from the experimental threshold used to select the jet events, for instance, the cuts made to the jet rapidities and transverse momenta. However, it is these procedures that are used experimentally to determine the jet multiplicities and to generate the distributions. Therefore, it is of utmost importance to have a NLO computational scheme to allow for a restricted jet procedure in order to perform the reliable revelation of the CGC phenomena out of the jet data.
We applied the recently developed computational scheme [1,2] within the CGC effective theory to compute the full NLO predictions for the single inclusive jet production in pA collisions. In order to implement the full jet clustering procedure, we developed a "subtraction scheme" 1 for the NLO real corrections. We present sufficient details on how the calculation is set up. We also examine the small-jet cone approximation in CGC and build up an interesting connection to the small-jet cone approximation in collinear factorization [92,93] through the small-R factorization theorem derived in this work.
1.0.0.1 Outline of this manuscript The manuscript is organized as the follows: In Section 2, we briefly review the CGC effective theory. We go through in detail the theoretical set-ups in Section 3, which cover the power counting, the soft and collinear modes as well as the Feynman rules in Section 3.2, the rapidity regulator in Section 3.3 and the color charge operator in Section 3.4. Examples on applying the set-ups are given in Section 3.5.
The LO jet production results are summarized in Section 4, followed by the detailed descriptions of the NLO calculations in Section 5, where we review the anti-k T jet algorithm [79] in Section 5.5. We show how the phase space is parameterized for the real emission in Section 5.6 and discuss various kinematics limits. The subtraction strategy will be elaborated in Section 5.7.2 and the explicit subtraction term will be presented in Section 5.7.3 and Section 5.8 for both the collinear and the soft contribution, respectively. The full NLO results are given in Section 5.9. The results are fully differential and can be adapted to other experimental observables.
In Section 6, we present the full analytic NLO cross section for the jet production in the small-R limit. A factorized form was found in the result, and we argue that the factorization will hold for any other jet/jet-like observables in their small value limits. A sketch of the threshold resummation for the jet production is given in Section 7. Except for additional Sudakov logarithms to be resummed in the jet case, the resummation structure is similar to the hadron production [2], which we reproduced in the work using a different approach.
We use q → q channel to highlight the calculations, all the other channels can be found in the appendix B. Numerical analyses are presented in Section 8. We conclude in Section 9.
We note that all the results presented in this work are in the large N C limit while our methodology is not limited to the approximation but can be applied to more general cases with finite N C .

A Brief Review of the CGC effective theory
We briefly go through the basic ideas of the color glass condensate (CGC) effective theory in the section. We refer the readers to Ref. [19,94,95] for comprehensive reviews.

Basic Picture
In high energy collisions, when nuclei get boosted, the phase space for bremsstrahlung radiations grows. The radiations generate more gluons with small longitudinal momentum fraction x. As we keep on probing smaller values of the x, the gluon density will continue growing, until it reaches a sufficiently small x region, where the gluon density is so high that the wave functions of 2 gluons start to overlap and the 2 gluons can recombine into one. The gluon bremsstrahlung and the recombination will eventually balance with each other to reach equilibrium, which is known as the gluon saturation. The gluon saturation happens typically at the saturation scale Q s , which features the order of the gluon transverse momentum within the nucleus. The saturation scale Q s grows as x → 0.
When Q s Λ QCD , the small-x constituents of the nucleus could be studied perturbatively, while the partons with large momentum fraction remain non-perturbative. The CGC effective theory is able to disentangle the perturbative physics and the non-perturbative effects, by introducing a cut-off scale ν (or Λ as in many CGC literature [19]) in the longitudinal momentum k − g along the nuclei propagating direction 2 , to separate the small (k − g < ν) and the large x (k − g > ν) degrees of freedom. During a collision, the large x partons at the scale Λ QCD will not participate directly the semi-hard interaction at O(Q s ), but instead they will gradually reduce the x and elevate the scale through the radiation and recombination. As k − g < ν and the scale reaches Q s , the gluon enters the semi-hard interaction, as illustrated in fig. 1. The CGC integrates out those gluon fields with k − g > ν, left with a random color source remainder for the active small-x gluons that take part in the semi-hard interaction. One can also defines the cut-off momentum fraction X f for the 2 We assume the nucleus are moving along the directionn = (1, 0, 0, −1) with negative z-component momentum, and we define k − g ≡ n · kg = k 0 g − k 3 g , with the light-like vector n = (1, 0, 0, 1). Note that for the gluon from the nucleus k 3 Figure 1. A small-x gluon with k − g < ν enters the semi-hard interaction (above the dashed line), originated from the radiation and recombination of a large-x parton (as the solid line) from the nucleus.
gluon in the nucleus through ν = X f p A where p A is the momentum of the highly boosted nucleus.

The CGC Wilson Line (the Interaction with the Shock Wave)
In a real collision, instead of scattering just once with the small-x gluon as in fig. 1, a probe parton (from the proton for instance) will experience multi-interactions coherently with the CGC small-x fields. One thus needs to include the multiple scattering effects, which in the CGC, is written as a sum of the small-x gluons  Figure 2. This figure shows the multi-interaction between the parton and the CGC field line, which represents the interaction between the projectile parton and the nucleus field shock wave.
where P denotes the path ordering. A − c (x + , x ⊥ ) is the small-x gluonic field with color c. The color generator T c αβ encodes the color message of a single interaction between the probe and the gluon. For a ferminoic probe, T c αβ = t c αβ the fundamental representation of the SU (N C ) group, while T c αβ = if αcβ the adjoint representation, if a gluon acts as the probe. The relation between the fundamental and adjoint color generators suggests the following connection between the Wilson lines in different representations, (2.3)

Theoretical Set-ups
In this section, we introduce the theoretical framework we developed for the NLO calculation and the resummation.

Light-cone Coordinate
We work in the frame where the proton is moving forward along the +z-axis while the nucleus in the −z direction. It is then natural to introduce the light-cone coordinates n µ = (1, 0, 0, 1) andn µ = (1, 0, 0, −1), which satisfy n ·n = 2. Any four vector p µ can be decomposed as known as the Sudakov decomposition, where and p µ ⊥ = (0, p 1 , p 2 , 0) is the transverse momentum with respect to the beam. In lightcone coordinates, we may also denote a vector p µ by its light-cone components as p µ = (p + , p − , p µ ⊥ ).

Power counting, Modes, and Feynman Rules
In this work, we are interested in forward inclusive jet production in proton-nucleus collisions, p(p P )A → J(p J )X, as illustrated in fig. 3. We assume the jet J is constructed by the anti-k T jet algorithm [79], but we emphasize that our calculation is fully differential and is applicable to any jet algorithms. Given that the proton is moving along the direction n µ = (1, 0, 0, 1) while the nucleus alongn µ = (1, 0, 0, −1), the forward scattering is then defined by the momentum hierarchy where we have introduce the power counting parameter λ and p i is the momentum of the collinear parton coming out of the proton and p J⊥ the transverse momentum of the jet with respect to the beam. We measure the jet rapidity y J and transverse momentum p J⊥ to select the forward jet events where y J is very large. We note that to probe the gluon saturation, p J⊥ ∼ Q s . The existing hierarchy λ 1 in the observables and the kinematics yields several different modes with different momentum scaling that simultaneously contribute to the leading region of the cross section 3,4 • the collinear mode whose momentum scales as k c = (k + c , k − c , k c⊥ ) ∼n · p i (1, λ 2 , λ), and • the soft mode with k s ∼n · p i (λ, λ, λ).
Both modes will couple to the CGC Wilson line made up of the • glauber gluons with k G ∼n · p i (0, 0, λ) coming from the nucleus, which provide the transverse kick through the potential 1/k 2 G ∼ −1/k 2 G,⊥ . We note that the virtuality of all the modes are O(p J⊥ ) ∼ O(Q s ) and they all contribute at leading power in λ. The collinear and the CGC Wilson line exist in all known CGC calculations, while in CGC, the soft mode was first identified in [1], which is found crucial to correctly produce the rapidity poles to cancel with the nucleus distribution and to automatically and systematically give rise to the kinematic constraint [1] which was introduced by hand in [50]. The soft mode compensates the relieved phase space restrictions due to the power expansion in λ. Its appearance should not be too surprising given that it already arises and plays an important role under similar circumstances such as the transverse momentum dependent (TMD) studies [96]. The necessity of the soft mode in the perturbative calculation can be seen as a direct consequence of the method of regions (or strategy of regions) technique [97].
The collinear and the soft modes are governed by their own Lagrangian which is obtained by appropriate power expansion of the QCD Lagrangian in λ. The procedure is exactly how the soft collinear effective theory (SCET) Lagrangian [98][99][100][101][102] is constructed. The difference is that now we need to include in the Lagrangian the interaction between the CGC shock wave and the collinear and the soft fields through the CGC Wilson line. The detailed derivation of the Lagrangian and its comparison with the conventional CGC approach will be given in another work [103]. Here we go through the Feynman rules that will be directly used in our calculation to get the NLO jet cross section. Throughout the work, we stick to the light-cone gauge in whichn · A = A + = 0.
The collinear Feynman rules are given by for the collinear quark propagator moving along the n direction, where the power expansion has been performed and only the leading contribution in λ is kept [99]. Here i (j) is the color index and p is the momentum carried by the quark. The collinear gluon is represented by a spring line with a straight line going through where a (b) is the color index and q is the gluon momentum, and the form of the gluon polarization sum arises from the fact that we work in the light-cone gaugen · A = 0 as we have mentioned above. The interaction between the colliner quarks and the collinear gluon was derived within the SCET framework before [99] and is given by where we have used the gauge conditionn · A = 0. One can recognize that the collinear interaction also exists in the light-front perturbation theory, when one expresses the socalled bad component of a spinor field in terms of the good component, see for instance [104]. Now we include the interactions between the CGC shock wave and the collinear modes via the CGC Wilson line. We note that these vertices are identical to the ones in the conventional CGC calculations [7,18,80,[105][106][107], but now derived via a different approach [103].
For the collinear quark, we can derive where the interaction is represented by the green box and W ji is the corresponding Wilson line in the fundamental representation. In comparison with [80], we have a factor of 4π instead of 2π in Eq. (3.7) above. This is because we use p ± = p 0 ± p 3 as our convention for the ±-component of a 4-vector, while p ± = (p 0 ± p 3 )/ √ 2 in [80]. We note that the form of the interaction indicates the 3-momentum conservation in the '+' and the '⊥' components, while the '−' component is not conserved which is a consequence of the homogeneous power expansion in λ [103] as required by CGC. The gluon-shock-wave interaction is where W ba is the Wilson line in the adjoint representation. The soft Feynman rule for emitting a soft gluon is given by which is nothing but the eikonal approximation. T a αβ = t a αβ if the soft gluon is radiated from a quark with α and β the color indices for the quark after and before the radiation, otherwise T a αβ = if αaβ if it is from a gluon. There is also the interaction between the soft gluon and the shock wave through the CGC Wilson line, which is found identical to the collinear gluon interaction in Eq. (3.8)

Rapidity Regulator
In this work, dimensional regularization 4 → D = 4 − 2 will be used to deal with the collinear and infrared divergences. However, in the effective framework, we could encounter divergences which are not regularized by dimensional regularization. For instance we could have integrals of the form which is divergent when k + → 0 and is known as the rapidity divergence. The rapidity divergence has been extensively discussed in the scenario of the TMD physics [108][109][110][111][112][113][114][115][116][117], where the divergence occurs due to the homogeneous power expansion in the phase space kinematics, see for instance [113,115]. Various rapidity regulators have been proposed to regulate the divergence [96,[108][109][110][111][112][113][114][115][116][117]. In CGC, one will face the same rapidity divergence [1,2,50,52,61,80,104,118,119], which needs to be properly regulated. In general, an introduced regulator better not destroy the properties necessary to establish all order factorization theorems, for example, the eikonal identities needed for exponentiation of the soft emissions into Wilson lines [96,113]. Moreover a regulator should not break the power counting, otherwise one could lose control over the non-perturbative power corrections to make the perturbatively calculable part ambiguous and thus sacrifices the predictive power of a perturbative calculation. We refer readers to Ref. [112,113,115] for more discussions. On the computational side, we better utilize a unified and conceptually well-defined regulator for both real and virtual contributions and could be easily applied to higher order corrections. For these purposes, throughout the work, we implement the regulator proposed in [115]. Consequently, • for the O(α s ) soft eikonal current in Eq. (3.9), we add the rapidity regulator factor as where the factor is the regulator introduced for the rapidity divergence, where ν is a rapidity scale and η is a small parameter to be expanded by the end of a calculation. We will show that ν is associated to the cut-off scale in the CGC. We note that since the rapidity divergence only arises when k + → ∞ k − or k − → ∞ k + , in our case, we could simplify the calculation if we make the following change [120] where η k = 1 2 ln (k + /k − ) is the rapidity of k µ and we have used k ± = k ⊥ e ±η k . To see this, we note that (3.14) and • For the collinear contributions, we apply the following replacement where k + is the momentum of the gluon.
At the end of a calculation, one should carry out the η expansion before the expansion [115]. The rapidity regulator will then turn the rapidity divergence to η-poles.

Color Charge Operator
Throughout the work, we will use the color charge operator T a i introduced by Catani, et al. in [121]. Although the notation is not necessary for fixed order calculations, we find it quite helpful in unifying the notation in different channels [2]. Meanwhile the notation helps to sort out the origins of different types of logarithms in the cross section and is thus beneficial for deriving the threshold resummation in Section 7.
In order to carry out the threshold resummation in CGC, we need to deal with multiple emissions that interact with the shock wave, which dramatically complicates the color flow and mixes the LO and higher order color structures. For instance, we can see that for the jet production (as in the hadron case), the LO matrix element squared possesses the color structure of the form for the color structures at NLO, and the NNLO correction will involve contributions such as where W ac and W bd are the CGC Wilson lines in the adjoint representation. We see that • the NLO and NNLO color structures are completely entangled with the structures, • the NLO and NNLO digrams showed above exhibit some kind of common features, however the color structures seem quite different.
However, resummation usually requires the factorization of the higher order corrections from the LO structures to some extent. Also, realizing a resummation actually means that one can identify in the higher order corrections a certain pattern as the repetition of the lower order results. Therefore, one can resum the lower order calculations to obtain the all order results, see for instance Ref. [122]. For these purposes, instead of tracing over color indices, we need to keep track of the color correlation when squaring the matrix element 5 . The color charge operator offers a neat way to handle the color correlation and allows us to achieve the factorization in which the higher order corrections can be seen as the color currents acting on LO matrix element, and in turn has been shown to help establish a compact formalism for the resummation [2].
To understand the notation, let us consider processes that involve m final-state QCD partons at the tree-level, i = 1, 2, . . . , m, each carrying a momentum p i . The LO matrix element takes the general structure   5 Similar situation happens to the non-global logarithms (NGLs), where one also maintains the color correlation information in order to achieve the resummation [123]. Given the similarities between NGLs and small-x evolution, it is perhaps not too surprise that in CGC certain resummations will also rely on the color correlation.
where |M(p 1⊥ , . . . , p m,⊥ ) p + i is a vector in the spin-color space. We will suppress the subscript to denote it as |M(p 1⊥ , . . . , p m,⊥ ) . And the matrix element squared after summing over all the color and the spin indices is then At higher orders, once a gluon with color index a is emitted from parton i, as illustrated in fig. 4, it will generate a color charge T a i which acts on the i-th parton color index and rotates the color basis in a way that where T a cb ≡ if cab if the emitting particle i is a gluon and T a αβ ≡ t a αβ if the emitting parton i is a quark (in the case of an antiquark T a αβ ≡t a αβ = −t a βα ). For the processes with initial QCD partons, the color operator rules can be obtained by crossing, such that if i is an initial gluon, T a cb = if cab ; while for intial quark we have T a αβ =t a αβ = −t a βα and anti-quark with T a αβ = t a αβ . The dot product of the color charge operator is given by where we note that the subscripts i and j denote the i-and j-th partons that the color charge operators are acting on. The color operator rule immediately leads to the color algebra that where C i = N C if i is a gluon and C i = C F = (N 2 C − 1)/2N C if i is a quark or antiquark, which reduces to C i = N C 2 in the large N C limit. Also for any QCD process, we have the color charge conservation when acting on the vector |M .
Using this notation, the square of color-correlated amplitudes is given by Again, the i and k here denote the i-th and the k-th parton, respectively. For instance, in the jet production, we will have where we have labeled the incoming quark as the i-th particle and the outgoing quark the j-th parton and again we have only paid attention to the colors but ignored the kinematic details. We have used |M 0 ∝ W (b ⊥ ), as in Eqs. (3.7) and (3.17) and will also derive in detail in Eq. (4.1). Now apparently the higher orders are obtained by color charge operator acting on the LO vector |M 0 and it is also clear that this specific NNLO term is the square of the NLO in terms of the colors. 6 This feature could explain the non-linearity of the BK evolution in Eq. (3.42) and the threshold resummation in the CGC framework [2]. It is also crucial for the threshold resummation for the jet production in section 7.

Examples
As an example of the application of the Feynman rules and the color operator T a , we derive the initial state radiation (ISR) collinear current, which will be used in the following sections to derive the collinear cross section. By ISR collinear current, we mean a collinear gluon is 6 We note the difference between the "trace-after-square" in Eq. (3.28), and the "square-after-trace" The former would reduce to the color structure in Eq. (3.19) while the latters are the square of the traced NLO in Eq. (3.18) before or after the r ⊥ -integration, respectively. Neither of the latters corresponds to the color structure arises at NNLO. radiated before the incoming parton interacts with the shock wave, which is described by Here we have carried out the l − -integration by contour integral, see Eq. (A.2).
Similar steps lead to the ISR soft current which can also be derived directly by power counting the above result assuming k + ∼ p + q λ, which gives where the rapidity regulator follows Eq. (3.12). We have made the change in Eq. (3.13) and use the fact that −l µ ⊥ = k ⊥ µ and for very large rapidities, η k → η k . To see this, we note |η k |, and η k → η k for very large rapidities. It is therefore justified to replace e −η|η k | by e −η|η k | to regulate the singular behavior at very large rapidities.

Multiple point correlator
In CGC, the incoming nucleus is described by the correlators of the Wilson lines. The nucleus dipole distribution is defined as The subscript X f represents the scale to evaluate the dipole, and is given by the typical momentum fraction of the nucleus carried by the gluon. We will also encounter the triple-pole distribution which is where W ab is the Wilson line in the adjoint representation. The triple-pole distribution can be further related to the dipole by the Fiertz identity in Eq. (A.11) as which in the large N C limit reduces to the product of 2 dipoles In the jet production, we will also need the 6-point correlator, defined as The 6-point correlator occurs at NLO when we have 2 distinguished jets. In the large N C limit, the S X f can be written as a product of the 4-point function and the dipole such that which is originated from the color identity in Eq. (A.12), while the quadru-pole distribution is given by For later use, we also introduce the nucleus distributions in the momentum space which are for the dipoles, where S ⊥ is the transverse area of the nucleus and F F is the distribution in the momentum space. Here we have applied the translational invariance of the dipole distribution. In the large N C limit, the momentum space bilinear distribution for the 3-point function is given by The dipole distribution satisfies the BK-evolution, which at LO gives where we have introduced the '+'-prescription for a distribution D(x) defined as The BK-equation is non-linear in the color structures, which can be seen by rewriting the equation using the color charge operator notation such that where we add the subscript X f to remind the rapidity scale dependence of the matrix. The equation can be formally solved to find The exponentiation of the color charge operator T a manifests the non-linearity of the BKevolution in the color structure. The non-linearity is originated from the multiple emission picture in Eq. (3.28) where for each emission, an additional Wilson line W with respect to the lowest order is generated. Eqs. (3.44) and (3.45) also hold for the gluon case.

Leading Order Cross Section
We start with the LO cross section. Throughout the work, we take the quark channel as an example to walk through our calculation in detail. Other channels can be obtained in a similar way and the results are given in the Appendix B. At LO, there is only one diagram that contributes to the jet production in the quark channel, where q(p q ) → q(q j ), as shown in fig. 5, which gives the amplitude Square the matrix element and sum over all the spins and average over the initial quark color, one finds The cross section is then given by where we have averaged over the proton spin which gives the overall factor 1/2 and including the partonic flux factor 1/(p + q ). Here f (x) is the quark parton distribution function (PDF) and the momentum fraction satisfies p + q = xp + p with p + p the proton momentum. We have worked out the p j on-shell condition and thus p − j = p 2 j⊥ /p + j . The unity d 4 p J δ (4) (p J − p j ) is inserted to define the jet momentum. At LO, since we only have one parton in the final state, the jet algorithm acts trivially on the p j phase space. The situation is different at NLO, when there is one additional radiated parton and the jet clustering procedure imposes non-trivial restriction on the two-parton phase space, which dramatically complicates the calculation. We have suppressed possible additional experimental cuts and the distribution bin separation placed on the jet momentum p J in Eq. (4.3).
Plugging the M 0 |M 0 into Eq. (4.3), we find where τ = p + J /p + p . As we have mentioned above, the subscript X f in the dipole S (2) denotes the scale to evaluate the dipole distribution. At LO, the scale choice is arbitrary, however the CGC framework demands that X f ∼ O(X A ) to separate the fast and the slow moving partons, where X A is the momentum fraction carried by the gluon from the nucleus. We will see that the NLO calculation will determine the X f by minimizing the logarithms in the NLO corrections, whose value produces the CGC requirement.
In the dipole momentum space, by using Eq. (3.40), the differential cross section at LO can also be written as where we have used dη J = dp + J /p + J . We note that in practice we still use p + J to generate the phase space instead of the rapidity η J . It is merely a matter of choice at LO, but crucial in order to completely determine the phase space of the real emission at NLO, as described in Section 5.6. For later convenience, for the rest of this work, we introduce dσ through 5 Next-to-Leading Order Correction

Virtual Corrections
Now we move on to evaluate the NLO corrections to the inclusive jet cross section. We start with the virtual corrections, which receive contributions from both the collinear virtual correction and the soft virtual correction, respectively elaborated in 5.2 and in 5.3. Since the virtual corrections share the LO kinematics, the jet algorithm plays no effect to the calculation, and the partnoic results are identical to the forward hadron production [1,42,43,61]. Figure 6. Feynman diagrams for collinear virtual corrections: (a) initial quark self energy correction (b) final quark self energy correction (c) the only non-vanishing contribution.

The collinear virtual correction
Three loop configurations can occur at NLO, as shown in fig. 6, while the first two vanish due to scaleless integrals and only the last one survives, which gives where we have let z ≡ l + /p + j = 1−k + /p + j and implemented the rapidity regulator following The loop integration over k − can be performed by contour integral, see Eq. (A.2), which simplifies the virtual matrix element to We note that the rapidity η-regulator naturally regulates the divergence at z = 1, and there is no need for any additional cut-off in the loop integral in our approach. We see this as one advantage of the rapidity regulator we are advocating.
Interfering the virtual amplitude with the LO matrix element, one finds where we have summed over the spin and averaged over the color. We have done the variable change k ⊥ → k ⊥ − (1 − z)p j⊥ and used the color identities in Eq. (3.34).
The integral over the loop-momenta k ⊥ and l ⊥ can be carried out straightforwardly, by using the integral formula Eq. (A.4) in the Appendix. Plugging the matrix element into the phase space integral in Eq. (4.3), and performing the η and expansions consecutively, we find that the virtual correction to the jet cross section is given by where the '+'-prescription for a distribution D + (x) has been defined in Eq. (3.43). We note that the virtual correction is identical to the single hadron production [42,61].

The soft virtual correction
Now we turn to the virtual correction from the soft contribution, induced by the soft currents discussed in previous sections. The only non-vanishing contribution is given in fig. 7, which is Again, the loop integration can be worked out explicitly. When interfering with the LO amplitude, the soft virtual correction to the squared matrix element is found to be Integrating over the phase space in Eq. (4.3) and utilizing the color identity in Eq. (A.12), we find the soft correction reads where c 0 = 2e −γ E with γ E the Euler constant.

Real Corrections
In this section, we go over the real corrections to the inclusive forward jet production in great detail. Unlike the LO and virtual corrections, now we have 2 partons in the final state and we need to decide when these 2 particles are clustered into one single jet and when Figure 7. The Feynman diagram for the non-vanishing soft virtual correction.
they form distinct jets separately. Therefore, the jet algorithm now imposes non-trivial constraints on the real correction phase space. We will briefly review the anti-k T jet algorithm and discuss its restrictions on the kinematics of the 2-body phase space. Meanwhile, we will study various kinematic limits which are relevant to our NLO calculation. Specifically, we will focus on the collinear, soft and rapidity limits, which will lead to singular behaviors when evaluating the real phase space integrals.
We will explain the subtraction strategy that handles the singularities in these kinematic limits and present the explicit form of the subtraction terms.

The anti-k T jet algorithm
Within the anti-k T jet algorithm, given a list of particles with momenta {k µ i }, the distance metrics ρ iB between particle i and the beam B, and ρ ij between particles i and j are introduced, in terms of their transverse momenta k i,⊥ , the azimuthal angles φ i and the rapidities η i , where [79] Here R is the jet radius parameter. The geometric distance ∆R 2 ij between particles i and j is given by If ρ ij is the smallest of all the ρ iB and ρ ij , particles i and j are clustered to form a pseudojet, whereas if ρ iB is the smallest, the i-th particle/pseudo-jet is promoted to a jet and removed from the list. The procedure continues until the list is empty and all particles in the list will fall into one of the jets constructed. Additional criteria on the jet transverse momentum p J,⊥ or rapidity η J are usually imposed and if the constructed jet passes the criterion thresholds, it is an observed jet, otherwise it is not counted. As for the real corrections to the single inclusive jet, the jet clustering procedure will impose kinematic constraints on the phase space of the 2 real emissions, which can be written as Such constraints can be understood as follows. When the separation between two particles p j and p k , ∆R jk is greater than R, then j and k can either be the signal jet and the jet momentum is given by the momentum of the individual particle. On the other hand, if ∆R jk < R, the 2 particles are clustered into the single jet and the jet momentum p J is the sum of the particle momenta. We should bear in mind that there are additional possible restrictions on the jet momentum p J (e.g. requirements from experimental cuts) which are suppressed in Eq. (5.10).

The phase space measure and the kinematic limits
For the real emission, we are dealing with 2 radiations in the final state. The phase space is given by where in our example q → q + g we denote the out-going quark momentum as p j and the gluon momentum p k , respectively. We have included the spin average factor, the flux, the PDF f (x) and the Bjorken-x integration as well as the phase space restriction due to the jet clustering procedure in Eq. (5.10). Here the in-coming quark momentum p + q = xp + p , with p p the momentum of the incoming proton in pA collisions.
, we can manipulate the phase space to get , and Here we have chosen to parameterize the phase space using the jet momentum p + J , p J⊥ and the gluon momentum p k . The gluon can be un-resolvable when it is soft or collinear to either the out-going or in-coming quarks. The momentum for the outgoing quark momentum can be generated via for 1 jet and one demands that the constructed p µ j must satisfy Θ 1 . While for the 2-jet case, p + j = p + J , p µ j⊥ = p µ J⊥ and Θ 2 should be fulfilled. Once we have p µ j and p µ k , all the other physical variables such as the jet energy E J and the rapidity η J can be easily constructed in a straightforward way.
For later use, let us study several kinematic limits here.
• ξ = 1 rapidity limit. In the rapidity limit, if the gluon transverse momentum p k⊥ = 0, the gluon is actually moving backwards with very large p − k due to the fact that Hence the backward gluon and the out-going forward quark are highly unlikely to be clustered into 1 jet and therefore Θ 1 → 0 while Θ 2 → 1 in this situation. More specifically, we have and we can see that as ξ = 1 while p k⊥ = 0, ∆η jk R to prevent j and k to be clustered together, as expected. We note that this backward-moving contribution will eventually be subtracted automatically when appropriate rapidity schemes are implemented. On the other hand, if p k⊥ = 0, we will have Θ 1 + Θ 2 = 1 in the rapidity limit. All in all, as ξ → 1, Θ 1 + Θ 2 = 1. This features will help us to construct the counter term in the following sections.
• p k ||p j final-final collinear limit. When p k and p j are collinear to each other, they will always be put into one jet to find Θ 1 = 1 and Θ 2 = 0. Also from Eq. (5.14), we find that in this limit since ∆η jk = 0, we have ξp k ⊥ = (1 − ξ)p j⊥ or p k⊥ = (1 − ξ)p J⊥ .
• p k ||p q intial-final collinear limit. In this limit p k⊥ = 0 and as long as ξ = 1, which means the gluon is propagating along the beam and can no way be clustered with the out-going quark, then one has Θ 2 = 1 and Θ 1 = 0; meanwhile if ξ = 1, Θ 1 + Θ 2 = 1, as one can see from Eq. (5.13).
• The soft gluon p µ k ∼ p J⊥ p + p limit. In the soft limit, as the gluon momentum p µ k → 0, the out-going quark momentum p + j → p + q = p + J and x → τ . By homogeneous power expansion, the p + k component will be completely dropped from the δ-function or the Θ's in the phase space. As a consequence, in the soft limit, there will be no restrictions on p + k or equivalently the gluon rapidity η k . Therefore, the phase space will become dΦ J,sof t = 1 8π where x = τ , and Also we note that, if p k → 0, Θ 1,sof t + Θ 2,sof t = 1. And in the rapidity limit when |η k | → ∞, for a jet with finite η J , the partons j and k are unlikely to be grouped into one jet and therefore Θ 2 → 1 and Θ 1 → 0.
• Small R limit. In this work, besides the predictions with the full jet algorithm dependence, we will also present analytic results for the jet production in the small-R limit, R 1. It is ready to show that in the small-R limit, we have up to O(R 2 ) corrections [120]. Here we have defined k ⊥ = ξp k⊥ − (1 − ξ)p j⊥ in the last step, and we immediately finds that in the R 1 limit where we have used that for 1 jet, p j⊥ = ξp J⊥ and p k⊥ = (1 − ξ)p J⊥ in the small-R limit, as can be seen from Eq. (5.14), while for 2 jets, p j⊥ = p J⊥ and p k⊥ = 1−ξ ξ p J⊥ . Apparently, in the limit ξ → 1, we have We note that the 1 jet phase space area scales as R 2 and could be highly suppressed in the small R limit, unless there is R −2 compensation from the matrix element, which is the case when p k ||p j and the matrix element becomes singular.

The collinear contribution
We now detail the calculation of the collinear real corrections below.

The collinear matrix element
At the NLO, two configurations contribute to the real correction of the quark channel, which we denote as the final state radiation (FSR) if the gluon is emitted after interacting with the shock wave (left figure) and the intial state radiation (ISR) if the gluon is radiated before the interaction (right figure), see fig. 8. The matrix elements are (5.20) for the FSR diagram and for the ISR diagram, respectively. Square the sum of both amplitudes and carry out the phase space integral in Eq. (5.12), we find that the collinear real correction can be written as the sum of three different pieces dσ R,coll. = dσ f sr + dσ isr + dσ inter. , where which is the contribution from the FSR in which the clustering conditions Θ 1 and Θ 2 are given in Eq. (5.13). On the other hand, represents the ISR contribution where we have integrated over the loop momenta l ⊥ 's and let x ⊥ = b ⊥ − r ⊥ and z ⊥ = r ⊥ − r ⊥ . Here, the 6-point correlator was introduced previously in Eq. (3.37) and Eq. (A.12). The σ inter. in Eq. (5.22) denotes the billinear term coming from the interference between the ISR and the FSR, which is We remind that in all contributions, when the matrix elements hit Θ 1 and Θ 2 , the Bjorken x and the p j will be replaced by p J and p k following the replacement rules in Eq. (5.13) accordingly. We note that unlike the hadron case, for the ISR contribution, the p k⊥ integral can not be performed inclusively and thus analytically to reduce the quadrupole structure to dipole, due to the existence of the jet algorithm.

The idea of subtraction
Now we are at the stage to evaluate the real correction in Eq. (5.22), which contains collinear and rapidity singularities. For instance, Eq. (5.22) involves contributions such as where the rapidity and the final-final collinear poles arise when ξ → 1 and/or p k⊥ → 0 and when ξp k⊥ → (1 − ξ)p j⊥ , respectively. Here we have applied the Fourier transformation in Eq. (3.40). Different from the forward hadron production, where the integration can be performed analytically and the singularities can thus be extracted in a straightforward way, here in the jet production, the integration is complicated by the existence of the jet clustering algorithm, represented by the Θ 1 -and Θ 2 -functions. The clustering hinders the analytic computation of the integration and in turn makes the pole isolation less obvious.
In order to deal with this complication, we appeal to the subtraction method and construct counter terms to isolate the poles. The idea can be highlighted by the following simple example. Suppose that we are dealing with the integral is regular but takes a complicated form involving the jet clustering algorithm. The integral can barely be done analytically. Meanwhile the x → 0 singularity also discourages the direct numerical integration.
We can however manipulate the integral to find where we subtracted a counter term x −1− f (0) and then added it back. The counter term is built in such a way that it contains all the infrared, collinear and rapidity (here, denoted as x → 0) singular behaviour of the original integrand x −1− f (x). As a result, in the first term of the first equation, the combination is everywhere regular in the domain [0, 1]. We can thus safely set = 0 to get the second equation. Although f (x) − f (0) remains complicated, since the integral is now finite, we can simply evaluate this term numerically.
All the x → 0 singularities are now contained only in the counter term. The good thing about the counter term is that it is constructed as the infrared and collinear limit of a full theory. As a consequence, f (0) is usually significantly simplified when compared to f (x). Especially, due to the infrared-collinear-safety (IRC safe) feature of the jet clustering algorithms, all the jet algorithm dependence will become trivial and reduce to 1 within the counter term f (0). This eventually allows us to perform the integration over the counter term analytically to extract the poles.

The collinear contribution
Back to Eq. (5.22), let us see how the subtraction term is constructed. To do so, we need to analyze the infrared behavior of the integrand, which as guaranteed by QCD, only possesses singularities arising from the soft and collinear limits, plus the ξ → 1 rapidity pole due to the power expansion performed in the kinematic constraints. Now we take a look at the behaviour of the integrand of dσ f sr in those limits one by one: • When approaching the rapidity limit, as ξ → 1, the gluon goes backwards, therefore Θ 2 → 1 and Θ 1 → 0 (see prevsious analysis in section 5.6). The integrand will behave as where we have explicitly replaced p j⊥ by p J⊥ since the matrix element hits Θ 2 .
• In dσ f sr , there is only collinear singularity when ξp k⊥ → (1 − ξ)p j⊥ in the denominator, which indicates that p k ||p j . The collinear singular behavior only arises in the 1 jet case, since when the 2 final state partons are so close to each other, the jet algorithm will always cluster them into one single jet and therefore Θ 2 → 0. Since the 2-jet condition will never be satisfied, we have Θ 1 → 1. Therefore the integrand approaches where we have replaced p j⊥ by p J⊥ − p k⊥ since the matrix element now hits Θ 1 .
• The soft singularity arises when the gluon becomes soft and thus both ξ → 1 and p k⊥ → 0. When this happens, Θ 1 + Θ 2 → 1 (again, see section 5.6 for details) and dσ f sr becomes (5.29) The demanded subtraction term should satisfy all the singular behaviours, and one option is to construct where the superscript c represents the counter-term. This term furnishes the point-wise approximation to the dσ f sr in all the singular regions and therefore renders the subtracted combination finite. The regulators have been removed by setting = 0 and η = 0 given that the combination is finite and integrable in 4-dimension. The integral can be safely evaluated numerically. One note on the counter term is that the counter term dσ c f sr is fully inclusive over p k , in the sense that within the counter term, any jet algorithm and any additional cuts should be placed on the momentum p J but never on p k . It is effectively a one jet configuration and the gluon in the counter contribution is essentially un-resolvable, just like the virtual gluon in the loop. The dσ c f sr in Eq. (5.30) can be evaluated analytically, which gives Once we add up Eq. (5.32) and Eq. (5.31), we obtain the contribution from dσ f sr .
Similar analysis can be applied to the terms which involve the initial-final collinear singularities, which leads to where the −xf (x) term is the subtraction and we note that for the subtraction, we always have p j = p J and again within the subtraction term, all kinematic cuts will be applied on the p J only and p k is un-resolvable just like the loop momentum. These hold for the rest of the section. To see the combination is finite we notice that, as ξ → 1, Θ 1 → 0 and Θ 2 → 1, therefore the rapidity singularity is regulated by the vanishing value of the combination in the bracket in the rapidity limit. There is collinear singularity in dσ isr , arising when the p k⊥ is integrated to ∞ and restricts z ⊥ = 0. The collinear pole is regulated by the fact that will restrict the p k integration to be within the jet cone to remove the singularity. More physically, in the ISR contribution, the pole arises when the emitted gluon is collinear to the in-coming quark. However once the gluon is restricted within the jet cone, the configuration is then forbidden and therefore the integral is regular. Also we note that since it is free of final-final collinear singularities in dσ isr as well as in Eq. (5.33), Eq. (5.33) will be vanishing as R → 0, since the phase space area allowed by Eq. (5.34) scales as R 2 as R → 0, and there is no R −2 compensation from the matrix element. The property will be used to derive the analytical cross section in the small R limit.
On the other hand, the integrated counter term for the ISR can again be evaluated analytically and we find dσ c,int isr = α s 2π Lastly we apply the strategy to the terms which come from the interference between the initial and final state radiations and have the bilinear structure of the dipole. These terms contain no collinear divergence, and all what we need is to subtract the rapidity divergence, which can be achieved by the subtraction as the follow Here we have introduced the bilinear local counter term which leads to the integrated counter term found to be Add up all contributions, including the collinear virtual corrections in Eq. (5.4), we find that the NLO collinear contribution to the quark channel is given by where P (1) is the q → q splitting function. And we have defined and

The soft contribution
√ŝ . Similar to the collinear radiations, there are 2 soft contributions to the real amplitudes, one ISR (right figure) and one FSR (left figure), as shown in fig. 9. The amplitudes are for the FSR which is nothing but the eikonal current acting on the LO matrix element and for the ISR contribution.
The soft cross section can be obtained by squaring the soft currents and integrating over the soft phase space in Eq. (5.15), which gives where Θ 1,sof t and Θ 2,sof t are given previsouly in Eq. (5.16). The integrand in the last two lines are regular, following the same reason as Eq. (5.33), although the last two lines are solely manipulated out of the ISR and the combination of the FSR and part of the interference contributions. The last 2 lines vanish as R → 0. To isolate the poles in the first two lines, we construct the counter term as The subtracted contribution is then free of any singularities and is integrable in 4-dimension. Therefore we can again remove the regulators to find which can be readily evaluated numerically. The integrated counter term is found to be When combined with the soft virtual correction in Eq. (5.7), we arrive at the contribution dσ un-resolv. R+V,sof t = α s 2π The transverse coordinates are defined as One may recognize that this unresolved term is nothing but the soft contribution to the single hadron inclusive production and the finite part is the contribution arising from the so-called kinematic constraint [1,50]. The total soft contribution is therefore given by where the superscript indicates that the gluons in the virtual and the counter term are unresolvable and the second term can be regarded as the additional kinematic restriction due to the jet clustering.

Full NLO Corrections
Gluing all pieces from both the collinear and the soft sectors, we find the NLO corrections where the collinear -pole in the first line will be absorbed by the proton PDF. We can immediately realize that the coefficient of the η-pole is nothing but the BK-kernel to be cancelled by the nucleus dipole distribution. The pole structures explicitly demonstrate the validity of the CGC hybrid scheme when applying to the jet production at the NLO. The finite NLO corrections to the jet production are then found to be which suggests the scale choice for the rapidity scale ν. By noting that the logarithmic term is proportional to the LO kinematics, it is thus found that ν = p 2 J⊥ /p + q = X f p A , where X f is the momentum fraction carried by the gluon from the nucleus. The scale choice naturally gives rise to the scale for the nucleus distribution required by the CGC framework. The kinematic constrain term is given by The final NLO predictions for the inclusive jet production will then be Once we have dσ up to NLO, with full information of momenta p j and p k , we can generate any observable distributions by histograms. We take the jet energy E J distribution as an example to highlight how this works: where the boundary of each bin could follow exactly the experimental setups.
2. We generate the momenta for the quark p µ j and the gluon p µ k in a 1-jet or 2-jet event out of the free variables p + J and p J⊥ following Eq. (5.13), accordingly. The 1-jet and the 2-jet event should satisfy the clustering condition in Eq. (5.13), respectively. The event is kept if the condition is fulfilled, otherwise vetoed. 3. The E J is constructed by p j and p k depending on whether it is an 1-jet or 2-jet event. If E J ∈ (E J,i , E J,i+1 ), the event will be filled into this bin with weight dσ. Consistent power counting has to be taken care of here. For instance, if E J,i ∼ √ s Q s ∼ λ √ s, then in the soft contribution the soft gluon energy E k ∼ Q s ∼ λ √ s will never contribute to the non-zoro jet energy bins, since by power counting we will have either δ( for two jets, where we have expanded the δ-functions in terms λ and only keep λ 0 terms. In the former case, we will have a one-jet event with the jet energy given by the energetic quark energy E J = E j . In the latter case, the gluon jet (contribution associated with δ(E J )) will never pass the jet energy lower bound E J = 0 < E J,i and therefore will not contribute to the jet energy spectrum, but only the quark jet does.
4. We repeat the steps 2) to 3) to build up the E J histogram until we have sufficient statistics.
Obviously, the procedure fits well to other observables and their spectra can be obtained similarly.

An alternative subtraction and the small-R limit
The subtraction terms to regulate the real contribution are not uniquely determined. An alternative is to construct the counter term for dσ f sr by simply replacing the full anti-k T jet clustering constraint with its small-R approximation. The other subtraction terms for dσ isr and dσ inter. as well as the one for the soft contribution remain the same as discussed in the previous sections. This leads to the new subtraction term for dσ f sr such that where Θ 1,R and Θ 2,R are the jet clustering conditions in the small jet radius limit and are given in Eq. (5.18). To see how it can render the subtracted combination finite we first note that in the collinear limit in which ξp k⊥ = (1 − ξ)p j⊥ , we have Θ 2 = Θ 2,R = 0 and Θ 1 = Θ 1,R = 1. Therefore the divergent behaviour in the collinear limit is removed by [Θ 1 − Θ 1,R ] = 0. As for the rapidity divergence when ξ → 1, we notice that when ξ = 1, we will have Θ 1 + Θ 2 = 1, Θ 1,R + Θ 2,R = 1 and x = τ , and hence the integrand vanishes and the integral is again finite in this limit. We work out the part proportional to Θ 2 and Θ 1 of this new integrated counter term respectively, and find dσ c,int. r,Θ 2 = α s 2π for the Θ 2 term, and for the Θ 1 term. Adding up these two contributions, we find and therefore the dσ f sr contribution is given by where the second combination can be evaluated numerically. There are several interesting aspects to notice • The dσ c,int. r is composed of 2 parts.
1. The first part, as the first 2 lines of Eq. (6.6), is identical to the NLO calculations of producing a parton in the single hadron production.
2. The jet clustering information are encoded in the last 2 lines, which reproduces exactly the un-renormalized semi-inclusive jet function (siJF) derived in [92,93], which consists of the 1-jet configuration in which both partons are clustered into a single jet (6.8) and the 2-jet case which is The -poles in these 2 parts cancel exactly with each other when the corresponding virtual corrections are added into the first part. The cancellation of the -poles is guaranteed by the Kinoshita-Lee-Nauenberg (KLN) theorem [124,125], since unlike the hadron production, the jets are inclusive over the final states. The cancellation of the poles serves as a strong check of our calculation.
• The combination dσ f sr − dσ c r is of order O(R 2 ) and is negligible when R 2 1. Hence when R is small, we can approximate which is practically a very good approximation since usually R is chosen around 0.4. It is also true for dσ isr − dσ c isr , dσ inter. − dσ c inter. and dσ R,sof t − dσ c R,sof t , and therefore they are all negligible in the small-R limit.
• As a consequence, in the small-R limit, we can derive the cross section fully analytically. With some manipulations, we find that the cross section can be written as where x = τ /ξζ, while dσ (0) q→q (p J⊥ /ζ) and dσ (1) q→q (ξ, p J⊥ /ζ) are the LO and NLO cross section to produce a quark with momentum p j⊥ = p J⊥ /ζ, respectively. Their explicit results are given by and (6.14) We note that dσ q→q is nothing but the partonic cross section for the single hadron production in pA collisions in the CGC formalism.
Obviously, at NLO, we can work out one of the ξ and ζ integrations, to find the more familiar form of Eq. (6.15) dσ (1) • It is very interesting to point out that the structure of the foward jet production in Eq. (6.11) is exactly the same as the central jet production using collinear factorization [92], i.e., the cross section is factorized into a cross section that produces a parton and a part encoding the jet formation in the vacuum. One difference between the CGC formalism Eq. (6.15) and the collinear factorization lies in the cross section dσ q→q due to the different mechanisms to produce a parton. While the form of the term inside the bracket in Eq. (6.15) is identical to the NLO quark jet function, obtained before within the collinear factorization [92], with an additional ξ −2 in the argument of the first logarithm. The difference is due to the fact that in the central region one looks at the high transverse momentum of order E J much larger than the angular separation of the splitting partons, while in the forward scattering, the jet transverse momentum is about the same order of the splitting, which introduces the additional ξ after evaluating the phase space integration. Apparently the small radius jet is totally ignorant of the interaction with the CGC shock wave.
This feature can be understood by noting that when R 1, p J⊥ R p J⊥ ∼ Q s where p J⊥ R sets the typical scale of the parton transverse momentum with respect to the jet axis inside the jet. As sketched in fig. 10, any parton will be knocked out of the jet if interacts with the shock wave, since it will require an additional transverse momentum p ⊥ ∼ Q s p J⊥ R, and therefore does not contribute to the jet function. As a consequence, any parton inside the jet can not experience the shock wave and the jet function remains the same in the CGC formalism as the collinear counter part.
Another way to understand the feature is that the typical time scale for the jet formation is of order O(1/p J⊥ R) which occurs much later than the semi-hard interaction that happens within the time period of order O(1/Q s ) for R 1, and therefore the formation of the jet feels no shock waves.
• Following the previous argument, using the power counting proposed in [1], we can derive that in the small-R limit, to all orders, the jet cross section can be written as the factorized form that 7 where x = τ /ξζ and J q (ζ) is the quark siJF [92,93] in the large N C limit, which is where we have included the contribution from the gluon as the signal jet and P (1) . Again compare with the jet function in the central region with large transverse momentum, there is an additional ζ −2 in the first logarithm. The factorization is illustrated diagrammatically in fig. 11. The small-R approximation for the gluon channel can be found in the Appendix C.1.
• We note that the same argument holds for any other jet substructure observable v, such as the jet mass, the angularities, the soft drop, as long as v p J,⊥ , which indicates that for any jet substructure studies in the small-x region, the cross section can always be factorized into the product of a dσ that produces partons, similar to the one in Eq. (6.16), and a jet function for the observable v, occurs also in the collinear factorization. Given that for many interesting jet substructures, the corresponding jet functions have already be obtained up to next-to-next-to-next-to-leading order (N 3 LO), see for instance [126][127][128][129][130][131][132][133][134][135][136], this factorization feature can thus be used to dramatically simplify and realize future calculations of the jet substructure distributions in the small-x regime. Last but not the least, the factorized form lays down the foundation for combining the existing parton shower with the CGC calculations, in which one manage to calculate the fully differential cross section within the CGC framework just like what we have presented in this work, and then shower the physical states in vacuum due to the factorization we pointed out.

Threshold resummation
Just like the hadron production, there exist large threshold logarithms in the jet production cross section, represented by ln n (1−ξ) 1−ξ + . When the jet p J⊥ becomes large, we will have τ = 7 We note that when we derive the factorization theorem, we default to the energetic jet assumption in Figure 11. The factorization of the jet cross section in the small-R limit p + J /p + p = p J⊥ e η J /p + p quickly approaches x and thus ξ → 1. This means that the radiations outside the jet cone are restricted to be soft. In the threshold limit, the ln n (1−ξ) 1−ξ + becomes dominantly large. The threshold logarithms are the most obviously seen by applying the Mellin transformation 1 0 dξ ξ N −1 f (ξ) to the small-R limit cross section dσ q→q in Eq. (6.15) where we have applied the Mellin transformation to Eqs. (6.15) and (6.13) in the threshold limit ξ → 1, following the threshold Mellin transform rules in Eq. (A.10). Here,N = N e γ E . The threshold ξ → 1 limit maps to the largeN limit after taking the Mellin moment, and we have only kept the threshold contribution, i.e., the non-vanishing terms asN → ∞.
Here to clearly identify the types of the logarithms, we unfold back the color constant N C in the NLO cross section to the color operators T and the Wilson line W a a to keep track of the color structures. We have assigned i to the incoming parton and j to the out-going quark.
The logarithms can be resummed to all orders. We will leave the details to another work [137]. Here let us emphasize several points associated with the threshold logarithms • Some of the logarithmic terms in Eqs. (7.1) and (7.2) can be reduced by the scale choice µ = p J⊥ and X f = X A in dσ q→q . The scale choice characterizes the scale where the semi-hard interaction happens. Note that X f = X A is the typical scale choice for the CGC framework. The scale evolution associated with X f gives the BK equation. This set of scale choice is good for eliminating large logarithms when being away from the threshold.
• However when p J⊥ is approaching the threshold, the remaining logarithms in 1 − ξ (or inN in Mellin space) become overwhelmingly large and draw the cross section negative [2], see also fig. 13 in the numerical analyses Section 8. These logarithms should be resummed to all orders to secure the perturbative predictive power.
• Thanks to the factorization theorem [137], the remaining logarithms in the jet function and dσ q→q,thr. can be treated independently. The lnN ln R 2 and ln 2N terms within the jet function are Sudakov logarithms and have been studied extensively before [84,120,[138][139][140][141][142][143]. Their resummation can be achieved by re-factorizing the siJF into a exclusive jet function and a collinear-soft function [137,143]. The procedure is standard in the language of SCET and we will leave the derivation to future work [137]. At the leading logarithmic accuracy, the resummed jet function is which is the direct exponentiation of the Sudakov logarithms.
• It is more involved to resum the logarithmic term from the interference of the ISR and FSR which calls for the threshold resummation technique different from the standard ones in the collinear factorization [143][144][145]. One way to see this is to notice that this term shares the same color structure T a i W aa (r ⊥ )T a j as the non-linear BK evolution, see Eq. (3.45). Indeed, as we have seen in Section 3.4, at higher orders, extra Wilson lines W 's arise, which suggests the need for a non-linear evolution to resum this contribution. The standard Sudakov resummation techniques only turn the color charge T into a resummed form, see for instance [143,[146][147][148], while will leave the CGC Wilson line W untouched.
• A systematic method to resum the logs proportional to T a i W aa (r ⊥ )T a j in Eq. (7.1) has been developed recently in [2], which will be detailed in [137] for the jet production. However, we note that for the leading log resummation it is sufficient to consider the n independent soft gluon emissions at N (n) LO, with strong ordering q − . . p − n−m , where q i and p i are the ISR and FSR momenta, respectively. It can be shown that the matrix could be written as where integration over the phase space is implicitly understood and we have used the property of the color charge operator in Eq. (3.28). When sum up all order contributions, the above equation leads to the exponentiation of the logarithms, which reads One realizes that the above equation reproduce what was found in [2] based on factorization and the rapidity renormalization group equations. The resummed formula suggests that when approaching the threshold, instead of choosing the CGC scale X f = X A , one should use a dynamic scale such that X f minimizes the exponent [2].
We briefly highlight how we derive the resummation form by going to the soft strong ordering limit. We start with NLO. At NLO, we can have for the matrix element which is the eikonal approximation, presented by the soft eikonal current J acting on |M 0 . We also have the ISR contribution, which is where we introduced d αβ (l) ≡ −g αβ +n αlβ +n β lᾱ n·l and have used the light-cone gauge condition n · = 0. We performed the contour integration on l − similar to Eq. (A.2) to get the final result. Here we have suppressed the rapidity regulators.
The LL approximation to the NLO cross section will then be dσ (1) At NNLO, to produce LLs, we only need to consider 2 independent soft gluon emissions with momenta p 1 and p 2 . We require p − 1 p − 2 . For the double FSR, the eiknoal current in this strongly order limit is well known, which gives α (q 2 )|M 0 , (7.11) in the strongly ordered limit, where to get the approximation we have used the strongly ordered limit that l − 2 l − 1 . It is easy to get the matrx element with one ISR and one FSR which gives Here since there is no strong ordering requirements in q 1 and p 1 , we have splitted the contribution to 2 terms with p 1 > q 1 and q 1 > p 1 , respectively.
Therefore, one could find the LL approximation to the NNLO matrix element be and thus dσ (2) Here the factor 1/2! accounts for symmetry factor of 2 indistinguishable gluons in the final state. The calculation is extendable to arbitrary n-th order to find dσ (n) which is exactly Eq. (7.5). Once one sums up all orders, one realizes the LL resummation which is exactly Eq. (7.6) when going to the Mellin space.

Numerical Analyses
In this section, we perform numerical analyses. For the time being, we pay attention to the numerical validation of the theoretical framework developed previously, while we leave the comparison of the phenomenological predictions with the LHC inclusive jet data to another work [137].
For the numerical calculation, we use the NLO MSTW2008 PDF sets [149] for the proton PDFs. We applied the LL BK evolution with α s running [150][151][152] to the nucleus dipole distributions. As for the initial condition, the MV-like model "γ 1119 " are used and we stick to the parameter settings in [152]. As for the S ⊥ , we follow the choice in [50]. In the calculation, we implement the kinematic cuts following strictly the CMS experimental setups in [153], in which the proton and the Pb nucleus are colliding at the center of mass energy with the radius parameter R = 0.5 and required to have a minimum transverse momentum p J⊥ > 3 GeV. We select the jets at very forward pseudo-rapidities 5.2 < η J < 6.6 in the laboratory frame, the boost between the laboratory frame and center-of-mass frame is δη J = 0.465. Throughout the analyses, we set the factorization scale µ = p J⊥ and the rapidity scale X f = X A where X A is the momentum fraction carried by the gluon from the nucleus.
In fig. 12, we first display the comparison between the NLO single inclusive jet cross section with full jet algorithm dependence and its analytical small-R approximation. We stick to the quark channel for this study. The NLO distributions are obtained by the histogram procedure described in Section 5.9. The error bars reflect the numerical uncertainties. We have performed the comparisons for R = 0.2, 0.4, 0.6 and 1.0.
In the left top panel of fig. 12, we plot the inclusive jet p T ⊥ distributions from the full NLO predictions (in open dots) and the small-R approximations (in solid lines). From the plot, we can observe that for larger values of R, jet spectrum spans to larger transverse momentum. This is expected since with a larger jet radius, one clusters more particles into one jet and thus more effortlessly generates larger jet transverse momentum to pass the 3 GeV threshold than the small jet radius case. We also observe that just like the single hadron production in pA collisions, the inclusive jet p J⊥ spectrum manifests the negative cross section problem for large p J⊥ . The issue can be resolved by the threshold resummation [2,137] as sketched in Section 7. As we go to the smaller values of the jet radius R, the better the small-R approximation becomes. This can be demonstrated from the ratio ρ R,p J⊥ ≡ dσ full /dp J,⊥ dσ small R /dp J,⊥ , where the numerator and denominator represent the jet cross section with the full jet algorithm and its small-R limit, respectively. This ratio is shown in the lower part of the left panel, which shows strongly that as R becomes smaller, the ratio ρ R,p J⊥ approaches 1. The comparison serves as a strong validation of the analytic small-R approximation and the factorization we derived. From the ratio, we can see that for small p J⊥ , the small-R approximation makes up ≥ 90% of the complete NLO result for narrow jets with R = 0.2, 0.4 and 0.6, while the percentage drops to 80% for fat jets with R = 1.0. The approximation is even better for large p J⊥ . The small-R approximation could break down when strong cancellation kicks in between different contributions, and therefore one should be careful when invoking the approximation for phenomenology studies. In the present case, it happens to be when the cross section becomes negative. We plot the comparison using the inclusive jet E J spectrum in the right panel of fig. 12. The division of the energy bins follows the CMS [153]. The plot demonstrates the feasibility of our NLO calculation for various different observables and experimental set-ups. Similar behaviours are observed in the E J distribution as the p J⊥ spectrum. The NLO cross section eventually turns negative and the threshold resummation comes to rescue [137]. The small-R approximation did a better job in the E J prediction than the p J⊥ case, which can be understood that it is mostly the low p J⊥ jet events that fill up the E J spectrum. This also explains why the NLO E J spectrum can stay positive for a wider range. Now we turn to study the threshold limit in fig. 13. Here we include all partonic channels which contain threshold logs (the q → q and g → g channels). The upper panel displays the jet p J⊥ spectrum predicted by the complete NLO calculation (in red dots) and its threshold approximation (in green triangles). The NLO cross section becomes negative when p J⊥ > 20 GeV. To investigate the size of the threshold contributions, we plot the ratio of the threshold limit to the full NLO prediction ρ thr. = dσ thr. /dp J⊥ dσ NLO /dp J⊥ as a function of the jet transverse momentum p J⊥ . To manifest the effect of the threshold logarithms ln i (1−ξ) we have removed from the ratio the common δ(1 − ξ) terms shared by both dσ thr. and dσ NLO . We see that when we crank up p J⊥ , the threshold logarithms are overwhelmingly dominant and the ratio ρ thr. approaches 1.

Conclusions
In this work, we applied the recently proposed computational techniques [1,2] to derive the complete NLO corrections to the single inclusive jet production in proton-nucleus (pA) collisions at forward rapidities, using the CGC effective theory. Within the framework, the cross section is factorized into the product of the proton PDFs, the small-x nucleus multi- pole distributions and the perturbatively calculable partonic cross sections. Our result clearly shows that the CGC factorization is valid at the NLO level for jet production. The root of the factorization is the consistent and homogeneous power expansion in the power counting parameter λ ∼ O −t/s . The power expansion relaxes the phase space kinematic constraint which in turn is compensated by the existence of the soft contribution. Our explicit NLO calculation showed that the soft mode is required in order to generate the kinematic restrictions due to the jet clustering in addition to the constraint in the hadron production case [50].
Our O(α s ) calculation of the jet cross sections is fully differential over the final state physical kinematics. The calculation is thus not limited to the jet spectra predictions but is able to predict any distribution of infra-red safe quantities. The full NLO jet cross section is given in Eq. (5.51) and Appendix B. To perform this computation, we have discussed a subtraction method to single out the singular contributions from the phase space integration. The singular terms have been evaluated analytically. The remaining non-singular part contains the jet algorithm and the experimental cuts. Since the remaining integration is finite and integrable in 4-dimension, it was obtained numerically.
We further investigated the small jet radius limit of jet production. Given that in practice the jet radius R is usually chosen as O(0.4), the small-R limit in general is believed to be a very good approximation to the full R result and has been applied in several small-x jet studies. In this work, we are able to carry out the fully analytic small-R jet cross section in Eq. (6.15) and compare it with the full jet algorithm dependent cross section to validate the approximation. We showed that in the small jet radius limit, the single inclusive jet cross section can be further factorized into the same short-distance cross section as the single inclusive hadron production, with only the fragmentation functions replaced by the semiinclusive jet functions (siJFs). The siJFs share the same form as the ones that show up in the jet production in the central region within the collinear factorization with slightly different ζ dependence in the logarithm (see Eq. (6.17)) due to the different jet transverse momentum range. We argued that this factorization feature of the small-R jet cross section holds for generic jet processes and jet substructure observables in the CGC framework. We carefully examined numerically when the small-R limit reliably approximates the full result. The obtained analytic result serves as a guide to appropriately using the small-R approximation in the forward scatterings, meanwhile the observed factorization can facilitate high order and resummation calculations of the jet cross sections and open up the opportunities to realize the computations of event shapes in other forward scatterings. Furthermore, the observed factorized form lays down the foundation for combining the CGC fixed order calculations with the existing parton shower techniques.
Like the forward hadron production, the obtained NLO jet spectrum also exhibits the negative cross section feature when the jet transverse momentum becomes large. Following the suggestions in [2], we resolved the negative cross section problem by the threshold resummation, where additional Sudakov logarithms arising in the jet production have also been resummed. Different from [2], in this manuscript, we accomplish the leading logarithmic threshold resummation using a different approach by considering the strongly ordered independent emissions to all orders. The achieved resummation agrees with the one obtained by the renormalizaton group equation [2,137], which suggests the plausibility of both approaches.
We look forward to comparing the NLO and resummation predictions against the LHC pA jet production data [137] in the future. Meanwhile, we expect the computational setups developed in the present work and in [1,2] lay down the theoretical foundation of the perturbative predictions for the CGC phenomenology involving jet clustering procedures and any other jet substructure and event shape observables.

A.1 Contour Integral
The integral of the form is frequently encountered in this work. The k-loop integration can be performed by contour, which gives We first did the k − -integral. The integral is only non-vanishing when 0 < k + < p + j otherwise all the poles occur at the same side of the complex k − -plain and one can choose the contour in the other side of the plain to enclose no poles. To get the 3rd line, we have performed the k − -contour integral around the upper plane which picks the pole and we let p − j = p 2 ⊥ /p + j to reach the final result.

A.2 Fourier Transformation
We list some integral formulae for deriving the NLO results.

A.3 Mellin Transformation
The Mellin transformation or the N th-moment of a function is given by If we apply the Mellin transform to the convolution of 2 functions In the resummation part of this work, the transformation we used are where H n is the harmonic number, defined as The threshold limit ξ → 1 will map onto the large N limit, which leads to the limit

A.4 Color identities
We provide several color identities used in the CGC multi-point correlators.
Through the Fiertz identity, one can derive that which relates the 3-point functions to dipole distributions. Also one needs the following relation that (A.12)

B Contributions from the other channels
In this section, we list all the remaining channels that contribute to the inclusive jet production, which are the g → g,q → g and g → q sub-processes.
The LO g → g is given by whose form in the momentum space is where τ = p + J /p + p , G(x) denotes the gluon PDF of the proton and F A (k ⊥ ; X f ) represents the momentum space nucleus dipole distribution in the adjoint representation, whose definition is The relation between the dipole in the adjoint representation F A and the fundamental representation F F is given by The NLO correction of the g → g channel to the jet cross section is found to be X (r ⊥ , b ⊥ ) +(dσ g,f sr − dσ c g,f sr ) + (dσ g,isr − dσ c g,isr ) + (dσ g,inter. − dσ c g,inter. ) + (dσ gR,sof t − dσ c gR,sof t ), (B.5) where x = τ /ξ and ξ = p + jg p + g . Here p g is the momentum of the initial state gluon, p jg denotes the momentum of one of the final state gluons that will become the signal jet in the 2-jets case.
The hard factor H g,2 , H qq,2 and H g,3 are found as the following, in Eq. B.9, T R = 1 2 , N f denotes the number of flavors in the quark loop, which is taken to be 3 in our calculation.
The ν-related logarithmic term is where the factor 2 before Θ 2 accounts for the contribution from filling the second jet to the histogram if it passes the jet threshold, where we have used the fact that the filling is symmetric between the 2 gluon jets. The interference term is given by where a factor 2 is again included in the 2-jet case to account for filling both of the jets in the final state. In this case, the counter contribution will also be filled twice. The same situation also happens to the ISR contribution which gives The soft contribution gives dσ gR,sof t − dσ c gR,sof t = α s π 2 where x ⊥ = b ⊥ − r ⊥ and z ⊥ = r ⊥ − r ⊥ and S (8) (B.17) Here p k and η k denote the momentum and the rapidity of the final state soft gluon, respectively. We note that in this work, we assume all the signal jets are energetic with E J Q s ∼ p J⊥ , therefore the soft gluon itself could not form a jet that passes the jet criteria and therefore the soft gluon jet will not contribute to the jet spectrum and the Θ 2 term will only be filled once to the histogram. However, if the condition is relieved, the soft jet could contribute to the p J⊥ spectrum.
The 8-point correlator is defined as X f , in the large N C limit, the S X f can be written as products of the 4-point function and the dipoles such that S (8) which is originated from the color identity that We can see that the g → g channel is very similar to the result of the q → q channel except for the splitting function and the color structure. In QCD, for the collinear limit, the only difference between cross sections of different channels is with respect to the splitting functions [154] . And the shock-wave introduced in CGC, which is the origin of the difference in dipole structure, will not change this property. For the soft limit, according to the conclusion of the soft theorem [155], the only difference between these two channels is the color factor.

B.2 g → q channel
For the g → q +q channel case, we will suppose that the final state quark becomes the jet in 2-jets case, and then we can get the NLO correction of this channel to the jet cross section as where the factor of 2 before Θ 2 accounts for the contribution from theq jet, which at NLO is identical to the q jet case. The interference term is found to be 26) and the contribution from the ISR gives dσ gq,isr − dσ c gq,isr = X f (r ⊥ , r ⊥ ) .

(B.27)
In both cases, the contribution from theq jet is included through the factor of 2 before Θ 2 and the counter terms.

B.3 q → g channel
For the q → g channel which means the gluon becomes the jet in 2-jets case of q → q + g process, the NLO correction to the jet cross section is found to be and dσ qg,isr − dσ c qg,isr = Here p k denotes the momentum of the final state quark. Under this definition Θ 2 has the same form as the q → q channel cases in terms of p k ,ξ and p J .

C The small-R limit for other channels channel
In this section, we list all the remaining channels that contribute to the result of jet production in the small-R limit, which are the g → g,q → g and g → q sub-processes.

C.1 g → g channel
In the small-R limit, by performing subtractions similar to the method in Section 6 and omitting O(R 2 ) contributions, we can get the factorized form for the g → g channel as dσ g,R = dξ dζ ζ 2 xG(x)dσ g→g (ξ, p J /ζ) J g (ζ) , (C.1) where x = τ /ξζ and J g (ζ) is the gluon siJF [92,93] in the large N C limit, which is and we have included the contribution from the g → qq channel, P