QCD Shear Viscosity at (almost) NLO

We compute the shear viscosity of QCD with matter, including almost all next-to-leading order corrections -- that is, corrections suppressed by one power of $g$ relative to leading order. We argue that the still missing terms are small. The next-to-leading order corrections are large and bring $\eta/s$ down by more than a factor of 3 at physically relevant couplings. The perturbative expansion is problematic even at $T \simeq 100$ GeV. The largest next-to-leading order correction to $\eta/s$ arises from modifications to the qhat parameter, which determines the rate of transverse momentum diffusion. We also explore quark number diffusion, and shear viscosity in pure-glue QCD and in QED.


Introduction
The original idea of the Quark-Gluon Plasma phase [1][2][3] was that it would consist of weakly-interacting, nearly-free quarks and gluons (this assumption is implicit, for instance, in treatments of the cosmological QCD phase transition [4]). This picture was naive, since the QCD coupling varies only logarithmically with scale [5,6], so the coupling is in fact quite large at any achievable temperature. Although thermodynamical quantities approach the expected weak-coupling values rather quickly [7][8][9][10], this does not necessarily indicate weak coupling; even in the limit of infinite coupling, analogue theories display 3/4 of the free theory value for the pressure, for instance [11].
Weak coupling would imply large transport coefficients, characterized for instance by a large ratio of the shear viscosity to the entropy density, η/s 1. In fact, leading-order (LO) perturbative calculations of η [12,13] find η/s ∼ 0.5 for coupling values of physical relevance for achievable temperatures. Unfortunately, the presentation in Ref. [13] has led to frequent misinterpretation of the results, such as using the next-to-leading-log (NLL) pocket formulae in regimes where the paper cautions that they are not applicable.
But in any case, experimental results at RHIC [14,15] and the LHC [16][17][18][19] indicate that the shear viscosity is even smaller: numerous authors have found that the experimental data on angular correlations and other experimental measurables are fit very well by relativistic, viscous hydrodynamics, but only if the shear viscosity to entropy ratio is quite small, η/s ∼ 0.1-0.3 (see [20,21] for reviews). This would indicate that η/s is quite close to the value in extremely strongly coupled theories with holographic duals [22][23][24]. First attempts at non-perturbative QCD determinations from the lattice, which require a highly non-trivial analytical continuation, also point towards small values [25][26][27][28][29][30][31]. So do FRG analyses, which also require analytical continuation and other truncations [32,33].
So how should we understand this discrepancy with the weak-coupling calculations? The best way to address this question is to compute the next-to-leading order (NLO) corrections to the shear viscosity. We finally have the technology to do so. One key breakthrough, due to Caron-Huot, was the development of a technique to understand how particles are "kicked" transversely as they move through the plasma, at next-to-leading order [34]. Then there was the development of sum-rule tools for next-to-leading order longitudinal momentum diffusion, identity change, and collinear emission, developed to study photon production [35,36] and recently extended to treat jet energy modification at subleading order [37,38]. With rather modest modifications, we can apply this technology to perform an "almost" next-to-leading order calculation of η, and of quark number diffusion D q , in a hot QCD plasma.
In the following sections, we will give a rather detailed explanation of how one computes η/s perturbatively in QCD, and of what is and is not included in our "almost" NLO calculation. But for the impatient reader, we will give a short summary of the procedure, of what is included and what is missing, why we think the remaining "missing" parts should give only a small correction, and of our final results.
Shear viscosity describes the persistence of any anisotropy in the stress tensor T µν . When a fluid flows in a nonuniform way, such anisotropy constantly develops from the fluid flow, and constantly disappears due to dissipative physics. Shear viscosity measures the inefficiency of that dissipation. It can also be studied by using random thermal fluctuations, through which T µν accidentally becomes anisotropic. The fluctuation-dissipation theorem says that the persistence of these fluctuations also determines the shear viscosity. These concepts are well defined in any theory with well defined thermodynamics, whether or not the stress tensor can be understood in terms of some "particle" degrees of freedom.
The perturbative picture is that the plasma is made up primarily of quasiparticle excitations with momenta of the order of the temperature, and these are responsible for carrying the stress tensor T µν of the plasma. An anisotropic T µν arises when the quasiparticles are distributed anisotropically in momentum space. Their scattering relaxes T µν towards its equilibrium value. This description is sufficient at both leading order and O(g) NLO order. The challenge is to determine the exact form of the collision operator which relaxes the particles towards equilibrium. The LO calculation [13] requires two sorts of scattering process, the 2 ↔ 2 scatterings with all hard (O(T )) external particles and 1 ↔ 2 effective splitting processes between hard participants. There are two features in the calculation. First, there is the momentum a particle carries into a scattering and the momentum it carries out. Second, there is the effect of the momentum which it "dumps" into the other particle in the scattering process. While the first effect always makes the momentum distribution more isotropic, the second effect can make it more or less isotropic, depending on the relative angles of the participants.
To treat the problem at NLO, we need to find all new scattering processes, and corrections to the already-considered processes, which are suppressed by a single power of g. No other corrections are needed because the quasiparticle picture first needs amending at O(g 2 ) or higher. As we shall show in detail, there are only a few such O(g) subleading effects. First, the rate of soft 2 ↔ 2 scattering is modified; this can be described as an additional momentum-diffusion coefficient δq. This modification, and an O(g) correction to the medium corrections to dispersion, also provide an O(g) shift in the 1 ↔ 2 splitting rate. Next, the 1 ↔ 2 splitting rate must be corrected wherever one participant becomes "soft" (p ∼ gT ) or when the opening angle becomes less collinear. And finally, the numerical implementation of the LO scattering kernel [13] already resums a small amount of these NLO effects, requiring a subtraction (or "counterterm") to δq and δq L (longitudinal momentum diffusion).
We are able to give a relatively simple determination of these effects by the use of lightcone techniques. Unfortunately, these methods typically keep track of the incoming and outgoing momentum of a particle, but lose track of the momentum which it transfers to the other participants. This momentum transfer also affects the departure from equilibrium of the other particle or particles which receive the momentum, an effect which we will fail to account for at NLO. Therefore our treatment is only "almost" NLO. However we compute the importance of this effect in the leading-order case and use it to make an estimate for this incomplete treatment. The associated errors turn out to be small, much smaller than the difference between LO and NLO, and therefore presumably smaller than still-uncomputed NNLO effects.
Our main results are presented in Section 5, but we will present one "summary" result right away in Figure 1. The figure shows the ratio of the shear viscosity to the entropy density, computed at LO and NLO. The temperature enters in the choice of renormalized coupling and the number of quark species (there are slight discontinuities where we cross quark-number "thresholds"). The solid thinner band represents our "best estimate" based on 2-loop renormalization group flow from the Z-pole and the coupling fixed via the EQCD choice of Laine and Schröder [39]. The renormalization uncertainty is estimated by varying the scale µ EQCD over the range µ EQCD = (2.7 ↔ 4π)T . The wider bands represent fixing g 2 from the scale πT to 4πT with the standard MS approach, to indicate the importance of the renormalization uncertainty. The plot shows that next-to-leading order corrections lower the shear viscosity by a factor of two at high temperatures T ∼ 1000 GeV, and by a factor of four for physically relevant temperatures, T ∼ 250 MeV. This large change is suggestive that the true value of η/s is smaller than the leading-order perturbative estimate, but it also signals severe convergence problems in the perturbative expansion, even for surprisingly large temperatures or, equivalently, small values of g. The figure also shows the analogous result for the (light) quark diffusion coefficient, which displays very similar coupling dependence. We present more results and discussion in later sections, but we point out now that the largest NLO correction arises from NLO modifications ofqsee Fig. 6. Accurate fits of or NLO results for η/s as a function of coupling are provided in an appendix.
Having finished a quick summary of the problem, our approach, and our main con-clusions, we now summarize the content of the remainder of the paper. In Section 2 we review the definition of transport coefficients and their calculation within the kinetic theory of Arnold, Moore, and Yaffe [40]. In Section 3 we show how to interpret parts of the leading-order calculation in terms of transverse and longitudinal momentum diffusion and of identity-changing processes. The NLO effects take the form of these three effects, plus a shift in the rate of 1 ↔ 2 splittings, and can therefore be efficiently included once we express the problem in terms of these pieces. We do this in Section 4, with special attention to the "overlap" regions between these processes. With all pieces available, we present the main results in Section 5. We also decompose the NLO correction into the respective pieces to see which are most influential. Some technical details, together with fits for our NLO results as a function of the coupling, are postponed to the appendices.

Ingredients
Let us start by briefly summarizing how the transport coefficients we investigate are defined and how they have been computed to leading order in the Effective Kinetic Theory (EKT). Transport coefficients characterize a system's response to weak, slowly varying inhomogeneities or external forces. In the case of the viscosity, if the flow velocity of the plasma is nonuniform, then the stress-energy tensor (which defines the flux of momentum density) departs from its perfect fluid form. In the local (Landau-Lifshitz) fluid rest frame at a point x, the stress tensor, to first order in the velocity gradient, has the form where the metric is the "mostly-plus" one, P is the equilibrium pressure associated with the energy density T 00 (x) = , and the coefficients η and ζ are known as the shear and bulk viscosities, respectively. The flow velocity u equals the momentum density divided by the enthalpy density + P = sT . We will only be concerned with the shear viscosity in this paper; the bulk viscosity requires a more complicated analysis, which has been carried out at leading order in [41,42] for a scalar theory and in [43] for a gauge theory with massless quarks. The charm contribution has been computed in [44]. Additional coefficients such as τ π would appear at higher order in the gradient expansion [45][46][47][48], but we leave their evaluation for a future investigation.
In the presence of further conserved global charges beyond four-momentum, such as baryon or lepton number, the associated charge density n ≡ j 0 and current density j satisfy a diffusion equation, in the local (Landau-Lifshitz) rest frame of the medium. The coefficient D is called the diffusion constant. When (some of) the diffusing species of excitations carry electric charge, as is the case for baryon and lepton number, the diffusion constants for these charged species determine the electric conductivity σ through an Einstein relation (see Refs. [12,13] where the sum runs over the different species or flavors of excitations with e a , D a and µ a the corresponding electric charge, diffusion constant and chemical potential, respectively. These transport coefficients all find a field-theoretical definition through Kubo-type formulae relating them to the zero-frequency (transport) limit of the spectral functions of two-point correlators of the appropriate operators (the stress-energy tensor or other conserved currents). However, for a leading-and next-to-leading-order perturbative evaluation, the diagrammatic approach that would result from a direct application of these Kubo formulae would require cumbersome resummations to all orders of many classes of sub-diagrams. It is thus more convenient to use the linearized version [13] of the EKT developed in [40]. Solving the linearized theory automatically accounts for the needed resummations. The leading-order equivalence between the diagrammatic and kinetic approaches has been proven in [41] for a scalar theory and in [49][50][51][52][53] for gauge theories.
The leading-order EKT introduced in [40] is given by this Boltzmann equation where f a (p, x, t) = dN a /d 3 xd 3 p is the phase space distribution function for the excitation (gluon, quark, antiquark ) of index a. The leading-order collision operator encodes the contribution of tree-level 2 ↔ 2 scattering processes, with Hard-Loop resummed propagators in the soft-sensitive channels, as well as collinear, effective 1 ↔ 2 processes resumming the effect of an infinite number of soft scatterings. Both processes contribute to order g 4 T to the collision operator; a subset of C 2↔2 a [f ] is logarithmically enhanced, g 4 T ln(1/g), due to the aforementioned sensitivity to the soft scale gT . C 2↔2 a [f ] and C 1↔2 a [f ] are described in detail in [13,40].
We now proceed to the gradient expansion of Eq. (2.4), following the notation of [12].
where f a 0 is the equilibrium distribution 1 , f a 0 = (exp(−βu µ P µ − q a α βµ) ∓ 1) −1 , which is determined by the Boltzmann equation at zeroth order in the gradients, C a [f 0 ] = 0. The inverse temperature β, the chemical potential µ, and the flow velocity u µ are functions of (t, x) and obey the equations of ideal hydrodynamics. We will consider u i (t, x) and µ(t, x) to be small perturbations on top of an approximately homogeneous background. The charge of species a under conserved charge Q α is q a α , where α is a label for the flavor symmetry of interest (i.e. quark number in our case).
Substituting f a 0 into the lefthand side of the Boltzmann equation, Eq. (2.4), yields a source for the first dissipative correction, which (after using the hydrodynamic equations 1 We use capital letters for four-vectors, bold lowercase ones for three-vectors and italic lowercase for the modulus of the latter. We work in the "mostly plus" metric, so that P 2 = −p 2 0 + p 2 . The upper sign is for bosons, and the lower sign is for fermions. The full collision operator is Ca; the collision operator linearized in the departure from equilibrium is Ca. of motion) is proportional to the strains [54] depending on whether we are considering chemical potential fluctuations (diffusion) or velocity fluctuations (shear viscosity). The source takes the form Here q a denotes the relevant conserved charge carried by species a associated with the transport coefficient of interest, q a ≡ q a α , (diffusion), |p| , (shear viscosity). (2.7) I i···j is the unique = 1 or = 2 rotationally covariant tensor depending only onp, (2.8) The normalization on I i···j was chosen so that and more generally, where P (x) is the 'th Legendre polynomial. The linearized kinetic equation may be written compactly as where C is a linearized collision operator defined below. To linear order the first dissipative correction must be proportional to the driving term X, allowing us to define the proportionality coefficient χ i···j (p), where we have also used rotational invariance of the collision operator (in the rest frame) to define a scalar proportionality coefficient χ(p), which describes how the departure from equilibrium varies as a function of the magnitude of the momentum. The first-order transport coefficients are then obtained from the kinetic-theory expressions for T µν and for the conserved current J µ α associated to the conserved charge Q α , i.e., where ν a is the spin and color degeneracy of the excitation a (2N c for quarks and antiquarks, 2(N 2 c − 1) for gluons). Upon inserting the first-order deviation f 1 in these equations, the first-order coefficients are easily recovered. Hence, the solution of the first-order, linear Eq. (2.11) yields η and D.
As in [12,13], we will solve Eq. (2.11) and its NLO extension by means of a variational method. To this end, we introduce the inner product (2.14) The linearized collision operator C is symmetric with respect to this inner product, and is given by variation of the quadratic form C is a positive semi-definite operator, and is strictly positive definite in the =1 and =2 channels relevant for diffusion or shear viscosity. As we will show, some NLO contributions are negative, so some care will be needed in defining a positive definite C NLO . Once that is taken care of, the linearized Boltzmann equation (2.11) at LO and NLO is precisely the condition for maximizing the functional Note that the maximized Q determines the rate per volume at which work is dissipated into heat; βQ then gives the rate per volume of entropy production. This structure is valid at LO and NLO, so we have not explicitly labeled C and f 1 in that respect. The strains X i···j may be pulled out of the integrals, and then rotational invariance of the measure and collision operator guarantees that The explicit forms of the source and LO collision parts of this quadratic functional are and The 2 ↔ 2 contribution reads, after symmetrization of the departures from equilibrium [13] where p is shorthand for d 3 p/(2π) 3 and M ab cd (p, k; p , k ) 2 is the matrix element squared for the ab ↔ cd process, summed over all spins polarizations and colors and Hard Thermal Loop (HTL) resummed [55,56] in the IR-sensitive cases. A complete list of these matrix elements appears in [13,40]. The 1 ↔ 2 contribution reads instead [13] The splitting rate is given by where h = p × k is a transverse, two-dimensional vector related to the transverse momentum picked up during the splitting process. d R b and C R b are the dimension and quadratic Casimir operator of the representation R of the particle b. A complete leading-order treatment of collinear radiation must consistently resum the effect of the many soft, transverse collisions to account for the Landau-Pomeranchuk-Migdal (LPM) effect [57][58][59][60][61]. This is achieved through the following integral equation for F b (h): For the case of g → qq, The equation depends on two inputs,C(q ⊥ ) and δE(h, p, k). The former is the leading-order transverse scattering kernel C(q ⊥ ) in units of the Casimir factor, [62,63] .

(2.25)
δE is the energy difference between the initial and final collinear particles. It reads where m 2 ∞ a is the asymptotic mass of the particle a. For gluons m 2 which, when substituted into Eq. (2.16), transforms its extremization into a matrix algebra problem [12]. Our choice of functional basis is motivated by the need [13] to allow infrared behavior of form ∝ p −1 and ultraviolet behavior of form ∝ p −1/2 . The variational procedure is only guaranteed to converge to the right answer as the functional basis becomes complete, but in practice we see good convergence above 4 functions. Later, in our numerical results, we will use 6 basis functions, but our results shift by less than 10 −4 when we make the basis still larger.

Reorganization of the LO quadratic functional
The effective kinetic theory introduced in [40] has been extended to next-to-leading order in [38] for the case where one follows the evolution of a dilute set of high-energy particles of typical energy E interacting with an equilibrated medium at a temperature T such that exp(−E/T ) 1, which is a sensible approximation for the evolution of the leading partons in a jet. There, we found that a reorganization of the form of the LO collision operator was necessary to systematically compute NLO corrections. Specifically, O(g) NLO corrections can only occur where one or more lines carry a soft O(gT ) momentum, because only there do statistical functions give rise to a 1/g enhancement of loop level effects. But transport coefficients are only sensitive to hard O(T ) momenta. So NLO corrections only occur when there is a momentum hierarchy within a diagram. In such cases one can always re-express the diagram as an effective process. When the soft particle is a gluon and does not change particle identity, the process can be understood as giving rise to momentum diffusion; when the exchanged particle is a quark and therefore changes quantum numbers, it is a conversion process. One can already isolate such processes at the leading order. Doing so will make it easier to see how to incorporate NLO processes.
In the diffusion case, the action of the soft gluon exchange is to randomize (diffuse) the momentum of the hard particles by small, O(g) amounts that can be described by a Fokker-Planck equation. The drag, longitudinal and transverse momentum diffusion coefficients appearing in the Fokker-Planck equation can be defined field-theoretically in terms of Wilson-line operators supported on light fronts, which can in turn be evaluated in analytical form using the light-cone techniques mentioned in the introduction. In the conversion case, the soft quark exchange converts a hard quark (gluon) into a gluon (quark) of the same momentum, up to O(gT ). Again, a light-front Wilson line definition for the conversion rate was introduced in [38], leading to a simple closed form expression. At NLO, the diffusion and conversion rates receive O(g) corrections, which were computed in [34,35,38]. These, together with corrections to the collinear 1 ↔ 2 rate and a new, semi-collinear process (which only contributes starting from NLO), constitute the entirety of the NLO corrections to the EKT in the "dilute-hard" approximation appropriate for energy loss.
For computing the transport coefficients we will first show (again) how the effect of soft-gluon exchange can be reorganized into a Fokker-Planck equation in Section 3.1. However, in order to conserve energy and momentum, the Fokker-Planck equation must be supplemented by gain terms, which describe precisely how the momentum lost by a parton in the bath is redistributed. This redistribution of energy and momentum is unimportant for determining the energy loss, but plays an essential role in determining the transport coefficients. The computation of these gain terms is not amenable to an evaluation using light-cone techniques since more than one light-like particle is involved, and therefore computing the gain terms constitutes a major obstacle to computing transport coefficients at NLO. We will use the LO analysis in this section to motivate a NLO Ansatz for the gain terms in Section 4. The treatment of soft fermion exchange and conversions is analogous and will be discussed in Section 3.2.

Soft gluon exchange
We will now analyze soft gluon exchange shown in Fig. 2. Intuitively, the effect of soft gluon exchange on the evolution of the system can be summarized by a Fokker-Planck equation. Anticipating the results of this section, the Fokker-Planck collision kernel can be written records the momentum diffusion parallel and perpendicular to the particle's momentum throughq L andq respectively. The gain terms are necessary to conserve energy an momentum, and record how the energy lost by a parton with momentum k is redistributed to particles with momentum p. The gain terms will take the following form: Here the angular function C ij ab (p ·k) determinesq ij and its explicit form given in Eq. (3.11). It is easily verified that energy and momentum are conserved under the time evolution (∂ t + v p ∂ x )f (p) = −(Cf 1,diff )(p). A simulation and discussion of a similar Fokker-Planck equation (with the gain terms) is given in [64]. Now we will derive these equations by analyzing the 2 ↔ 2 collision integral with soft gluon exchange recorded in Eq. (2.21) and illustrated Fig. 2. The relevant processes are gg ↔ gg, q 1 q 2 ↔ q 1 q 2 (and similar ones where q 1 and/or q 2 are replaced by their antiquarks), q 1 q 1 ↔ q 1 q 1 andq 1q1 ↔q 1q1 , q 1q1 ↔ q 1q1 and finally q 1 g ↔ q 1 g andq 1 g ↔ q 1 g. The LO contribution from soft gluon exchange is obtained by expanding Eq. (2.21) for ω, q ∼ gT , p, k ∼ T , where Q = (ω, q) = P − P is the momentum exchange shown in Fig. 2. In more detail, the phase space integration in Eq. (2.21) is approximated by (see Appendix B.1) where v µ p = (1,p) and v µ k = (1,k) are light-like vectors in the direction ofp andk. The t-channel matrix element in the soft approximation reads [13,40] |M ab ab | 2 is the retarded, HTL-resummed propagator [55,56] in Coulomb gauge (see App. A). For process with identical particles in the initial or final state, the u-channel exchange is equivalent in the soft limit. Finally, in a soft (or diffusive) expansion we may approximate the departures from equilibrium appearing in Eq. (2.21) (3.7) With these approximations the 2 ↔ 2 collision operator in a diffusive approximation takes the form 9) and the gain terms take the form 10) where the angular function is andq ij a is given by Eq. At a technical level, the loss terms arise when the deviations from equilibrium are on the same side of the gluon exchange diagram, and their contribution to the quadratic functional therefore involves (f 1 (p) − f 1 (p )) 2 ∼ q 2 (f 1 (p)) 2 . This is illustrated by the black dots in Fig. 2 (left). The gain terms describe the correlation between the momenta across the exchange diagram (illustrated by the dots in Fig. 2 (right)), and the quadratic functional . From the point of view of the Fokker-Planck equations this term gives rise to a gain term. But from the point of view of the original Boltzmann equation, this is a cross-correlation between the departures from equilibrium of the two particles. Therefore we will refer to these contributions both a gain terms and as cross terms, depending on the context. Examining the expression for (f 1 , C 2↔2 diff f 1 ) loss , we see that it involves one light-like vector, v p . Indeed, the expression forq ij can be rewritten as the Wightman correlator of soft thermal gauge fields along this light-like direction. Using the causality and KMS properties of such light-like correlators [34], these soft contributions toq andq L can be evaluated in closed form [37,38,63], Here gT µ ⊥ T is a cutoff on the q ⊥ ≡ q 2 − ω 2 integration separating the soft from the hard scale 2 . The dependence on this cutoff cancels against the region where ω, q > ∼ T , where the bare matrix elements can be used to evaluate the hard contribution to C 2↔2 . The simple form ofq andq L is a consequence of the fact that light-like separated points are effectively causally disconnected as far as the soft gauge fields are concerned. Using the explicit form of the angular dependence of f a 1 (p) = β 2 χ i···j (p)X i···j and Eq. (2.17), straightforward analysis shows that the loss term reduces to which is the most useful form for evaluating the transport coefficients numerically. The gain terms (Eq. (3.10)) intrinsically involve two light-like momenta v p and v k associated with f 1 (p) and f 1 (k). The points on these two light-like rays are causally connected by soft gauge fields, thus the analyticity techniques used forq cannot be expected to work. All attempts to extend these techniques to two light-like rays have met with frustration, and C ij ab (p ·k) and its moments must be computed numerically. To this end, using Eq. (2.17) and the angular dependence of f 1 (p) = β 2 χ(p) I i···j (p) X i···j , we may 2 We have performed the change of integration variables rewrite the gain terms as where are coefficients which must be evaluated numerically. The complicated weights involving P (p·k) multiplying the matrix elements reflect the angular structure of the collision kernel. When computing the diffusion coefficient ( = 1), the gluon-mediated gain terms described by Eq. (3.14) actually vanish. This is because the gluons carry no charge and quarks and antiquarks have opposite charges, so that χ g (p) = 0, χ q (p) = −χq(p). Thus, the only processes that can give rise to gluon-mediated gain terms are qq ↔ qq,qq ↔qq and qq ↔ qq. However, due to their opposite signs, the quark and antiquark contributions end up canceling in the sum of these processes in Eq. (3.14) (see [64] for further discussion).
When computing the shear viscosity ( = 2), these integrals are convergent, and (c 1 , c 2 , c 3 ) may be evaluated directly (see Appendix B.2 for further details). The UV finiteness of the gain terms was discussed previously in [12,64] where it was noted that in a leading-log approximation (where the HTL propagator in Eq. (3.15) is replaced with the bare propagator) the coefficients c 1 , c 2 , c 3 vanish for ≥ 2.
As discussed in Section 4, we expect that the functional form of the gain terms in Eq. (3.14) will remain valid at NLO but the coefficients c 1 , c 2 and c 3 will be modified by order g corrections. We will only be able to estimate these modifications and their associated (numerically small) contributions to the NLO shear viscosity.

Soft quark exchange
We will now analyze soft fermion exchange shown in Fig. 3, which parallels the soft gluon exchange described in the previous section. In this case, a hard quark with momentum p is converted into a hard gluon with approximately the same momentum through the soft fermion exchange. (The reverse process is also possible, and the set of matrix elements involved in this process are qq ↔ gg, qg ↔ qg, andqg ↔qg.) Q Q Figure 3. Diagrammatic representation of the conversion and gain terms in soft quark exchange processes. The graphical notation is the same as in Fig. 2 and the intermediate quark is soft (single line). The diagram on the left is a conversion term, entering in (f q 1 (p) − f g 1 (p)) 2 , whereas the one on the right is a gain term.
The dynamics of the conversion process are summarized by a set of rate equations [38,64] The conversion rates Γ conv q→g (p) at leading and next-to-leading order are given by Eq. (3.29) and Eq. (4.3) respectively [38], and the gluon conversion rate is The gain term is necessary to conserve baryon number under time evolution. Indeed, the gain term records how the baryon charge associated with conversion of a quark of momentum k to a gluon is balanced by an increase of quarks (or decrease of anti-quarks) of momentum p. We will show that the gain term takes the form where C conv q→g (p·k) (which is given explicitly below in Eq. (3.28)) is a squared matrix element which specifies the angular structure of the conversion process. The angular average of C conv q→g (p ·k) determines the conversion coefficient Γ conv q→g , It is straightforward to show that with the gain and loss terms the total baryon number is conserved under the evolution specified by Eq. (3.16).
To derive these results we return to the 2 ↔ 2 collision integrals with soft fermion exchange. The phase space integral and soft approximations are given in the previous section, Eq. (3.5). The relevant processes are Compton scattering and pair annihilation, qg ↔ gq,qg ↔ qg, and gg ↔ qq. The HTL-resummed matrix elements are [13,40] M qq where S R (Q) is the retarded HTL-resummed quark propagator (see App. A). At leading order in the soft approximation the two become equal: Tr Neglecting the small momentum exchange in evaluating the statistical functions, the contributions to the quadratic functional from these two processes are, for each light flavor 3 The quadratic functional for the conversion process is obtained by adding the Compton and pair annihilation contributions, and sorting the terms into direct (e.g. ). Minor manipulations lead to the final form of the conversion functional 4 3 Both processes occur four times for each light fermion flavor in the sum over species abcd . The pair annihilation process receives an extra factor of 2 (which we place just in front of |M| 2 ) to account for soft u-channel exchange [12]. 4 These manipulations include employing the identity f q 0 (p) , symmetrizing the integrand over p, k, and using the definition νq ≡ 2dF .
Here the loss part stems from the direct terms while the gain part stems from the cross terms The conversion coefficient Γ conv q→g is given by Eq. (3.19), and the conversion kernel C conv q→g (p·k) is given by Varying the conversion functional according to Eq. (2.15) yields the kinetic equations given by Eq. (3.16). At a technical level, the loss terms arises when the deviations from equilibrium are on the same side of the fermion exchange diagram, (f q 1 (p)−f g 1 (p)) 2 , as illustrated by the black dots on Fig. 3 (left). The gain term, which records the correlations between the scattered particles, arises through an exchange of quantum numbers across the fermion exchange diagram, Fig. 3 (right).
Examining the expression for (f 1 , C 2↔2 conv f 1 ) loss , we see it involves one light-like vector v µ p . Indeed, the conversion coefficient, Γ conv q→g , can be rewritten as a light-like Wightman correlator of soft fermion fields [37,38]. As shown in [38], this correlator can also be evaluated in closed form using light-cone techniques (see App. D.2 in [38]), yielding As in the previous section, the dependence on the cutoff µ ⊥ cancels against the hard region, ω, q ∼ T , where bare matrix elements may be used. In practice, for the shear viscosity ( = 2) we solve for the fermion sum (f q 1 + fq 1 )/2 and set the fermion difference (f q 1 − fq 1 )/2 to zero, while for the diffusion coefficient ( = 1) we solve for the difference and set the sum to zero. Thus, the fermion gain term only enters when calculating the diffusion coefficient. For the numerical evaluation of the loss term, we substitute the angular form f a 1 (p) = β 2 χ a i···j (p)X i···j into the quadratic functional (Eq. (3.26)), use Eq. (2.17), and sum over flavors to find For the gain terms (which are only relevant for = 1), we substitute f a 1 (p) = β 2 χ a i (p)X i into Eq. (3.27) and find Similarly to the momentum diffusion case, the gain coefficient must be evaluated numerically as worked out in Appendix B.2. At NLO we expect the form of the quadratic functional (Eq. (3.30) and Eq. (3.31)) to remain valid, but we have been unable to evaluate the gain coefficient c 1 beyond leading order. We will estimate the NLO modifications of this coefficient in the next section, and evaluate its (numerically small) contribution to the NLO diffusion coefficient.

Diffusion and identity in collinear processes
Consider the collinear process introduced in Eq. (2.22). Although it is unnecessary to do so in a leading-order calculation, one can interpret the k p and (p − k) p parts of the integration in Eq. (2.22) as representing diffusion and identity-changing processes respectively for the case q → qg, as each representing identity changing processes for the case g → qq, and as each representing diffusion processes for the case g → gg. Specifically, for the case of q → qg, one can estimate Eq. (2.23) with Eq. (2.24)-Eq. (2.26) for k p or (p − k) p as [38] (see App. C.1 for details) Therefore the small (p − k) ≡ p region of the integration in Eq. (2.22) is parametrically of form piece represents an identity-changing process. There is also a gain term due to f q 1 (p ) in the = 1 case (see the considerations on the IR behavior of f 1 in App. C.1). So the small p region of the integral can be understood as identity-change. However it is not necessary to do so, since there is no dp /p enhancement of this region, so p T gives rise to a suppressed contribution.
Similarly, for k p, the integral is approximated by (3.35) We can approximate (f q 2 , canceling the f 0 (k)/k ∼ T /k 2 behavior to give a nonzero contribution to the longitudinal diffusion coefficient. But again, because the integration is then only 0 dk, the k T region is suppressed. In particular, if we take k (or p ) to be O(gT ), in each case we find a contribution which is O(g) suppressed. Therefore we do not technically need to consider these regions as identity change or longitudinal momentum diffusion in a leading-order treatment. But it will be important in an NLO investigation that these limiting regions can be described in this way.

NLO corrections
Here we show how to incorporate next-to-leading order corrections into the leading-order treatment discussed in the previous section. We begin by showing how to do so in a strict expansion in g. Then we show the problem with this method; the resulting collision integral is not manifestly positive. Arnold, Moore, and Yaffe already encountered this problem in their leading-order treatment [13], which they then avoided by not using momentum cutoffs, instead applying screening corrections at all momentum transfer scales. This led to a manifestly positive collision operator which agreed to O(g) corrections with the strict leading-order form when g is held small. We show how to make a similar treatment of the O(g) corrections, which leads to a stable numerical treatment.

Strict NLO treatment
In the last section we saw how to reorganize the leading-order treatment of Arnold Moore and Yaffe [13] into a contribution from generic momenta without screening, cut off at a transverse scale µ ⊥ , and effective diffusion and identity changing processes. The scale µ ⊥ cancels when summing the two contributions, providing that we choose this scale to be sufficiently small. This leads to a self-consistent definition of the leading-order scattering operator. Furthermore, under this definition the linearized collision operator C is structured strictly as a g 4 object times a log plus constant, and therefore contains no formally subleading in g content. Our goal in this subsection is to extend this treatment, capturing all O(g) corrections.
The only way O(g) corrections can arise is if the physics of q ∼ gT degrees of freedom features in a calculation. These are highly occupied, and loop corrections are of order g 2 f 0 (q) ∼ g when bosons propagate at this energy scale. Furthermore, the HTL effects which are essential at this momentum receive the first non-HTL corrections at O(g).
Among 2 ↔ 2 processes, the gT scale appears only when the exchange momentum becomes small -in which case the process degenerates into a diffusion or identity change process -or when an external particle becomes soft, p ∼ O(gT ). In the latter case the other states are nearly collinear, and this possibility will be part of what we call semi-collinear processes below. Among 1 ↔ 2 processes, the gT scale appears in the transverse exchange momentum q ⊥ and the screening mass m ∞ appearing in Eq. (2.24) and Eq. (2.26). Each will receive an O(g) correction [35]. Furthermore, our treatment in Eq. (2.22) involved a collinear approximation which breaks down when one of the splitting daughters becomes soft, k ∼ gT , p−k ∼ gT , or when the transverse momentum becomes larger, h ∼ √ gT 2 .
We already showed that the case of a soft splitting daughter can be treated as a correction to the diffusion and identity change rates. The large-h region is what we call semi-collinear processes. We showed in [38] how to handle each sort of O(g) correction, except for the gain terms which we discussed above. In summary, to perform an almost-NLO treatment (in the sense of only missing these gain terms), we include the following: • We shift the transverse momentum diffusion coefficientq a by [34] • We shift the longitudinal momentum diffusion coefficientq a L by [38] δq a L = − where µ NLO ⊥ is a new separation scale between NLO soft and hard (semi-collinear) processes.
• We correct the conversion process rate Γ conv q→g to [38] δΓ conv • We correct collinear 1 ↔ 2 processes via the incorporation of O(g) corrections tō C(q ⊥ ) and m ∞ , appearing in Eq. (2.25) and Eq. (2.26). The procedure is to modify the splitting rate γ a bc (p; p − k, k) precisely as is described in Appendix E of Ref. [38]: • We include corrections to the collinear approximation in C 1↔2 by incorporating the first non-collinear corrections. We postpone the details to subsection 4.3. The short version is that, in Eq. (2.24), we have made approximations which only hold when h is sufficiently small, h ∼ gT 2 . When it becomes larger, h ∼ T 2 √ g, the approximations break down and we must be more careful. In treating this region we need an IR cutoff on h, which exactly compensates the (UV) cutoff µ NLO ⊥ we need for the longitudinal momentum diffusion and identity change processes at NLO.
Adding these contributions to the strict leading-order contributions of the previous section produces a collision operator which is fully NLO except for NLO contributions to the gain terms. Furthermore, it again exists strictly as an O(g 4 ) piece and an O(g 5 ) piece, each containing logs of the coupling, but with no formally higher order content.

Problem with strict order-by-order
Except for quite small coupling, the approach of the last subsection fails in practice. We already see why by considering its application at leading order. How small does the separation scale µ ⊥ need to be to find a result which is µ ⊥ -independent? The answer is that we need µ ⊥ T , since T is the natural scale for f 0 (p) and χ(p) to vary. However, once µ ⊥ < m D , the momentum diffusion and identity change contributions of Eq. (3.12) and Eq. (3.29) become negative. But whenq andq L are negative, the collision operator is not strictly positive. Within a finite basis of relatively smooth functions, this nonpositivity may not manifest itself, if the strictly positive contributions from C 2↔2 and C 1↔2 are large enough. But as we consider functions with large p-derivatives, the importance ofq L grows relative to other terms. So too doesq for functions which peak at very small p. So for a sufficiently large basis of test functions in Eq. (2.27), the collision operator will operate nonpositively within our Ansatz subspace.
This problem was already recognized in Ref. [13]. The solution there was to abandon the strict leading-order methodology. Rather than introducing a separation scale and replacing the screened IR piece with a differential operator, they incorporated screening corrections into the scattering matrix elements responsible for diffusion and number change, at finite momentum exchange. At weak coupling this procedure is equivalent to the strict leading-order treatment up to corrections which begin at O(g), and which are dependent on the exact methodology used for incorporating the screening corrections (see [13] section IIIB and Figure 4). Here we will adopt the precise prescription detailed in Appendix A of the reference.
We can certainly use this methodology for the leading-order collision term C 2↔2 . However, we must check whether the resulting leading-order collision operators then already incorporate formally O(g) subleading corrections, and if so, we must make a subtraction to avoid a double counting.
To see how each approach works in practice, and to illustrate how nonpositivity arises in the strict case and O(g) corrections arise in the AMY procedure, we will delve a little into the details of the soft region at leading order. The most convenient choice of phase space integration variables for evaluating the leading-order 2 ↔ 2 process in the t channel (suppressing particle-species labels and an overall factor of 1/(2 8 π 5 )) is [13] (see also App. B.1 for details) (4.5) Here (p, k) are the incoming particle energies, (p = p+ω, k = k −ω) are the outgoing energies, and we frequently reorganize the first two integrals, 0 dq q −q dω = dω 0 dq ⊥ (q ⊥ /q), with q 2 = q 2 ⊥ + ω 2 . Reducing the final line to a scalar expression requires evaluating the angles between the momenta p, k, p , k ; the relevant angles are listed in Eq. (A21) of Ref. [13].
For small q ⊥ one must modify the matrix element to reflect (HTL) screening effects. The strict leading-order procedure is to introduce an intermediate scale µ ⊥ . Above this scale we neglect changes to the matrix element. Below this scale, we systematically expand in (ω, q ⊥ ) (p, k) to obtain the diffusion and conversion expressions of the previous section. This affects the integration limits, with the (p, k) integrals extending to 0, and it affects the matrix element and the departures from equilibrium, which can be expanded in gradients for gluon exchange or replaced with their p = p , k = k limits for quark exchange: where the last square bracket means either the last line of Eq. (3.13) and Eq. (3.14) or of Eq. (3.30) and Eq. (3.31). In the second and third lines the q ⊥ , ω integrals and p, k integrals factorize separately. The ω, q ⊥ integrals are computed using sum rules, giving rise to a logarithm ln(µ ⊥ /m D ). In practice when m D is not small this is where lack of positivity enters.
On the other hand, the AMY procedure [13] is to retain the integration measure and distribution functions of Eq. (4.5), and to replace the tree level |M| 2 with an HTL form, detailed at some length in the reference, at all q, ω values. The p, k integration limits are not changed and the statistical functions are not simplified. That is, (4.7) To identify O(g) differences between these procedures, we must understand where the differences occur when g is genuinely small (so m D /T 1). For generic p, k ∼ T , the regime q ⊥ ∼ T differs by an O(g 2 ) amount, because M AMY = M free + O(g 2 ) in this regime. For q ⊥ ∼ gT , we have M AMY = M HTL + O(g 2 ) and the small-ω, q approximations to statistical functions and angles are O(g 2 ) after we symmetrize over positive and negative ω.
On the other hand, there is the region where one external particle becomes soft and the exchanged four-momentum is soft. (The region where both external particles are soft is highly suppressed.) When k ∼ gT , the two treatments differ in several respects. Clearly the phase space treatment is different; Eq. (4.6) integrates to k = 0, while Eq. (4.5) integrates to 2k = q+ω, an important difference if k ∼ gT . Also the approximations to the statistical functions and matrix element are no longer reliable. Therefore the two approaches have O(1) differences in this region. Provided that k is a gluon, this region is only suppressed by O(g). Therefore this region represents a source of O(g) differences between the strict and AMY methodologies.
Fortunately, when p is hard but k is soft, the process is well described by either momentum diffusion or identity change from the point of view of the high-energy particle. Therefore the treatments differ at O(g), but only because of the region where one external particle is soft, and this region can be captured in terms of diffusion and identity change processes.
In a leading-order calculation we are free to use either approach. The AMY approach is preferred because it gives a positive collision operator. In an NLO treatment, we have calculated the NLO corrections to diffusion and identity changing processes assuming that the strict leading-order treatment is to be used. If we use instead the AMY approach, as we do, we need a calculation at O(g) of the difference between the two leading-order approaches, written in terms of diffusion and identity-change rates. These can then be included as "counterterms" in our NLO treatment. We compute these counterterms in detail in Appendix B.3.

Semi-collinear contributions and reorganization
Returning to the NLO corrections we introduced in Eq. This scale must again be chosen in such a way that its influence is small. Furthermore, even at small g, the δq L and δΓ conv corrections and the semi-collinear ones tend to represent a negative contribution to the NLO collision operator, as the proper O(g) evaluation of these regions is smaller than the naive one included at LO in e.g. Eq. (3.34) and Eq. (3.35). We will thus lose positivity of the collision operator when we incorporate these corrections at not-so-small values of g. So again we need to find a reorganization which reproduces these contributions in the sense of a strict NLO expansion in g. To do this we need to return to the semi-collinear process in more detail.  The relevant kinematics are summarized in Figure 4. Collinear processes correspond to a particle making a large change in energy but a small change in transverse momentum. Elastic scattering is a large change in both -or, for soft processes, a small change in both. The semi-collinear region is where the exchanged transverse momentum is intermediate between these two cases. Therefore it requires subtractions from each. It also requires a subtraction of its soft-exchange tail. We implement this as a cutoff on q ⊥ at the scale µ NLO ⊥ , but physically one could also see this as a way to cut off small energy (really q + ) exchanges.
To understand this region better, consider Eq. (2.23) and Eq. (2.24). In deriving the equation we assumed that p, k ∼ T and h ∼ gT 2 . This allowed a collinear expansion; in particular we could equate q = ω in exchange processes (q − = 0). But this breaks down as we consider larger h values. Fortunately in this regime there is a new simplifying approximation; the integral equation, Eq. (2.24), can be solved iteratively in large δE: This is the same as treating the emission in the Bethe-Heitler limit, ignoring LPM corrections. We will make this approximation in the following. We can also assume that the h 2 term dominates in the expression for δE, Eq. (2.26), so δE(h) = h 2 /(2pk(p−k)). However, as noted above, it is no longer sufficient to neglect q − relative to q ⊥ , because q − = δE ∼ gT ∼ q ⊥ . Therefore the kinematics of scattering is changed andC(q ⊥ ) must be recomputed. A more accurate form forC(q ⊥ ) in this regime, replacing Eq. (2.25), is [35,38] 5C .

(4.9)
Physically this arises from two types of processes, in which the splitting is either induced by an elastic scattering or by the absorption of a soft on-shell particle, see Figure 5. Therefore we need to make two subtractions, corresponding to the already-computed LO 1 ↔ 2 contribution (the small δE limit), and the already-included LO 2 ↔ 2 contribution (the small m D limit) [35]: .

(4.10)
The second term is the LO collinear form forC (the small δE limit of Eq. (4.9)). The other subtraction, of the m D → 0 limit, precisely removed the second term appearing in Eq. (4.9). The semi-collinear contribution is found by substituting Eq. (4.10) into Eq. (4.8) and using it to evaluate Eq. (2.23) and hence Eq. (2.22). But one further simplification can be made. For generic p, k ∼ T , we have δE ∼ h 2 /T 3 . The two terms in Eq. (4.10) cancel up to small corrections unless δE ∼ m D ∼ gT , which then requires the semi-collinear value h ∼ T 2 √ g. On the other hand, q ⊥ ∼ gT ; for larger values, q 2 ⊥ δE 2 and the two terms again cancel. Therefore, we can make a systematic expansion in pq ⊥ h in Eq. (4.8). And to get a strict NLO result, we also need to make such an expansion. Averaging over directions for q ⊥ , we have which we combine with the definition (see footnote 5) When this result is inserted into Eq. (2.22) it leads to logarithmic small k and (p−k) divergences unless we impose an IR cutoff on the allowed value of h, namely h ≥ pµ NLO ⊥ . The cutoff dependence cancels that in the NLO longitudinal momentum and identity change rates [38].
The problem with this procedure is the same as the problem with the strict LO rate. We need to insert a regulator scale which separates regions with finite k-momentum exchange from regions which are treated diffusively. Neither side is necessarily positive and the collision operator can have serious positivity problems when the coupling is not small. This necessitates a rewriting of the NLO contributions along the lines of the AMY method at LO. The problem arises because we made a strict h pq ⊥ expansion in Eq. (4.11). Without this approximation, that is, by using the full expression for δC, Eq. (4.10), in Eq. (4.8), we obtain instead the manifestly finite result γ a bc semi (4.14) with (C R − C A /2) appearing on the h + pq ⊥ term for g ↔ qq processes. This is then inserted into Eq. (2.22), resulting in (4.15) In the small g limit, this single expression reduces to the sum of the previous NLO semicollinear, δq L , and δΓ conv contributions, as we show at some length in Appendix C. The appendix also explains how this result is related to the light-cone sum rules. We conclude the illustration of this region with a remark. Currently, we treat the collinear region with Eq. (2.22) and the semi-collinear region, including a subtraction due to the collinear one, using Eq. (4.15). But we could combine them into a single calculation by adding δC(q ⊥ , δE) toC(q ⊥ ) in Eq. (2.24). In this way one would perform LPM resummation with the δE-dependent kernel and thus would no longer need to subtract the strictly collinear one. The contribution would also be manifestly positive. However, this would be extremely impractical. BecauseC(q ⊥ ) is quite simple, we can Fourier transform it analytically and solve Eq. (2.24) in impact-parameter space as a differential equation. But δC does not have a simple Fourier expression, so Eq. (2.24) would need to be solved as an integral equation in q ⊥ space, making a numerical solution very intricate. Treating the parts separately as we do, with the expansion Eq. (4.8) used for the semi-collinear but not the collinear case, does not lead to a manifestly positive result. But the sum nevertheless tends to remain positive, even for large m D /T , because δC(q ⊥ , δE) becomes smaller in that limit.

Estimate of NLO gain terms
We have now presented the NLO contributions except for possible gain terms, as explained in Section 3. Although we will not be able to compute these, we can at least estimate their size, which allows us to assign a systematic error budget for their exclusion. NLO effects arise from soft momenta and from corrections to collinear and semi-collinear physics. For the latter, a small momentum exchange induces a large change in the particle which splits, and since we capture all aspects of this large change, only the soft exchange partner is mistreated. This is a subleading effect. Therefore we only need concern ourselves about NLO gain terms due to momentum diffusion and identity change.
To get an estimate for their magnitude, we compute the soft contribution from gain terms in the LO calculation, where we know how to compute and include them. Then we estimate that the missing NLO gain terms are of order the same size, times a factor reflecting how much smaller NLO effects are relative to the leading order. In the = 2 case, where, as we've seen in the previous section, gain terms are only mediated by soft gluon exchange, we shall use where C =2 is a constant that we vary to incorporate our ignorance of the actual size (and sign!) of the NLO gain terms. We are thus making the Ansatz that the NLO corrections to the gain terms take the exact same form as at leading order, but rescaled by m D /T ∼ g times an arbitrary constant. Similarly, for the = 1 case, where only fermionic gain terms contribute, we assume We evaluate each expression, as given in Eqs. (3.14) and (3.31), in Appendix B.2. Our results will show that the impact of these gain terms is very modest.

Summary
We conclude this section by summarizing the form of the collisional part of the quadratic functional. At LO it is given by Eq. (2.20). At NLO we have where Here the first contribution is found by inserting δq a from Eq. (4.1) in place ofq a in Eq. (3.13). In the form most suited for numerical evaluation it reads The second term in Eq.

Results
As we have mentioned in Sec.
We have however chosen not to pursue this avenue because δC can be large and positive, and the above expression becomes negative for intermediate g values. Hence, we instead define the NLO transport coefficients as the expression before the =⇒ in Eq. (5.1), rather than the expression after the arrow. One way to think of this is that we are computing 1/η at NLO and then inverting it, similar to resumming self-energy insertions into a Dyson sum so they appear linearly in the denominator of the propagator. We will plot the ratio η/s using the leading-order, Stephan-Boltzmann value for the entropy density with N f massless quarks, since the first perturbative corrections to the partition function, from which all thermodynamic observables are obtained, are of order g 2 . This is self-consistent within our kinetic approach, and to do otherwise would be treating particles' contribution to the entropy and to the stress tensor differently and inconsistently.
We will start by presenting results in full QCD with fermions. Later on, in Sec. 5.2, we will also present results for the pure Yang-Mills theory and in Sec. 5.3 for QED. Accurate fits for the NLO results will be presented in App. D.

Results in full QCD
In Fig. 6 we show our results for the shear viscosity over entropy η/s and quark number diffusion D q as a function of m D /T for QCD with N f = 3 flavors. We plot the LO results [13] in solid blue, and for NLO we plot our result in solid green (for η/s) and red (for D q ). To estimate the uncertainty from the undetermined NLO gain terms we provide bands in the same color around these central C = 0 NLO values. The bands are obtained from taking C in the range [−2, 2]. This apparently arbitrary choice is motivated by comparing LO and NLO momentum diffusion rates; the NLOq to LOq ratio is δq a /(q a m D /T / ln(µ ⊥ /m D )), which we can read off from Eqs. (3.12) and (4.1); it ranges from ∼ 1 for N f = 6 to ∼ 2.2 in the pure Yang-Mills case. Therefore |C | = 2 appears to be a conservative value in estimating resulting errors. As we point out in App. B.2, we have made another conservative choice there in the evaluation of the gain terms. The uncertainty arising from the gain terms is smaller for D q than for the shear viscosity; in the former case we are dealing with the soft-fermion, = 1 term given by Eq. (4.17), whose LO value (Eqs. (3.31) and (B.8)) is numerically smaller than its = 2 counterpart (Eqs. (3.14) and (3.15)). In both cases the main difference between LO and NLO results arises from δq. This is reinforced by the dashed lines in Fig. 6, which shows results obtained from a collision operator containing, beyond leading order, only the NLO corrections toq: with the pertinent "counterterm" (f 1 , C 2↔2 O(g) finiteq f 1 ) given by Eq. (B.18). We see that this curve lies quite close to the (C = 0) full NLO result, indicating that other corrections are small or largely cancel each other. But the δq contribution is so large that it starts to overtake the leading-order collision operator before m D = 1T and represents a factor-5 modification for α s = 0.3. We present an accurate fit for the NLO curves In order to study more quantitatively the observed similar trend between the NLO η and D q , compared to their respective leading orders, we plot the NLO/LO ratios in Fig. 7, complete with gain uncertainty bands, as a function of m D /T for QCD with N f = 3. As the plot shows, the two central values fall within the uncertainty bands. Each transport coefficient is dominated by elastic scattering and in each case the ratio of δq to the leadingorder elastic effect is about the same; therefore the trend with coupling is very similar.
True vacuum renormalization effects will first arrive at NNLO (at O(g 2 )), so we do not yet see effects of coupling renormalization. This makes it difficult to use any internal consistency to set the scale in our calculations. Nevertheless, we are clearly very interested in plotting the temperature dependence of the LO and NLO transport coefficients, which requires picking a prescription for g(T ) and for the quark mass thresholds, with the understanding that the different choices might differ starting parametrically from NNLO. Various choices are commonly employed in the literature. One widely used prescription is to simply take the MS coupling at n loops, with threshold matching at n − 1 loops, and choose the renormalization scale µ to be a multiple of the Matsubara frequency 2πT (usually a set of values such as µ = {0.5, 1, 2, 4}πT is employed to estimate the scale setting uncertainty). Another choice is to use the "effective QCD coupling", introduced in [39] as the matching coefficient appearing in the dimensionally reduced effective theory EQCD (Electrostatic QCD, [65][66][67][68][69]). The two-loop expression for this matching coefficient, as computed in [39], is better suited to describe the coupling in settings where contributions from the soft scales play a major role, as the computation of the spatial string tension and comparison with lattice data in [39] display. Since the LO results are dominated by the logarithmically enhanced diffusion and conversion processes [13], which are very sensitive to the soft scale, and the NLO results are dominated by the large corrections toq, which are in turn determined from EQCD, we argue that the EQCD coupling is the best choice for these transport coefficients. Hence we will mostly use the EQCD coupling from [39] in our plots. We start by plotting the coupling itself, as shown in Fig. 8. where a quark contributes half as much (Stephan-Boltzmann) entropy as a massless quark, which we therefore pick as our criterion for the quark's decoupling temperature (for instance, the b quark decouples at T b ≈ 1.55 GeV under this choice). As we remark in App. E, the matching to the EQCD coupling cancels the leading renormalization point dependence, which is why the EQCD curves are nearly identical. Fig. 9 shows the LO (blue) and NLO results for η/s (green) and D q (red) with the effective EQCD coupling, set at the entropy-motivated prescription µ = 2.7T . At the quark mass thresholds we switch from describing a system with N f + 1 massless quark flavors to describing a system with N f massless flavors, leading to a discontinuity in the coupling, the entropy density and the transport coefficients, and therefore in each curve. Our treatment is insufficient near each threshold because we have not developed an η (or D q ) calculation which correctly treats massive quarks. We show the uncertainty bands corresponding to is shown by the shaded green and red bands respectively. This uncertainty estimate is described in Fig. 6.
the previous values for the arbitrary constant in Eq. (4.16): C = ±2. As the plot shows, in the η/s case the uncertainty band due to missing NLO (gain) contributions grows larger as N f increases with increasing temperature. This is because the LO gain term, which multiplies C =2 in Eq. (4.16), has terms proportional to N f and to N 2 f , as can be inferred from Eq. (3.14). We remark that, as expected from Fig. 6, the NLO results are much smaller than the leading order: at temperatures of the order of the QCD transition the NLO η/s is smaller by a factor of 5, which becomes a factor of two for T ∼ 1 TeV. 7 The gain uncertainty band, on the other hand, represents a +30%, −20% correction to the NLO result. In terms of the strong-coupling results [22,70,71], the NLO results for η/s (D q ) can get smaller than 2/(4π) (2/(2πT )) at the lowest temperatures, corresponding to couplings of the order of α s ∼ 0.35.
In Fig. 1 we analyzed another source of theoretical uncertainty, arising from a different scheme for the running coupling. Besides the LO and NLO results with the EQCD effective coupling, already presented in Fig. 9, we also show results obtained from the two-loop QCD MS coupling discussed above. As the plot shows, the LO and NLO uncertainty bands introduced by the different choices adopted for the renormalization scale are well separated (except at the lowest temperatures, where, as Fig. 8 shows, the MS coupling for µ = πT is O(1)). This is consistent with the expectation that the running coupling is an NNLO effect and should thus be smaller than NLO corrections. 7 We present these high-temperature results only to analyze the convergence of the perturbative series.
They do not apply to the early universe at these temperatures, where electroweak and leptonic degrees of freedom, absent from this calculation, would play a major role. In fact, for early universe applications, electroweak degrees of freedom will always play a dominant role.

Results in pure Yang-Mills
Pure Yang-Mills theory is only of interest for academic reasons. Nevertheless, since it is straightforward, and since most lattice results for the viscosity [25][26][27][28][29][30][31], as well as analytical studies [72], are actually for pure Yang-Mills theory and not full QCD, we will present results for this case. In Fig. 10 we show the η/s ratio in pure Yang Mills for N c = 3. The general trends are the same as in full QCD but, interestingly, the NLO/LO ratio is smaller as a function of m D /T than it is for full QCD. When examined in terms of g (see the upper scale in α s ) they are however similar. It is also worth noting that the absolute values for η/s are larger than for N f = 3. In Fig. 11 we plot the η/s ratio in N c = 3 Yang-Mills theory as a function of the temperature. The coupling is fixed as follows: • At a sufficiently high scale we impose the two-loop asymptotics α s (µ)/π = −8/(β 0 t)− 16β 1 ln(t)/(β 3 0 t 2 ), with t = ln(µ 2 /Λ 2 MS ), β i as given by Eq. (E.3) and Λ MS = 1.24T c [73], with T c the critical temperature.
• For the two-loop QCD MS coupling, this asymptotic value is then evolved down to lower scales using the two-loop β-function in Eq. (E.1). We present LO and NLO results as wide blue and green bands respectively, reflecting the renormalization scale uncertainty. Uncertainties arising from the Λ MS /T c ratio or from the two-loop 0.1 1 10 1 10 100 1000 truncation of the β-function should be smaller than the large bands arising from the variation of the renormalization scale.
• For the effective EQCD coupling we use Eq. (E.4) as before. The displayed darker blue (LO) and green (NLO) bands, for the the same µ interval as in the MS case, are much narrower, given that the dependence on µ is very small in the absence of the discontinuities at the quark mass thresholds.
In this case one observes again two non-overlapping bands for the LO and NLO shear viscosity. Due to the smaller values of the couplings 8 the shear viscosity over entropy density is larger than in full QCD in the transition region.

Results in QED
We have also obtained the shear viscosity for QED. In this theory the large NLOq contribution is absent, due to its non-abelian nature, and the coupling is small. We have taken α = 0.0072973525664 and one massless Dirac fermion, so as to describe an electronpositron-photon plasma at m µ T m e . In QED m 2 D = e 2 T 2 /3 and hence m D /T ≈ 0.17. (5.4) Hence, the NLO central value (C =2 = 0) corresponds to a 1.4% increase over the LO shear viscosity, while the upper and lower values correspond to a 2% and a 0.8% increase respectively. We can thus conclude that the abelian NLO corrections tend to decrease the collision operator and that in the case of QED perturbation theory works very well.

Conclusions
The main aim of this paper has been to compute the shear viscosity and quark diffusion coefficient of QCD at "almost" NLO in g. This involved partially resumming some O(g) effects in the leading-order treatment, and some O(g 2 ) effects in the NLO treatment, in order to maintain positivity of the collision operator. Also we invert the full C +δC (leading plus next-to-leading order) collision operator, rather than expanding in δC as suggested in Eq. (5.1). It also involved neglecting gain terms 9 which we could not compute at NLO, but which proved to be small at leading order. We have estimated the possible effects of these missing contributions and found they are likely quite small. From a technical standpoint, the most important result of this paper is the methodology introduced in Sec. 4.3, where we introduce a 1 ↔ 2 rate, Eq. (4.14), which smoothly extends into regions of soft or semi-collinear (less collinear) radiation, without the need for intermediate regulators. In this paper we have only needed to treat this new equation in the single-scattering (Bethe-Heitler) regime, but it would be interesting to try to solve it as an integral equation, thereby incorporating LPM interference when needed. We leave this, together with applications of this approach to thermalization or jet quenching, to future studies.
The qualitative trend observed for the shear viscosity and the light quark diffusion coefficients as a function of the coupling m D /T , both in QCD with three light fermions and in the pure gauge theory, is as follows (see Figs. 6, 7, 10): the NLO curves in green (η/s) or red (D q ) start to diverge significantly from the LO ones in blue for m D /T ∼ 0.5, with the NLO transport coefficients becoming as small as one fifth of the LO for values of m D /T corresponding to α s ∼ 0.3. Furthermore, the uncertainty band introduced by considering a rather large value for the gain terms at NLO only modify the NLO transport coefficients by 30% at most. In Figs. 1, 9 and 11, we plot instead the transport coefficients as functions of the temperature, which requires picking a prescription for the coupling as a function of the temperature and for the decoupling of heavy quarks. The LO and NLO curves do not overlap, even accounting for the uncertainties arising from the choice of the running prescription, renormalization scale and decoupling point. Therefore the limitations of perturbation theory are much more severe than simply the question of what to choose for the renormalization point. Indeed, even at temperatures of order one TeV, where perturbation theory would be expected to work well, the NLO transport coefficients are smaller than the LO value by about a factor of 2.
The dashed curves in Figs. 6 and 10 show that by far the dominant NLO effect is the large NLO correction toq, first derived in Ref. [34]. This should perhaps not be too surprising. The corrections to splitting rates are not small, but they tend to be compensated by the semi-collinear ones [35]. And as emphasized in [13], elastic scattering is the principal contributor to shear viscosity and number diffusion, with splitting processes amounting to 10-20% effects. Further, the NLO contributions toq represent new physical processes not included at leading order; the inclusion of additional soft emissions in the course of scattering and interference between different scattering processes. Unfortunately the Euclidean methods used to compute δq do not allow us to evaluate these contributions separately. In order to test a theory that is not sensitive to this large δq contribution, we examined QED in Sec. 5.3, finding that the remaining abelian NLO contributions are a percent-level correction.
One important question to be addressed is what should we make of a perturbative expansion that does not converge above m D /T ∼ 0.5, or equivalently below temperatures well above the TeV scale. Taken at face value, the results plotted in Sec. 5 would perhaps suggest a grim answer to this question. However, one could optimistically think that, if we were to correctly identify the physics responsible for these large corrections, and rearrange the perturbative expansion by resumming it, possibly in the form of an Effective Field Theory, then the outlook on convergence would be quite different; a redefined LO somehow incorporating most of the NLO corrections toq would not look so different from the dashed lines in Fig. 6, so that the deviation from the NLO in solid green/red would be much less pronounced. Of course, much action is needed to move this scenario from the realm of wishful thinking into physically motivated perturbative schemes. One possible direction would be to treat the problematic soft sector non-perturbatively. The mapping to the Euclidean 3D theory makes a lattice determination of the soft contribution to C(q ⊥ ) andq possible, with first results reported in [74]. Refinements of this measurement, together with calculations of the shift in the dispersion relation δm 2 ∞ and ofq(δE), seem within reach, due to their Euclidean nature. We also need a better understanding of how such Euclidean measurements can be systematically included into transport calculations within a rigorous Effective Field Theory framework. Other needed ingredients, such as the longitudinal momentum broadening, conversion rates and gain terms, on the other hand, cannot be mapped to the 3D Euclidean theory and cannot thus be currently determined on the lattice. Therefore we should view it as good news that these effects appear to be much smaller than δq. One might hope that, with enough nonperturbative Euclidean contributions, the perturbative approach might work down closer to experimentally realizable temperatures.

Acknowledgments
JG would like to thank Aleksi Kurkela, Marco Panero and Péter Petreczky for useful conversations. GM would like to acknowledge support by the Deutsche Forschungsgemeinschaft (DFG) through the grant CRC-TR 211 "Strong-interaction matter under extreme conditions." DT would like to acknowledge support by the U.S. Department of Energy through the grant DE-FG02-88ER40388.

A Hard Thermal Loop propagators
In the next appendices we will look at matrix elements with soft exchange momenta in more detail. Therefore we need to specify the hard thermal loops, which appear in the expressions for these soft, screened matrix elements. We start with the fermionic HTLs, which are most easily written in terms of components with positive and negative chiralityto-helicity ratio. The retarded fermion propagator reads where the upper (lower) sign refers to the positive (negative) chirality-to-helicity component. The projectors are h ± p ≡ (γ 0 ∓ γ ·p)/2. Here m 2 ∞ is the fermionic asymptotic mass squared, defined such that the large-momentum dispersion relation for helicity=chirality fermions is p 2 0 = p 2 +m 2 ∞ . We similarly define the asymptotic gluonic mass M 2 ∞ . At leading order, their values are where we have also shown the relations to the more commonly used Debye mass m D and quark "mass" m q . Gluons are described in the strict Coulomb gauge by (A.5)

B Gain terms and finite order-g subtractions in 2 to 2 processes
In this section we first provide some details on the phase space integration coordinates in Sec. B.1. We then evaluate numerically the gain terms at leading order in Sec. B.2. In Sec. B.3 we will instead address the O(g) contributions in the 2 ↔ 2 collision operator that need to be subtracted, i.e. (χ i···j , C 2↔2 O(g) finite χ i···j ) in Eq. (4.19).

B.1 Phase space
In Sec. 3.1 we provided the phase space integration in the soft approximation in Eq. (3.5).
We now set out to briefly justify that equation and provide more elements for the evaluations that will be performed in Sec. B.2 and B.3. One starts by eliminating a variable through the three-momentum δ-function, In the t channel, p can then be shifted to q = p − p and an extra ω integral is introduced, i.e.
where we have introduced the four-vector Q = (ω, q) and expanded the arguments of the δfunctions for ω, q ∼ gT p, k, recovering Eq. (3.5). Using the coordinate parameterization of [13], Eq. (B.2) becomes where φ is the angle between the p, q and k, q planes and in going from the first to the second line we have used the change of variables discussed in footnote 2, with q = ω 2 + q 2 ⊥ . For future convenience we recall that in these coordinates and in the soft approximationp

B.2 LO gain terms
Let us begin with the gluon exchange contribution at leading order. We recall that it only contributes for = 2. Starting from Eq. (3.14) and Eq.
where the q ⊥ integration has been extended to infinity, 10 and has been performed numerically, together with the ω integration. The other coefficients in Eq. The fermion exchange contribution only arises at = 1. Starting with Eq. (3.32) and using Eqs. (3.28) and (3.22), together with the same techniques as the gluonic case, we have where again the ω and q ⊥ integrations have been carried out numerically.

B.3 Order-g terms
In Eq. (4. 19) we have introduced (f 1 , C 2↔2 O(g) finite f 1 ) as the O(g) region of the 2 ↔ 2 processes that needs to be subtracted. As we have argued, both gluon and quark exchange processes contribute to it. Let us then write it in terms of χ i···j as The integral converges without q ⊥ cutoff because the result of the ω integration vanishes faster than 1/q 2 ⊥ . However, this happens due to cancellations; the absolute convergence of the ω integral is slower. Therefore the result is actually dependent on our integration choice (like many convergent but not absolutely convergent integrals). If we integrate µ 0 dq q −q dω then we also get a valid µ → ∞ limit but with a different answer. The difference between integration choices is the integral over a region lying between a sphere and the superscribed cylinder; for instance, for c1 and = 2, and performing the integrals from innermost to outermost, we find where we have used bare matrix elements for this UV integral. This is an ambiguity in the soft part of the leading-order gain term, which is canceled by a matching ambiguity in the hard part. The gain term found with the q ⊥ coordinates is larger, leading to a more conservative estimate for the gain-term uncertainty.
where the g and q labels stand for gluon and quark (and antiquark) exchange contributions. Let us then begin by evaluating the gluon exchange contribution. As we have stated in Sec. 4.2, we need to consider the region where ω and q and an external gluon line (p or k) are soft. Let us then take these assumptions in Eq. (4.5) and simplify accordingly: (B.10) We have already switched to ω, q ⊥ coordinates; µ ⊥ and µ k are cutoffs separating the soft and hard scales. The three soft integrations in q ⊥ , ω and k contribute to a factor of g 3 , the soft expansion of the Bose-Einstein distributions contributes a factor of 1/g 2 , which is compensated by the g 2 behavior of the matrix element squared (see Eq. (3.6)) and finally the departure from equilibrium contributes another g 2 , bringing the total to g 5 . The last line becomes where the (χ g (0) ) 2 arises due to the infrared nature of the = 2 departure from equilibrium, as illustrated in App. C.1. We have not included gain terms, where contributions proportional to χ(p)χ (0) or χ (p)χ (0) would arise. Since we do not know the NLO corrections to the gain terms, it makes little sense to subtract this contribution: in our current Ansatz, Eq. (4.16), it just amounts to picking a different arbitrary constant C =2 . Finally, as we argue in Sec. 4.2, the matrix element in this scaling can be obtained from App. A of [13]. It reads (B.12) which correctly reduces to Eq. (3.6) for k ω, q. For what concerns the symmetry factors, the gg ↔ gg process receives a factor of 2 from the identical u-channel contribution and a factor of 2 from the p ∼ gT , k ∼ T region. The qg ↔ qg process receives a factor of 4 from the initial and final state symmetries, so that the full contribution is where we have not used the "finite" label, as this equation contains also power-law UV divergences. Indeed, performing the k integral with µ k gT yields a linear-in-µ k divergent term plus a finite part, the latter responsible for the genuine, double-counted O(g) contribution. Keeping only the aforementioned finite contribution and dropping the odd-in-ω terms we have 14) The terms on the second and third line contribute to both = 1 and = 2 (and to any in general), whereas those on the final two lines, are specific for the = 2 case. We recall that in the diffusion case gluons are in equilibrium, so that χ g (p) = 0. This expression is moreover still not UV finite. Indeed, by using the bare propagators G ⊥ and performing the ω integrations we obtain As a consistency check, let us remark that the form of δq we have written in Eq. (4.1) includes the finite part only. In its evaluation [34], Caron-Huot found also a linearlydivergent part in µ ⊥ , which cancels against a corresponding term in the IR expansion of the hard gluon exchange at NLO. Including such a term, Eq. (4.1) turns into When plugging its UV-divergent part in Eq. (4.21) this agrees with the transverse diffusion part of Eq. (B.15). The calculation of δq L in [38], on the other hand, does not contain linear divergences in µ ⊥ , which also agrees with Eq. (B.15). 11 We can then subtract the bare, UV-divergent contribution Eq. (B.15) from (B.14). The resulting dω integrations do not seem doable by means of analyticity techniques. 12 Upon numerical integration 13 we obtain In Sec. 5 we needed the (χ(p)) 2 part of Eq. (B.17), i.e.
Let us now look at the fermion exchange processes, i.e. Compton scattering and qq annihilation. We start again from Eq. (4.5) and we need to expand for ω, q, p ∼ gT , k ∼ T . In both cases there will also be an equivalent contribution for p ∼ T , k ∼ gT . The deviation from equilibrium for Compton processes becomes (B.19) The annihilation case is equivalent. As we shall see in App. C.1, in the = 1 case the quark departure from equilibrium approaches a constant at LO in the IR, due to the action of the 1 ↔ 2 processes, while it vanishes for = 2, so that the (χ q (0)) 2 term needs to be considered only when computing quark number diffusion. We have also neglected gain terms of the 11 That calculation contains a linear divergence in an analogue of µ k . These are related to the discussion of App. C. 12 It is possible to do some manipulations so that some terms become amenable to analytical methods, but others remain non-analytical due to branch cuts on the imaginary axis starting at ω = ±iq ⊥ . 13 In this case there is no coordinate ambiguity, contrary to what was encountered in footnote 10.
form χ q (k)χ q (0). Given thep ·k-(and hence φ-) independence of that expression, we can directly compute the φ-averaged expansion of Eqs. (3.21) and (3.20), which is so that the O(g) contribution from soft p becomes, summing the Compton and annihilation contributions where we have included the factor of 8N f to account for the initial and final state symmetries, the antiquark contribution in the Compton case and the u-channel contribution in the annihilation case. In the = 2 case we have used the fact that χ q = χq, whereas in the = 1 case we have used the fact that χ g = 0 to sum the quark and antiquark contributions. We can perform the dp integration with a UV cutoff and discard linearly divergent terms as in the gluon exchange case. Keeping only the terms that are even in ω we get to The two-dimensional ω, q ⊥ integration is finite, as expected, since there would be nothing to absorb UV divergences otherwise, given that the O(g) correction to the conversion rates is free of linear UV divergences in the transverse integrals. 14 The numerical integration yields This was just the contribution from having p soft and k hard. The opposite case gives the same contribution, as can be inferred from the symmetries of the integrand, so that the total double-counted contribution amounts to 2 times Eq. (B.23), i.e.

C Equivalence of semi-collinear implementations
In subsection 4.3 we argued that the semi-collinear regions and NLO contributions to longitudinal diffusion and identity change could all be treated simultaneously by evaluating the semi-collinear corrections without approximating pq h and without IR regulation. Here we verify this claim. We also analyze in greater detail the IR form of the 1 ↔ 2 processes and its consequences on the departures from equilibrium in Sec. C.1.

C.1 IR limits
Let us start from examining the IR behavior of the the 1 ↔ 2 rate given by Eq. (2.23), which determines the IR tail of the departures from equilibrium. To do so, let us start from the single soft scattering (Bethe-Heitler) limit of the 1 ↔ 2 rate. It can be easily obtained by solving Eq. (2.24) by substitution, as shown in Eq. (4.8), under the assumption that δE is much larger than the effect of collisions. We then have where we remind that the g ↔ qq process has C F − C A /2 multiplying the second, rather than the first, term in square brackets. Let us first remark that for generic p, k, p − k ∼ T Eq.
where h ≡ h/p and the two transverse integrations are finite, as shown. When plugged in the relevant corners of Eq. (2.22), it turns it into where we have introduced a factor of 2 to accounts for the k ∼ gT and p − k ∼ gT corners in the g ↔ gg process and for the final state symmetry in the q ↔ qg process. We have furthermore assumed χ g (k → 0) = kχ g (0) . That is because, even though the k integration (with gT µ k T ) might seem finite, since the soft k expansion of the departures from equilibrium in Eq. (2.22) yields a factor of k 2 which compensates the 1/k in the rate and the 1/k from the Bose distribution, one should however recall that in writing the quadratic functional in the form of Eq. (2.22) we have performed a symmetrization by shifting some integrations, which is allowed only as long as these integrations are finite. If one were to work with (C 1↔2 χ ij ) a (p), entering in Eq. (2.11), in the soft gluon limit, i.e. (C 1↔2 χ ij ) g (p → 0), one would see a fixed point arising, enforcing i.e. giving rise to a boundary term which indeed forces a linear behavior for the gluonic departure from equilibrium in the IR (see also [64] (C.5) The analysis of (C 1↔2 χ i···j ) q (p → 0) then shows that in the = 2 case ∞ 0 dk k f q 0 (k)[1 + f g 0 (k)] χ g (k) + χ q (k) − 2χ q (0) = 0 , (C.6) so that a linear behavior is enforced for χ q (p → 0). In the = 1 case one has instead which enforces a constant behavior. Again, these constraints are approximately satisfied by the LO variational solution.
Let us now look at the semi-collinear implementation in Eq. (4.14). In the soft gluon radiation limit one has to replaceC(q ⊥ ) with δC(q ⊥ , δE) in Eq. (C.2), yielding 15 where the negative 1/k contribution arises from the subtracted collinear limit in δC(q ⊥ , δE). It would seem that, once Eq. (C.8) is plugged into the quadratic functional, it would generate a contribution opposite to Eq. (C.3), canceling it and removing the linear behavior for the ( = 2) infrared gluonic departure from equilibrium. However, Eq. (C.8) is valid when k ∼ g 2 T . It is easy to see from Eqs. (2.13) that the LO contribution comes from k ∼ T , χ a (p ∼ T ) ∼ T −1 /g 4 . The soft region (k < ∼ gT ) has a large phase-space suppression, so that, even accounting for the Bose enhancement of soft gluons and the linear or constant LO form of χ a (p → 0), it always contributes beyond NLO for all transport coefficients under consideration. Hence, we only need to know the functional form of the deviations from equilibrium no further down than T k gT , and one can show that the semicollinear implementation in Eq. (4.14) does not alter the linear behavior found at NLO in that region, so that we may keep the functional form given in Eq. (2.27) for the test functions.
For quarks one has instead which is equally valid only for k ∼ g 2 T . The behavior at the interface T k gT remains unaltered in this case too.

C.2 Equivalence
Let's look at Eq. (4.15). The leading-order contribution to it would naively come from the strictly collinear scaling, i.e. q ⊥ ∼ gT , h ∼ gT 2 , p, k, (p − k) ∼ T . This however implies that δE(h) ∼ g 2 T and that it can thus safely be dropped from the denominators in the collision kernel in Eq. (4.10), as we have argued in Sec. 4.3, i.e. δC(q ⊥ , δE) = g 2 T m 2 (C.10) which, when plugged into Eq. (4.14), makes it of order g 6 and hence beyond NLO.
At relative O(g), three regions contribute. These are 1. The diffusion region, where a final-state gluon becomes soft. There, assuming k is the gluon's momentum, p ∼ T , k ∼ gT and h/T ∼ q ⊥ ∼ gT .
2. The analogous conversion region, where a final-state quark (or antiquark) becomes soft. Assuming now p−k is the quark's momentum, the scaling is the same: p, k ∼ T , p − k ∼ gT and h/T ∼ q ⊥ ∼ gT .
g ↔ gg and q ↔ qg processes contribute to the diffusion region. Upon accounting for the k and p − k soft regions in the all-glue case and for the final state symmetry in the q ↔ qg χ g (0) in the = 2 case, it can be shown that the coefficient c 1 in Eq. (3.14) contains a term proportional toq L (just rewrite P (p ·k) as 1 + (P (p ·k) − 1) and compare with Eq. (3.4)). It reads When substitutingq L with δq L , as given by Eq. (4.2), and using Eq. (C.4), this can be brought into agreement with the terms proportional to to χ g (0) in Eq. (C.14).
In the conversion region the relevant processes are the q ↔ qg and g ↔ qq ones. In the latter one there is an identical contribution from k ∼ gT . It is easy to see that the resulting contribution is altogether similar to what we just found for the diffusion limit, yielding (C. 16) where we have relabeled k to be the soft quark's momentum and δE c (h ) ≡ (h 2 +m 2 ∞ )/(2k) is the conversion limit of Eq. (2.26). This then results in −χ q (0)] 2 , (C. 17) which agrees with the contribution that would arise form inserting δΓ conv , as given by Eq. (4.3), in Eq. (3.30). For what concerns the terms proportional to χ q (0) in the = 1 case, they match those obtained from Eq. (3.31) through Eq. (C.7): as in the case of Eq. (C.15), we can rewrite P 1 (p ·k) as 1 + (P 1 (p ·k) − 1) in Eq. (3.32). Finally, as we mentioned in Sec. 4.3 in the main text, in the semi-collinear region, where p, k, (p − k) ∼ T , q ⊥ ∼ gT and h ∼ √ gT 2 , Eq. (4.14) can be expanded back yielding, at O(g 5 ), Eq. (4.13). Subleading terms in the expansion contribute at higher orders. This completes the proof of equivalence of the two approaches. We conclude by commenting on the relation of this new approach with the contour sum rules used to obtain δq L and δΓ conv in [35,38]. Take for instance the computation of δq L described in App. F of [38]. There we used the analytical properties of the light-cone amplitudes to deform the contour of the k integration (it is called q + there). In doing so, we encounter poles in the q − variable (called k − there) that can be pinched or not, of the form 1/(k − − δE) (see for instance (F.6) in [38]). Upon deforming the integration contour and expanding for large, complex k (q+), the non-pinched poles contribute (see (F.7)) to a δ(k − ) + δE/(k − ) 2 structure (in the variables of [38]). The first term is responsible for the linear-in-µ k term on the first line of Eq. (C.12), whereas the second gives rise to the µ k independent term (recall that δE ∝ 1/q + in the variables of [38]). The same reasoning applies to the subtracted terms on the second line of Eq. (C.12). In obtaining Eq. (C. 12) we have essentially inverted the order of the q − (k − in [38]) and k (q + ) integrations. As a consequence, it is important to note that, between the new and old approaches, the rates themselves are different as functions of p and k (in the coordinates of this paper), it is only their integral, the collision operator, which agrees (at LO and NLO).

D Fits of the NLO results
In this section we will present fits that reproduce the NLO results by smoothly interpolating between the NLL behavior at small values of m D /T and the δq-dominated one at the opposite end. The former is given by [13]  where η 1 and D 1 are the leading-log coefficients [12] and µ * is the next-to-leading-log one [13].
To obtain the δq-dominated behavior at large m D /T we first briefly show that the collision operator composed uniquely by Eq. (4.21) can be inverted analytically. Let us define which is simply the limiting form of Eq. (2.16) under the assumption that at large enough values of m D /T it becomes dominated by Eq. (4.21). Figures 6 and 10 already provide a motivation for this assumption, which will be reinforced later on. The maximization of Q δq [χ] can be done by functional differentiation with respect to the χ g (p) and χ q (p) (see Eq. (2.15)), which, since the source term is linear in these (see Eq. (2.19)) and the collision operator is quadratic, leads to a simple solution: χ a (p) = 4p +1 T ( + 1)δq a , (D .3) where δq a , the O(g) correction toq, is given by Eq. (4.1). Upon plugging this in Eq. (D.2) and recalling that η = 2Q max /15, D q = 2Q max /(N c T 2 ), we obtain η δq = 16π 4 T 6 945 2d A δq g + 31 32 where the factor of 31/32 arises from the fermionic, rather than bosonic, integrations, similarly to the factor of 7/8 in the Stephan-Boltzmann contribution of fermions to the pressure.
With these ingredients we can obtain simple fits for the NLO curves of Figures 6 and 10 at C = 0. We fit the shear viscosity as where a, b, c and d are fit parameters. As one can see, for small m D /T the curve approaches the NLL approximation (D.1), while at large m D /T they approach Eq. (D.4). Using the latter, the numerical values for η 1 and µ * from [13] and fitting the parameters we obtain, for N c = 3 The fits are accurate to below 0.5% for m D /T < 5. We have tested that they remain below 2% up to m D /T < 10.
With the same philosophy we can fit the NLO curve for D q (Fig. 6) as The fit is accurate to 0.5% for m D /T < 4. We have tested that it remains below 4% up to m D /T < 10.