Systematics of quark/gluon tagging

By measuring the substructure of a jet, one can assign it a"quark"or"gluon"tag. In the eikonal (double-logarithmic) limit, quark/gluon discrimination is determined solely by the color factor of the initiating parton (C_F versus C_A). In this paper, we confront the challenges faced when going beyond this leading-order understanding, using both parton-shower generators and first-principles calculations to assess the impact of higher-order perturbative and nonperturbative physics. Working in the idealized context of electron-positron collisions, where one can define a proxy for quark and gluon jets based on the Lorentz structure of the production vertex, we find a fascinating interplay between perturbative shower effects and nonperturbative hadronization effects. Turning to proton-proton collisions, we highlight a core set of measurements that would constrain current uncertainties in quark/gluon tagging and improve the overall modeling of jets at the Large Hadron Collider.

In order to achieve robust quark/gluon tagging, though, one needs theoretical and experimental control over quark/gluon radiation patterns. At the level of eikonal partons, a hard quark radiates soft gluons proportional to its C F = 4/3 color factor while a hard gluon radiates soft gluons proportional to C A = 3, and quark/gluon tagging performance is simply a function of C A /C F . As we will see, quark/gluon discrimination performance is highly sensitive to perturbative effects beyond the eikonal limit, such as g → qq splittings and color coherence, as well as to nonperturbative effects such as color reconnection and hadronization. While these effects are modeled (to differing degrees) in parton-shower generators, they are relatively unconstrained by existing collider measurements, especially in the gluon channel.
The goal of this paper is to highlight these uncertainties in quark/gluon tagging, using both parton-shower generators and first-principles calculations. We start in the idealized context of electron-positron collisions, where one can study final-state quark/gluon radiation patterns in the absence of initial-state complications. Here, we find modest differences in the predicted distributions for quark/gluon discriminants, which then translate to large differences in the predicted quark/gluon separation power. Motivated by these uncertainties, we propose a set of LHC measurements that should help improve the modeling of jets in general and quark/gluon tagging in particular. A summary and outline of this paper follows.
A common misconception about quark/gluon tagging is that it is an intrinsically illdefined problem. Of course, quark and gluon partons carry color while jets are composed of color-singlet hadrons, so the labels "quark" and "gluon" are fundamentally ambiguous. But this is philosophically no different from the fact that a "jet" is fundamentally ambiguous and one must therefore always specify a concrete jet finding procedure. As discussed in Sec. 2, one can indeed create a well-defined quark/gluon tagging procedure based on unambiguous hadron-level measurements. In this way, even if what one means by "quark" or "gluon" is based on a naive or ambiguous concept (like Born-level cross sections or eikonal limits), quark/gluon discrimination is still a well-defined technique for enhancing desired signals over unwanted backgrounds.
In order to quantify quark/gluon discrimination power, there is a wide range of possible quark/gluon discriminants and a similarly large range of performance metrics, both discussed in Sec. 3. As a concrete set of discriminants, we consider the generalized angularities λ κ β [14] (see also [26][27][28][29]), with the notation to be explained in Sec. 3.1. We consider five different (κ, β) working points, which roughly map onto five variables in common use in the literature: (0, 0) (2, 0) (1, 0.5) (1, 1) (1, 2) multiplicity p D T LHA width mass (1.2) Here, multiplicity is the hadron multiplicity within the jet, p D T was defined in Refs. [11,12], LHA refers to the "Les Houches Angularity" (named after the workshop venue where this study was initiated [1]), width is closely related to jet broadening [30][31][32], and mass is closely related to jet thrust [33]. To quantify discrimination performance, we focus on classifier separation (a default output of the TMVA package [34]): , (1.3) where p q (p g ) is the probability distribution for λ in a generated quark jet (gluon jet) sample. This and other potential performance metrics are discussed in Sec. 3.2.
To gain a baseline analytic understanding, we use resummed calculations in Sec. 4 to provide a first-order approximation for quark/gluon radiation patterns. For κ = 1, the generalized angularities are infrared and collinear (IRC) safe, and therefore calculable in (resummed) perturbation theory. At leading-logarithmic (LL) accuracy, the IRC-safe angularities satisfy a property called Casimir scaling, and the resulting classifier separation ∆ is a universal function of C A /C F , independent of the value of β. At present, the distributions for generalized angularities are known to next-to-leading-logarithmic (NLL) accuracy [13,14]. Here, we include the resummation of the leading-color nonglobal logarithms [35], though we neglect the resummation of pure jet-radius logarithms [36], and soft single-logarithmic corrections proportional to powers of the jet radius. These NLL calculations are effectively at parton-level, so to obtain hadron-level distributions, we estimate the impact of nonperturbative effects using shape functions [37,38].
To gain a more realistic understanding with a full hadronization model, we use partonshower generators in Sec. 5 to predict quark/gluon separation power. In an idealized setup with e + e − collisions, we can use the following processes as proxies for quark and gluon jets: "quark jets" : e + e − → (γ/Z) * → uu, (1.4) "gluon jets" : e + e − → h * → gg, (1.5) where h is the Higgs boson. These processes are physically distinguishable by the quantum numbers of the associated color-singlet production operator, giving a way to define truth-level quarks and gluons labels without reference to the final state. 1 We compare seven different parton-shower generators both before hadronization ("parton level") and after hadronization ("hadron level"): • Pythia 8.215 [39], • Herwig++ 2.7.1 [40,41], 2 • Sherpa 2.2.1 [45], • Vincia 2.001 [46], • Deductor 1.0.2 [47] (with hadronization performed by Pythia 8.212), 3 1 Of course, the quantum numbers of the color singlet operator are not measurable event by event. The idea here is to have a fundamental definition of "quark" and "gluon" that does not reference QCD partons directly. 2 We use the default angular-ordered shower for these studies. Subsequent to this paper, Ref. [42] performed a study to improve quark/gluon modeling in Herwig 7.1 [43,44]. 3 Note that this Deductor plus Pythia combination has not yet been tuned to data.
To test other generators, the analysis code used for this study is available as a Rivet routine [51], which can be downloaded from https://github.com/gsoyez/lh2015-qg. As we will see, the differences between these generators arise from physics at the interface between perturbative showering and nonperturbative fragmentation. One might think that the largest differences between generators would appear for IRC-unsafe observables like multiplicity and p D T , where nonperturbative hadronization plays an important role. Surprisingly, comparably-sized differences are also seen for the IRC-safe angularities, indicating that these generators have different behavior even at the level of the perturbative final-state shower. In Sec. 5.2, we study these differences as a function of the collision energy Q, the jet radius R, and the strong coupling constant α s , showing that the generators have somewhat different discrimination trends. In Sec. 5.3, we compare the default parton shower configurations to physically-motivated changes, showing that modest changes to the shower/hadronization parameters can give rather large differences in quark/gluon separation power.
At the end of the day, most of the disagreement between generators is due to gluon radiation patterns. This is not so surprising, since most of these generators have been tuned to reproduce distributions from e + e − colliders, and quark (but less so gluon) radiation patterns are highly constrained by event shape measurements at LEP [52][53][54][55]. In Sec. 6, we suggest a possible analysis strategy at the LHC to specifically constrain gluon radiation patterns. At a hadron collider, the distinction between quark jets and gluon jets is rather subtle, since radiation patterns depend on color connections between the measured final-state jets and the unmeasured initial-state partons. That said, we find that one can already learn a lot from hadron-level measurements, without trying to isolate "pure" quark or gluon samples. In particular, we advocate measuring the generalized angularities on quark/gluon enriched samples: "quark enriched" : pp → Z + jet, (1.6) "gluon enriched" : pp → dijets, (1.7) where "enriched" means that the Born-level process contributing to these channels is dominated by the corresponding jet flavor. By making judicious kinematic cuts, we could further flavor-enrich these samples [56], though we will not pursue that in this paper for simplicity. We present our final recommendations and conclusions in Sec. 7. The main take home message from this study is that, contrary to the standard lore, the e + e − measurements currently used for tuning are insufficient to constrain uncertainties in the final state shower. There are alternative e + e − measurements, however, that can play an important role in constraining gluon radiation patterns. Ultimately, gluon-enriched measurements at the LHC will be crucial to achieve robust quark/gluon discrimination.
What is a Quark Jet?  2 What is a quark/gluon jet?
As part of the 2015 Les Houches workshop on "Physics at TeV Colliders" [1], an attempt was made to define exactly what is meant by a "quark jet" or "gluon jet" (see Fig. 1). Here are some suggested options for defining a quark jet, in (approximate) order from most ill-defined to most well-defined. Related statement can be made for gluon jets.
A quark jet is...
• A quark parton. This definition (incorrectly) assumes that there is a one-to-one map between a jet and its initiating parton. Because it neglects the important role of additional radiation in determining the structure of a jet, we immediately dismiss this definition.
• A Born-level quark parton. This definition at least acknowledges the importance of radiative corrections to jet production, but it leaves open the question of how exactly to define the underlying Born-level process from an observed final state.
(For one answer valid at the parton level, see flavored jet algorithms below.) • An initiating quark parton in a final state parton shower. We suspect that this is the definition most LHC experimentalists have in mind. This definition assumes that the parton-shower history is meaningful, though, which may not be the case beyond the strongly-ordered or LL approximations. Because the parton shower is semi-classical, this definition neglects the impact of genuinely quantum radiative corrections as well as nonperturbative hadronization.
• A maximum-p T quark parton within a jet in a final state parton shower. This definition uses the hardest parton within the active jet area encountered at any stage of the shower evolution, including the initial hard scattering process. This "maxp T " prescription is a variant on the initiating parton prescription above (see further discussion in Ref. [57]). It differs from the initiating parton by a calculable amount in a LL shower [36] and is based on the same (naive) assumption that the parton-shower history is meaningful.
• An eikonal line with baryon number 1/3 and carrying triplet color charge. This is another semi-classical definition that attempts to use a well-defined limit of QCD to define quarks in terms of light-like Wilson lines. Philosophically, this is similar to the parton-shower picture, with a similar concern about how to extrapolate this definition away from the strict eikonal limit.
• A parton-level jet object that has been quark-tagged using an IRC-safe flavored jet algorithm. This is the strategy adopted in Ref. [58]. While this definition neglects the impact of hadronization, it does allow for the calculation of quark jet cross sections at all perturbative orders, including quantum corrections.
The unifying theme in the above definitions is that they try to identify a quark as an object unto itself, without reference to the specific final state of interest. However, it is well-known that a "quark" in one process may not look like a "quark" in other process, due to color correlations with the rest of the event, especially the initial state in pp collisions. The next definition attempts to deal with the process dependence in defining quarks.
• A quark operator appearing in a hard matrix element in the context of a factorization theorem. This is similar to the attitude taken in Ref. [56]. In the context of a well-defined cross section measurement, one can (sometimes) go to a limit of phase space where the hard production of short-distance quarks and gluons factorizes from the subsequent long-distance fragmentation. This yields a nice (gauge-covariant) operator definition of a quark jet, which can be made precise for observables based on jet grooming [59,60]. That said, even if a factorization theorem does exist for the measurement of interest, this definition is potentially ambiguous beyond leading power.
The definition we adopt for this study is inspired by the idea that one should think about quark/gluon tagging in the context of a specific measurement, regardless of whether the observable in question has a rigorous factorization theorem.
• A phase space region (as defined by an unambiguous hadronic fiducial cross section measurement) that yields an enriched sample of quarks (as interpreted by some suitable, though fundamentally ambiguous, criterion). Here, the goal is to tag a phase space region as being quark-like, rather than try to determine a truth definition of a quark. This definition has the advantage of being explicitly tied to hadronic final states and to the discriminant variables of interest. The main challenge with this definition is how to determine the criterion that corresponds to successful quark enrichment. For that, we have to rely to some degree on the other less well-defined notions of what a quark jet is.
To better understand this last definition, consider a quark/gluon discriminant λ. Since λ can be measured on any jet, one can unambiguously determine the cross section dσ/dλ for any jet sample of interest. But measuring λ does not directly lead to the probability that the jet is a quark jet, nor to the probability distribution p q (λ) for λ within a quark jet sample. Rather, the process of measuring λ must be followed by a separate process of interpreting how the value of λ should be used as part of an analysis.
For example, the user could choose that small λ jets should be tagged as "quark-like" while large λ jets should be tagged as "gluon-like". Alternatively, the user might combine λ with other discriminant variables as part of a more sophisticated classification scheme.
The key point is that one first measures hadron-level discriminant variables on a final state of interest, and only later does one interpret exactly what those discriminants accomplish (which could be different depending on the physics goals of a specific analysis). Typically, one might use a Born-level or eikonal analysis to define which regions of phase space should be associated with "quarks" or "gluons", but even if these phase space regions are based on naive or ambiguous logic, λ itself is a well-defined discriminant variable.
In Sec. 5, we will consider the generalized angularities λ κ β as our discriminant variables and we will assess the degree to which the measured values of λ κ β agree with a quark/gluon interpretation based on Born-level production modes. This is clearly an idealization, though one that makes some sense in the context of e + e − collisions, since truth-level "quark" and "gluon" labels can be defined by the Lorentz structure of the production vertex. In Sec. 6, we will recommend that the LHC experiments perform measurements of λ κ β in well-defined hadron-level final states, without necessarily attempting to determine separate p q (λ κ β ) and p g (λ κ β ) distributions. Eventually, one would want to use these hadron-level measurements to infer something about parton-level quark/gluon radiation patterns. Even without that interpretation step, though, direct measurements of dσ/dλ κ β would provide valuable information for parton-shower tuning. This in turn would help λ κ β become a more robust and powerful discriminant in searches for new physics beyond the standard model.

Generalized angularities
A wide variety of quark/gluon discriminants have been proposed (see Ref. [9] for an extensive catalog), but here we limit ourselves to a two-parameter family of generalized angularities . Two-parameter family of generalized angularities, adapted from Ref. [14]. The dots correspond to the five benchmark angularities used in this study, with "LHA" referring to the Les Houches Angularity. The horizontal line at κ = 1 corresponds to the IRC-safe angularities, e β = λ 1 β . [14], shown in Fig. 2. These are defined as (repeating Eq. (1.1) for convenience) where i runs over the jet constituents, z i ∈ [0, 1] is a momentum fraction, and θ i ∈ [0, 1] is a (normalized) angle to the jet axis. The parameters κ ≥ 0 and β ≥ 0 determine the momentum and angle weighting, respectively. For κ = 1, the generalized angularities are IRC safe and hence calculable in perturbation theory [29] (see also [28,[61][62][63][64]), and we will sometimes use the shorthand e β ≡ λ 1 β .

(3.2)
For general κ = 1, there are quasi-perturbative techniques based on generalized fragmentation functions [14] (see also [10,[65][66][67]). In our parton-shower studies, we determine λ κ β using all constituents of a jet, though one could also consider using charged-particle-only angularities to improve robustness to pileup (at the expense of losing some particle-level information).
For our e + e − study, we cluster jets with FastJet 3.2.1 [68,69] using the ee-variant of the anti-k t algorithm [70], with | p|-ordered winner-take-all recombination [29,71,72] to determine the jet axisn. Unlike standard E-scheme recombination [73], the winner-take-all scheme yields a jet axisn that does not necessarily align with the jet three-momentum p; this turns out to be a desirable feature for avoiding soft recoil effects [13,29,30,74,75]. We define where E J is the jet energy, E i is the particle energy, Ω in is the opening angle to the jet axis, and R is the jet radius (taken to be R = 0.6 by default, unless explicitly mentioned otherwise).
For our pp study, we use the standard pp version of anti-k t with E-scheme recombination, defining where p T i is the particle transverse momentum and R in is the rapidity-azimuth distance to the jet axis. To define a recoil-free axis, we recluster the jet using the Cambridge/Aachen (C/A) algorithm [76,77] with p T -ordered winner-take-all recombination. Note that with this choice of recombination scheme, the p T of the recoil-free axis is effectively the scalar sum j∈jet p T j used in Eq. (3.4). In addition to directly measuring the angularities, we also want to test the impact of jet grooming (see e.g. [78][79][80][81]). As one grooming example, we use the modified mass drop tagger (mMDT) with µ = 1 [78,82] (equivalently, soft drop declustering with β = 0 [83]). This grooming procedure starts from the C/A-reclustered jet, which yields an angular-ordered clustering tree. This tree is then declustered, removing the softer branch until where 1 and 2 label the two branches of a splitting. By applying the mMDT procedure, we can test how quark/gluon discrimination performance and robustness is affected by removing soft radiation from a jet. For concreteness, we always set z cut = 0.1 to match the studies in Refs. [83][84][85][86]. By adjusting κ and β in the angularities, one can probe different aspects of the jet fragmentation. We consider five benchmark values for (κ, β) indicated by the black dots in  Except for the LHA, these angularities (or their close cousins) have already been used in quark/gluon discrimination studies. The LHA has been included to have an IRC safe angularity that weights energies more heavily than angles, similar in spirit to the β = 0.2 value advocated in Ref. [13] for energy correlation functions. Most of the results in this paper are shown in terms of the LHA; results for the other four benchmark values are available in the source files for the arXiv preprint [87], where each figure in this paper corresponds to multipage file. For the IRC-safe case of κ = 1, there is an alternative version of the angularities based on energy correlation functions [13] (see also [75,88,89]), where equality holds in the extreme eikonal limit. 5 For the e + e − case, the pairwise angle θ ij is typically normalized to the jet radius as θ ij ≡ Ω ij /R. To avoid a proliferation of curves, we will not show any results for ecf β . We will also neglect quark/gluon discriminants that take into account azimuthal asymmetries within the jet, though observables like the covariance tensor [9] and 2-subjettiness [90][91][92] can improve quark/gluon discrimination. See Ref. [16] for a related study of quark/gluon systematics for shower deconstruction [93][94][95] and energy correlation functions [13].

Classifier separation
Since we will be testing many parton-shower variants, we need a way to quantify quark/gluon separation power in a robust way that can easily be summarized by a single number. For that purpose we use classifier separation (repeating and reorganizing Eq. (1.3) for convenience), , (3.8) where p q (p g ) is the probability distribution for the quark jet (gluon jet) sample as a function of the classifier λ. Here, ∆ = 0 corresponds to no discrimination power and ∆ = 1 corresponds to perfect discrimination power. A more common way to talk about discrimination power is in terms of receiver operating characteristic (ROC) curves, shown in Fig. 3a. At a point (q,g) on the ROC curve, where q, g ∈ [0, 1], one can define a selection that yields q efficiency for quarks and g mistag rate for gluons, or equivalently, a (1 − g) efficiency for gluons for a (1 − q) mistag rate for quarks. There are various ways to turn the ROC curve into a single number, and in the arXiv preprint source files [87], every figure for ∆ is part of a multipage file that also has results for Since we are more interested in understanding the relative performance between parton showers rather than the absolute performance, we will not show full ROC curves in this paper, though they can be easily derived from the p q and p g distributions. If one observable has an everywhere better ROC curve than another (i.e. it is Pareto optimal), then it will also have a larger ∆ value. The converse is not true, however, since depending on the desired working point, a "bad" discriminant as measured by ∆ might still be "good" by another metric. In that sense, ∆ contains less information than the full ROC curve. An alternative way to quantify discrimination power is through mutual information, which counts the number of "bits" of information gained from measuring a discriminant  and other information-theoretic quantities. The mutual information at f = 1/2 (i.e. I 1 2 ) is also available from the arXiv preprint source files [87].
variable (see Ref. [14]). Given a sample with quark fraction f ∈ [0, 1] and gluon fraction (1 − f ), the mutual information with the truth (a.k.a. the truth overlap) is where T = {q, g} is the set of truth labels, Λ = {λ} is the (continuous) set of discriminant values, and 13) The choice f = 1 2 was used in Ref. [14] and is also available in the arXiv preprint source files [87], though other f choices are plausible. Though we will not use mutual information in this study, it is amusing to note that the second derivative of I(T ; Λ) with respect to f is related to classifier separation as − log 2 4 More broadly, the dependence of I(T ; A) on f can be related to other concepts in statistics, as visualized in Fig. 3b. At f = 0 and f = 1, the mutual information itself is zero, but the derivatives are: The first derivative is sometimes called relative entropy and the second derivative is sometimes called discrimination significance. Unlike classifier separation, these later metrics do not treat quark and gluon distributions symmetrically. One advantage of ∆ over I(T ; Λ) is that the integrand in Eq. (3.8) is easier to interpret, since it tracks the fractional difference between the signal and background at a given value of λ. 6 Specifically, by plotting one can easily identify which regions of phase space contribute the most to quark/gluon discrimination. One can then ask whether or not the regions exhibiting the most separation power are under sufficient theoretical control, including both the size of perturbative uncertainties and the impact of nonperturbative corrections.

Analytic quark/gluon predictions
For the IRC-safe angularities with κ = 1 (namely e 0.5 , e 1 , and e 2 from Eq. (3.6)), we can use analytic calculations to get a baseline expectation for the degree of quark/gluon separation. At LL accuracy, a jet effectively consists of a single soft gluon emission from a hard quark/gluon, with a suitable Sudakov form factor coming from vetoing additional radiation. At this order, the strong coupling constant is fixed and only the leading splitting function is used. In particular, the IRC-safe angularities at LL order satisfy a property called Casimir scaling (reviewed below), such that the discrimination power is independent of β. At NLL order, the jet is described by multiple gluon emissions from a hard quark/gluon, including the effects of α s running and subleading terms in the splitting function, but neglecting matrix element corrections and energy-momentum conservation. For the IRC-safe angularities, the NLL-accurate distributions were calculated in Refs. [13,14] (see also [26,28,29,75,96]). For the generalized angularities, generalized fragmentation functions [10,14,[65][66][67] were used to extend the NLL calculation beyond the IRC-safe regime, though we will not use that technique in the present paper. To date, the impact of soft nonperturbative physics has not been included in the distributions for the IRC-safe angularities, but we do so below.

Casimir scaling at LL
As shown in Refs. [13,14], the IRC-safe angularities satisfy Casimir scaling at LL accuracy, which implies that the quark/gluon discrimination performance only depends on the color factor ratio C A /C F . To see this, let us first introduce the notation for the cumulative distribution Σ(λ), which is defined by If an observable satisfies Casimir scaling, then the quark and gluon cumulative distributions can be written as where r(λ) is a monotonically decreasing function of λ. Here, the only difference between the quark and gluon distributions is in the color factors C F = 4/3 versus C A = 3. At LL accuracy, the distributions for the IRC-safe angularities take precisely this form [13,14] where i labels the jet flavor. This LL result can be understood from the fact that quarks and gluons have the same leading splitting function up to an overall multiplicative color factor and therefore the Sudakov form factor (which is what appears in Eq. (4.3)) differs only by the color factor in the exponent. Observables that satisfy Casimir scaling have universal ROC curves [13] and universal truth overlaps [14], which are independent of the precise functional form of r(λ) and only depends on the ratio C A /C F . We can derive the same universality for classifier separation. Using and the change of variables u ≡ e −C F r(λ) , we have Strictly speaking, Eq. where 2 F 1 [a, b; c; z] is the hypergeometric function. 8 For the case of QCD with C A /C F = 9/4, We mark this benchmark value with an arrow on the subsequent plots for reference. 9 Going beyond LL accuracy, Casimir scaling is typically violated, and ∆ depends on the precise observable in question. Thus, all of the differences we see in our subsequent studies are effects that are truly higher-order or nonperturbative.

NLL resummation
Going to NLL accuracy is straightforward for global logarithms, using the formalism of Ref. [75]. Nonglobal logarithms have not been included in previous angularity calculations, but we include them here using their numerical extraction in the large N c limit [35]. We always assume that the jet radius R is order 1 so we can ignore log R resummation [36].
The cumulative distribution for an IRC safe angularity e β takes the form [35,75] Σ where R(e β ) is known as the radiator function, f NGL (e β ) encodes nonglobal logarithms, γ E is the Euler-Mascheroni constant, Γ is the gamma function, and primes indicate logarithmic derivatives: The e −R(e β ) factor in Eq. (4.10) is just the Sudakov form factor, which exhibits Casimir scaling at LL accuracy, and the prefactor containing R (e β ) captures the effect of multiple emissions on the e β distribution. The e −f NGL (e β ) factor comes from Eq. (18) of Ref. [35], where it is called S. Note that f NGL is proportional to C i C A , so it effectively preserves Casimir scaling; for this reason, the inclusion of nonglobal logarithms is not expected to have a large impact on quark/gluon separation power. For the IRC-safe angularities, the radiator function is [13,14] R(e β ) = C i 12) and the strong coupling is evaluated with two-loop running at the k t emission scale The reduced splitting functions (i.e. splitting functions summed over all allowed 1 → 2 processes) are where T R = 1/2 and we take the number of light quark flavors to be n f = 5. Following Ref. [14], both R(e β ) and R (e β ) are truncated to only keep terms that are formally of NLL accuracy.

Nonperturbative shape function
The quark/gluon studies in Refs. [13,14] used solely the distributions as calculated in Eq. (4.10) (with f NGL = 0). As we will see, nonperturbative hadronization has a big effect in our parton-shower studies, so we would like to include the corresponding effect in our analytic results. The IRC-safe angularities are additive observables, meaning that at leading power in the small e β limit, one can decompose them into separate contributions from perturbative and nonperturbative modes: One can then convolve the perturbative distribution forê β ≡ e (pert) β with a nonperturbative shape function F that describes the distribution of ≡ e (NP) β [37,38] (see also [97][98][99][100][101]), whereσ ≡ dσ/dê β refers to the perturbative result. The shape function prescription gives sensible results in the small e β limit, but it breaks down at large values of e β , since the convolution in Eq. (4.16) can yield e β values that extend beyond the physical range. To address this, we need to smoothly turn off the nonperturbative shift asê β approaches the physical endpoint e max β . There is no unique way to do this, but we find sensible results using , (4.17) which ensures that the cross section normalization is not modified even when the impact of the shape function is suppressed. For simplicity, we take e max β = 1 for all of our distributions, though in practice the perturbative distributions do not extend out that far.
The shape function F has to be extracted from data, but we can use simple parametrizations that account for some aspects of its known behavior: Both of these functions go linearly to zero at → 0, fall exponentially as → ∞, are normalized by d F ( ) = 1, and have an expectation value The parameter 0 can therefore be interpreted as the average shift of the perturbative distribution from nonperturbative effects. The second form in Eq. (4.18) was used in Ref. [102], but we take the first form as our default since it has a less pronounced high-side tail. Following Ref. [61] (see also [75,97]), one can estimate 0 as a function of β. Nonperturbative modes have k t Λ QCD , so Eq. (4.13) implies the relationship Appealing to local parton-hadron duality [103], we can estimate the average contribution to z θ β from nonperturbative soft gluon emissions as where we are using the soft (and collinear) gluon matrix element to determine the phase space integration, and Ω 0 and Ξ 0 are nonperturbative parameters that are both of order Λ QCD . This estimate of 0 can also be understood by considering two types of nonperturbative modes: These contribute to the angularities as We see that soft modes dominate for β > 1, collinear modes dominate for β < 1, and soft and collinear modes are equally important for β = 1. This behavior is indeed encoded in Eq. (4.21), which smoothly interpolates between these regimes, yielding a logarithmic structure of Ω 0 R E J log R E J Ξ 0 for β = 1 exactly. Comparing quarks and gluons, we expect the overall size of the nonperturbative shift Ω 0 should scale proportional to the Casimir factors, as in Eq. (4.4). The scaling of Ξ 0 is less clear, since it controls nonperturbative collinear radiation, which is less well studied than nonperturbative soft radiation. For our baseline distributions, we assume that Ξ 0 also obeys Casimir scaling: By tying Ω 0 and Ξ 0 together, this has the effect of reducing the phase space for nonperturbative emissions from gluons, which is particularly important for β < 1. Ideally, one would  want a more rigorous justification for the assumptions in Eq. (4.25) (as well as the convolution structure in Eq. (4.17)), though that is beyond the scope of the present work.
To test whether Eq. (4.21) is a plausible estimate for nonperturbative corrections, we take the parton-shower generators studied in the next section and study how the average value of e β shifts as hadronization is turned off and on, and use that to estimate 0 . We emphasize that a hadronization model used with a parton shower is not the same as a shape function in an analytic calculation, so one has to be careful drawing conclusions about the size of 0 from a study like this. In particular, effects that are captured by F ( ) in an analytic calculation could either be part of the perturbative showering or nonperturbative hadronization in a generator. That said, we expect that the scaling of the e β shift as a function of E J , R, and β should be roughly the same.
In Fig. 4, we show the size of the e β shift as a function of R E J for the three benchmark β values, where the band indicates the minimum and maximum shifts seen among Pythia, Herwig, Sherpa, Vincia, Deductor, Ariadne, and Dire. 10 We then compare to the expected shift from Eq. (4.21) with We also consider two alternative scaling behaviors for Ω 0 and Ξ 0 . The first alternative is motivated by the observation that, as far as the perturbative soft gluon matrix element is concerned, the Casimir factor affects the rate of soft gluon emissions but not the associated kinematics. Thus, one might expect the overall Ω 0 factor in Eq. (4.21) to respect Casimir scaling, but not the Ξ 0 factor inside the delta function. We therefore test a variant with where again these values are estimated by fitting to the parton shower e β shifts. As shown in Fig. 15a below, Eq. (4.27) leads to a dramatic increase in the predicted quark/gluon separation power for β < 1. The second alternative is motivated by the absence of any evidence of Casimir scaling in the parton showers from Fig. 4. We therefore try taking both nonperturbative parameters to be independent of C i , with No C i in 0 : which leads to a corresponding decrease in separation power, since the nonperturbative shape function now has the same behavior for quarks and gluons.

Idealized quark/gluon discrimination
We now turn to parton-shower studies of quark/gluon discrimination, starting with the idealized case of e + e − collisions. While far less complicated than quark/gluon tagging in the LHC environment, this e + e − case study demonstrates the importance of final-state evolution for quark/gluon discrimination, independent from initial-state complications arising in pp collisions. A Rivet routine [51] for this analysis can be downloaded from https: //github.com/gsoyez/lh2015-qg under MC_LHQG_EE.cc. To define the truth-level jet flavor, we use a simple definition: a quark jet is a jet produced by a parton-shower event generator in e + e − → (γ/Z) * → uū hard scattering, while a gluon jet is a jet produced in e + e − → h * → gg. Of course, an e + e − → uū event can become a e + e − → uūg event after one step of shower evolution, just as e + e − → gg can become e + e − → guū. This illustrates the inescapable ambiguity in defining jet flavor. 12 To partially mitigate the effect of wide-angle emissions, we restrict our analysis to jets that satisfy This conclusion is not simply an artifact of the fitting procedure, as none of the individual generators show evidence for Casimir scaling in the nonperturbative shift either. 12 In an e + e − context, our definition at least respects the Lorentz structure of the production vertex, so in that sense it is a fundamental definition that does not reference (ambiguous) quark or gluon partons directly.
where Q is the center-of-mass collision energy, allowing for up to two jets studied per event.
Note that this condition acts as a restriction on out-of-jet radiation, which already suppresses to some extent non-global effects [35]. 13 There is also the ambiguity of which parton shower to use, so we investigate quark/gluon radiation patterns in several event generators: Pythia (with cluster hadronization performed by Sherpa).

Baseline analysis
In Fig. 5, we show hadron-level distributions of the LHA (i.e. e 0.5 = λ 1 0.5 ) in the quark sample (p q ) and gluon sample (p g ), comparing the baseline settings of seven different parton-shower generators with a center-of-mass collision energy of Q = 200 GeV and jet radius R = 0.6. In the quark sample in Fig. 5a, there is relatively little variation between the generators, which is not surprising since most of these programs have been tuned to match LEP data (though LEP never measured the LHA itself). Turning to the gluon sample in Fig. 5b, we see somewhat larger variations between the generators; this is expected since there is no data to directly constrain e + e − → gg (though there are indirect tests from LEP; see Sec. 7). It is satisfying that for both the quark and gluon samples, the analytic NLL results from Sec. 4 peak at roughly the same locations as the parton showers. In the arXiv preprint source files [87], one can see comparable levels of agreement for the two other IRC-safe angularities (e 1 and e 2 ).
In Fig. 5c, we plot the integrand of classifier separation, d∆/dλ from Eq. (3.18). This shows where in the LHA phase space the actual discrimination power lies, with large values of the integrand corresponding to places where the quark and gluon distributions are most dissimilar. Now we see considerable differences between the generators, reproducing the well-known fact that Pythia is more optimistic about quark/gluon separation compared to Herwig [20]. The predicted discrimination power from the other five generators and the NLL calculation are intermediate between these extremes.
One might expect that the differences between generators are due simply to their having different hadronization models. It seems, however, that the differences already appear at the parton level prior to hadronization. We should say at the outset that it is nearly impossible to do a true apples-to-apples comparison of parton-level results, since these generators are interfaced to different hadronization models, and only the hadron-level comparison is physically meaningful. In particular, the crossover between the perturbative and nonperturbative regions is ambiguous and each of these showers has a different effective shower cutoff scale, resulting in different amounts of radiation being generated in the showering versus hadronization steps. 14 Similarly, for the parton-level NLL results, small values of the angularities are 13 Note that we have not included the effect of Eq. (5.1) in our analytic calculation, which in principle affects the functional form of fNGL for non-global logarithms.
14 In general, generators based on string hadronization tend to use a lower shower cutoff scale (∼ 0.5 GeV) compared to those based on cluster hadronization (∼ 1 GeV).    artificially suppressed by the α s → ∞ Landau pole, which enhances the Sudakov exponent. 15 With that caveat in mind, we show parton-level results in Fig. 6. One immediately notices that three of the generators-Herwig, Sherpa, and Deductor-yield a large population of events where the perturbative shower generates no emissions, even in the gluon sample. This gives λ 1 0.5 = 0 such that non-zero values of the LHA are generated only by the hadronization model. By contrast, Pythia and Vincia give overall larger values of the LHA from the perturbative shower alone, with Ariadne and Dire yielding intermediate results. As mentioned above, some of this difference can be explained simply by the different shower cutoff scales used in each generator, but it probably also reflects a difference in how semi-perturbative gluon splittings are treated. Since Fig. 5a shows that all generators give similar distributions for quark jets after hadronization, we conclude that understanding quark/gluon discrimination is a challenge at the interface between perturbative showering and nonperturbative hadronization.
To summarize the overall discrimination power, we integrate Eq. (3.18) to obtain the value of classifier separation ∆ for the LHA. This is shown in Fig. 7, which also includes the four other benchmark angularities from Eq. (3.6). There is a rather large spread in predicted discrimination power between the generators, especially at hadron level in Fig. 7a. 15 An alternative approach would be to freeze αs at scales below ΛQCD or extend it into the nonperturbative region as suggested in Refs. [104][105][106]. Either way, this region of phase space is dominated by the nonperturbative shape function, which is absent from the "parton-level" distributions. While such differences might be expected for IRC-unsafe angularities (multiplicity and p D T ) which depend on nonperturbative modeling, these differences persist even for the IRC-safe angularities at parton level (see Fig. 7b). 16 This suggests a more fundamental difference between the generators that is already present in the perturbative shower.
For the IRC-safe angularities with κ = 1, there is a generic trend seen by all of the hadron-level generators that discrimination power decreases as β increases. This trend agrees with the study performed in Ref. [13] and our NLL calculation here, but disagrees with the ATLAS study in Ref. [20], which found flat (or even increasing) discrimination power with increasing β. Understanding this β trend will therefore be crucial for understanding quark/gluon radiation patterns.

Parameter dependence
Given the large absolute differences in discrimination power seen above, we next want to check if the parton-shower generators exhibit similar or dissimilar trends as parameters are varied. We perform three parameter sweeps, using the boldface values below as defaults: where α s0 is the default value of the strong coupling, which is different between the generators (and sometimes different between different aspects of the same generator).
The resulting values of ∆ for the LHA are shown in Fig. 8, at both the hadron level and parton level. There are number of surprising features in these plots. Perhaps the most obvious (and seen already in Fig. 7) is that even for the IRC-safe angularities, the effect of hadronization is rather large, both on the absolute scale of discrimination and the trends. The main exception to this is Herwig, which does not exhibit as much of a shift from hadronization, though an effect is still present.
The next surprising feature is that the parton-level trends for sweeping α s do not necessarily correspond to those for sweeping Q and R. According to the perturbative NLL logic in Ref. [13], quark/gluon discrimination should depend on α s evaluated at the scale QR/2, with larger values of α s (QR/2) leading to improved discrimination power. This is indeed seen in the parton-level curves obtained from the analytic NLL calculation in Sec. 4, and parton-level Pythia, Herwig, Vincia, Ariadne, and Dire also show improved performance with larger α s . However, larger values of Q and R correspond to smaller values of α s , so the NLL logic would predict that increasing Q or R should lead to worse discrimination power. Instead, at parton-level, all of the generators show the opposite Q and R trend from the analytic NLL result.   One reason to expect quark/gluon discrimination to improve at higher energies is that the phase space available for shower evolution increases as Q increases. The scale µ of the shower splitting is µ 2 0 < µ 2 < Q 2 , where µ 0 = O(GeV) is the shower cutoff scale. With more range for shower evolution at higher Q, there is a greater possibility to see that a quark jet is different from a gluon jet. Similarly, larger values of R allow for more emissions within a jet, and from scaling symmetry, one expects that parton-level discrimination power should depend on the combination QR. 17 By contrast, the NLL logic says that quark/gluon discrimination should be dominated by the leading emission(s) in a jet, and since α s is smaller at higher values of QR, those leading emissions are more similar between quarks and gluons. Given these two different but equally plausible logics, both of which undoubtably play some role in the complete story, this motivates experimental tests of quark/gluon separation as a function of Q and R.
For many of the generators, going from parton-level to hadron-level reverses or flattens the Q and α s trends, though the R trends are more stable. For the NLL results, including the shape function from Sec. 4.3 leads to an overall improvement in discrimination power and a slight flattening of the Q and R trends, though the difference between parton-level and hadron-level is not nearly as dramatic as for the parton showers. This is further evidence that the boundary between perturbative and nonperturbative physics is ambiguous, and hadronlevel comparisons are the most meaningful.

Impact of generator settings
Formally, parton-shower generators are only accurate to modified leading-logarithmic (MLL) accuracy, though they include physically important effects like energy/momentum conservation and matrix element corrections that go beyond MLL. We can assess the impact of these higher-order effects by changing the baseline parameter settings in each parton-shower generator. We will also explore similar kinds of changes for the NLL analytic calculation.
Because each generator is different, we cannot always make the same changes for each generator. Similarly, the spread in discrimination power shown below should not be seen as representing the intrinsic uncertainties in the shower, since many of these changes we explore are not physically plausible. The goal of these plots is to demonstrate possible areas where small parameter changes could have a large impact on quark/gluon discrimination. Ultimately, collider data and higher-order calculations will be essential for understanding the origin of quark/gluon differences. In all cases, we show both hadron-level and parton-level results, even if a setting is only expected to have an impact at the hadron level.
Our Pythia baseline is based on the Monash 2013 tune, with parameters described in Ref. [108]. In Fig. 9, we consider the following Pythia variations:  • Pythia: no g → qq. While the dominant gluon splitting in the parton shower is g → gg, Pythia-and every other shower in this study-also generates the subleading g → qq splittings by default. This variation turns off g → qq, which makes gluon jets look more gluon-like, thereby increasing the separation power.
• Pythia: no ME. The first emission in Pythia is improved by applying a matrix element correction [109], but this variation turns those corrections off, showing the impact of non-singular terms. No matrix element correction is available for h * → gg, though, so the true impact of these corrections might be larger than the relatively small effect seen for this variation.
• Pythia: 2-loop α s . The default Pythia setting is to use 1-loop running for α s . This variation turns on 2-loop running for α s , which has a small (beneficial) effect at parton level which is washed out at hadron level.
• Pythia: CR1. Often, one thinks of color reconnection as being primarily important for hadron colliders, but even at a lepton collider, color reconnection will change the Lund strings used for hadronization. Compared to the baseline, this variation uses an alternative "SU(3)"-based color reconnection model [110] (i.e. ColourReconnection:mode = 1). No attempts were made to retune any of the other hadronization parameters (as would normally be mandated in a tuning context), so this change simply illustrates the effect of switching on this reconnection model with default parameters, leaving all other parameters unchanged. At parton level, this variation has no effect as expected. At hadron level, this variation considerably degrades quark/gluon separation compared to the baseline.  Figure 10. Same as Fig. 9, but for Herwig 2.7.1.
The most surprising Pythia effect is the large potential impact of the color reconnection model, which is also important for the Herwig generator described next. Our Herwig baseline uses version 2.7.1, with improved modeling of underlying event [111] and the most recent UE-EE-5-MRST tune [112], which is able to describe the double-parton scattering cross section [113] and underlying event data from √ s = 300 GeV to √ s = 7 TeV.
In Fig. 10, we consider the following Herwig variations: • Herwig: no g → qq. Turning off g → qq splittings in Herwig has the reverse behavior as seen in Pythia, leading to slightly worse discrimination power, though the effect is modest.
• Herwig: no CR. The variation turns off color reconnections in Herwig. This has no effect at parton level, as expected. At hadron level, this variation for Herwig gives a rather dramatic improvement in quark/gluon discrimination power. We think this arises since color reconnection in Herwig allows any color-anticolor pair to reconnect, even if they arose from an initially color octet configuration. By turning off color reconnection, the gluons look more octet-like, explaining the improvement seen.
The importance of color reconnections in Herwig is a big surprise from this study, motivating future detailed studies into which color reconnection models are most realistic when compared to data. In the future, we also plan to test the default angular-ordered Herwig shower against an alternative dipole shower [114]. Our Sherpa baseline uses matrix element corrections for the first two emissions (N jet = 2) with CKKW-style matching [115]. In Fig. 11, we consider the following Sherpa variations: • Sherpa: No g → qq. Turning off g → qq splittings in Sherpa has a negligible effect at parton level, but it leads to a large jump in discrimination power at hadron level, again  Figure 11. Same as Fig. 9, but for Sherpa 2.2.1.
due to an interplay between the perturbative shower and nonperturbative hadronization.
• Sherpa: N jet = 1. This variation only performs CKKW matching for the first emission, leading to negligible changes in the discrimination performance.
• Sherpa: N jet = 0. Turning off all matrix element corrections in Sherpa slightly decreases the predicted quark/gluon discrimination power, in agreement with the behavior of Pythia.
Within Sherpa, matrix element corrections appear to have a very small effect at parton level. The large changes seen at hadron level from turning off g → qq splittings motivates further investigations into the shower/hadronization interface. Our Vincia baseline uses the default setup for version 2.001 [46], which includes "smooth ordering" and LO matrix-element corrections [116] up to O(α 3 s ) for both e + e − → qq and e + e − → gg. The coupling α s is evaluated with 2-loop running defined by α s (M Z ) = 0.118 (reinterpreted according to the CMW scheme [117]) with µ R = 0.6p ⊥ as the renormalization scale for gluon emissions and µ R = 0.5m qq for g → qq branchings. In Fig. 12, we consider the following Vincia variations: • Vincia: no g → qq. This variation turns off g → qq, leading to the expected increase in separation power as seen in Pythia.
• Vincia: no ME. By default, each 2 → 3 antenna in Vincia has an associated matrix element correction factor. Since the antennae are already rather close to the true matrix elements, turning off these matrix elements has a modest effect on quark/gluon discrimination power.  Figure 12. Same as Fig. 9, but for Vincia 2.001.
• Vincia: 1-loop α s . This variation switches from 2-loop to 1-loop α s running, yielding a parton-level difference which goes in the same direction as the equivalent Pythia variation (note the baseline in Pythia is 1-loop running) and a modest hadron-level difference, again in agreement with the observation for Pythia.
• Vincia: alt Q E . By default, Vincia uses a transverse-momentum scale (the same as in Ariadne) as the evolution variable for gluon emissions. This variation instead uses a virtuality-like quantity. This changes the Sudakov factors to slightly enhance wideangle emissions over collinear ones (see e.g. [118]). The resulting increase in separation power is mainly due to increased activity in the H → gg shower.
Since Vincia and Pythia share the same hadronisation model and both have dipole-style showers, it is not surprising that they exhibit similar behaviors as parameters are changed. The biggest surprise is the significant change observed when using an alternative shower evolution variable ("alt Q E "), which persists at hadron level. Although this variation is theoretically disfavored (the default p ⊥ evolution variable has been shown to reproduce the logarithmic structure of the qq → qgq antenna function to second order in α s [119]), formal control of the ambiguity would depend on one-loop corrections. It would therefore be interesting to determine the extent to which multi-leg NLO merging techniques (such as UN-LOPS [120]) would reduce it, and/or whether second-order corrections to the shower kernels are required (for which only a proof of concept currently exists [121]). Our Deductor baseline uses leading color plus (LC+) showering, which includes some subleading color structures. We find that switching from LC+ to LC showering at parton level has a negligible impact on quark/gluon discrimination power. When Deductor interfaces with the default tune of Pythia 8.212 for hadronization, only leading color is used in the showering, such that partons with their LC color information can be directly passed to the  Figure 13. Same as Fig. 9, but for Ariadne 5.0.β.

Lund string model. No
Deductor variations are shown here, though it would be interesting to study the effect of g → qq splitting in future work. Our Ariadne baseline is based on a beta release of version 5. In Fig. 13, we consider the following Ariadne variation: • Ariadne: no g → qq. This variation turns off g → qq, leading to modest improvement in separation power, similar in magnitude to Herwig though in the opposite direction.
• Ariadne: no swing. Swing refers to color reconnections performed during the perturbative cascade, where dipoles in the same color state are allowed to reconnect in a way which prefers low-mass dipoles [48,122]. Turning off swing has an effect already at parton level, which is amplified at hadron level, leading to improved quark/gluon separation.
Like for Pythia and Herwig, color reconnections play a surprisingly important role in

Ariadne.
Our Dire baseline is based on the initial release, interfaced with Sherpa for cluster hadronization. In Fig. 14, we consider the following Dire variations: • Dire: no g → qq. This variation turns off g → qq, yielding an improvement in separation power at both the parton level and hadron level, intermediate between Ariadne and Vincia.
• Dire: MC@NLO. This variation uses MC@NLO [123] as implemented in Sherpa to provide a matrix element correction. The discrimination power slightly improves at both parton and hadron level, though not that much, since the Dire shower already is very close to capturing the matrix element for the first emission.  Figure 14. Same as Fig. 9, but for Dire 1.0.0.
• Dire: 1-loop α s . The default within Dire is to perform 2-loop α s running. This variation uses just 1-loop running, with a slight degradation of discrimination power.
• Dire: 3-loop α s . Using 3-loop running also degrades performance, but by a very small amount.
• Dire: string had. This variation uses Pythia for Lund string fragmentation, which only has an effect at hadron level. This leads to a modest improvement in discrimination power, suggesting that long-range color connections can play an important role in quark/gluon discrimination. Note that the shower cutoff scale is the same for cluster and string fragmentation in Dire.
Of the generators we tested, Dire is the only one that interfaces with two different hadronization routines, motivating further studies into the differences between cluster and string fragmentation. Finally in Fig. 15, we consider the analytic NLL calculation from Sec. 4. Here, we can only study the IRC-safe angularities with κ = 1.
• Analytic NLL: no g → qq. To turn off gluon splitting to quarks, we set n f = 0 in Eq. (4.14), without adjusting the running of α s . This effectively decreases the number of emissions from gluons, making them look more quark-like. The resulting decrease in discrimination power is the opposite of the behavior seen in the parton-shower generators (except Herwig), suggesting that at higher perturbative orders, the effect of g → qq will go beyond just changing the reduced splitting functions.
• Analytic NLL: no NGLs. Here, we set f NGL = 0 in Eq. (4.10). Since nonglobal logarithms obey Casimir scaling in the N c → ∞ limit, this is expected to have a mild impact on quark/gluon separation power, which is indeed the case.  Figure 15. Same as Fig. 9, but for the analytic NLL calculation from Sec. 4.
• Analytic NLL: 1-loop α s . The analytic NLL calculation uses 2-loop α s running by default. This option uses only 1-loop running, which has a relatively small (beneficial) impact.
• Analytic NLL: no C i in Ξ 0 . The default choice for the average nonperturbative shift 0 assumes Casimir scaling as in Eq. (4.26). This option uses instead the shift in Eq. (4.27), which only has Casimir scaling for Ω 0 and not for Ξ 0 . This makes a dramatic impact for β < 1 at hadron-level, since Ξ 0 dominantly controls the impact of nonperturbative collinear emissions. Specifically, the default 0 scales like C β i for β 1 whereas this option has linear scaling with C i , leading to increased discrimination power.
• Analytic NLL: no C i in 0 . This option uses the nonperturbative shift in Eq. (4.28), which is the same for quarks and gluons. As expected, this reduces the difference between quark and gluon jets at hadron-level, leading to a large reduction in discrimination power.
• Analytic NLL: alt F ( ). Here, we change the functional form of the shape function in Eq. (4.18) from F ( ) to F alt ( ), keeping the same value of 0 . Since F alt has a larger high-side tail, there is more overlap of the quark and gluon distributions, reducing somewhat the discrimination power.
The key lesson from this analytic study is that the form of the nonperturbative shape function has a large effect on quark/gluon discrimination power, especially the assumed dependence of 0 on the Casimir factor. So while higher-order perturbative calculations of quark/gluon radiation patterns are essential, quantitative control over nonperturbative physics will be required to make robust statements about the predicted discrimination power. 6 Quark/gluon tagging at the LHC It is clear from our e + e − study that quark/gluon radiation patterns face considerable theoretical uncertainties, as seen from the differing behaviors of parton-shower generators and from the importance of the shape function in the analytic calculation. This is true even accounting only for final-state physics effects, so additional initial-state complications can only increase the uncertainties faced in pp collisions at the LHC. Beyond just the application to quark/gluon tagging, this is an important challenge for any analysis that uses jets. For example, a proper experimental determination of jet energy scale corrections requires robust parton-shower tools that correctly model effects like out-of-cone radiation.
Eventually, one would like to perform improved analytic calculations to address these radiation pattern uncertainties. In the near term, though, measurements from the LHC will be essential for improving the parton-shower modeling of jets. In this section, we perform an example LHC analysis that highlights the kind of information one can gain about quark/gluon radiation patterns, despite the additional complications faced by hadronic collisions.

Defining enriched samples
As discussed in Sec. 2, there is no way to isolate pure samples of quark or gluon jets at the LHC, but one can isolate quark/gluon-enriched samples, as defined by the flavor label of the jet in the corresponding Born-level partonic process. As shown in Fig. 16, the Born-level jet in W/Z/γ + jet is more than 70% quark enriched over the entire jet p T range of interest. For jets softer than around 200 GeV, the Born-level jet in dijets or H + jet is more than 60% gluon enriched, with that fraction decreasing as the jet p T increases. More sophisticated enrichment procedures are described in Ref. [56].
In principle, one could try to "diagonalize" some combination of vector boson plus jet and dijet samples in order to define separate quark or gluon samples (see e.g. [20]). In the spirit of Sec. 2, though, we think it is more beneficial for the LHC experiments to perform process-specific measurements without trying to directly determine their quark and gluon composition. Instead of quark/gluon separation, here we ask the more well-defined question of whether one can tell "the jet in Z plus jets" (quark-enriched) apart from "the jet in dijets" (gluon-enriched). 18 In a similar spirit, one could test for differences within a single jet sample, such as comparing central jets versus forward jets in dijet production. This process-based strategy can help sidestep the known process dependance of defining quarks and gluons at the LHC, where color correlations have an important impact on observed jet radiation patterns.
For this study, we study proton-proton collisions at the 13 TeV LHC. We consider four different hadron-level generators-Pythia 8.215 [39], Herwig 2.7.1 [40,41], Sherpa 2.2.1 [45], and Vincia 2.001 [46]-using Z → µ + µ − plus jets as our quark-enriched sample and dijets as our gluon-enriched sample. All of these generators are used with their default settings, including underlying event modeling and hadronization. We set R = 0.4 as the default jet radius, with jets defined by the anti-k t algorithm, in keeping with current jet studies at the LHC, exploring other values in Sec. 6.3. Hadrons with rapidity |y| < 2.5 are used for jet clustering, and the resulting jets are restricted to have |y jet | < 1.5. We apply a minimum p T cut with default value p min T = 100 GeV, similar in spirit to the Q/2 value used in the e + e − study, though the precise meaning of p min T differs between the two samples. The specific analysis routines used for this pp study are available from https://github. com/gsoyez/lh2015-qg. For the Z plus jets analysis (with Rivet routine MC_LHQG_Zjet.cc), the selection criteria for the reconstructed Z boson and jet are: In addition, we apply a p T > 5 GeV cut on each muon. For the dijet analysis (with Rivet routine MC_LHQG_dijet.cc), our selection is based on the two hardest jets (labeled 1 and 2), both of which are used for analysis if they satisfy: pp → 2j ("gluon-enriched") : We study the same five benchmark angularities from Eq. (3.6), but we also test the impact of soft radiation removal using mMDT grooming (µ = 1 and z cut = 0.1), with the grooming condition given in Eq. (3.5). Prior to both the computation of λ κ β and the application of the mMDT procedure, the jet constituents are reclustered with the C/A algorithm, using the winner-take-all recombination scheme.

Baseline analysis
In Fig. 17, we show LHA distributions for the quark-enriched and gluon-enriched samples, using the default values p min T = 100 GeV and R = 0.4. For the quark-enriched Z plus jets 18 See, however, Ref. [124] for a machine-learning approach to handle mixed quark/gluon samples.
Z+j, Hadron-level, mMDT jet  Figure 18. Same as Fig. 17 but after mMDT jet grooming with µ = 1 and z cut = 0.1.   Figure 19. Classifier separation ∆ for the five benchmark angularities in our LHC study, determined from the various generators (a) using all jet constituents and (b) after mMDT grooming. Note that the plotted range is different from Fig. 7, reflecting the decreased discrimination power in the realistic pp case compared to the idealized e + e − study with pure quark/gluon samples.
sample, all of the generators except Herwig yield similar distributions, as expected from the e + e − study where the various generators broadly agreed on quark radiation patterns. For the gluon-enriched dijet sample, the difference between generators grows noticeably, yielding disagreements that are even larger than the e + e − study. This difference is apparent also in the d∆/dλ distribution, where Pythia and Vincia predict substantially larger separation power than Herwig, again in agreement with the e + e − study. Results from Sherpa appear to be intermediate between these extremes, though the integrated ∆ value turns out to be similar to Herwig (see Fig. 19a). Already from these raw distributions, we see that LHC jet shape measurements would help constrain parton-shower uncertainties, especially for gluon-enriched jets. We next turn to the impact of jet grooming. Often jet grooming is described as a strategy to mitigate jet contamination from pileup, underlying event, and initial-state radiation [78][79][80][81]. Even at the level of final-state radiation, though, grooming modifies the observed jet radiation patterns in ways that are interesting from the quark/gluon discrimination perspective [82,83]. The impact of grooming is shown for the LHA after mMDT in Fig. 18. In general, grooming pushes jet shapes to smaller values, since the effect of grooming is to remove soft peripheral radiation from a jet. If the parton showers differed primarily in their treatment of wide-angle soft radiation, then one would expect grooming to bring the distributions into closer agreement. Instead, we see that the generator differences persist even after grooming, suggesting that the parton showers differ already in their treatment of collinear radiation, despite using the same underlying collinear splitting kernels. This motivates LHC measurements of groomed jet shapes to better understand the description of collinear physics.
In Fig. 19, we plot the classifier separation ∆ for all five benchmark angularities, with and without jet grooming. Compared to the e + e − study, the overall degree of discrimination power is reduced, as expected because the Z + j and dijet processes do not yield pure quark/gluon samples. The spread between the generators is fairly large, with the expected trend that Pythia is more optimistic about quark/gluon separation than Herwig. We see that Vincia has somewhat smaller predicted separation power than Pythia. Though their raw distributions differ, the discrimination power in Sherpa is comparable to Herwig, with the ordering roughly flipped for unsafe versus safe observables.
One surprising outcome of this study is the relatively modest impact of grooming on discrimination power. From the calculations in Refs. [82,83], one generically expects quark/gluon discrimination power to degrade after jet grooming, since the soft radiation that is being removed carries information about the color structure of the jet. This predicted degradation, however, is only seen modestly here, possibly because soft correlations with the initial state already blurred the distributions in the ungroomed case. One advantage of working with groomed samples is that jet grooming reduces the process dependence in quark/gluon radiation patterns [59,60]. In this way, groomed angularities should yield a more robust theoretical definition for quark and gluon jets, with only a small performance penalty.

Parameter dependence
To test parameter dependence, we now consider five different minimum p T values and five different jet radii, with the boldface values corresponding to the defaults used above: 19 Minimum p T : p min  Fig. 20. In general, increasing p min T leads to a degradation of separation power, though this is due in large part to the change in sample composition shown in Fig. 16. In the context of these mixed samples, it is difficult to disentangle the impact of reduced quark/gluon enrichment at high p T with actual trends in discrimination power (cf. Fig. 8a from the e + e − study). That said, the trends are sufficiently different between generators that the relative differences cannot be ascribed to sample composition alone.
With respect to changing the jet radius R, the discrimination trends are noticeably different between the generators. In Pythia and Vincia, the discrimination power rises with increasing jet radius, whereas in Herwig, the discrimination power degrades with larger R; Sherpa has relatively little R dependence. The trends are rather similar before and after 19 As a technical note, in order to test all values of p min T in a single Monte Carlo run, we generate pT -weighted events in each generator.   Figure 20. Classifier separation ∆ for the LHA in our LHC study, sweeping p min T (top row) and jet radius R (bottom row). Results are shown with all jet constituents (left column) and after mMDT grooming (right column). jet grooming, again pointing to differences between the generators in collinear physics and not just soft physics. We conclude that varying p min T and R provides important information about quark/gluon radiation patterns that cannot captured by focusing on a single kinematic regime. We therefore encourage LHC measurements of jet shapes at multiple energy scales with multiple jet radii.

Summary and recommendations
By measuring the substructure of jets, one can gain valuable information about the relative quark/gluon composition of a jet sample. The challenge we have identified in this study is that the precise radiation patterns of quark and gluon jets are poorly understood, in the sense that parton-shower generators give rather different predictions for absolute quark/gluon discrimination power as well as relative trends as a function of the jet kinematics. From our analytic NLL studies including nonglobal logarithms and shape functions, we see that both perturbative and nonperturbative physics play an important role in determining jet shape distributions. That said, analytic calculations are not yet at the level of accuracy where they could directly guide the tuning of event generators. Therefore, LHC measurements are the best near-term strategy to constrain quark/gluon radiation patterns and enable quark/gluon discrimination to become a robust experimental tool.
Our five benchmark angularities probe both the perturbative and nonperturbative structure of jets, so we think they would be a good starting point for a more comprehensive quark/gluon jet shape analysis at the LHC. In this spirit, we are encouraged by the track multiplicity study of Ref. [25], though for parton-shower tuning is it is important to have measurements not only of jet shape averages but also of the full jet shape probability distributions. In terms of specific measurements that should be highest priority for ATLAS and CMS, our study has not revealed a silver bullet. Rather, all of the observables studied in this paper show similar levels of disagreement between generators, so a systematic LHC study of even one observable is likely to offer crucial new information.
What does seem to be essential is to make LHC measurements at multiple jet p T scales with multiple jet radii R in multiple different quark/gluon-enriched samples. Unfolded distributions would be the most useful for constraining parton-shower uncertainties, but even detector-level measurements compared to detector-simulated parton showers could help spot troubling trends. For the IRC-safe angularities in particular, studying the β dependence would help separate information about collinear and soft radiation patterns, especially given the fact that the β trends seen in the parton-shower generators here disagree with those seen in Ref. [20]. In addition, measurements of both groomed and ungroomed jet shapes could help disentangle collinear versus soft effects.
If possible, it would be interesting to study the LHA (β = 1/2) on archival LEP data, since this angularity probes the core of jets in a new way, distinct from broadening-like (β = 1) or thrust-like (β = 2) observables. Among the IRC-safe angularities studied here, the LHA has the best predicted discrimination power, making it (and other 0 < β < 1 angularities) a wellmotivated target for future lepton collider measurements. Similarly, it would be worthwhile to improve our analytic understanding of the LHA. From Fig. 5c, we see that the LHA has discrimination power both at small values of λ 1 0.5 (where nonperturbative corrections play an important role) as well as at larger values of λ 1 0.5 (where fixed-order corrections are important). Therefore, one must go beyond an NLL understanding to accurately describe the quark/gluon performance of the LHA.
The key lesson to parton-shower authors is that, contrary to some standard lore, existing LEP measurements used for tuning do not constrain all of the relevant aspects of the final state parton shower. While we have extensive information about quark-jet radiation patterns from LEP event shapes, gluon-jet radiation patterns are largely unconstrained. This has important implications for parton-shower tuning strategies, since LHC data can and should be used to adjust final-state shower parameters. For example, the ATLAS A14 tune of Pythia has a 10% lower value of α s in the final-state shower compared to the Monash tune, which yields better agreement with charged-particle multiplicity distributions [25]. However, A14 has not been tested on LEP event shapes, suggesting that a global tuning strategy is needed. In addition, it is worth mentioning that similar quark/gluon studies have been carried out in deep inelastic electron-proton scattering [125], which offer an intermediate step between pp and e + e − collisions, and this ep data could also be valuable for parton-shower tuning.
Interestingly, there are LEP measurements that do constrain gluon radiation patterns, as recently summarized by K. Hamacher in Ref. [126]. Unfortunately, these are not currently implemented in Rivet [51] and, to our knowledge, are not used in any present-day partonshower tuning strategy. The cleanest LEP studies focused on Z → bbg events, i.e. 3-jet events with two heavy-flavor tagged jets [127,128]. By applying appropriate event-selection cuts, these studies identified "symmetric events" where the gluon was relatively isolated [129][130][131][132]; this strategy was extended to more general 3-jet topologies using Lorentz-invariant p ⊥ scales [133]. In the rare case that the two tagged jets appeared in the same hemisphere of an event, the opposite hemisphere could be used to define an inclusive sample of gluon-like jets [134][135][136][137]. With a relatively pure gluon-jet sample, one could then study various aspects of gluon-jet fragmentation, including hadron multiplicity, single-hadron energy fractions, and y-splitting scales. It should be noted, however, that in at least some of the above analyses, the corrections to hadron level made use of Monte Carlo truth information to correct not only for photon initial-state radiation and detector effects but also for impurities in the gluon-jet selection. We therefore encourage efforts to determine the extent to which Rivet implementations of these measurements are practicable, and to begin that process if the corrections are deemed to be sufficiently model independent. This would enable a broader suite of LEP measurements to be included in the next round of parton-shower tunes and in global comparisons such as MCplots [138].
In a similar spirit, a future high-luminosity lepton collider would allow for measurements of the above processes with a high precision (see e.g. [126]). At sufficiently high collision energies, one can also measure other interesting processes, such as associated Higgs production with the Higgs boson decaying to bottom quarks or gluons. Such measurements would provide an invaluable source of data in the context of quark/gluon discrimination and, more generically, for parton-shower tuning.
Based on this study, we have identified three aspects of the final-state parton shower that deserve closer scrutiny.
• Gluon splitting to quarks. Some of the largest differences between generators came from turning on and off the g → qq splitting process. While Pythia, Sherpa, Vincia, Ariadne and Dire suggest that (unphysically) turning off g → qq would improve quark/gluon separation, Herwig (and the analytic calculation from Sec. 4) suggests the opposite conclusion. Beyond quark/gluon discrimination, it would be helpful to identify other contexts where g → qq might play an important role (see e.g. [139]).
• Color reconnection in the final state. Color reconnection is often thought of as an issue mainly at hadron colliders, but we have seen that it can have an impact in e + e − collisions as well. This is particularly the case with the default color reconnection model in Herwig, since it allows the reconnection of color/anticolor lines even if they originally came from an octet configuration. We also saw large changes from "Pythia: CR1" and "Ariadne: no swing", suggesting that one should revisit color reconnection physics when tuning generators to LEP data.
• Reconsidering α s defaults: In the context of parton-shower tuning, the value of α s used internally within a code need not match the world average value, since higher-order effects not captured by the shower can often be mimicked by adjusting α s . That said, one has to be careful whether a value of α s tuned for one process is really appropriate for another. For example, Pythia uses a relatively large value of α s in its final-state shower, which allows it to match LEP event shape data. The same value of α s , though, probably also leads to too much radiation within gluon jets.
Finally, we want to emphasize that despite the uncertainties currently present in partonshower generators [140][141][142][143], parton showers in particular (and QCD resummation techniques more generally) will be essential for understanding quark/gluon discrimination. Fixed-order QCD calculations cannot reliably probe the very soft and very collinear structure of jets, which is precisely where valuable information about quark/gluon radiation patterns reside. Given the ubiquity and value of parton-shower generators, improving the understanding of quark/gluon discrimination will assist every jet study at the LHC.