A Theory of Quark vs. Gluon Discrimination

Understanding jets initiated by quarks and gluons is of fundamental importance in collider physics. Efficient and robust techniques for quark versus gluon jet discrimination have consequences for new physics searches, precision $\alpha_s$ studies, parton distribution function extractions, and many other applications. Numerous machine learning analyses have attacked the problem, demonstrating that good performance can be obtained but generally not providing an understanding for what properties of the jets are responsible for that separation power. In this paper, we provide an extensive and detailed analysis of quark versus gluon discrimination from first-principles theoretical calculations. Working in the strongly-ordered soft and collinear limits, we calculate probability distributions for fixed $N$-body kinematics within jets with up through three resolved emissions (${\cal O}(\alpha_s^3)$). This enables explicit calculation of quantities central to machine learning such as the likelihood ratio, the area under the receiver operating characteristic curve, and reducibility factors within a well-defined approximation scheme. Further, we relate the existence of a consistent power counting procedure for discrimination to ideas for operational flavor definitions, and we use this relationship to construct a power counting for quark versus gluon discrimination as an expansion in $e^{C_F-C_A}\ll1$, the exponential of the fundamental and adjoint Casimirs. Our calculations provide insight into the discrimination performance of particle multiplicity and show how observables sensitive to all emissions in a jet are optimal. We compare our predictions to the performance of individual observables and neural networks with parton shower event generators, validating that our predictions describe the features identified by machine learning.


Introduction
High energy quarks and gluons fragment and hadronize into jets of particles through quantum chromodynamics (QCD). Identifying light jets as arising from quarks or gluons is a fundamental challenge for collider physics at the Large Hadron Collider (LHC). Many efforts have proposed new observables [1][2][3][4][5][6][7][8][9][10][11][12] or jet flavor definitions [13][14][15][16][17], completed theoretical calculations [18][19][20][21], and used machine learning methods [22][23][24][25][26] to push the boundaries of the discrimination power between quark and gluon jets. While these studies have led to steady improvements over time, they have been done with no clear organizing principle or agreed-upon "best" discrimination strategy. Further, while machine learning methods have demonstrated the greatest discrimination power, no clear physical reason for that performance has been presented. It is therefore desirable to construct a general theory of quark versus gluon discrimination which both explains and provides robust understanding of the discrimination power.
Previous studies have made progress in this direction. For example, Ref. [8] was the first broad study of the quark versus gluon discrimination power of a large number of jet observables in simulation, including identifying those pairs of observables that improved discrimination power the most. Ref. [18] introduced mutual information as a metric for useful discrimination information in distributions, applying it to pairs of generalized angularities [27][28][29] measured on jets. Resummed predictions of mutual information were performed and compared to simulation which concretely enabled identification of features that are both under theoretic control and well-described by simulation. Nevertheless, this study was limited to observables that are first non-zero for jets with two particles in them. In Ref. [10], an infrared and collinear (IRC) safe definition of multiplicity was introduced, based on a generalization of the soft drop grooming algorithm [30]. This observable, called soft drop multiplicity n SD , counts the number of relatively hard, angular-ordered emissions off of the hard jet core. At leading-logarithmic accuracy, it can be proven that n SD is the optimal quark versus gluon discriminant, on the phase space of particles directly emitted off of the hard initiating particle of the jet. However, this is not a proof that n SD is the optimal observable for quark versus gluon discrimination in general, because there are regions of phase space in which emissions live that may improve discrimination power, but to which n SD does not have access.
In this paper, we present a first systematic theoretical analysis of quark versus gluon discrimination. Working in the strongly-ordered soft and collinear limits, we explicitly calculate the resummed probability distribution of multiple infrared and collinear safe observables on a jet. These multiple observables enable a characterization of the emission phase space and evaluation of the optimal observable for discrimination. We calculate the energy distributions for quark and gluon jets with up to three resolved emissions, though nothing prohibits continuing to arbitrary numbers of emissions. Our approximations enable simple, recursive evaluation of the probability distribution as a product of conditional probability distributions. Though simple, these calculations are sufficient to validate predictions and make several concrete conjectures regarding quark versus gluon discrimination to all-orders. Our first step in developing a theory of quark versus gluon discrimination is to establish a robust power counting scheme that can be used to construct individual observables, strictly from general statements about the singular limits of QCD. Power counting rules for observables useful for discriminating multi-prong substructure in jets has been extensively developed [31][32][33]. An observable parametrically separates jet categories if power counting identifies arbitrarily pure samples at the phase space boundaries, enabling an unambiguous definition in a singular limit. By a pure sample we mean that a formal region of phase space, however small, exclusively consists of one type or category of jet. For binary classification, the existence of such pure phase space regions provides a robust definition of the jet categories, which is referred to as "mutual irreducibility" [16] of the samples being discriminated. The complementary ideas of power counting and mutual irreducibility are powerful tools we exploit to identify pure phase space regions and quantify potential discrimination power.
The precise notion of mutual irreducibility is relatively new in particle physics, but the requirement that pure phase space regions are necessary to unambiguously define jet categories is well-understood. Throughout this paper, we refer to "signal" and "background" jets in an idealized sense, assuming that we have perfect knowledge of the jet categories. Then, on a restricted space of measurements on those jets, we study the possible discrimination power accessible by those measurements. Thus, even if two jet samples are not mutually irreducible on some restricted observable phase space, we are still able to use our perfect knowledge to study their separation. This notion is widely used in discrimination studies in jet physics, though often not explicitly stated. For jet samples that are not able to be purified on phase space, a so-called reducibility factor κ is defined as the accessible purity of signal or background phase space regions. Further, reducibility factors are just the limiting values of the likelihood ratio and, as we will show, they quantify parametric discrimination power.
As a first familiar example, we demonstrate mutual irreducibility for a problem in which power counting is well-understood: in the context of QCD jet versus hadronically-decaying, boosted Z boson discrimination. Power counting for quark and gluon jets is intrinsically more difficult because, as we demonstrate on any phase space with finitely-many resolved emissions, quark and gluon jets are not strictly mutually irreducible. Thus a power counting scheme does not currently exist to identify robust phase space boundaries between both quark-pure and gluon-pure regions. Nevertheless, because the rates of particle emission from quarks and gluons are controlled by the color Casimirs of the fundamental and adjoint representations with C F = 4/3 < C A = 3, gluon jets exhibit greater Sudakov suppression near the singular phase space boundaries, and so one can define a quark-pure region of phase space. This motivates using the power counting parameter e C F −C A 0.189 to identify such a phase space region, which we formally take to be parametrically smaller than 1. A gluon-rich phase space region is then one for which Sudakov factors are irrelevant and approximately unity.
With explicit, analytic expressions for multi-differential cross sections measured on quark and gluon jets, we are able to calculate any of the quantities familiar from statistics and machine learning, but within the context of a well-defined approximation scheme, with no black boxes. By the Neyman-Pearson lemma [34], the optimal binary discrimination observable formed from the measurement of some collection of observables is the likelihood ratio. This is simply the ratio of the corresponding probability distributions for quark and gluon jets, and will provide a benchmark when comparing to other observables. The likelihood ratio is in general some complicated function of the phase space variables that does not enable a simple determination of the receiver operating characteristic (ROC) or signal versus background efficiency curve. Nevertheless, the discrimination power of the likelihood ratio, or any observable, can be quantified by the area under the ROC curve (AUC). We use a ROC convention where AUC = 0 is perfect performance and a random classifier has AUC = 1 2 . The AUC can be calculated directly from an ordered integral of the product of quark and gluon probability distributions. This also enables a variational approach to construct discrimination observables, whose parameters are chosen to minimize the AUC.
Our results enable a number of statements that we prove at this accuracy including: • Due to Sudakov suppression and since C F < C A , the reducibility factor for quark jets is κ q = 0 for the measurement of any number of resolved emissions in the jets. Pure quark jet phase space regions can essentially always be defined.
• For jets on which measured observables resolve n emissions, the reducibility factor for gluon jets κ g is (1.1) A fully pure gluon jet phase space region can therefore only be exactly defined if all emissions are resolved. The gluon-rich region of phase space is where Sudakov factors are irrelevant, and so is well-described at fixed-order. This particular scaling comes from diagrams in which all particles in the jet are emitted off of the initiating hard particle, ensuring maximum sensitivity to the color Casimirs C F and C A .
• There is an upper limit on the quark vs. gluon discrimination performance with n resolved emissions of AUC ≥ κ q + κ g − 2κ q κ g 2 − 2κ q κ g = 1 2 at this accuracy, with even stronger bounds for specific observables. This bound follows from monotonicity of the ROC and its first derivative, and so the reducibility factors define a quadrilateral whose area is necessarily no larger than the AUC. Analogous bounds on other measures of classification performance can also be derived.
We also are able to make a number of well-motivated conjectures that follow from our explicit calculations including: • The reducibility factor of gluon jets does not improve by resolving the full 3n − 4 dimensional phase space for a jet with n constituents. One only needs to measure n − 1 observables to resolve the existence of each emission off of the initiating gluon.
• Multiplicity is a powerful discrimination observable because it is sensitive to every emission in the jet. Because C F /C A 0.444, the gluon reducibility factor of multiplicity quickly converges to 0 as the number of particles in the jet increases.
• The discrimination power of a single observable τ n that is sensitive to n emissions in a jet, such that its value is 0 if the jet has fewer than n emissions, is bounded by multiplicity. The performance of τ n increases with n for small n, and degrades when n is comparable to the total number of particles in the jets. An optimal value of n occurs when n is comparable to the minimal number of constituents of gluon jets.
• Unlike the case for discrimination of jets with different multi-prong substructure, the likelihood for quark vs. gluon discrimination is an IRC-safe observable. By the established power counting, the most singular region of phase space is necessarily pure quark jet, and so contours of constant likelihood should be parallel to this boundary. Therefore, the singular region of phase space is mapped to a unique value of the likelihood. This means that the distribution of the likelihood ratio can be calculated in fixed-order perturbation theory.
We perform an analysis of quark versus gluon discrimination in a Monte Carlo parton shower to validate that these results describe the physics in simulation. This paper is organized as follows. In Sec. 2, we establish the observables that we measure on jets and clearly lay out our approximations. While this is not a precision QCD study, our approximations become increasingly accurate as the jet energy increases. Sec. 3 reviews and relates concepts from power counting and mutual irreducibility, outlining our general conceptual and mathematical approach. In Sec. 4, we construct the rules for power counting on the observable phase space for quark versus gluon jet discrimination. Several results then immediately follow from these rules, which we validate in later sections. Secs. 5, 6, and 7 contain our explicit calculations for jets on which one, two, or three emissions are resolved, respectively. For concreteness, we focus our calculations on N -subjettiness [35][36][37] observables, though to our accuracy identical results follow for other observables, such as (generalized) energy correlation functions [33,[38][39][40][41]. We are able to construct an IRC safe definition of multiplicity that depends on a resolution parameter Λ 0 > 0. Sec. 8 is devoted to calculations of the distribution of this multiplicity observable and developing an understanding of the "true" multiplicity limit for Λ 0 → 0. Simulated events are analyzed in Sec. 9, in which we both test our predictions and verify that simulation describes physics as expected. For high dimensional phase space, we utilize machine learning techniques to approximate the likelihood and related discrimination observables in simulation. We conclude in Sec. 10 and look forward to further advancements in probing and defining quark and gluon jets. An appendix applies reducibility ideas to the problem of up vs. down quark jet discrimination.

Approximations and Observables
We work to leading-logarithmic accuracy in the strongly-ordered soft and collinear limits of QCD with fixed coupling. This means that we will successfully resum all double logarithms, terms in the fixed-order cross section that scale as α n s log 2n O, of the observables O that we measure on our quark and gluon jets. While this approximation clearly has its limitations, it does enable explicit, analytic formulas for all of the cross sections that we present in this paper. Further, Sudakov factors in the double logarithmic limit can be easily calculated from the areas of emission veto regions in the Lund plane [42]. We briefly present results for calculations beyond this accuracy from elsewhere in the literature in Sec. 5.
At double logarithmic order, the hard, initiating parton defines the jet flavor and so there is no ambiguity in the definition of quark and gluon jets. The subtleties in defining a jet flavor beyond this accuracy have been addressed by the community in a review article [15] and it remains an active research direction, with recent efforts to define quark and gluon jets based directly upon mutual irreducibility ideas [16,17]. We also do not include non-perturbative physics due to hadronization, for example, which would be needed for precision comparison to data. For IRC safe observables, the effects of non-perturbative physics is suppressed by Λ QCD /Q where Q is some characteristic high energy scale (∼ 1 TeV), so our calculations will have an increasingly large domain of applicability at higher energies. Nevertheless, at any finite Q, there is always some region of phase space dominated by non-perturbative physics.
Given the double-logarithmic approximation, in this paper we choose to analyze sets of N -subjettiness observables measured on our jets. N -subjettiness observables vanish for configurations of n < N particles, and hence they probe the degree to which a jet can be described by N -subjets. The definition of N -subjettiness τ (β) N that we use when measured on jets at a hadron collider is Here, p T J is the transverse momentum of the jet with respect to the colliding beam axis, R 0 is the jet radius, the sum runs over all particles i in the jet J, p T i is the transverse momentum of particle i, and R iK is the distance in the rapidity-azimuth plane from particle i to subjet axis K in the jet. Specifically, R iK is in terms of the respective rapidity y and azimuthal angle φ of the particle and axis about the colliding beam axis. The N -subjettiness observables are IRC safe with the angular exponent β > 0 and to our approximation, any recoil-free axis definition suffices for our calculations of the discrimination power. In fact, our calculations hold even with recoil for choices of β where the same emission dominates both the axis position and the value of the observable. However, we will have to make an explicit choice of axes in our simulation, which we will discuss in Sec. 9.
N -subjettiness observables are nice for calculation both because they are IRC safe, and so are calculable at fixed-order in perturbation theory, and additive, and so can be resummed to double logarithmic accuracy simply. Measuring a sufficient number of these observables can be used to completely specify the 3M − 4 dimensional phase space of a jet with M particles [43]. N -subjettiness is not unique in these points, but the linear computational complexity in the number of particles (after determining axes) means that calculating τ (β) N for large N (N 5) is not computationally prohibitive within simulation.
Further, to the accuracy of our calculations, the angular exponent β does not affect the discrimination power of the N -subjettiness observables that we measure on the jets. Effectively, to double logarithmic accuracy, the angular exponent can be absorbed into a redefinition of the coupling α s → α s /β, which is the same for quark and gluon jets. Therefore, we typically will simply fix β = 1 in our calculations so that N -subjettiness measures the total momentum that is transverse to the N subjet axes in the jet. For compactness, we denote τ (β=1) N ≡ τ N throughout this article. However, in Sec. 5, we will discuss the effects of measuring two 1-subjettiness observables on jets and higher-order effects of the angular exponent, in which we maintain explicit β dependence in the observable definition.
An observable which counts the number of resolvable, angular-ordered emissions off of a hard core was introduced in Ref. [10], referred to as soft drop multiplicity n SD . There, it was argued that n SD is the optimal quark vs. gluon discriminator at leading-logarithmic accuracy for observables on the phase space of those particles emitted directly off of the hard core of the jet. For such emissions, the rate of emission is controlled by the appropriate color Casimir and the kinematic distribution of the emissions is identical between quarks and gluons. Thus, all discrimination information is contained in simply counting the emissions, with the kinematics adding no discrimination power. In this paper, we will consider the more general case of jets with relevant emissions off of emitted particles. In this more general case, the quark and gluon kinematic distributions on phase space are no longer equal, because there are different weights on the phase space regions in which such secondary emissions could live, depending on the quark or gluon color Casimirs. We explicitly demonstrate that there is discrimination information in kinematic distributions, beyond just counting emissions.

Power Counting and Mutual Irreducibility
Using power counting to identify optimal observables for classification [31][32][33] is a conceptual framework that has led to new jet substructure observables which have successfully been applied to analyses at the LHC [44][45][46][47]. The key idea is to identify regions of phase space that parametrically separate signal and background. An observable is then optimal in this framework if it separates the signal-dominated and background-dominated regions of phase space. A robust power counting on a phase space of jet observables requires that the boundaries of that phase space define pure regions of the underlying categories. That is, for power counting of discrimination observables as studied in earlier work, this implicitly requires that the two discriminated samples are "mutually irreducible". Mutual (ir)reducibility was first introduced in a collider physics context to statistically disentangle or define different types of jets from mixed samples [16,17]. Signal and background categories are said to be mutually irreducible if there exist pure phase space regions, however small, for each of the categories. Further, the degree to which two categories are mutually irreducible can be sharply quantified in terms of their reducibility factors: where O is an observable or set of observables that define some phase space. Here, we will use the language and mathematical machinery of mutual (ir)reducibility for a new purpose: as a technique to quantify the parametric separability of two calculated distributions. The central importance of pure phase space regions is shared with power counting strategies. In particular, these ideas will allow us to quantify the power counting ideas in a new way and apply them to quark versus gluon jet classification. While previous studies have conjectured that quark vs. gluon discrimination did not admit a power counting [31], later efforts have identified requirements on observables to go beyond the leading-order C A /C F separation [33]. Our definition of power counting for quarks and gluons here will be much more general than previous considerations and enable analysis of arbitrary multi-differential probability distributions.

Theoretically Bounding Classification Performance
While we will have our quark vs. gluon case in mind for the following discussion, we keep the signal vs. background terminology general in order to highlight the broad applicability of this reasoning. The signal and background reducibility factors are related to the derivatives of the ROC curve near its endpoints. Note that the ROC curve is the background cumulative distribution evaluated at the inverse of the signal cumulative distribution: for signal efficiency x. The derivative of the ROC curve is then which is precisely the signal-background likelihood ratio for the observable value O(x) giving rise to signal efficiency x. The emergence of the likelihood ratio as centrally relevant highlights Signal Fraction Background Fraction Figure 1: An illustration of the bound on the ROC curve and its AUC from extrapolating the reducibility factor slopes κ S and 1/κ B from the endpoints. The ROC is monotonic and concave up and so the gray quadrilateral is always completely contained underneath the full ROC curve, yielding the bound. the close relationship between mutual (ir)reducibility, power counting ideas, and optimal classification. This relationship between reducibility factors and the ROC curve can be exploted further: we now prove a strict lower bound on the ROC curve and its AUC from the reducibility factors. The ROC curve can be taken to be strictly monotonic with a positive first derivative because the value of the ROC curve between any two points can (at worst) be a random weighting of the values at those points. The reducibility factors are then the slope (or its inverse) of the ROC curve at the appropriate endpoints. Therefore, we can bound the area under the ROC curve by a quadrilateral, of which the angle of two of the vertices are set by the values of κ S and κ B . An illustration of this quadrilateral for a general ROC curve is shown in Fig. 1. Its area is straightforward to compute, yielding the bound This bound only vanishes when κ S = κ B = 0, namely when the categories are mutually irreducible. Thus when pure phase space regions do not exist, an intrinsic ceiling on classification performance at that accuracy can instead be obtained. Further, as we shall show in later sections, the reducibility factors tend to isolate the dominant phase space regions and are thus typically significantly simpler to calculate than the full distributions of the the phase space observables.
The quadrilateral in Fig. 1 provides a bound to the overall signal vs. background ROC curve. Hence any measure of the classification performance can be bounded through the reducibility factors in this way, not solely the AUC. To highlight this fact, we also derive a bound on another common measure of classification performance: the (inverse) background mistag rate 1/ε B at a specified signal efficiencyε S . Computing this bound, we find (3.5) demonstrating again the relationship between parametric discrimination power and phase space purity, quantified through the reducibility factors.

Z Boson vs. QCD Jets
In this section, we calculate the reducibility factors for a discrimination problem in which a robust power-counting scheme has been defined and used [31]. Specifically, we study the discrimination of two-prong quark jets from hadronically-decaying boosted Z bosons. This will provide us with a concrete case study to explore the relationship between power counting optimality and mutual irreducibility in a known context before moving on to discuss quark vs. gluon discrimination. The calculations that follow were also presented in Ref. [48]. Unlike quark or gluon jets, Z bosons are massive, which fixes a relationship between the energies of the Z decay products and their opening angle. Because the Z boson has a fixed mass, we consider measuring N -subjettiness observables with angular exponent β = 2, which (approximately) corresponds to the ratio of mass to jet energy squared. In particular, there is no soft singularity for the decay products of the Z boson, so in the large boost limit, 1-subjettiness measured on the Z boson is simply where z is the energy fraction of one of the quark decay products of the Z boson and θ is the angle between decay products. For unpolarized Z bosons because there is no soft singularity, to leading power, the distribution of the energy fraction z is uniform on z ∈ [0, 1]. To calculate the cross section of τ 2 given this value of τ 1 , we consider the emission of a soft and collinear gluon off of either decay product of the Z boson and find: where we have neglected subleading terms in τ The corresponding conditional cross section for quark jets will be calculated in Secs. 5 and 6, and we will state them here to complete our argument. We find To calculate the quark reducibility factor, we would in principle need the complete, normalized probability distributions for both quark and Z boson jets. However, these fixed order cross sections are sufficient, without the inclusion of exponential Sudakov factors, because the reducibility factor vanishes. In the limit that τ 1 , the quark reducibility factor can be found from taking the ratio of these two cross sections: The identified purifying phase space region of τ 1 suggests using an observable such as τ 1 as a parametrically optimal classifier. This has long been studied and identified from power counting arguments as the combination of N -subjettiness observables most sensitive to two-prong substructure, so it is pleasing to observe that reducibility arguments readily produce the same result.
To determine the Z boson reducibility factor directly, we would need to include the appropriate Sudakov form factors for both quark and Z boson jets. The prediction of the resummed conditional probability for quark jets can be extracted from our later results in Secs. 5 and 6 and Z bosons require a new calculation. We will not perform that calculation explicitly here, though it is relatively simple because the distribution of energy fractions of decay products from the Z boson is simply uniform. The Z boson reducibility factor κ Z is also 0, because the quark jet Sudakov factor provides more exponential suppression in the limit that τ (2) 2 → 0 than for Z bosons. This is due to the fact that the two prongs of the quark jet are a quark and a gluon, while the two prongs of the Z boson are both quarks. Further, this reasoning also applies to gluon jets versus Z bosons, through replacing the color factors in the quark jet distributions C F → C A . Because C A > C F , gluon jets and Z boson jets are also mutually irreducible.
It is worth noting that higher order effects, such as g → qq, may spoil this mutual irreducibility and hence the parametric separation of the categories. Calculating these effects requires working beyond leading logarithmic accuracy, at least including non-singular pieces of the splitting functions as well as the running of the strong coupling constant. While we will not pursue this further here, we highlight that the reducibility factors allow for the investigation of optimal parametric separation at higher orders. Developing collider observables which are optimal at next-to-leading and higher logarithmic accuracy is an interesting avenue for further exploration.

Quark and Gluon Power Counting Rules
We now present power counting rules that can be applied to simply determine powerful observables for quark versus gluon discrimination. As we justify in the following sections, resolving any finite number of emissions in a jet strictly prohibits the isolation of a gluon-pure phase space region. Nevertheless, due to Sudakov suppression and the fact that the fundamental Casimir C F is smaller than the adjoint Casimir C A , only quark jets survive deep in the infrared regions of phase space. Thus, a quark-pure region of phase space can be defined, which motivates a power counting parameter and a definition of the gluon-rich region of phase space simply as that region for which the Sudakov factors are unity. Further, the power counting for quark and gluon jets is a bit different than that established for prong discrimination, for example. In the quark versus gluon case, we construct a power counting scheme for the distribution of an observable (or multiple observables), and not for the observables themselves. This enables us to identify the necessary properties of the distribution such that quarks and gluons are optimally separated.
With this motivation and within the stated caveats, we present the power counting rules for quark versus gluon discrimination: 1. Given a measured set of observables on the jets, such as N -subjettiness {τ (β) N }, identify the corresponding phase space boundaries defined by these observables.

Formally take the power counting
With this power counting, the boundaries of phase space where any ratio of a pair of measured observables {τ (β) N } becomes large (or small) are dominantly populated by quarks. This is because Sudakov form factors exponentially suppress the gluon jet cross section beyond that of quarks. The boundaries on which all observable ratios are order 1 are dominantly populated by gluons.

Construct a function of the observables {τ (β)
N } whose constant values define hypersurfaces for which, for example, when the function is 1 only the gluon region is selected, and when the function is 0, only the quark region is selected. The resulting function is guaranteed to be a powerful quark/gluon discriminant.
Our explicit calculations in the following sections will justify these rules in a concrete context. Additionally, there are numerous immediate consequences. For a given set of observables, the observable that is directly sensitive to the most emissions in the jet satisfies the power counting requirements. In the context of N -subjettiness observables, τ N is a better quark/gluon discriminant than τ (β) N −1 in the limit that parametrically approaches the phase space boundaries. Further, the multiplicity observable is obviously sensitive to all particles in a jet, hence it will also be a very good quark/gluon discriminant.
Perhaps the most surprising consequence of these power counting rules is that good quark versus gluon discrimination observables are IRC safe. By "IRC safe" we mean that the region of phase space in which cross sections calculated at fixed-order in perturbation theory diverge are mapped to a single value of the observable. This is a bit more of an abstract definition of IRC safety than is typically stated (see for example Ref. [49]), but is equivalent to the heuristic that the observable is insensitive to exactly collinear or zero energy emissions.
The argument for the IRC safety of good quark/gluon discriminants using the power counting rules is as follows. The regions of phase space on which any N -subjettiness observable ratio becomes large is the singular limit of perturbative QCD, in which the corresponding fixed-order cross section would diverge. For powerful discrimination, we need constant hypersurfaces of the constructed observable to be approximately parallel to these boundaries; otherwise quark-pure and gluon-rich regions of phase space would be mixed by the observable. Thus, all singular phase space regions must be mapped onto the same value of the discrimination observable. As such, all real and virtual divergences can be correspondingly cancelled order-by-order. Because the N -subjettiness observables can form a complete basis of M -dimensional phase space for any M , the optimal quark versus gluon discrimination observable is some IRC safe combination of (many) N -subjettiness observables. We emphasize that this is purely a perturbative argument, as IRC safety is only relevant within perturbation theory. Nevertheless, this suggests a guiding principle for constructing quark/gluon discriminants and attempting to understand the output of high-dimensional machine learning studies.
We also note that this observation is not vacuous, as it is not true that IRC safe observables are optimal for all jet discrimination problems. For example, in the case of discrimination of jets with different numbers of prongs, such as QCD jets versus boosted top quarks, it has been argued that optimal observables are not IRC safe. Power counting in the two-versus one-prong or three-versus one-prong jet cases motivates ratio observables such as τ [31,32], which are not IRC safe [50]. Of course, these ratios can become IRC safe if combined with a constraint on other observables, such as the jet mass. Nonetheless, it is interesting that the optimal discriminants for multi-prong tagging are indeed not IRC safe without such a restriction. This is in contrast to what we have established for quark vs. gluon discrimination, where the likelihood ratio is always IRC safe.

Resolving One Emission
We now present explicit calculations of collections of N -subjettiness observables on jets, resolving one, two, or three emissions within the jet. In this section, we showcase results for jets on which one emission is resolved and discuss their consequences, which will frame the calculations in the next two sections. All results in this section have been calculated elsewhere in the literature [51][52][53][54], so we will not present the details of the calculation. We compile them to construct a complete picture of quark versus gluon discrimination on such jets. The results in the following sections will be novel, in which complete calculations will be presented.
To double logarithmic accuracy, the normalized distribution of one-subjettiness τ (β) 1 for quark and gluons jets is The corresponding cumulative distributions are which are related by so-called Casimir scaling. The quark/gluon ROC curve is thus: and its integral is the AUC, namely: These results will provide a benchmark for discrimination performance that we will compare to in the following sections. We now proceed to calculate the quark and gluon reducibility factors for the phase space of one resolved emission. For Casimir-scaling observables, this was calculated in Ref. [16], but we present the result here for completeness. For the one-subjettiness distributions, the likelihood ratio is Note the appearance of the power counting factor exp[C F − C A ] 1 in this distribution. Approaching the boundary where τ (β) 1 → 0, this small number is raised to a large positive power, demonstrating that the quark reducibility factor is 0. The gluon reducibility factor is the inverse of the value of the likelihood for τ That is, by just measuring τ (β) 1 , any phase space region of gluon jets is always contaminated by quark jets, by a relative proportion of C F /C A or greater.

Resolving the One-Emission Phase Space
Measuring τ (β) 1 resolves one emission off of the hard jet core, and so effectively defines a jet with two particles. Two-body phase space is two-dimensional, and this phase space can be defined by the relative energy fraction and angle of the emission. Correspondingly, one can measure two one-subjettiness observables, τ with α > β, to completely resolve two-body phase space. To double logarithmic accuracy, this joint probability distribution was first calculated in Ref. [51] and extended in Refs. [52][53][54] which found The Sudakov factor is where C i is the appropriate color factor. The physical phase space lies within the boundaries of τ The likelihood ratio for the two one-subjettiness observables is then The quark reducibility factor is still 0, due to the exponential suppression of the Sudakov factors. Further, the gluon reducibility factor is still C F /C A ; completely resolving the oneemission phase space does not improve gluon jet purity. From power counting arguments, this then implies that completely resolving the phase space does not parametrically improve discrimination power. It is most important to measure observables to demonstrate that a particular number of emissions exist in the jet. The likelihood ratio is the optimal observable for discrimination, and it is straightforward to demonstrate that it is in this case indeed IRC safe, as claimed from our power counting arguments. Due to the phase space boundaries, there is only one point on phase space that corresponds to the singular limit: when τ The only way that the likelihood can vanish is if the ratio of Sudakov factors vanish; the prefactor formed from a ratio of logarithms is always positive on the physical phase space. However, the Sudakov factor can only vanish if its exponent diverges, corresponding to at least one of the one-subjettiness observables going to 0. By the phase space constraints, if one goes to 0 the other must as well, and so the only point on phase space that makes the likelihood vanish is the singular point τ Therefore, all divergences on phase space are isolated to a single point in the likelihood, and thus it is IRC safe.

Higher Order Effects
For this case of one resolved emission, we also briefly discuss higher-order corrections. In Ref. [40], a calculation of recoil-insensitive one-emission observables was presented at nextto-leading logarithmic accuracy. For the one-subjettiness observables considered here, this would correspond to defining the jet axis with a recoil-free scheme, such as the broadening [55] or winner-take-all [56,57] axis. For the ROC curve, they found the following relationship between the gluon and quark cumulative distributions (with arguments suppressed): The lowest-order relationship is simply the overall C A /C F Casimir scaling, but effects like running coupling, hard collinear radiation, and multiple emissions all affect the discrimination power at higher orders. In general, discrimination power improves as the angular exponent β decreases, due to these higher order effects. This is directly observed in simulations, suggesting that one should use as small an angular exponent as possible, while maintaining theoretical control. 2 These higher order effects could be explored for more resolved emissions, but we leave that to future work. In the following sections, we will focus on the calculations at double logarithmic accuracy.

Resolving Two Emissions
We now turn to calculations for jets on which two emissions are resolved. While some of the calculations that we present are included in parts of various other calculations in the literature [48,59,60], to our knowledge, these complete expressions have never appeared for quark versus gluon discrimination. Therefore, we present a detailed discussion of the calculations that follow. Further, as discussed in Sec. 2, we simplify our analysis and strictly consider measuring N -subjettiness observables with an angular exponent β = 1. As higherorder corrections in the one emission case demonstrate, there is likely discrimination power to be gained by changing the angular exponent. However, we do not consider that here as even this simple analysis will enable significant understanding.

Fixed-Order Analysis
We begin with a calculation of the cross section for quarks jets on which both τ 1 and τ 2 have been measured. The phase space restrictions demand that τ 2 ≤ τ 1 ≤ 1 and at leading order, there are two possibilities for the orientation of emissions in the jet. Either the gluons that set τ 1 and τ 2 could be sequentially emitted from the initiating quark, or the gluon that sets τ 2 is emitted off of the gluon that sets τ 1 . In the first case, the color factor is C 2 F and the contribution to the cross section to double logarithmic accuracy is In the second case, the color factor is C F C A and we must account for the fact that the gluon that sets τ 2 can neither have more energy than the gluon that sets τ 1 nor be at larger angle.
In this color channel, the cross section is then Combining these results, the leading-order double differential cross section in the double logarithmic limit for quark jets is This agrees with the results of Ref. [48], in which they calculate the distribution of τ 1 when there is a cut on the ratio τ 2 /τ 1 . The result for gluon jets can be found by simply replacing While only evaluated at fixed-order, these results are not probability distributions, and so we cannot use them to determine likelihood ratios. However, the gluon reducibility factor is the ratio of the cross sections in the region where the Sudakov factors are unity; that is, the gluon reducibility factor can be calculated strictly from fixed order results. The ratio of the quark to gluon cross sections is This ratio is minimized in the ordered limits in which first τ 2 → τ 1 and then τ 1 → 1. The second limit is required to remain in the fixed-order regime and neglect the Sudakov factor. In these limits, the gluon reducibility factor κ g (τ 1 , τ 2 ) is then This is significantly smaller than the reducibility factor of C F /C A 0.444 with only one resolved emission, demonstrating that purer gluon phase space can be isolated through additional measurements.
With the fixed-order cross section in hand, we can additionally integrate over τ 1 to determine the cross section for jets on which τ 2 is measured alone. From the power counting arguments, τ 2 should be a good discriminant itself, because it vanishes in the singular phase space regions, where the Sudakov factors exponentially suppress the cross section, and when τ 2 → 1, then necessarily τ 1 → 1. Integrating over τ 1 , we find the quark jet cross section singly-differential in τ 2 to be As before, the cross section for gluon jets can be found by replacing C F → C A : The gluon reducibility factor for jets on which just τ 2 is measured is then the ratio of these two cross sections: While this reducibility factor is definitely larger than the case in which both τ 1 and τ 2 are measured, it is still significantly smaller than the Casimir-scaling result of C F /C A 0.444. Therefore, as predicted by power counting, just measuring τ 2 enables an increased purity of gluon jets and therefore improved discrimination power over just measuring τ 1 .

Including Resummation
For a thorough analysis, however, we need to calculate the joint probability distribution of τ 1 and τ 2 on jets. To calculate this, we will employ the expression for the joint probability distributions expressed in terms of conditional probabilities. For the joint probability distribution p(τ 1 , τ 2 ), we can express it as Here, z 1 is the energy fraction of the gluon that sets the value of τ 1 . It is necessary to include it in an intermediate step to correctly enforce angular ordering, as we will discuss shortly. The probability distribution of τ 1 , p(τ 1 ), was presented for quark and gluon jets in Eq. (5.1). The conditional distribution of the energy fraction p(z 1 |τ 1 ) is found by noting that to double logarithmic accuracy, log 1/z 1 is just distributed uniformly from 0 to log 1/τ 1 . That is, the conditional distribution is Figure 2: Illustrations of the forbidden regions (grayed) for gluon emission that sets the value of τ 2 , given a value of τ 1 . The location of the emission that sets the value of τ 1 in the Lund plane is illustrated by the star. On the left is the forbidden region if the gluon is emitted off of the initiating quark; the only requirement is that the gluon must enforce τ 2 < τ 1 . On the right is the forbidden region if the gluon is emitted off of the gluon that sets the value of τ 1 : it must be both at smaller angle and have smaller energy than the first emitted gluon.
To calculate the conditional probability distribution for τ 2 , p(τ 2 |τ 1 , z 1 ), we first calculate its cumulative distribution, Σ(τ 2 |τ 1 , z 1 ). To calculate this distribution requires identifying the regions in the Lund plane which are forbidden, given the measured value of τ 2 . There are two possibilities for how the emission that sets τ 2 was formed, and that produces two different no emission regions. These regions are illustrated in Fig. 2 in gray. First, if the gluon that sets τ 2 is emitted off of the quark, the only restriction on its phase space to this accuracy is that τ 2 < τ 1 . This area, multiplied by the appropriate color and coupling factors, is The no emission region in the case in which the gluon that sets τ 2 is emitted off of the gluon that sets τ 1 is required to both be at smaller angle and smaller energy than the first emission. This demonstrates why the energy fraction z 1 is measured, as this enables an identification of the angular-ordered phase space region. The area of this region, including color and coupling factors, is With these results, the cumulative conditional probability distribution is just the expo-nential of these areas, as follows from considering gluon emission as a Poisson process: 14) The conditional probability distribution is then just the derivative of this expression: For gluon jets, the whole analysis is identical, we just replace C F → C A and find We can then multiply the distributions together and integrate over z 1 ∈ [τ 1 , 1] to find the double differential probability distribution to resolve two emissions off of a quark. We find The corresponding distribution for gluons is found by making the replacement C F → C A : It is straightforward to see that these expressions reduce at lowest order in α s to Eqs. (6.3) and (6.4).

IRC Safety of the Likelihood
These expressions for the quark and gluon probability distributions can be used to construct the likelihood ratio and demonstrate that it is IRC safe, as claimed. The likelihood ratio The non-exponential prefactor never vanishes on the physical phase space where τ 2 < τ 1 .
Because C A > C F , the exponential factor vanishes as τ 2 → 0, which is also the entire region of phase space on which fixed-order cross sections diverge. Therefore, because the entire singular region of phase space is mapped to a single point, the likelihood ratio L(τ 1 , τ 2 ) is indeed IRC safe.

AUC Evaluation
To quantify the absolute discrimination power of the likelihood L(τ 1 , τ 2 ), we could attempt to construct its complete ROC curve. However, the likelihood is a complicated function of the observables τ 1 and τ 2 , which doesn't enable a convenient inversion. Therefore, we take a different route: instead of calculating the full functional form of the ROC curve, we just calculate its integral, the AUC. Improved discrimination power corresponds to decreasing the value of the AUC, so we are able to compare directly between the AUC calculated with different numbers of resolved emissions. What makes the AUC so convenient as a discrimination metric, even without an explicit form of the ROC curve, is that it can be expressed as an ordered integral over the probability distributions. For signal and background distributions p s (x) and p b (x) of a random variable x, the AUC that corresponds to measurement of the variable x is (6.20) Translated to the evaluation of the AUC of the likelihood for quark and gluon jets on which τ 1 and τ 2 are measured, we have To perform the integral to calculate the AUC, we use the implementation of Vegas within Cuba 4.2 [61]. Using C F = 4/3 and C A = 3, we find that the AUC of the likelihood is On the right, we compare to the AUC for resolving one emission, just measuring τ 1 , Eq. (5.4). Because the coupling α s dependence enters in the exact same way for quarks and gluons, the AUC is independent of the particular value of the coupling, which we verified. An additional benefit of the AUC as a measure of discrimination power is that it enables a simple, concrete variational algorithm to determine other observables. Consider an observable O(α 1 , α 2 , . . . , α n ) that is some function of the N -subjettiness observables, that depends on some set of parameters {α i }. We can calculate the AUC for this observable and then fix the parameters to minimize the AUC. Of course, the value of the AUC for such an observable is bounded from below by the likelihood. However, this procedure provides an approximation to the likelihood that may have a significantly simpler functional form.
We can construct such an observable with this technique. For illustration, we just consider the observable formed from a product of powers of τ 1 and τ 2 : In general, α 1 and α 2 are real numbers, but the observation that the likelihood is IRC safe helps to dramatically constrain the observables. First, because the likelihood vanishes as  We enforce this on O by requiring the power α 2 > 0. Any monotonic function of an observable has the same discrimination power, so the IRC safety of this observable enables us, with impunity, to set α 2 = 1. Further, the likelihood vanishes in the ordered limit τ 2 → τ 1 and τ 1 → 0, and this requires α 1 > −1. That is, the observable that we consider is just with α > −1. While this ratio seems potentially ambiguous when α < 0 for a jet with a single particle, it is nevertheless still IRC safe. The potential 0/0 ambiguity can be eliminated and a well-defined result obtained by first taking τ 2 → τ 1 and then τ 1 → 0. The exponent α can then be determined by the value that minimizes the AUC.
To do the minimization, we simply scan through α ∈ [−1, 1], and plot the AUC as a function of α. The result of this scan is plotted in Fig. 3. Also shown on this plot are the AUC values of the likelihood for jets on which τ 1 is measured and τ 1 and τ 2 are measured. The AUC for the variational observable is minimized when α = −0.2, corresponding to an observable that is O = τ −0.2 1 τ 2 . To three significant figures, the value of the AUC at this point is 0.256, which is significantly lower than that for just τ 1 , and well within 1% of the AUC value of the two-resolved-emission likelihood. That the minimum AUC exists near α = 0 can be understood in the following way. As argued earlier, the likelihood is an IRC safe observable, and when α ≤ −1, the observable O is no longer IRC safe. On the other hand, if α is very large, then the discrimination power of the observable O is essentially entirely controlled by τ 1 . Because τ 2 is directly sensitive to more emissions in the jet than τ 1 , it should have better discrimination power. This suggests that the power α should be relatively close to 0 to maximize discrimination.

Resolving Three Emissions
We now present calculations for resolving three emissions off of a hard jet core, by e.g. measuring τ 1 , τ 2 , and τ 3 . To our knowledge, these calculations are novel, even in the double logarithmic limit, and have application to top quark tagging. In addition to the explicit fixed-order and resummed calculations, we also discuss properties that hold for an arbitrary number of emissions. We prove that the gluon reducibility factor when n emissions is resolved is (C F /C A ) n and provide a robust lower bound on the AUC exclusively in terms of reducibility factors.

Fixed-Order Analysis
Starting with the fixed-order calculation of the triple-differential cross section of τ 1 , τ 2 , and τ 3 in the double logarithmic limit, there are three separate color channels that contribute. As in earlier sections, we start with the calculation for a quark jet, and then simply make the replacement C F → C A for gluon jets. The C 3 F color channel means that all three emissions that set these observables are sequentially emitted off of the quark and we find The C 2 F C A channel has two emissions off of the quark and the third off of one of the secondary gluons. There are three ways this can occur yielding Finally, the C F C 2 A channel consists of the gluon that sets τ 1 emitted off of the quark, and then the gluons that set τ 2 and τ 3 are subsequently emitted off of the the secondary gluon. There are two possible ordering of emissions, which results in The total cross section is then the sum of these three color channels. For brevity, we will not write the combined result. Further, the result for gluon jets to this approximation is found by making the replacement C F → C A , though we also will not write that out explicitly. These results are sufficient to calculate the gluon reducibility factor, corresponding to the smallest value of the likelihood formed from the ratio of the quark to gluon cross sections. Motivated by the location of the likelihood minima in the case of the cross section for τ 1 and τ 2 , we consider the ordered limit τ 3 → τ 2 → τ 1 . In this limit, the cross sections in the C 2 F C A and C F C 2 A vanish; only the C 3 F channel is non-zero. We therefore find 1 σ 0 The corresponding limit for gluon jets is similar: The reducibility factor for gluons is then the ratio of these cross sections, with τ 1 → 1: Marginalizing the cross section over τ 1 and τ 2 enables us to determine the distribution of τ 3 . For quarks, we find Correspondingly, for gluons, we find It then follows that the gluon reducibility factor with τ 3 can be found from the ratio of these distributions: Note that this reducibility factor for just measuring τ 3 is smaller than even the reducibility factor for measuring τ 1 and τ 2 in Eq. (6.6). This suggests that just measuring τ n for sufficiently large n a pure sample of quarks and gluons can be defined.

Calculation of Gluon Reducibility for Any Number of Emissions
These results are evidence for the scaling of the reducibility factor of gluons to be (C F /C A ) n , if n emissions in the jet are resolved by measuring the set of N -subjettiness observables τ 1 , τ 2 , . . . , τ n . For this to be true, it must be that the contribution to the quark cross section from the mixed color channels C n−i F C i A for 0 < i < n vanishes at the point that the likelihood assumes its minimum value. We will prove this from a direct calculation of the cross section in an arbitrary color channel, in the strongly-ordered soft and collinear limits.
The n-differential cross section for N -subjettiness observables measured on quark jets in the C n−i F C i A color channel can be expressed as Here, the product runs over all n emissions that set each of the τ j values. The outer sum runs over all possible orderings of the emission tree. The upper bound on the angular integral θ j,max represents the appropriate maximum angle for θ j . If the jth emission is from the hard core of the jet, θ j,max is just 1. If j is a secondary (or later) emission off of other emissions in the jet, then this is the appropriate angle to enforce angular ordering. Note that the particular ordering fixes the maximum energy of any given emission; this is expressed with the product of energy fractions within the δ-functions. In the strongly-ordered energy limit, only if a gluon is directly emitted off of the initiating quark does its energy range up to the total jet energy. We now first assume that 0 < i < n, so that there is at least one gluon that is a secondary emission off of another gluon. Now, set all N -subjettiness values τ j equal to τ 1 , corresponding to the ordered limit τ n → τ n−1 → · · · → τ 1 . Then, as long as 0 < i < n, there will be at least one pair of δ-functions in the differential cross section for τ j 1 and τ j 2 , with j 1 > j 2 , whose arguments are of the form However, this then sets θ j 1 = z j 2 θ j 2 . (7.12) Choosing the appropriate pair j 1 and j 2 such that θ j 2 ,max = θ j 1 means that θ j 1 > θ j 2 , but z j 2 < 1, so these requirements are inconsistent. Note that this choice of j 1 and j 2 can always be done: if j 2 is a secondary emission off of j 1 , then both the energy and angle of j 2 are constrained by j 1 . Therefore, the C n−i F C i A color channel of the quark cross section vanishes for 0 < i < n, in the ordered limit τ n → τ n−1 → · · · → τ 1 .
By contrast, the cross section in the pure C n F color channel does not vanish. Every gluon that sets the value of the N -subjettiness observables in this color channel is emitted directly off of the initiating quark. Therefore, the cross section in this channel is Setting all τ j = τ 1 , then this evaluates to (7.14) The gluon cross section in this limit is found from C F → C A : The gluon reducibility factor for measuring enough N -subjettiness observables to resolve n emissions is then just the ratio of these cross sections: Note that this is indeed the minimum value of the likelihood ratio; because C A > C F , a contribution to the quark cross section from any other C n−i F C i A color channel would increase this ratio. This completes the proof of the gluon reducibility factor for n resolved emissions.
The arguments in this proof explicitly relied on the form of the cross section in the double logarithmic limit. However, the region of phase space which is dominated by gluon jets, where τ n → τ n−1 → · · · → τ 1 → 1, is not accurately described by the double logarithmic approximation. Higher-order resummation and fixed-order corrections are necessary to accurately describe this region, and those contributions do not necessarily have such a nice organization. In the region of phase space dominated by fixed-order corrections, the matrix elements are smooth and exhibit no non-analytic structure. Also, because N c = 3 in QCD, the leadingcolor approximation is accurate, up to corrections of about 10%. These features of QCD and quark versus gluon discrimination suggest that the result for the reducibility factor for jets with n resolved emissions derived in this section is a good approximation to what would be derived when all relevant effects are taken into account.

Including Resummation
To calculate the likelihood and related quantities, we further need to calculate the resummed probability distribution for τ 1 , τ 2 , and τ 3 measured on jets. In a similar way to what was done in the case for just measuring τ 1 and τ 2 , we can express the joint probability distribution as an integral over a product of conditional probabilities: p(τ 1 , τ 2 , τ 3 ) = 1 0 dz 1 1 0 dz 2 p(τ 1 )p(z 1 |τ 1 )p(τ 2 |z 1 , τ 1 )p(z 2 |τ 2 , z 1 , τ 1 )p(τ 3 |z 2 , τ 2 , z 1 , τ 1 ) . (7.17) log 1 z Figure 4: Illustrations of the two of the forbidden regions (grayed) for gluon emission that sets the value of τ 3 , given a values of τ 1 and τ 2 . The location of the emission that sets the value of τ 2 in the Lund plane is illustrated by the star. On the left is the forbidden region if the gluon is emitted off of the initiating quark; the only restriction on the gluon is that it must set τ 3 < τ 2 . On the right is the forbidden region if the gluon is emitted off of the gluon that sets the value of τ 2 ; it must be both at smaller angle and have smaller energy than the first emitted gluon.
In the strongly-ordered limit, the first three of these probability distributions have already been calculated in the previous sections. We only need to calculate p(z 2 |τ 2 , z 1 , τ 1 ) and p(τ 3 |z 2 , τ 2 , z 1 , τ 1 ). The quark jet probability distribution for the energy fraction z 2 of the second gluon emission p(z 2 |τ 2 , z 1 , τ 1 ) can be extracted from the multi-differential fixed-order cross section, allowing the second gluon to be emitted either from the quark line or off of the primary gluon emission. One then finds To calculate the quark jet resummed conditional distribution for τ 3 , we first consider its cumulative conditional distribution, Σ q (τ 3 |z 2 , τ 2 , z 1 , τ 1 ). This distribution is just the Sudakov form factor in the double logarithmic approximation, and so is just exponentiated areas on the Lund plane. These areas are illustrated in Figs. 4 and 5. First, on the left in Fig. 4, we can consider the forbidden emission area if the gluon that sets τ 3 is emitted off of the quark. With appropriate color and coupling factors, this area is: On the right of Fig. 4 is the situation if the gluon is emitted off of the secondary gluon, the gluon that sets the value of τ 2 . The forbidden emission area in this case is: Both of these areas are just the analogs of the corresponding situation in the two emission case of Fig. 2.
If the emission that sets τ 3 is off of the primary gluon, then the forbidden emission area is a bit more subtle. This is illustrated in Fig. 5, and the area now depends on the energy fraction and angle of the primary gluon emission, as well as the value of τ 2 . With color and coupling factors, this forbidden emission area is The Sudakov form factor is just the exponential of these areas. For calculating the conditional probability, we differentiate the Sudakov form factor to find We leave the area factors in the exponential implicit for brevity and, as always, the result for gluon jets is found from replacing C F → C A . Unlike in previous sections, we will not explicitly write the triple differential distribution out, as it is now unwieldy. At any rate, it can be calculated from the provided conditional probabilities and by integrating over the values of the primary and secondary emitted gluon energy fractions, z 1 and z 2 , as in Eq. (7.17). To lowest order, the resummed expression agrees with the fixed-order calculations from earlier in this section. Additionally, we just note that the likelihood ratio is IRC safe, by a similar reasoning as we used in the previous section.

AUC Evaluation
With the probability distributions and the likelihood calculated, we can then calculate the AUC for quark versus gluon discrimination when three emissions in jets are observed. Extending the calculation for the AUC from Sec. 6.4, it can be expressed in this case as Figure 5: Illustration of the third of the forbidden regions (grayed) for gluon emission that sets the value of τ 3 , given a values of τ 1 and τ 2 . The location of the emission that sets the value of τ 1 in the Lund plane is illustrated by the square and the emission. The forbidden region is constrained by the energy and angle of the primary gluon emission and by enforcing τ 2 < τ 3 .
As before, to perform the integral to calculate the AUC, we use the implementation of Vegas within Cuba 4.2. Using C F = 4/3 and C A = 3, we find that the AUC of the triple differential likelihood is AUC 0.231 < 0.256 Going right we compare to the value of the AUC for resolving two emissions (0.256), and resolving just one emission (0.308). So, the absolute discrimination power is definitely improved, but the size of the relative improvement in going from resolving two to three emissions has decreased from that of resolving one to two emissions. We can also extend the variational approach to construct a powerful discrimination observable whose functional form is much simpler than the full likelihood. For illustration, we take a product form for an observable O, where The likelihood vanishes in the τ 3 → 0 limit, manifesting its IRC safety, and so we enforce δ > 0. Thus, without loss of generality, we can just set δ = 1 and consider the observable In the ordered limits τ 3 → τ 2 and τ 3 → τ 2 → τ 1 , IRC safety further enforces that 1 + γ > 0 and 1 + α + γ > 0. To find the α and γ values that yield the best discrimination power, we  Fig. 6 for which the minimal AUC of 0.232 is achieved at α = −0.3, γ = 0.1. Perhaps a more complicated observable could be constructed that performed slightly closer to that of the likelihood, but we won't pursue that further here.

Estimate of n → ∞ AUC
While we won't present further calculations of probability distributions to resolve four or more emissions in a jet in this paper, we can still make some robust statements about the discrimination power for any number of resolved emissions.
Our general analysis of the reducibility factors and their relationship to the ROC curve provided a bound on the AUC in Eq. (3.4). We can then apply this bound to our results for the quark and gluon reducibility factors. We have shown that κ q = 0 for any number of resolved emissions, while when n emissions are resolved. Plugging these values into the bounding formula we find That is, perfect discrimination power between quark and gluon jets is only possible if an infinite number of emissions are resolved. As any physical jet contains only a finite number of particles in it, this suggests that there is an absolute lower bound on the AUC for the discrimination of physical quark and gluon jets. This bound on the AUC is indeed satisfied by our results for one, two, and three resolved emissions. Comparing to this bound, we had found Here, the subscripts on the AUC represents the number of resolved emissions. Because the rate of convergence to 0 observed in the complete calculations is so much slower than the bound would suggest, this might be evidence that any achievable AUC for a physical jet, even resolving all of its emissions, is relatively large. So, our calculations suggest that there seems to be an inherent limitation to quark and gluon jet discrimination, beyond all of the subtleties regarding their fundamental, theoretical definition.

IRC Safe Multiplicity
The way through which we defined resolved emissions in a jet, by measuring N -subjettiness, enables a simple, IRC safe, definition of resolved particle multiplicity in the jet. Given an n + 1 dimensional joint probability distribution, p(τ 1 , τ 2 , . . . , τ n+1 ), the probability that the jet has exactly n resolved constituents is Here, Λ 0 > 0 is some resolution cut that is responsible for the IRC safety of this multiplicity definition. While we always assume the ordering of N -subjettiness observables τ n > τ n+1 , we only first explicitly write it in the third line to connect to the expression in the final line. In the final equation, Σ n (Λ 0 ) is shorthand for the cumulative distribution This multiplicity distribution is normalized when summed over all n: Because the N -subjettiness variables are ordered τ 1 ≥ τ 2 ≥ · · · , for sufficiently large n and a fixed cutoff Λ 0 , the value of τ n will have probability 1 to be below Λ 0 . Note that the probability distribution of this multiplicity has strictly less information than the full joint probability distribution, because information in lost in doing the integral up to the scale of the resolution variable. Therefore, the discrimination power of such a multiplicity is strictly less than that of the likelihood formed from the ratio of joint probability distributions p g (τ 1 , τ 2 , . . . , τ n+1 )/p q (τ 1 , τ 2 , . . . , τ n+1 ). The AUC as a measure of the discrimination power of this IRC safe multiplicity can be calculated and one finds This formula can be derived by summing over the area of the trapezoids that make up the ROC curve. Additionally, we have assumed that the multiplicity is monotonic in the likelihood of multiplicity, which is expected. More realistically, one only resolves up through n emissions in the jet, inclusive over more emissions. In our calculations, for example, we have only resolved up through three emissions in the jet, and so we can only say if the jet has 0, 1, 2, or three-or-more emissions. In this case, to calculate the AUC, the sum must be truncated: where Σ q,n (Λ 0 ) = 1. Applying this formula to our multi-differential probability distribution p(τ 1 , τ 2 , τ 3 ) calculated in the previous section we found a minimum AUC value of about 0.274 for Λ 0 0.005. Note that this is indeed larger than the AUC formed from the likelihood p g (τ 1 , τ 2 , τ 3 )/p q (τ 1 , τ 2 , τ 3 ).
The expression of the triple-joint probability distribution is complicated and does not provide an intuition for what physics controls the discrimination power of multiplicity. In some cases, most notably through iterative soft drop [10], the multiplicity is approximately distributed as a Poisson random variable. For such observables, the mean multiplicity of quark and gluon jets are related by their color factors: for some fiducial multiplicity λ. The probability for n resolved emissions distributed according to the Poisson distribution is then for a mean λ i . If all emissions in the jet are resolved, the quark and gluon reducibility factors of Poisson-multiplicity are From our expression on the lower bound on the AUC from reducibility factors, we find that Note that this lower bound only vanishes if the fiducial mean multiplicity λ → ∞. Iterated soft drop multiplicity was argued to be the optimal quark versus gluon discriminant at leading logarithmic accuracy [10]. This would seem to be at odds with our analysis here with collections of N -subjettiness observables. However, there are a few differences. First, at leading-logarithmic accuracy, iterated soft drop is only sensitive to emissions off of the hard core of the jet, while N -subjettiness (or related) observables can be sensitive to secondary emissions. Thus, the leading-logarithmic phase space is different between these observables. A sufficiently large collection of N -subjettiness observables completely resolves M -body phase space, but iterated soft drop at leading-logarithm could, in principle, remove an arbitrary number of emissions from the jet before identifying an emission that passes. Thus, to directly compare, one would at least need to consider jets on which an arbitrary number of N -subjettiness observables are measured. Further, N -subjettiness is a continuous variable while (any definition of) multiplicity is discrete, so comparing their discrimination power in practice is more challenging.

Relationship of Multiplicity to Individual N -subjettiness Observables
This formulation of multiplicity suggests a new way of thinking about it that can provide insight into its discrimination performance in comparison to other observables. In particular, in this section we compare the quark vs. gluon discrimination performance of multiplicity to that of an individual N -subjettiness observable, τ n . Definitive statements about their relationship require more information about the multiplicity distribution, but we conjecture that τ n has an AUC bounded from below by multiplicity. Further, we conjecture that this inequality is saturated when n is about the number of minimal constituents in a gluon jet. The observation that N -subjettiness τ n for large n is a good quark vs. gluon discriminant has been known for a long time [8], and we hope that the arguments presented here can be sharpened in the future. In this section, we work beyond leading-logarithmic accuracy, and attempt to make general statements that hold even non-perturbatively regarding the relationship of multiplicity and N -subjettiness as quark vs. gluon discriminants.
In practice, multiplicity is not defined with a cutoff; it is just a count of all those experimentally-resolved constituents of a jet. We will not attempt at defining what "experimentallyresolvable" means nor attempt to include a finite cutoff representing the experimental limitations. In this spirit, we will just write Λ 0 = 0 in the following with the caveat that "0" here may actually be a finite value. At any rate, its value is not set within the applicability of perturbation theory so invalidates the conclusions made earlier. True multiplicity is thus the Λ 0 → 0 limit of the IRC-safe multiplicity whose distribution we had defined in Eq. (8.1): Any realistic collection of jets will only have a finite number of constituents, and so the sum terminates once the maximum number of quark jet constituents n q,max has been reached. That is, once i is at least n q,max all jets have 0 value for τ i or that for i ≥ n q,max . Because we have defined multiplicity through properties of the N -subjettiness variables, this allows a convenient comparison to the discrimination power of an individual Nsubjettiness τ n . The AUC for τ n can be approximated by: Here, the {x i } are a collection of points of τ n ∈ [0, 1] at which the ROC is evaluated. To directly compare to multiplicity, it is convenient to set the number of bins in the ROC curve N = n q,max and choose the locations of the bins to match that of multiplicity. This means that we choose the points x i such that Σ q,n (x i ) = Σ q,i (0) , (8.13) or that x i = Σ −1 q,n (Σ q,i (0)) . (8.14) Note also that due to Σ q,i (0) ≥ Σ q,n (0) for i ≥ n we have that x i = 0 if i ≤ n. With this choice of points, the AUC of τ n is then approximately It's then straightforward to evaluate the difference between the AUC for τ n and multiplicity: [Σ q,i+1 (0) − Σ q,i (0)] Σ g,n (Σ −1 q,n (Σ q,i (0))) − Σ g,i (0) + Σ g,n (Σ −1 q,n (Σ q,i+1 (0))) − Σ g,i+1 (0) .
The first term in this AUC difference is just the area of a right triangle with sides of length Σ q,n (0) and Σ g,n (0). Because the ROC and its first derivative are both monotonically increasing, the difference of the first two terms is necessarily non-negative: Unfortunately, it is much more challenging to determine the sign of the sum on the second line of Eq. (8.16). The sign of this term is set by the difference of gluon cumulative distributions: for i > n. For i = n, this difference is just 0. The interpretation of the term Σ g,n (Σ −1 q,n (Σ q,i (0))) is the following. First, Σ q,i (0) is the total integral of the quark jet events for which τ i is zero. Because we assume that i > n, note that Σ q,n (0) < Σ q,i (0). Then, there exists some > 0 such that Σ q,n ( ) < Σ q,i (0). This then sets the region over which we integrate the distribution p g,n (τ n ), which includes a δ-function at τ n = 0 for those jets with n or fewer constituents. This is illustrated in Fig. 7. We then need to compare this term to Σ g,i (0). We now make the following reasonable conjecture, but have not been able to prove it. We assume that all gluon jets for which τ i = 0 satisfy the inequality With this assumption, it then follows that Σ g,n (Σ −1 q,n (Σ q,i (0))) − Σ g,i (0) ≥ 0 . (8.20) While this is seems reasonable, we emphasize that we do not have a proof. With this assumption, we then establish the approximate inequality AUC τn AUC mult . (8.21) Note that, through our explicit calculation, we demonstrated that Further, if n is very large and approaching the maximal number of quark jet constituents n q,max , τ n is just 0 for most quark and gluon jets. So, at very large n, AUC τn approaches 1/2. Therefore, there must be some n at which AUC τn is minimized, and is close to AUC mult . Gluon jets in our sample will have some minimal number of constituents, call it n g,min . For all n < n g,min , τ n = 0 and then the first two terms of the difference in Eq. (8.16) vanish: ⌧ n p g,n (⌧ n ) Figure 7: Illustration of the integration region over N -subjettiness τ n of the gluon jet probability distribution that defines the quantity Σ g,n (Σ −1 q,n (Σ q,i (0))). Note the δ-function at τ n = 0 for all those jets with n or fewer constituents.
This follows because Σ g,i (0) = 0 for i < n g,min . Therefore, the largest n for which this term in the AUC difference is (approximately) 0 is when n n g,min . This suggests that the difference between the τ n and multiplicity AUCs is minimized when n g,min n n q,max . As we will see in our Monte Carlo studies, n g,min is about 15, or so, suggesting that τ 15 is about as good a discriminant as multiplicity. However, in practice, the discrimination power of τ n quickly saturates, even for n as small as 5 or so.

Comparison to Monte Carlo Simulation
In this section, we explore our calculations and conclusions in the context of simulated samples of quark and gluon jets. While here we will use a manifestly unphysical flavor definition for quark and gluon jets, operational flavor definitions [16,17] may be used in practice to study our conclusions directly in data.
Dijet events are generated with Pythia 8.226 [62,63] at √ s = 14 TeV with the default tunings and shower parameters, including hadronization and multiple parton interactions (i.e. underlying event). Final state non-neutrino particles are clustered into R = 0.4 antik T jets [64] with FastJet 3.3.0 [65], keeping up to two jets with transverse momentum p T ∈ [1000, 1100] GeV and rapidity |y| < 2.5. We compute N -subjettiness observables with β ∈ {0.5, 1.0, 2.0} using FastJet Contrib 1.029 [66] with winner-take-all axes [55]. Jets are labeled as "quark" or "gluon" based on the flavor of the closest parton in the hard process, required to be within 2R of the jet four-momentum.

Quark vs. Gluon Classification Performance
The quark vs. gluon discrimination ROC curves of the N -subjettiness observables with N up to 5 are shown in Fig. 8a. While the angular weighting parameter β has no effect on our calculations at this accuracy, we show results for β ∈ {0.5, 1.0, 2.0} to give a sense of the robustness of our predictions. The AUCs of these observables are shown in Fig. 8b, along with the N -emission AUC bound of 1 2 (C F /C A ) N and the tighter N -subjettiness AUC bounds for N ≤ 3. The bounds are indeed borne out in practice, with only mild dependence on β, and the N -subjettiness AUC bound explains the majority of the performance ceiling for the computed N values. We find that smaller β values tend to mildly improve the discrimination power of the individual observables, consistent with the overall conclusions of Refs. [40,41]. Further, Fig. 8 shows the ROC curve and AUC for the constituent multiplicity, which is an IRC unsafe observables that is known to be a good quark/gluon discriminant [8]. We find that the N -subjettiness observables closely approach the performance of multiplicity for large values of N .
This can be studied in more detail following the discussion of multiplicity in Sec. 8.1. Fig. 9a shows the distributions of constituent particle multiplicity for quark and gluons in our simulated jet samples. On average, quark jets have fewer constituents than gluons, due to the smaller color factor, and the smallest nontrivially-populated bin (greater than about one part in 10 5 ) for gluon jets is about 15 or so. From our conjecture at the end of Sec. 8.1, we then expect that τ N from about N 15 or so to exhibit similar discrimination power to that of multiplicity. This is demonstrated in Fig. 9b in which we plot the AUC for τ (β) N , for β ∈ {0.5, 1.0, 2.0} and N out to 100. The AUC of individual N -subjettiness observables converges rapidly to the AUC of multiplicity, and remains comparable until N ∼ 40 at which the AUC diverges, approaching 0.5 as N increases. This N of the divergence point is also approximately the mean multiplicity of quark jets, suggesting that once most of the quark jets have τ N = 0, the discrimination power of τ N is no longer optimal. Also, because multiplicity has (weak) jet p T dependence, these relationships will have some p T dependence. Nevertheless, we do expect, for any p T , a wide range of N for which τ N and multiplicity have comparable discrimination performance. As discussed earlier, these fascinating relationships between individual N -prong observables and multiplicity merit further study.
Beyond the AUC bounds, we also have predictions for the asymptotic ROC curve behaviors predicted by our power counting arguments. Fig. 10 shows the predicted asymptotic ROC curve behaviors in the high quark-efficiency region together with the ROC curve for β = 2 N -subjettiness observables, where we have the best perturbative control. We see good agreement with the analytical expectation, validating the applicability of the power counting reasoning to analyzing quark vs. gluon discrimination. We also predict that the ROC curve will have vanishing slope in the low quark-efficiency region, which is also borne out in these results. These results indicate that N -subjettiness observables with large N values may be a  Figure 10: The ROC curves for N -subjettiness observables with β = 2, together with the predictions for their asymptotic behavior. The predictions are shown as shaded regions that are predicted to match the slope of the ROC curve in the high quark-efficiency region. We also predict the slope of the ROC curve to approach zero in the low quark-efficiency region.
There is good agreement between the predictions and the observed classification performance.
good candidates for data-driven quark/gluon definitions [16,17], due to their (near) mutual irreducibility while retaining analytic understanding and perturbative control.
To check the robustness of our analysis and conclusions to non-perturbative effects, we have also repeated the studies in this section at parton level, without hadronization. Overall, we find a very similar story to the results presented in this section. Differences include a decrease of quark/gluon discrimination power available at parton level and correspondingly smaller N values for the performance saturation of τ N .

Probing Machine Learning Strategies
Our theoretical results allow us to explore and understand the behavior of machine learning strategies for jet or event classification in new ways, at least in a limited context.
We begin by considering classifiers formed via the product of observables. This parameterization allows for the classification performance to be optimized while still producing a theoretically-understandable observable. Such a strategy has been used successfully with products of N -subjettiness observables to optimize the performance of tasks such as H → bb vs. g → bb using a brute force optimization of the product observable [67] as well as more sophisticated machine learning techniques [68]. Here, we will consider this approach applied to quark versus gluon classification, with the product observable: where the goal is to learn the parameters a 1 , · · · , a N to achieve optimal performance. In general, we set a N = 1 by monotonically rescaling the observable without changing the classification performance. Note that Refs. [67,68] used observables with three different β values together in the product, whereas here we will consider observables with the same β value for simplicity. The quark vs. gluon AUC performance of the product observable O = τ α 1 τ 2 is shown in Fig. 11 over a sweep of α values with β ∈ {0.5, 1.0, 2.0}. The qualitative features of this product agree rather well with the theoretical predictions of Fig. 3, particularly for the case of β = 2. While the overall scale of the classification performance differs, the relative behavior of the different product observables is well-described by our calculation. As with the prediction, the region near α = 0 is preferred to optimize the discrimination power. Going further, the AUC performance of the product observable O = τ α 1 τ γ 2 τ 3 is shown in Fig. 12 over a sweep of α and γ parameter values with β = 2. Again, we see qualitative agreement with the predictions in Fig. 6. Both theoretically and in simulation, we see that a single N -subjettiness observable τ N with the largest N captures a great deal of the overall product classification performance. These results suggest single N -subjettiness observables with large-N as strong candidates for individual quark/gluon classification observables. More broadly, these results are a significant step towards providing an analytic understanding of machine learning with product observables, such as those explored in Refs. [67,68], from a first-principles multi-differential calculation.
A general strategy in machine learning for collider physics has been to combine the information from a collection of observables with dense neural networks (DNNs), boosted decision trees, or linear methods. While these methods are intrinsically more opaque due to their black box nature, our theoretical understanding can nonetheless shed some light on the performance achieved by the model. In particular, we will consider a relatively simple dense neural network consisting of two fully-connected layers of 100 nodes each. All neural networks are implemented in Keras [69] with the TensorFlow [70] backend on a sample of 200k jets with a 50k validation set and 50k test set. A ReLU activation [71] is used on each layer with He-uniform [72] weight initializations, using a crossentropy loss function and the Adam optimization algorithm [73]. Models were trained with a batch size of 500 for 25 epochs.
To probe the information accessed by the model in combining observables, we begin by combining two N -subjettiness observables of the same N and different β values with a DNN. The resulting ROC curves are shown in Fig. 13. Based on the analysis of Sec. 5.1, we do not anticipate two observables of the same N to parametrically improve discrimination performance. Indeed, the marginal improvements in performance are largely in the middle of the ROC curve with essentially unchanged parametric performance near the endpoints. Thus on general grounds, in this simple case, we are able to understand the limits on the information probed by the network using the different observables without a requiring a detailed understanding of their multi-differential correlations.
The decisions made by the neural network can be understood in even more detail. The output of a classifier trained with a crossentropy or mean squared error loss is optimally S/(S + B), which is indeed monotonically related to the likelihood ratio. In a feature space x, the output of the trained classifier is optimally: where in the case of a two softmaxed outputs, each component is optimally S/(S + B) and B/(S + B). Assuming that the neural network is sufficiently trained to approach this limit, we can in principle predict its output and decision boundaries using our understanding of the signal and background distributions. The output of a quark/gluon discrimination neural network that combines two 1-subjettiness observables with different β values is shown in Fig. 14 . The neural network output is indeed well described by S/(S +B), which can be verified based on its similarity to the binned histogram estimate of S/(S + B). Further, the network interpolates its output much more smoothly than the binned histogram estimate, which suffers from the curse of dimensionality. Quark vs. Gluon AUC Figure 12: The AUC quark vs. gluon discrimination performance of N -subjettiness product observable O = τ α 1 τ γ 2 τ 3 , sweeping over different parameter values. We see good qualitative agreement with the predictions shown in Fig. 6.
The theory prediction captures the general scale of the neural network output and predicts is saturation around C F /(C F + C A ) 0.308. Note that we use a fixed strong coupling constant for the prediction, where running coupling effects would provide an additional enhancement near the origin of phase space.
This analysis can also be carried out for neural networks combining N -subjettiness observables that probe different numbers of emissions. The output of a neural network that combines 1-subjettiness and 2-subjettiness is shown in Fig. 15, compared with the corresponding theory prediction for S/(S + B) and a Monte Carlo histogram estimate. The values are shown only in the physical phase space of τ 2 ≤ τ 1 . Again, the neural network is welldescribed by S/(S + B) and interpolates better than the binned histogram estimate. The theory prediction provides a good description of the neural network decision boundaries and its saturation around  Figure 13: A comparison of the classification performance added by considering two Nsubjettiness observables of the same N and different β values. Combining β = 1 and β = 2 Nsubjettiness observables with a DNN can increases the classification performance, particularly for N = 1. However, the parametric discrimination power, namely the ROC curve near the endpoints, is largely unchanged and approaches that for the β = 1 N -subjettiness.
are shown in Fig. 16 through N = 4 for β = 1. For comparison, we also show the results using 15-body phase space, namely all N -subjettiness values with β ∈ {0.5, 1.0, 2.0} and N up to 15. This has been established to achieve competitive quark vs. gluon classification performance with other machine learning methods, and so provides us with a proxy for absolute convergence of the ROC curve. We see that combining a non-trivial number of N -subjettiness observables with a neural network indeed achieves comparable performance to general machine learning techniques. Further, we see that the asymptotic classification performance at high quark efficiencies is indeed well-described by the bound based on our calculation of κ g = (C F /C A ) N for this feature space.
To probe our understanding of the parametric ROC curve performance, Fig. 17 shows the high quark efficiency region of N -subjettiness observables combined with neural networks for β ∈ {0.5, 1.0, 2.0} and N through 4. For β = 2.0 where we have the highest perturbative control, we see that indeed the parametric performance of the neural network is relatively well governed by these limits. For smaller values of β, higher order effects become more important and classification performance is increased, but the relative hierarchy remains consistent. Hence our understanding based solely on the quark-and gluon-enriched regions of phase space using our power counting rules has begun to provide a good qualitative and 15-body Phase Space Figure 16: The ROC curves for neural networks combining N -subjettiness observables with β = 1, together with the predictions for their asymptotic behavior in the high quark efficiency region based on κ g = (C F /C A ) N . The classification performance increases as N -subjettiness observables are added, saturating at the performance of using 15-body phase space. The asymptotic classification performance is qualitatively well described by the analytical estimates via reducibility factors for N -emission sensitive observables.
semi-quantitative understanding of neural network performance in high dimensions.

Conclusions
The identification of the initiating particle of a jet and the discrimination of jets of different origins are central problems in the analysis of events at the LHC. Due both to the importance of the problem and the abundance of data from the LHC, machine learning with DNNs, for example, has seen extensive use. However, in most studies, the inputs to the DNN are lowlevel information such as individual particle four-momenta and so the dimensionality of the input can be tens or hundreds of numbers. This enormous dimension is difficult to quantify and requires reliance on the DNN to tease out the important features. Further, studies thus far have used simulation to train the models, which is not reality, and this risks learning the idiosyncrasies of the simulation, and not real physics. Recent ideas for training directly on the data [74][75][76][77] are closely related to the notions of power counting and parametric discrimination power developed here [16,17]. More generally, in order to trust the output of the model and identify the relevant physics that drives the discrimination power, first-principles theoretical calculations that parallel the machine as best as possible are necessary. Here, we performed these calculations to double-logarithmic accuracy within the context of quark vs. gluon discrimination. We explicitly considered the measurement of the IRC-safe and additive N -subjettiness observables to resolve emissions, which enable straightforward resummation and a sufficient number of them measured on a jet is in one-to-one correspondence with M -body phase space. For a binary discrimination problem, a machine outputs an estimate of the likelihood ratio as a function of the training data and the classification performance can be quantified via the AUC. With our predicted resummed probability distributions, the likelihood is just the ratio of signal and background distributions and the AUC is calculated through an ordered integral over the distributions. Further, limits of the likelihood quantify the achievable sample purity through reducibility factors. This has a close relationship to power counting and enables the identification of powerful discrimination observables without the necessity of a detailed calculation. This established power counting method and our explicit calculations demonstrate that sensitivity to a large number of emissions in the jet produces a good quark/gluon discriminant and that, surprisingly, the likelihood is itself an IRC safe observable. These predictions are exhibited in Monte Carlo parton shower simulations, providing an understanding of what a machine trained on simulation is learning.
This is a first step in a theoretical effort to deconstruct machine learning for particle physics. This new field is becoming increasingly sophisticated and performance metrics are more well-established, providing concrete goals for theoretical studies. Binary classification, like the case studied here, is an old problem within the field of jet substructure. However, signal and background are not necessarily so well-defined, and so more general problems include multi-label classification in which a given sample is divided into more than two categories. In searching for new physics signals, the problem of anomaly detection or anti-tagging is relevant, in which deviations from a fiducial distribution (that predicted by the Standard Model), are of interest. These problems are just now being studied from the machine learning angle [78][79][80][81][82][83][84][85], and theoretical efforts are necessary to identify the individual observables, techniques, and signatures that are most sensitive to the goals.
Establishing uncertainties and demonstrating robustness from machine learning is challenging due to the high-dimensionality of the inputs. However, even in a simplified, but theoretically well-defined, approximation, if individual observables can be identified that perform comparable to the output of a DNN they are preferred. The definition of such an observable would not rely on the details of Monte Carlo parton shower modeling and the physics of its performance would be well-understood. Such efforts work toward the goal of opening up the black box and shining a new light on the physics of jets.
For observables τ 1 , · · · , τ n probing up to n photon emissions, the up vs. down reducibility factors for the multi-differential phase space are: Hence up and down quarks are only mutually irreducible in their photon radiation pattern in the limit of probing many emissions. For instance, a selection of jets with an energetic photon will necessarily be contaminated by down quarks by a relative amount Q 2 d /Q 2 u . In practice, one can probe the electromagnetic aspect of quark jet physics using isolated photon subjets, as studied in detail in Ref. [97]. There are several experimental complications that we do not consider here, such as backgrounds from π 0 → γγ, that would limit the sensitivity to perturbative photon emissions and hence further degrade classification performance. Even so, using our results we are able to obtain a theoretical understanding of and determine strict limits on the up vs. down quark discrimination performance based on the photon radiation pattern. Theoretical investigation of these ideas is important to extend operational jet (and event) flavor definitions [16,17] beyond solely "quark" and "gluon" categories.