Casimir Meets Poisson: Improved Quark/Gluon Discrimination with Counting Observables

Charged track multiplicity is among the most powerful observables for discriminating quark- from gluon-initiated jets. Despite its utility, it is not infrared and collinear (IRC) safe, so perturbative calculations are limited to studying the energy evolution of multiplicity moments. While IRC-safe observables, like jet mass, are perturbatively calculable, their distributions often exhibit Casimir scaling, such that their quark/gluon discrimination power is limited by the ratio of quark to gluon color factors. In this paper, we introduce new IRC-safe counting observables whose discrimination performance exceeds that of jet mass and approaches that of track multiplicity. The key observation is that track multiplicity is approximately Poisson distributed, with more suppressed tails than the Sudakov peak structure from jet mass. By using an iterated version of the soft drop jet grooming algorithm, we can define a"soft drop multiplicity"which is Poisson distributed at leading-logarithmic accuracy. In addition, we calculate the next-to-leading-logarithmic corrections to this Poisson structure. If we allow the soft drop groomer to proceed to the end of the jet branching history, we can define a collinear-unsafe (but still infrared-safe) counting observable. Exploiting the universality of the collinear limit, we define generalized fragmentation functions to study the perturbative energy evolution of collinear-unsafe multiplicity.


Introduction
The fantastic jet reconstruction performance of ATLAS and CMS [1,2]-along with increasingly sophisticated tools to predict jet properties from first principles [3][4][5][6][7][8][9][10]-has led to significant advances in the field of jet substructure [11][12][13][14]. A key goal in jet substructure is to robustly discriminate quark-initiated jets from gluon-initiated jets [15][16][17][18][19][20][21][22][23], with many applications to new physics searches at the Large Hadron Collider (LHC) (see e.g. [24][25][26]). In the eikonal limit, quarks and gluons differ only by their respective color charges, C F = 4/3 versus C A = 3, such that gluon jets emit more soft gluon radiation than quark jets. At this order, the difference between quark and gluon radiation patterns is controlled entirely by the Casimir ratio C A /C F = 9/4, which drives (and limits) the expected separation power between quark and gluon jets. One of the most powerful quark/gluon discriminants is hadron multiplicity, or its chargedparticle-only variant, track multiplicity n tr [15,16,18,[27][28][29]. This is an effective discriminant because the average track multiplicity within quark and gluon jets scales approximately as (see e.g. [30,31]) n tr g n tr q Since multiplicity is not infrared and collinear (IRC) safe, though, it is difficult to predict its discrimination performance from first principles. 1 On the other hand, IRC-safe observables like jet mass and jet width are analytically tractable [33][34][35], but they exhibit worse quark/gluon performance than multiplicity. The reason is that these discriminants are dominated by a single emission at leading-logarithmic (LL) accuracy, giving rise to Casimir scaling of the quark/gluon discrimination power, (gluon mistag rate) (quark efficiency) C A /C F , (1.2) and therefore relatively weak separation between quark and gluon jets. This Casimir scaling behavior holds for any observable with a Sudakov form factor at LL accuracy, including a wide range of IRC-safe additive observables [17]. While one can try to interpolate between the IRC-unsafe and IRC-safe regimes using generalized angularities [18], track multiplicity remains one of the best performing-yet analytically puzzling-quark/gluon discriminants. In this paper, we introduce a new class of "counting observables" that are IRC safe, yet yield comparable quark/gluon performance to track multiplicity. Unlike additive observables, which are only sensitive to a single emission at LL order, these counting observables are directly sensitive to multiple emissions at LL, allowing them to exceed the performance estimate in Eq. (1.2). Crucially, the quark/gluon performance of counting observables still depends on the color factors C A and C F , but instead of being described by Sudakov form factors, these observables are described by Poisson distributions; this allows their discrimination power to improve as more emissions are included. These counting observables not only clarify the underlying reason why track multiplicity performs so well, but they also demonstrate the new kinds of analytic structures possible from IRC-safe but non-additive observables. 2 The counting observables we study are based on an iterated variant of soft drop declustering [40]. As a grooming procedure, soft drop starts at the trunk of an angular-ordered clustering tree [41,42] and sequentially removes soft branches with small momentum fraction z ij until a hard branching is found. At a step in the clustering tree where branches i and j split, the splitting is retained in the groomed jet if the momentum fraction satisfies where θ ij is an appropriately defined relative angle between branches i and j, and R 0 is the jet radius. For appropriate choices of the soft drop parameters z cut and β, observables defined on the groomed jet are automatically infrared (but not necessarily collinear) safe. While the original soft drop procedure terminates once it finds a hard 1 → 2 splitting, the iterated variant we employ in this paper continues, following the hardest branch (the "trunk") through multiple levels until an angular cutoff scale θ cut is reached.
The simplest counting observable we can define using iterated soft drop (ISD) is just the total number of emissions from the trunk of the clustering tree that ISD records. In particular, this includes all emissions n ∈ [1, n max ] that satisfy the soft drop condition and lie outside the θ cut cone. We call this observable "soft drop multiplicity", n SD (z cut , β, θ cut ) = n 1, (1.4) which depends on the choice of ISD parameters. It is complementary to the "soft drop level" observable L SD (β) introduced in Ref. [43], which also iteratively applies the soft drop condition, but changes the z cut scale. As long as z cut > 0, soft drop multiplicity is infrared safe.
With θ cut > 0 or β < 0, n SD is collinear safe as well, so we can use analytic resummation tools to predict its discrimination power. We do this to resum large logarithms of z cut and θ cut , which are of soft and collinear origin, respectively, and which lead to a double-logarithmic observable. The analysis at LL order is straightforward, yielding a Poisson distribution whose average value is set by the phase space "area" of counted emissions. This leads to quark/gluon discrimination power which approaches that of track multiplicity, particularly in the case of β = −1. Moving from LL to next-to-leading-logarithmic (NLL) order, one finds a slight decrease in discrimination power, due in part to the jet-flavor mixing that appears at this accuracy. We implement the NLL calculation through a set of evolution equations that have a similar form to parton evolution.
With θ cut = 0 and β ≥ 0, the soft drop multiplicity n SD is no longer collinear safe, so we cannot predict its absolute discrimination power. That said, for the special case of β = 0 (which was initially introduced as the modified mass drop tagger [3,44]), we can use renormalization group (RG) techniques to predict the evolution of its discrimination power. When β = 0, soft drop multiplicity has purely collinear divergences, which can be absorbed into a generalized fragmentation function (GFF) that depends on the RG scale µ [45]. After extracting this GFF at low scales (either from LHC data 3 or parton shower simulations), one can use a perturbative DGLAP-like evolution equation to predict the discrimination power achievable at higher scales. Intriguingly, in the limit of pure Yang-Mills, one can show that at lowest order, the soft drop multiplicity asymptotes to a true Poisson distribution at large values of µ, such that it behaves like an idealized counting observable (albeit in a theory with only gluons).
The remainder of this paper is organized as follows. In Sec. 2, we define the ISD procedure, introduce soft drop multiplicity, and take a first look at its distribution using parton shower generators. In Sec. 3, we perform an LL analysis, focusing on the contrast between soft drop multiplicity's Poisson behavior and the more familiar Sudakov-peak behavior of additive observables. We extend our analytic calculations to NLL order in Sec. 4 and compare our analytic distributions to those obtained from various parton showers. We consider the collinear-unsafe case of θ cut = 0 and β = 0 in Sec. 5, deriving the corresponding RG evolution equations and presenting numerical results based on parton shower inputs. We present our conclusions in Sec. 6.
In an appendix, we demonstrate that our analytical tools can also be used to study more general ISD observables, in particular the weighted multiplicity n (z n ) κ which weights each counted emission according to its momentum fraction z n . Soft drop multiplicity is a special case (κ = 0) of this more general observable, and the one most useful for quark/gluon discrimination.

Iterated Soft Drop
Our counting observables are defined using an iterated variant of the soft drop declustering algorithm. We briefly review soft drop here for convenience and to establish conventions.
The soft drop grooming procedure can be applied to any jet found using a standard jet algorithm of characteristic radius R 0 . After reclustering the jet using the Cambridge/Aachen (C/A) algorithm [41,42], soft drop involves sequentially undoing the cluster history to remove wide-angle soft radiation and identify hard 2-prong substructure. For each C/A branching into subjets i and j, there are quantities z ij and θ ij , which are defined differently for different collider environments: where ∆R represents distance in the rapidity-azimuth plane. The soft drop grooming algorithm can be summarized as follows: 1. Traverse the C/A clustering tree, beginning at the trunk and sequentially examining each branching.
2. Upon arriving at a branching into subjets i and j, check whether the soft drop condition is satisfied: where z cut and β are fixed parameters of the algorithm. If so, the algorithm terminates; stop grooming and return the jet as is.
3. If the branching fails this condition, remove the softer of the two subjets (i or j) from the groomed jet and return to Step 2 on the next branching in the remaining clustering tree.
Our analysis is based on ISD where the soft drop algorithm is iterated. In this case, the procedure does not terminate when a hard branching is found, but is instead iteratively applied to the harder of the two subjets. This continues until an angular cutoff is reached, so in addition to z cut and β, ISD depends on an additional parameter θ cut . While ISD could be used as a grooming procedure in its own right, the primary purpose of ISD in this paper is to determine which set of (z ij , θ ij ) branchings contribute to the observables we define below. For this purpose, the ISD algorithm proceeds as follows: 1 . Set the counter n equal to 1. Traverse the C/A clustering tree, beginning at the trunk and sequentially examining each branching.
2 . Upon arriving at a branching into subjets i and j, check whether the branching angle satisfies θ ij > θ cut .
If not, the algorithm terminates.
3 . If θ ij > θ cut , then check whether the soft drop condition is satisfied: If not, return to Step 2 on the harder of subjets i and j.
4 . If the soft drop condition is satisfied, define Then increment n → n + 1 and return to Step 2 on the harder of subjets i and j. Figure 1: Illustration of the ISD procedure. A C/A tree is declustered from the trunk (thick line), defined by the hardest p T branches. If a node fails the soft drop condition, it is removed from consideration (dashed lines). If a node passes the soft drop condition after n iterations, this defines the value of (z n , θ n ). The declustering stops at an angular scale of θ cut , and subsequent nodes are not considered further (gray lines).
Because we recurse to the harder subjet at each junction, we think of each (z n , θ n ) splitting as an emission from the "hard core" of the jet and refer to the above procedure as traversing the "trunk" of the clustering tree. A schematic of this procedure is shown in Fig. 1.
To emphasize, we are not using ISD as an alternative grooming technique to soft drop. In fact, we have found no need to refer to the ISD-groomed jet explicitly in our analysis. Instead, we employ ISD simply as a method to obtain an IRC-safe set of (z n , θ n ) values to define our counting observables. Of course, the specific values of (z n , θ n ) depend on the precise choice of ISD procedure. In this paper, we focus on the soft drop multiplicity, which counts emissions from the trunk of the clustering tree, and have defined ISD accordingly. In Sec. 2.3, we consider variants of soft drop multiplicity, with corresponding variants to the ISD procedure.
To demonstrate the qualitative behavior of observables defined below in this section, we present results from parton shower simulations. We separately generate pp → Z + q and pp → Z + g events at center-of-mass energy 13 TeV using MadGraph 2.4.0 and let the Z decay to neutrinos for simplicity. We then shower the events through Vincia 2.0.01 [46,47], a plug-in to Pythia 8.215 [48], with default tuning parameters. 4 Jet are identified using the anti-k t algorithm [49] with radius R 0 = 0.6 in FastJet 3.1.3 [50]. We use a sample of events in which the hardest jet with |η| < 2.5 has p T between 450 and 550 GeV. We recluster and measure our observables on the hardest jet from each event using FastJet. Because ISD is sufficiently different from ordinary soft drop, we do not use the RecursiveTools fjcontrib [51], but rather directly traverse the C/A tree in our analysis. We plan to make our code available publicly in a future release of fjcontrib.

Soft Drop Multiplicity
The (z n , θ n ) values from ISD allow us to define a variety of interesting jet observables. Here, we focus on soft drop multiplicity n SD , which is simply the total count of the recorded (z n , θ n ) pairs. This observable, defined already in Eq. (1.4), depends implicitly on the ISD parameters z cut , β, and θ cut . Among all of the observables we tested, n SD appears to perform the best for quark/gluon discrimination. We discuss more general observables in Sec. 2.3 and App. A.
As defined above, ISD only follows the harder branch (i.e. the trunk) at each junction of the clustering tree. Therefore, n SD effectively counts emissions from the hard core of the jet, down to the angular resolution scale θ cut . When z cut = θ cut = 0, n SD is simply the depth of the trunk of the C/A tree.
When z cut > 0, the soft drop multiplicity is infrared safe, as all soft emissions at finite angles fail the soft drop condition in Eq. (2.5). When θ cut > 0, soft drop multiplicity is also collinear safe, since an exactly collinear splitting along the trunk does not satisfy Eq. (2.4). Alternatively, β < 0 also gives collinear-safe distributions, since an exactly collinear splitting along the trunk does not satisfy Eq. (2.5). The borderline case of θ cut = 0 and β = 0 is collinear unsafe, but it can be handled using RG methods, as shown in Sec. 5.
In Fig. 2, we show the soft drop multiplicity distributions for quark and gluon jets as extracted from Vincia. Results are given using the benchmark parameters This benchmark is chosen to maximize quark/gluon discrimination power while retaining perturbative calculability, as discussed in Sec. 3. The distributions are approximately Poisson and yield good quark/gluon discrimination power.

Multiplicity Variants
While the focus of this paper is on soft drop multiplicity n SD , many other observables could be defined using the (z n , θ n ) values recorded by ISD. For example, the techniques developed in this paper can be directly applied to the weighted soft drop multiplicity, Note that soft drop multiplicity is a special case (κ = 0) of this more general observable, with the same criteria for IRC safety. We study the weighted multiplicity in detail in App. A, but find its quark/gluon discrimination power to be inferior to the discrete κ = 0 case. In fact, LL reasoning leads one to expect the soft drop multiplicity n SD to have the best discrimination power of any observable defined on the (z n , θ n ) values; see the end of Sec. 3.3 for a short discussion. Nevertheless, several other promising variants of soft drop multiplicity might prove useful: • The weighted soft drop multiplicity in Eq. (2.8) only refers to the momentum fractions z n in the sum over emissions. One could also consider an angle-weighted variant n z κ n θ α n , (2.9) or indeed any function of z n and θ n . The potential advantage of including θ n information is that even for θ cut = 0, such observables would be collinear safe for α > 0.
• Instead of counting emissions only from the trunk of the C/A tree, we could extend the sum to include all branchings down to the angular resolution θ cut . This multiplicity variant would require a modification of the ISD algorithm: in step 4 , the recursion would be applied to both subjets i and j, not just the harder one. This is a step in similarity towards full hadron multiplicity, reducing to it exactly when z cut = θ cut = 0. This variant of soft drop multiplicity is more difficult to study analytically, however, due to the nonlinear structure of the recursion. Moreover, it is not clear that this variant would provide a performance advantage over n SD . While gluons emitted from the hard core of a quark (gluon) jet give rise to factors of C F (C A ), subsequent emissions from those gluons give rise to factors of C A regardless of the jet flavor; this might wash out quark/gluon discrimination power.
• The original soft drop algorithm uses a C/A tree to mimic the angular-ordered structure of the parton shower. One could also study variants based on reclustering with the generalized-k t algorithm with exponent p [49,50]. The C/A algorithm used above corresponds to p = 0, while the k t algorithm uses p = 1. For this variant, it would make sense to replace the angular cut θ cut with a cut d cut on the generalized distance measure d ij .
This last k t variant is of particular interest, given the discussion below in Sec. 3.3. Nonperturbative physics typically dominates when k t Λ QCD , so it makes sense to use a clustering algorithm where the clustering scale is "parallel" to the nonperturbative scale. This variant of n SD would then allow the nonperturbative phase space to be clearly separated from the perturbative region and avoided. This would open up as much perturbative phase space for measured emissions as possible. We note that it is possible to mimic some of the LL structure of the k t variant by using ISD with β = −1, though there would be differences going to NLL order.
We defer an analysis of these variants to future work, anticipating that many of the analytic tools from this paper can be translated to these generalized contexts. Experimentally, one might want to measure a track-based version of n SD , trading collinear safety for improved robustness to pileup, which could be studied with the help of track functions [45,52,53].

Leading-Logarithmic Analysis
At LL order, the only difference between quarks and gluons is encoded in the color factors C F and C A , so Casimir scaling is a generic feature of many quark/gluon discriminants. Here, we review the case of additive observables (and close variants), where Casimir scaling of the Sudakov form factor yields a universal discrimination power at LL that depends only on C A /C F . We then show that the soft drop multiplicity is Poisson distributed, with its mean and variance satisfying Casimir scaling.
In general, any observable that is sensitive to multiple emissions at LL is "Poisson-like" distributed, in the sense that its variance σ 2 and mean µ both scale with the number n of emissions counted, i.e. σ 2 = O(µ). In the limit of many emissions, all such observables converge to a normal distribution with decreasing relative width w rel ∼ σ/µ ∼ 1/ √ n. Then as more emissions are counted, the discrimination power is not a universal function of C A /C F , but instead improves as µ increases and the quark/gluon distributions separate.
In this section, we illustrate this behavior for soft drop multiplicity with distributions extracted from Vincia, using the setup described in Sec. 2.1. We extract ROC (receiver operating characteristic) curves of the quark efficiency versus the gluon mistag rate, and explain their qualitative behavior. In App. A, we consider weighted soft drop multiplicity, with behavior that interpolates between that of Poisson-and Sudakov-distributed observables.

Review of Additive Observables
A generic jet observable is defined on the momenta p i and quantum numbers q i of particles within a jet. An additive IRC-safe observable f is one that reduces to the form in the soft/collinear limit, so that the observable depends on a simple sum over the jet constituents, independent of q i . 5 The function f (p i ) can depend on global properties of the jet (e.g. its p T ), but not on its substructure. Collinear safety implies that f (p i ) is linear in the particle energies E i . Examples of additive observables include the jet mass [54][55][56], the radial moments [57], and the angularities [34,58,59], among many others. We now review the Casimir scaling of additive observables at LL order, as discussed in Ref. [17]. 6 For simplicity of the discussion below, we let α s be a fixed coupling so that the expressions are more compact, but it is straightforward to include a running coupling at LL order. At this order, we need only consider gluon emissions from the jet core that 5 One could consider additive but IRC-unsafe observables which do depend on qi. 6 Casimir scaling of additive observables at LL is identical to the statement of Casimir scaling of the cusp anomalous dimension in QCD, which has a long history in QCD [61,62]. Casimir scaling is known to hold through three loops [63] in the cusp anomalous dimension, but is not expected to hold exactly [64]. At NLL and beyond, Casimir scaling is broken by the appearance of the non-cusp anomalous dimension.
are both soft and collinear, described by the most singular terms in the splitting function. Parametrizing emissions by their angle θ and energy (or p T ) fraction z, real emissions are uniformly distributed in the (log 1/θ, log 1/z) plane. The density in this emission phase space is where C i is the appropriate color factor, equal to C F = 4/3 for quarks and C A = 3 for gluons. The structure of emission phase space is shown in Fig. 3a. Virtual emissions are encoded in the boundaries of the emission phase space, where log(1/θ), log(1/z) → ∞, such that the total emission probability at each α s order is zero to maintain the normalization of the probability distribution. Applying the strongly-ordered limit and the fact that f (p i ) is linear in E i , only a single dominant emission contributes to the observable at lowest order: Therefore, the probability that the observable f is less than some value f max is equal to the probability that there are no emissions in the region where f (p i ) > f max . This implies a cumulative distribution function where A(f max ) is the forbidden area of emission phase space, shown in Fig. 3a: Note that the cumulative distributions for quarks and gluons are related by where C A /C F = 9/4. That is, the Sudakov form factors for f are related by Casimir scaling. As a result, the ROC curve for quark/gluon discrimination, which simply plots Σ q (f ) versus Σ g (f ), takes the universal form of Eq. (1.2). From this logic, it is clear that the above analysis also extends to certain non-additive observables. For example, jet observables defined on groomed jets are not additive, since the grooming procedure removes emissions that would otherwise contribute to the sum in Eq. (3.1). But groomed observables of the quasi-additive form still exhibit Casimir scaling, since the measured value of f groomed forbid emissions in the region A(f groomed ) shown in Fig. 3b. More generally, Casimir scaling arises whenever the value of Sudakov-Distributed Obs. Vincia 2, p T = 500 GeV, R = 0.6 z cut = 0.05, β = 0 y = x 9/4 R g groomed mass mass n tr Figure 4: Comparison of the quark/gluon ROC curves for various Sudakov-distributed observables to the y = x 9/4 prediction from Casimir scaling. Shown are the groomed jet radius, groomed jet mass, and ordinary jet mass. As a useful benchmark, we also show the performance of track multiplicity n tr , which is known to be a very strong discriminant. the measurement actively forbids emissions from some region of phase space. This vetoed phase space region builds up a Sudakov form factor which in turn controls the discrimination power achievable at LL.
Beyond LL order, different Sudakov-distributed observables will exhibit different discrimination power due to higher-order or nonperturbative effects, but Eq. (3.6) is still a representative benchmark. In Fig. 4, we show ROC curves for jet mass m, the soft-dropped jet mass m SD , and the groomed jet radius R g , which all roughly follow the prediction from Casimir scaling. We also show track multiplicity n tr , which exhibits substantially better performance and provides a useful discrimination target.

Soft Drop Multiplicity
Soft drop multiplicity is not an additive observable, nor does the measured value of n SD actively forbid emissions in any region of phase space. As a result, n SD does not exhibit Sudakov behavior and it instead satisfies a fundamentally different scaling relation. Physically, this is because all emissions that pass the soft drop condition are weighted equally, so n SD depends on multiple emissions even at leading accuracy. These emissions occur in the region of phase space passing the soft drop and angular cuts, shown in Fig. 5.
Restricting to the IRC safe case with θ cut > 0, the measured region has finite area in the emission plane, and soft drop multiplicity simply counts the number of real emissions in this area. This expression actually holds for all β ∈ (−∞, ∞) as long as the angular cut θ cut imposes a non-trivial constraint on emissions. Since real emissions occur independently with uniform probability, they are described by a Poisson process, and the soft drop multiplicity is Poisson distributed at LL order: 7 For reference, the Poisson distribution with mean λ is Since the variance of a Poisson distribution is also equal to λ, the means and variances of n SD both satisfy Casimir scaling mirroring the behavior of track multiplicity in Eq. (1.1), but for an IRC-safe observable.
To be clear, in defining our resummation accuracy, we count large logarithms of z cut and θ cut in the mean/variance of the n SD distribution. That is, we define LL and NLL exactly as for more familiar additive observables, with LL including all terms of the form α n s log n+1 that appear in the exponent of the n SD distribution, and NLL including those terms of the form α n s log n . With this definition, Eq. (3.8) then shows that n SD is indeed a double-logarithmic observable. In this section, we study this observable's general properties with fixed coupling, i.e. in the double-logarithmic approximation, for purposes of illustration. In Sec. 4, LL and NLL results are computed using the appropriate running coupling.
The above analysis provides several concrete predictions. Our most salient result is that, since the soft drop multiplicity is Poisson distributed at LL, we expect the ratio of the variance to the mean to be close to 1, as shown in Fig. 6a. We also predict that the mean and variance satisfy the Casimir scaling relations in Eq. (3.11), as shown in Fig. 6b. Though not shown here, we also checked the prediction that for β = 0, the mean soft drop multiplicity scales as In general, we find good agreement for these predictions at large values of z cut , even out to z cut 0.4 where log z cut is not so large. For lower cut values, nonperturbative and higherorder effects cause these LL results to break down. In Sec. 3.3, we demonstrate how to choose parameters so that nonperturbative effects can be avoided, and in Sec. 4.2, we compute the NLL corrections to the perturbative predictions discussed here.

Optimal Discrimination Power
As a direct result of the properties exhibited in Sec. 3.2, the discrimination power of soft drop multiplicity improves as the means λ i = ρ i A emit increase. This is because the mean of The mean observable value for quarks is λ q , and we assume the mean for gluons is given by Casimir scaling λ g = (C A /C F )λ q . For reference, we show the y = x 9/4 curve for additive observables with Casimir scaling, as well as track multiplicity n tr extracted from Vincia. For mean quark values λ q 2, a Poisson-like observable satisfying Casimir scaling would be competitive with track multiplicity. The ROC curves are piecewise linear since the observable takes on discrete integer values.
each distribution is proportional to the Casimir C i , while the standard deviation is equal to the square root of the mean. The overlap of the distributions is characterized by the relative width Indeed, in the many-emission limit where the distributions are approximately Gaussian, have equal mean and variance, and satisfy Casimir scaling, the discrimination power is solely determined by the relative width. As the cuts z cut and θ cut are lowered, the means increase, causing the relative widths to narrow, reducing the overlap between the quark and gluon distributions, and improving the discrimination power. For reference, the discrimination power of Poisson distributions with different means is shown in Fig. 7, from which we see that track multiplicity has comparable discrimination power to a λ q 2 observable.
To maximize the quark/gluon discrimination power, one should maximize the mean of the soft drop multiplicity distributions, which corresponds to taking z cut and θ cut as small as possible, for a given exponent β. The validity of this analysis, however, is restricted to perturbation theory, so we must ensure that the values of the chosen parameters do not allow for distributions that are dominated by nonperturbative emissions. We can determine the parameters that enforce perturbative emissions by restricting the minimum relative k t appropriately.
To enforce that an emission is perturbative, we require that the relative k t of the emission is larger than a perturbative cutoff scale Λ NP , i.e.
where z and θ are the energy fraction and splitting angle of the emission, and p T is the transverse momentum of the jet. Below, we take Λ NP = 2 GeV unless otherwise noted. For an emission that just passes soft drop, and therefore contributes to the soft drop multiplicity, we have There are two regimes to consider. For β > −1 as in Fig. 8a, we can find the intersection of Eqs. (3.14) and (3.15). Setting θ → θ cut , we find a restriction on θ cut to be perturbative: To determine the optimal choice of z cut while enforcing perturbativity, we set θ cut to saturate this inequality and insert it into the double-log expression for the average soft drop multiplicity, Eq. (3.8). Maximizing this quantity, we find the optimal ISD parameters to be The factors of two arise because the energy fraction of the softer emission is (by definition) less than 1/2. Inserting these results into the expression for the average soft drop multiplicity, we find the largest perturbative value for the mean soft drop multiplicity to be For β < −1, one can see from the (log 1/θ, log 1/z) phase space in Fig. 8b that an angular cutoff is not needed to avoid the nonperturbative region, so we can set θ cut = 0. In this case, z cut saturates the bound Eq. (3.14) for θ → R 0 , yielding 20) and the average soft drop multiplicity is Combining these regions for all β ∈ (−∞, ∞), the maximum attainable mean soft drop multiplicity with perturbative parameters is In particular, the mean is maximized for β = −1, giving the optimal perturbative discrimination power in this double-log approximation. This result can be understood directly from Fig. 8, which shows that soft drop multiplicity with β = −1 can capture all of the perturbative emissions in phase space. We can directly test this double-log prediction in parton shower generators. In Fig. 9a, we show the quark/gluon ROC curve for soft drop multiplicity with the optimal perturbative soft drop parameters, sweeping through β. The best discrimination power found in Vincia is indeed observed near β = −1. For a more quantitative test, Eq. (3.22) predicts that the ratio of the optimal soft drop multiplicity for a given value of β to the optimal soft drop multiplicity at β = 0 is n SD optimal n SD β=0 optimal = min 2 |β| , 2 |2 + β| .
Analytic Prediction of Mean Vincia 2, p T = 500 GeV, R = 0.6 Λ NP = 2 GeV quark gluon LL In Fig. 9b, we compare this ratio to distributions extracted from Vincia and find good agreement away from β = −1. Note that when β = −1, the counted and nonperturbative regions share a boundary, while in all other cases the two regions only meet at a single point. This explains why nonperturbative sensitivity should be amplified when β nears −1. This extra sensitivity could of course be mitigated by using a more conservative value of Λ NP , but there is a tradeoff between reducing nonperturbative effects and increasing discrimination power.
In Fig. 10a, we show the effect that decreasing Λ NP (and thus decreasing z cut and θ cut ) has on the discrimination power, holding β = −1 fixed. Note that n SD rivals n tr for Λ NP = 1 GeV, but that there is no gain in performance when Λ NP is taken smaller. In Fig. 10b we show the shift in gluon n SD distributions from switched off hadronization and underlying event in Vincia. We take this as an indicator of nonperturbative sensitivity in the distributions. One can see that perturbative control is lost for Λ NP < 2 GeV. For p T = 500 GeV, Λ NP = 2 GeV gives the benchmark parameters in Eq. (2.7).
Our perturbative analysis here was restricted to LL order and fixed coupling, and the inclusion of higher-order effects will affect the discrimination power of soft drop multiplicity. In particular, at NLL order, quark and gluon jet flavors can mix, so we expect that higher-order effects in general decrease the discrimination power from the LL prediction. We perform NLL calculations and compare our results to parton showers in Sec. 4. Beyond these higher-order effects, we have restricted the analysis to perturbative parameters. Allowing nonperturbative emissions to contribute to the soft drop multiplicity should improve the discrimination power, however, at the expense of loss of predictivity. We discuss in Sec. 5 how to restore some of this predictive power in the nonperturbative regime with GFFs. One might wonder if the discrimination power could be further improved by weighting the emissions, e.g. by their energy, as in the weighted soft drop multiplicity of Eq. (2.8). At LL order, however, the soft drop multiplicity is provably the most powerful discriminant that can be defined on the (z n , θ n ) values. 8 To see this, note that the normalized distribution of emissions in the (log 1/θ, log 1/z) plane is identical for quark and gluon jets at LL order, even including running coupling effects. Therefore, once the value of n SD is known for a given jet, no additional discriminatory information can be gleaned from the (z n , θ n ) values. Nevertheless, weighted soft drop multiplicity provides an example of a more general observable that can be effectively studied with our analytic tools; we demonstrate this in App. A. 9

Calculations for IRC-Safe Soft Drop Multiplicity
We now demonstrate that the LL predictions of the previous section can be reproduced by a set of perturbative evolution equations. These equations describes how soft drop multiplicity evolves with decreasing θ cut , similar to traditional parton evolution [65]. This approach also admits a generalization to NLL, which we use to make precise predictions for comparison to parton showers.
When talking about the resummation of large logarithms at LL and NLL accuracy, we are specifically referring to factors of log z cut and log θ cut , not to any logarithms associated with the n SD observable (which is an integer). As we already saw in Eq. (3.8), these logarithms control the size of the emission phase space, which in turn control the expected mean value of n SD , so their resummation is essential for predicting the distribution of n SD .

Leading-Logarithmic Evolution Equations
We begin by analyzing the soft drop multiplicity to LL accuracy. This case is simple enough to keep the structure of the θ cut evolution transparent; the generalization to NLL just requires keeping track of more details. To achieve LL accuracy, we need only consider soft-collinear gluons emitted from the hard core of a jet; flavor-changing effects are not present at this order. Furthermore, the trunk of the clustering tree retains all but an O(z cut ) fraction of the original jet's energy, so for z cut 1, energy losses are negligible at this order as well. Let p i n (θ cut ) denote the probability that, given a jet of flavor i and ISD parameter θ cut , its soft drop multiplicity n SD (θ cut ) is measured to be n. Here, we leave the dependence on z cut and β implicit, since they do not participate directly in the evolution equations. Since n SD is a discrete counting observable, p i n (θ cut ) is finite and should satisfy the normalization condition ∞ n=0 p i n (θ cut ) = 1 for each flavor i. We can compute the distribution for p i n (θ cut ) by solving a set of evolution equations. Consider decreasing the resolution angle from θ cut to θ cut − δθ cut . The value of n SD will increase by one if there is an emission in the interval [θ cut − δθ cut , θ cut ] that passes soft drop; otherwise n SD will remain unchanged. That is, Here, P i→i (z) is the splitting function for the hard parton i to emit a collinear gluon of energy fraction z (and remain as flavor i), and Θ SD (z, θ) imposes the soft drop condition, At LL, α s (z θ cut p T ) runs with the 1-loop β function.
Using Eq. (4.1), we can derive the linear first-order differential equation in θ cut , dp i Because no emissions are recorded outside the jet radius R 0 , there is a boundary condition p i n (R 0 ) = δ n,0 . With this boundary condition, the solution to Eq. (4.3) is where A similar simplification occurs for each value of n, and we recognize the Poisson distribution we found in Eq. (3.9): At LL, the soft drop multiplicity n SD is thus Poisson distributed with mean λ i = I i→i (θ cut , R 0 ). With fixed coupling, the mean value agrees exactly with λ i = ρ i A emit found before (see Eqs. (3.2) and (3.8)):

Next-to-Leading-Logarithmic Corrections
The next-to-leading logarithms take the form α n s log n z cut and α n s log n θ cut in the logarithm of p i n (θ cut ). To resum these, we must consider emitted partons that are not necessarily soft and that can be either quarks or gluons. This requires us to take energy losses and flavor changes into account at this accuracy. It is convenient to compute p i n (θ cut ) by expressing it as Here, dZ p i→j(Z) n (θ cut ) is the differential probability that, given a jet of flavor i, ISD counts n emissions from its hard core that result in a flavor change from i to j, and a remaining energy fraction in the interval [Z, Z + dZ]. 10 These more differential distributions evolve with θ cut as The middle line of Eq. (4.11) is the probability that n emissions are counted at resolution θ cut , and that only virtual or soft-dropped emissions (neither of which have an impact on energy fractions, up to z cut corrections) occur in the interval [θ cut − δθ cut , θ cut ]. The second line is the probability that n − 1 emissions are counted at resolution θ cut and result in a flavor conversion i → k, and that an additional counted emission causing further conversion k → j occurs in [θ cut − δθ cut , θ cut ]. We now justify that these evolution equations do indeed resum large logarithms to NLL, with one caveat. As is necessary for NLL resummation, these evolution equations contain NLO information about the jet's substructure. To achieve NLL accuracy, we need to properly include the following double-emissions structures: collinear plus collinear (C+C), soft plus collinear (S+C), soft plus soft (S+S), and hard plus soft-collinear (H+SC). Since ISD is an angular-ordered algorithm, collinear emissions factorize in the cross section, so our evolution equations correctly include C+C and S+C double emissions. The S+S case is included as well by letting α s run with the 2-loop β function in the CMW scheme [66]. The one caveat is that we do not describe H+SC double emissions correctly at NLO, since we use splitting functions instead of full matrix elements. 11 Thus, our approximation should become more accurate as the jet radius R 0 becomes smaller, forcing hard emissions in the jet to become collinear. We also ignore the effects of logarithms of z cut that arise from nonglobal radiation [67], and so do not describe emissions in the jet from secondary radiation from outside of the jet. 10 The probability for the hard core to be left with energy fraction between Z/(1 − z) and (Z + dZ)/(1 − z) is then p i→j[Z/(1−z)] n (θcut) dZ/(1 − z). This is used in Eq. (4.11). 11 Besides this caveat, though, note that our use of 1 → 2 (as opposed to 1 → 3) splitting functions is sufficient at NLL, since nSD is a double-logarithmic observable.
Despite the extra complications at NLL order, Eq. (4.11) is still a linear first-order differential equation, just as in Sec. 4

.1. The solution is
where (θ cut ), and so on until p j n is negligible. In practice, the probability saturates for n of order 10. The n SD distributions and ROC curves at LL and NLL accuracy are displayed in Fig. 11. The uncertainties in the NLL calculation come from varying the α s scale up and down by a factor of 2. (Scale variation in the LL calculation does not give a reliable estimate of the uncertainty, since flavor-changing processes are absent at LL; we therefore omit bands around the LL predictions.) The fact that the uncertainties are abnormally small in one bin is an artifact of this one-dimensional variation procedure, which leaves the scale-varied distributions properly normalized. Also, the uncertainties in the ROC curve are substantially smaller than the uncertainties in the NLL distributions, since the way we implement the scale variation affects quarks and gluons in a correlated way. We show both β = −1 and β = −0.5 with z cut and θ cut chosen to be "optimal" according to Eqs. (3.17), (3.18), and (3.20) with Λ NP = 2 GeV. One can see that NLL corrections result in a slight decrease in discrimination power compared to LL, due in part to the flavor changes that occur at this order.

Comparison to Parton Showers
It is instructive to compare our NLL calculation of the soft drop multiplicity n SD with results obtained from parton shower generators. In addition to the Vincia setup described in Sec. 2.1, we obtained alternative event samples by showering the hard events through Pythia 8.219 [68,69], Herwig 7.0.1 [70,71], and Sherpa 2.2.0 [72], interfaced to their default hadronization and underlying event models.
First, to validate the reliability of our NLL calculation, we want to explore the impact of nonperturbative effects on the parton showers. In Sec. 3.3 we noted that hadronization effects should generically be minimal provided parameters are chosen at or above the values given in Eqs. n SD distributions. Though not shown, the other three parton shower generators also exhibit comparable nonperturbative shifts. Next, we show that all parton shower generators predict that soft drop multiplicity is a relatively good quark/gluon discriminant. In Fig. 13, we compare n SD with β = −1 and β = −0.5 to jet mass and n tr for each generator separately. For β = −1, soft drop multiplicity provides a significant improvement over generic additive observables but does not quite  Fig. 10a where nonperturbative parameter values push the performance of n SD to match n tr .) The ordering of the ROC curves is roughly the same between the four generators, though the absolute discrimination power does differ. Finally, we can directly compare our NLL predictions to the parton shower generators. In Fig. 14, we show the n SD distributions and ROC curves for both β = −1 and β = −0.5. When interpreting these curves, one has to remember that the NLL prediction does not include nonperturbative effects. The quark distributions are roughly similar between the various generators, but there is a larger spread in the gluon distributions, a feature also seen in the study of Refs. [20,23]. It is interesting to note that both Vincia and Sherpa, as well as our NLL calculation, predict rather strong discrimination power, in better agreement with Pythia than with Herwig. This highlights the importance of carrying out these analytic calculations to even higher accuracy, in order to better understand the desired behavior for these parton shower generators.

Calculations for Collinear-Unsafe Soft Drop Multiplicity
Thus far, we have focused on choices of ISD parameters where the quark/gluon discrimination power could be predicted using perturbation theory. In the section, we consider the special case of θ cut = 0 and β = 0, where the soft drop multiplicity is collinear unsafe but still soft safe, allowing us to calculate its RG evolution.

Review of Generalized Fragmentation Functions
To study observables with purely collinear final-state divergences, one can use the formalism of GFFs. Ordinary fragmentation functions are well-known objects in QCD which describe the fragmentation of a quark or gluon into a single hadron. GFFs are nonperturbative objects that describe the fragmentation of a quark or gluon into correlated sets of hadrons. The GFF technique has already been applied successfully to weighted jet charge [10,73], track functions [52,53], and generalized angularities [18], and a forthcoming paper explores the broader space of observables described by GFFs [45]. Each collinear-unsafe observable x has an associated set of GFFs, F i (x, µ), where i labels each quark flavor, anti-quark flavor, and gluon. They are normalized to have unit integral, and at leading order, they have the interpretation of the probability of parton i to yield the observable value x. In higher-order partonic calculations, the GFFs absorb collinear divergences and pick up dependence on the RG scale µ. While the GFFs themselves cannot be calculated using perturbation theory, their RG evolution is calculable. Ordinary fragmentation functions exhibit linear DGLAP evolution [65,74,75], whereas GFFs in general have non-linear evolution equations which can even involve mixing between different sets of GFFs. As shown in Ref. [45], though, for observables defined on a pairwise clustering tree, the evolution equations for the GFFs greatly simplify. These observables are called fractal jet observables, since their RG evolution is reminiscent of the fractal structure of the parton shower. For θ cut = 0 and β = 0, soft drop multiplicity (and its weighted variant) is an example of a fractal jet observable, allowing us to use the GFF formalism.
It is important to emphasize that the GFF formalism only works for purely collinear divergences. For θ cut = 0 but β > 0, there are mixed soft-collinear divergences in the simultaneous z → 0 and θ → 0 limits. These correlated diverges would require additional regulators, similar in spirit to rapidity regularization [76] (see also [77]). The use of fragmentation functions to study the β = 0 limit was previously considered in Ref. [78] to study the soft-dropped z g distribution (which is the same as z 1 for ISD).
Following Ref. [45], consider a fractal observable x defined recursively on an IRC-safe binary clustering tree as follows. Each final-state hadron is assigned a starting weight w a , which serves as the initial seed for the observable, and the observable x is built recursively according to x =x(z, where z ∈ [0, 1] is the momentum fraction of the 2 → 1 merging, and x 1 and x 2 are the values of the observable (or the starting weight w a ) on the daughter nodes. Note thatx is independent of the opening angle θ of the merging, and the only angular dependence comes through the choice of clustering tree. The leading-order RG evolution for the GFFs associated with x is 3) where P i→jk (z) is the splitting function. At this order, the evolution equation (but not the observable itself) is independent of the choice of clustering tree. Note that the evolution equation is also independent of the starting weights w a , which are effectively encoded in the low-scale initial conditions for F i . Even though the clustering tree is IRC safe, x is generally collinear unsafe, since Eq. (5.2) allows an exactly collinear splitting to change the observable.
The canonical RG scale for a generic GFF is and if we can extract the functional form of F i (x, µ) at a low scale, we can use Eq. (5.3) to predict their form at a higher scale. The RG equations have the same recursive structure as a parton shower, and we can use the numerical techniques of Ref. [45] to evolve the GFFs in µ. As we will see, our observable of interest actually has a linear evolution equation, which greatly simplifies the numerical treatment.

Linear Evolution for Soft Drop Multiplicity
For θ cut = 0 and β = 0, soft drop multiplicity is an example of a fractal observable. More generally, any ISD observable of the form is a fractal observable. Using C/A for the binary clustering tree with starting weights w a = 0, the recursion relation for this general observable iŝ The four cases check which subjet is harder and whether the softer subjet passes soft drop. If the softer subjet fails soft drop (i.e. min(z, 1 − z) < z cut ), then the observable value is unchanged. If the softer subjet passes soft drop, then the f (z) (or f (1 − z)) value of the splitting enters linearly into the observable. The recursion relation in Eq. (5.6) takes a particularly simple form, since each of the four cases involves either x 1 or x 2 , but not both. This allows us to rewrite the RG evolution from Eq. (5.3) in the form (5.7) where we have simplified using the identity P i→jk (z) = P i→kj (1 − z). This evolution equation is linear, and hence is numerically no more difficult to solve than the ordinary DGLAP equations. This form holds both for the ordinary soft drop multiplicity as well as for the weighted variants in App. A, just with a different choice of f (z).

Evolution for Pure Yang-Mills
Before showing numerical results, it is instructive to consider the case of n f = 0, where there is only a gluon GFF and the evolution can be studied analytically. Of course, this limit cannot teach us anything about quark/gluon discrimination directly, but we will see that the gluon GFF asymptotes to an exact Poisson distribution at sufficiently large µ, such that it behaves like an idealized counting observable.
For pure Yang-Mills, we can drop flavor labels, and write the gluon GFF as F ≡ F g and the relevant splitting function as P (z) ≡ P g→gg (z). Specializing to soft drop multiplicity (i.e. f (z) = 1), the evolution equation in Eq. (5.7) becomes where we have defined The interpretation of Eq. (5.9) is that gluon emissions that pass soft drop are added at a rate of P ave α s (µ)/2π in log µ evolution. Specifically, in evolving from µ i to µ f , the expected number of additional emissions is where the convolution is in x. 12 As µ f increases, more emissions are added, so the initial GFF distributions at µ i becomes less and less important. Substituting in the one-loop running of the strong coupling constant in pure Yang-Mills, the number of expected emissions is (5.14) Since this quantity continues to grow at high µ f , the IR boundary condition F(x, µ i ) is irrelevant in the µ f → ∞ limit, yielding the asymptotic form Thus, we find a Poisson distribution whose mean scales as log log µ, such that the soft drop multiplicity acts like an idealized counting observable.

Comparison to Parton Showers
We now compare the results of the GFF approach to parton shower predictions. First, in Fig. 15, we show the predicted discrimination power for the collinear-unsafe n SD from the same four parton showers studied in Sec. 4.3. We see that for low z cut values, the discrimination power of the collinear-unsafe soft drop multiplicity approaches that of our benchmark IRCsafe soft drop multiplicity, previously shown in Fig. 13. (It does not, however, reach the power of the nonperturbative soft drop multiplicities shown in Fig. 10a.) Making z cut any smaller does not significantly improve discrimination power, so we use z cut = 0.02 as our baseline parameter choice.
To make a prediction using the GFF approach, we need to extract the nonperturbative distributions at a low scale and then evolve them to a higher scale. In a full analysis, the low scale distributions would be extracted from data, but here we can use the parton shower  Figure 15: Same as Fig. 13, but for the collinear-unsafe soft drop multiplicity with θ cut = 0 and β = 0.
generators. For this, we switch to e + e − collisions, generating pure quark and gluon samples through the processes e + e − → γ/Z * → qq and e + e − → H * → gg in Vincia 2.0.01. Setting R 0 = 0.6 as our baseline, we generate jets with energies in a 10% window of E jet = 400 GeV, corresponding to µ = E jet R 0 = 240 GeV. We then extract n SD from the generated events, which at leading order, is a direct measure of the corresponding GFFs. 13 Using Eq. two-loop running of α s . 14 This evolution includes all 10 active quark and antiquark flavors, as n f = 5 in this energy range. 15 There are various sources of theoretical uncertainties in 14 Since we only consider the leading-order evolution of the GFFs, strictly speaking, only leading-order evolution of αs is needed at this order. Switching to one-loop running has a negligible effect on the results of this section. 15 For simplicity, we ignore effects due to the g → tt splitting, which would require a matching calculation the evolved result, and we highlight two of them in this study. The first contribution is due to the fact that the energy scale Eq. (5.4) only depends on the product E jet R 0 , though the initial distributions could be extracted with any R 0 . To estimate this uncertainty, which serves as a consistency check of the choice of µ scale, we also extract GFFs with R 0 = 0.3 and R 0 = 0.9, keeping µ fixed. The second contribution is from uncertainty in the absolute value of the energy scale itself. To address this, we perform evolution with both half and double the energy scale of Eq. (5.4). We plot the envelope of these 9 results in a shaded uncertainty band. Of course, this is only a subset of the possible GFF uncertainties, but a full study is beyond the scope of this work. The results for z cut = 0.02 and z cut = 0.1 are shown in Fig. 16, comparing the RG-evolved results to Vincia distributions extracted at the high scale. To show a single curve for quark jets, we plot the quark-singlet distribution as defined in Ref. [45] (where it is instead denoted by S). We find reasonable agreement between the RG evolution and Vincia for both z cut values, with a larger range of evolution for the case of z cut = 0.02. The uncertainties in the RG evolution do not fully cover the high-scale Vincia distribution, though it is worth emphasizing that we are only using the LO evolution equations.
to the top quark electroweak decay.
In Fig. 17, we show the RG evolution of the quark/gluon ROC curves. Despite the fact that the n SD distributions themselves exhibit significant RG evolution, the corresponding ROC curves do not change significantly with the energy scale µ. This is a key prediction of the GFF approach, and one that we can better understand by studying the moments of the GFF distributions.

Moment Space Evolution
To understand the slow evolution of the quark/gluon discrimination power, consider the evolution in moment space. Following Ref. [45], the n th moment of a GFF is defined as In moment space, we denote the gluon GFF by G(n, µ), and the quark-singlet GFF (as defined in Eq. (5.16)) by Q(n, µ). To derive the moment space evolution equations, we integrate both sides of Eq. (5.8) against x n , shift the final integral by x → x + 1, and then simplify the n th moments with the splitting function identities After these manipulations, the moment evolution equation for the n th gluon or quark-singlet GFF can be written solely in terms of the difference G(n) − Q(n), along with lower moments G(k), Q(k) for k < n. For n = 1, the evolution equation for the means is where we are suppressing the µ arguments and using the abbreviated notation The appearance of the difference of the moments on the right-hand side has a dramatic effect on the high-energy limit of the evolution. Specifically, the difference in the means evolves as where c 1 and c 2 are positive constants defined by integrals of the splitting functions. Thus, at high energies, the difference in the means asymptotes to a constant, This asymptotic behavior is strikingly different from the LL analysis of IRC-safe multiplicity in Sec. 3. In the IRC-safe case, the LL prediction is that the gluon and quark means should have a constant ratio determined by C A /C F . Here, in the collinear-unsafe case, the gluon and quark means asymptote to having a constant difference. Physically, this occurs because the RG evolution takes flavor mixing effects into account, so that at sufficiently high energies, the n SD distributions for quark and gluon jets become essentially the same. While we have only presented the calculation for the quark-singlet mean, it is straightforward to show that the means for each individual quark flavor behave in the same way, with differences between different quark flavors evolving to zero.
Moving to higher moments, a useful simplification occurs for the variances, In this case, the evolution of the variances only depends on the difference of the variances and the difference of the means, (5.24) At sufficiently high energies, G(1) − Q(1) approaches a constant, so the evolution equation for the variances is of the same form as the evolution equation for the means. We find that, like the means, the difference of variances asymptotes to a constant, Substituting our asymptotic results back into Eq. (5.19) and Eq. (5.24), we see that both the mean and variance simply grow linearly in α s (µ)d(log µ) at high energies, so that they become proportional in the UV limit. Therefore, even with flavor-mixing effects, the soft drop multiplicity maintains a Poisson-like distribution, with σ 2 = O(µ). We can roughly estimate the discrimination power of the soft drop multiplicity using a relative width, similar to that of Eq. (3.13). Since Casimir scaling no longer holds, the distance between the quark-singlet and gluon distributions is no longer characterized by the means, but rather the difference in means. Moreover, in the UV limit, the standard deviations of the quark singlet and gluon distributions approach each other. Thus, the quantity characterizes the extent to which the distributions overlap, and hence measures the discrimination power of the soft drop multiplicity. We see that, as a result of flavor-mixing effects, the relative width is now expected to increase somewhat as more emissions are counted, roughly as the square root of the mean. Quark/Gluon Differences R = 0.6, z cut = 0.02 Figure 18: (a) RG evolution of means and variances of the quark-singlet and gluon GFFs for the soft drop multiplicity with z cut = 0.02. (b) RG evolution of the mean/variance differences, which asymptotically approach constants. Also shown is the relative width w rel defined in Eq. (5.26), which increases slowly. For comparison, quantities extracted from Vincia at E jet = 4 TeV are shown as dots.
To verify these results, we numerically evolve the GFFs according to Eq. (5.7), starting from an initial condition extracted from Vincia 2.0.01 at E jet = 400 GeV and R = 0.6. As in Fig. 16, we show a theoretical uncertainty band constructed from the envelope of 9 results. In Fig. 18a, we show the evolution of the mean and variance of the soft drop multiplicity for quark singlets and gluons. As expected from the above analysis, the mean and variance curves become parallel at sufficiently large values of µ. This is confirmed in Fig. 18b, which shows that the differences do indeed asymptote to constant values.
Crucially, the relative width in Fig. 18b remains approximately constant over a large energy range, as the increase in the standard deviation is canceled by the increase in the mean difference as it approaches its asymptotic value. This explains the slow evolution of discrimination power seen in Fig. 17. In this way, even though these collinear-unsafe distributions cannot be predicted directly from first principles, the GFF approach gives us a valuable analytic handle on their RG evolution.

Conclusions
Quark/gluon discrimination has a long history, with many proposed discriminants [15,18,21,22,25,73,[79][80][81][82][83][84][85] though relatively few analytic calculations [17][18][19]. Because C A /C F is an order 1 number, distinguishing quark-from gluon-initiated jets is an intrinsically hard problem. Moreover, to gain a quantitative understanding of quark/gluon separation power, one has to account for physics effects beyond the LL approximation, including the impact of nonperturbative physics. These physics effects are modeled to differing degrees in parton shower generators, but ultimately one wants quark/gluon studies to be based on systematicallyimprovable analytic calculations.
In this paper, we introduced an IRC-safe counting observable which approaches the quark/gluon discrimination performance of IRC-unsafe track multiplicity. Through a LL analysis, we demystified the power of multiplicity, showing that Poisson distributions typically yield better quark/gluon separation than Sudakov distributions, even though they are both controlled by the same C A and C F Casimir factors. Specifically, we introduced soft drop multiplicity, which depends on multiple soft gluon emissions even at LL accuracy, allowing it to outperform observables like jet mass whose value is dominated by a single gluon emission. Remarkably, there is a choice of ISD parameters where soft drop multiplicity is controlled by perturbative physics, such that its behavior can be reliably studied from first principles.
To gain a more quantitative understanding of n SD , we introduced NLL evolution equations, which allowed us to make interesting comparisons to parton shower generators. We also studied a collinear-unsafe (but infrared-safe) version of n SD , whose RG evolution could be studied using the formalism of GFFs. In both cases, analytic understanding was aided by the recursive structure of the observable. This motivates further studies into jet measurements performed on (groomed) clustering trees, which can depart significantly from the more commonly studied additive observables.
Ultimately, any single observable will never match the performance of multivariate jet tagging methods. This has been emphasized recently in the context of deep neural networks which exploit subtle correlations to maximize separation power [21,[86][87][88][89][90][91][92][93][94][95]. Still, we are encouraged by observables like soft drop multiplicity which offer a balance between discrimination power and analytic tractability. Going beyond LL order where n SD can saturate the discrimination power (see Sec. 3.3), it would be interesting to study correlations between n SD and other IRC-safe observables like jet mass to see if there is additional information in their combination. Because the physics basis for n SD is so transparent, we suspect it will be a useful benchmark for both parton shower tuning and experimental jet analyses. Because the analytic structure of n SD is so unique, we hope it inspires new precision calculations in QCD.  Figure 19: Quark/gluon discrimination power of weighted soft drop multiplicity as a function of κ, at the benchmark parameters from Eq. (2.7). We also show the limit κ → ∞, which is equivalent to max(z n ).

A Weighted Soft Drop Multiplicity
At the end of Sec. 3, we used LL reasoning to argue that soft drop multiplicity n SD extracts all of the quark/gluon discriminatory information from the (z n , θ n ) variables recorded by ISD. In this appendix, we study a variant of n SD , the weighted soft drop multiplicity, defined in Eq. (2.8) and repeated for convenience: While quark/gluon performance is not improved by weighting, the purpose of this appendix is to demonstrate that the techniques of this paper are applicable to a variety of observables.

A.1 Discrimination Power
For small values of κ, the weighted soft drop multiplicity is still sensitive to all emissions in the region A emit . On the other hand, as κ → ∞, only the largest z n value contributes significantly to the observable. As a result, the weighted multiplicity interpolates between counting and additive behavior, in the limits κ → 0 and κ → ∞, respectively. The κ dependence of the discrimination power, extracted from Vincia, is shown in Fig. 19. One can see that the quark/gluon performance decreases monotonically as κ increases. The LL distribution of the weighted soft drop multiplicity is analytically complicated. Indeed, any analytic expression for it must contain a sum of distributions, one for each value of the number n of counted emissions. For example, when β ≤ 0, each emission contributes at least z κ cut , so at most n emissions can contribute to n (κ) SD if its value is below n z κ cut . A full analysis along these lines is carried out in App. A.2 below.
To qualitatively understand the trend in Fig. 19, consider the limit in which ISD records many emissions. Strictly speaking, this analysis is not quantitatively applicable in the perturbative regime, where n 10 emissions are counted. Nor is this reasoning applicable in the collinear-unsafe regime studied in App. A.3, where solely perturbative reasoning is insufficient. Nonetheless, the many-emission limit serves to build intuition.
In the double-logarithmic approximation, where emissions are soft and collinear and α s is a fixed coupling, the weighted multiplicity distribution can be found from summing independent identically distributed numbers. By the central limit theorem, this converges to a normal distribution in the limit of many recorded emissions. In this limit, it suffices to compute the mean and variance of n (κ) SD to estimate its discrimination power. These are determined at lowest order from the average values of z κ and z 2κ in the allowed emission region as n (κ) With a fixed coupling, the mean value of z κ for β > 0 is For β < 0, the mean value is A emit z κ β<0 = Θ θ cut − (2z cut ) Because of the ρ i prefactor in Eq. (A.2), we see that the mean and variance once again satisfy Casimir scaling as in Eq. (3.11). Moreover, both the variance and mean scale with the counted area A emit , establishing that the weighted soft drop multiplicity is Poisson-like distributed as defined in Sec. 3. The discrimination power is determined by the relative width We can get a sense for the behavior of w rel by considering two extreme limits. For κ → 0 and any choice of β, the mean value z κ (and hence w rel ) approaches a constant, independent of κ. For κ → ∞, the mean value scales with κ like A emit z κ κ→∞ ∼ 1 2 κ κ , (A.7) with z cut < 1/2, such that the relative width scales as w κ→∞ rel ∼ √ κ . (A.8) Since the relative width increases with increasing κ, this reasoning predicts that the discrimination power decreases as κ increases. This implies the best discrimination power is attained for κ = 0 (i.e. ordinary soft drop multiplicity) and decreases for higher κ. Physically, the discrimination power of n (κ) SD comes from sensitivity to multiple emissions, and for higher κ, sensitivity to softer emissions is decreased. In the extreme limit of κ → ∞, the weighted soft drop multiplicity reduces to the energy fraction of the hardest emission, n κ→∞ SD = max(z n ). This qualitatively explains the trend seen in Fig. 19, i.e. that the discrimination power monotonically decreasing as κ increases. In the limit κ → ∞, the discrimination power reaches the universal result predicted by Casimir scaling (slightly off due to small nonperturbative corrections), as the observable max(z n ) is determined by a Sudakov form factor.

A.2 Analytic Calculation
Using evolution equations similar to those employed in Sec. 4, we can compute the distribution of IRC-safe weighted soft drop multiplicities. We will demonstrate this here at LL for simplicity; by taking into account flavor changes and energy losses, one could obtain NLL evolution equations as in Sec. 4.2. Since n (κ) SD is a continuous observable, however, significantly more computation time would be required to compute its NLL distribution, in comparison to the discrete unweighted case.
Let p i (n SD , θ cut ) dn SD denote the differential probability that, given a flavor i jet, its weighted soft drop multiplicity is measured to be n SD . Here, we leave the z cut , β, and κ dependence implicit. Though the weighted soft drop multiplicity does not directly count emissions, it is still useful to keep track of the number of contributing emissions, using p i (n SD , θ cut ) = ∞ n=0 p i n (n SD , θ cut ) , (A. 9) where n labels the number of counted emissions as before. If we change the resolution angle from θ cut to θ cut − δθ cut , then p i n (n SD , θ cut − δθ cut ) = p i n (n SD , θ cut ) 1 − δθ cut θ cut 1/2 0 dz α s (z θ p T ) π P i→i (z) Θ SD (z, θ) + δθ cut θ cut 1/2 0 dz α s (z θ p T ) π P i→i (z) Θ SD (z, θ) p i n−1 (n SD − z κ , θ cut ) . (A.10) This leads to a linear differential equation. Instead of the Poisson distribution found in Sec. 4.1, the solution in this case is differential in n SD = i z κ i : p i n (n SD , θ cut ) (A.11) In the perturbative regime, the behavior of n (κ) SD is most clearly seen on a logarithmic scale. Two example LL distributions are displayed in Fig. 20 and compared to results from Vincia. In these examples, soft drop parameters were chosen to demonstrate that the sharp features of the n (κ) SD distributions are indeed captured by the LL evolution equations. These sharp features result from the edges of the p i n (n SD , θ cut ) distributions for different values of n. For example, with β ≤ 0, the p i n (n SD , θ cut ) distribution only has support on the interval [n z κ cut , n 2 κ ].

A.3 Collinear-Unsafe Evolution
In the case of a collinear-unsafe weighted soft drop multiplicity with β = 0 and θ cut = 0, we can apply the methods of Sec. 5. Specifically, after extracting the GFF at some RG scale µ, we can use Eq. (5.7) with the particular choice f (z) = z κ to predict the upwards evolution. In Fig. 21, we compare the result of the RG evolution for z cut = 0.01 and κ = 1 to Vincia, finding overall good agreement. By eye, one can see that these κ = 1 distributions do not yield as good separation power as the κ = 0 distributions shown in Fig. 16, though the degree of RG evolution is similar for both the weighted and unweighted cases.