Towards an understanding of jet substructure

We present first analytic, resummed calculations of the rates at which widespread jet substructure tools tag QCD jets. As well as considering trimming, pruning and the mass-drop tagger, we introduce modified tools with improved analytical and phenomenological behaviours. Most taggers have double logarithmic resummed structures. The modified mass-drop tagger is special in that it involves only single logarithms, and is free from a complex class of terms known as non-global logarithms. The modification of pruning brings an improved ability to discriminate between the different colour structures that characterise signal and background. As we outline in an extensive phenomenological discussion, these results provide valuable insight into the performance of existing tools and help lay robust foundations for future substructure studies.


Introduction
The Large Hadron Collider (LHC) at CERN is increasingly exploring phenomena at energies far above the electroweak scale. One of the features of this exploration is that analysis techniques developed for earlier colliders, in which electroweak-scale particles could be considered "heavy", i.e. slow-moving, have to be fundamentally reconsidered at the LHC. In particular, in the context of jet-related studies, the large boost of electroweak bosons and top quarks causes their hadronic decays to become collimated inside a single jet. Consequently a vibrant research field has emerged in recent years, investigating how best to identify the characteristic substructure that appears inside the single "fat" jets from electroweak scale objects, as reviewed in Refs. [1][2][3]. In parallel, the "tagging" and "grooming" methods that have been developed have started to be tested and applied in numerous experimental analyses (e.g. Refs [4][5][6][7] for studies on QCD jets and Refs [8][9][10][11][12][13][14] for searches). The taggers' and groomers' action is twofold: they aim to suppress or reshape backgrounds, while retaining signal jets and enhancing their characteristic jet-mass peak at the W/Z/Higgs/top/etc. mass. Nearly all the theoretical discussion of these aspects has taken place in the context of Monte Carlo simulation studies (see for instance Ref. [2] and references therein), with tools such as Herwig [15,16], Pythia [17,18] and Sherpa [19]. While Monte Carlo simulation is a powerful tool, its intrinsic numerical nature can make it difficult to extract the key characteristics of individual substructure methods and understand the relations between them. As an example of the kind of statements that exist about them in the literature, we quote from the Boost 2010 proceedings: The [Monte Carlo] findings discussed above indicate that while [pruning, trimming and filtering] have qualitatively similar effects, there are important differences. For our choice of parameters, pruning acts most aggressively on the signal and background followed by trimming and filtering.
While true, this brings no insight about whether the differences are due to intrinsic properties of the substructure methods analysed or instead due to the particular parameters that were chosen; nor does it allow one to understand whether any differences are generic, or restricted to some specific kinematic range, e.g. in jet transverse momentum. Furthermore there can be significant differences between Monte Carlo simulation tools and among tunes (see e.g. [2,4,7,20]), which may be hard to diagnose experimentally, because of the many kinds of physics effects that contribute to the jet structure (final-state showering, initial-state showering, underlying event, hadronisation, etc.). Overall, this points to a need to carry out analytical calculations to understand the interplay between tagging/grooming techniques and the quantum chromodynamical (QCD) showering that occurs in both signal and background jets.
So far there have been three main investigations into the analytical features that emerge from substructure techniques. Refs. [21,22] investigated the mass resolution that can be obtained on signal jets and how to optimize the parameters of a method known as filtering [23]. Ref. [24] discussed constraints that might arise if one is to apply Soft Collinear Effective Theory (SCET) to jet substructure calculations. Ref. [25] observed that for narrow jets the distribution of the N -subjettiness shape variable [26] for 2-body signal decays can be resummed to high accuracy insofar as it is related to the thrust distribution in e + e − [27][28][29][30], though for phenomenological purposes this still needs to be supplemented with a calculation of the interplay with practical cuts on the jet mass. Other calculations that relate to the field of jet substructure include those of planar flow [31], energy-energy correlations [32] and jet multiplicities in the small-jet-radius limit [33]. Additionally Ref. [34] has examined the extent to which simple approximations about the kinematics involved in tagging and grooming can bring insight into different methods.
Here we embark on a comparative, analytical study of multiple commonly-used taggers and groomers. Ideally we would include all existing methods for both background (QCD jet) and signal-induced jets, however given the many techniques that have been proposed, this would be a gargantuan task. In practice we find that a background-only study, for just a handful of substructure techniques, already brings significant insight into the way the taggers function.
The three commonly used methods that we concentrate on are: the mass-drop tagger (MDT) [23], pruning [35,36] and trimming [37]. They all involve the identification of subjets within an original jet, and share the characteristic that they attempt to remove subjets carrying less than some (small) fraction of the original jet's momentum.
To provide a starting point for our discussion, consider Fig. 1, which shows Monte Carlo simulation for the mass distribution of tagged/groomed jets with the three substructure methods considered here (and also for the plain jet mass), plotted as a function of a variable ρ, where m is the jet's mass, p t its transverse momentum and R the radius for the jet definition; the upper axis gives the correspondence in terms of jet mass for jets with p t = 3 TeV. The left-hand plot is for quark-induced jets, the right-hand plot for gluon-induced jets. A first  Figure 1. The distribution of ρ = m 2 /(p 2 t R 2 ) for tagged jets, with three taggers/groomers: trimming, pruning and the mass-drop tagger (MDT). The results have been obtained from Monte Carlo simulation with Pythia 6.425 [17] in the DW tune [38] (virtuality-ordered shower), with a minimum p t cut in the generation of 3 TeV, for 14 TeV pp collisions, at parton level, including initial and final-state showering, but without the underlying event (multiple interactions). The left-hand plot shows qq → qq scattering, the right-hand plot gg → gg scattering. In all cases, the taggers have been applied to the two leading Cambridge/Aachen [39,40] jets (R = 1.0). The parameters chosen for mass-drop (y cut = 0.09, µ = 0.67), pruning (z cut = 0.1, R fact = 0.5) and trimming (z cut = 0.05, R sub = 0.3) all correspond to widely-used choices. observation is that all three methods are identical to the plain jet mass for ρ 0.1. At that point, pruning and MDT have a kink, and in the quark-jet case exhibit a flat distribution below the kink. Trimming has a kink at a lower mass value, and also then becomes flat. For gluon jets, the kinks appear in the same location, but below the kink there is no flat region. Pruning and trimming then each have an additional transition point, at somewhat smaller ρ values, below which they develop peaks that are reminiscent (but at lower ρ) of that of the plain jet mass. Knowing about such features can be crucial, for example in datadriven background estimates, where there is often an implicit assumption of smoothness of background shapes. In this context one observes that for the upper-range of p t 's that the LHC will eventually cover, p t 3 TeV, the lower transition points of pruning and trimming occur precisely in the region of electroweak-scale masses. 1 To our knowledge the similarities and differences observed in Fig. 1 have not been systematically commented on before, let alone understood. Questions that one can ask include: why do the taggers/groomers have these characteristic shapes for the mass distributions?
Is there any significance to the fact that pruning and MDT appear very similar over some extended range of masses? How do the positions of the kinks and transition-points depend on the substructure methods' parameters? Good taggers and groomers should probably not generate such rich structures for the background shapes and, as we shall see, a deeper understanding can point to desirable modifications of these methods. Finally, what classes of perturbative terms are associated with the substructure techniques, specifically what kinds of logarithms of jet mass arise at each order in the strong coupling α s and what are the implications for the likely reliability of fixed-order, resummed and Monte Carlo predictions? These are the types of question that we shall address here. A companion paper [42] discusses the first two orders of log-enhanced terms in substantially more depth and includes comparisons to fixed-order results for jets in e + e − collisions.

Definitions and approximations
Let us start with a question of nomenclature: tagging v. grooming, for which there is no generalised agreement. One definition of grooming that is in widespread use is that, given an input jet, a groomer is a procedure that always returns an output jet, although possibly with a different mass. A tagger could then instead be construed as a procedure that might sometimes not return an output jet (so pruning and trimming are groomers, while the mass-drop method is a tagger).
An alternative definition of grooming comes from the 2010 Boost report [1], and is more restricted: grooming is "elimination of uncorrelated UE/PU radiation from a target jet". With this definition, consider a signal jet, say from W or top decay: in the absence of showering, hadronisation, underlying-event or pileup, the groomed version of the jet should be identical to the original, ungroomed jet, because there is no radiation to groom away. A tagger would instead be a procedure that, through a combination of cuts (e.g. on an invariant mass, but also internal jet variables), rejects background jets more often than it rejects signal jets. In this definition even a simple cut on plain jet mass is to be considered a tagging step and all the procedures that we consider here involve both tagging and grooming elements when they are used in conjunction with a mass cut. 2 For simplicity we will just refer to them as taggers.
The techniques that we will be investigating have, in general, quite complicated dynamics. To help make their analysis tractable, we shall focus on their behaviour for small values of the ρ = m 2 /p 2 t R 2 ratio, considering the differential distribution ρ σ dσ dρ , or its integral up to some value ρ, Σ(ρ) = ρ dρ ′ dσ dρ ′ , which we shall call the integrated distribution. We will work with jet algorithms in the limit of small jet radius R. This enables us to consider only the radiation from the parton that initiated the jet, and to ignore considerations such as large-angle radiation from other final-state partons and from the initial-state partons. In practice the small-R approximation is known to be reasonable even up to quite large values of angle ∼ 1 [43,44].
When considering multiple emissions, we will assume that they are ordered either in angle or in energy. This kind of approximation, together with an appropriate treatment of the running coupling, is generally sufficient to obtain what is known as single-logarithmic accuracy, i.e. terms α n s ln n m/p t in the integrated distribution. Note that we will not always aim for single-logarithmic accuracy, and the specific accuracy we reach will be different for each tagger, in part because the complications that one encounters differ substantially for each one. In terms of choosing what accuracy to aim for, our guiding principle will be to capture the key features of each tagger. In many cases we will supplement our full results with versions in a fixed-coupling approximation, often easier to assimilate, while nevertheless encoding the essence of the results. When examining fixed-order expansions of the results, we will label our results with "LO" (leading-order) and "NLO" (next-to leading order). It is understood that these expressions are not the full fixed-order results but, rather, their logarithmic-enhanced parts.
All of the taggers that we consider involve a parameter called y cut or z cut that effectively cuts on the energy fraction of soft radiation. Since the taggers tend to be used with values of these parameters in the range 0.05 − 0.15, it will be legitimate to assume that terms suppressed by powers of y cut or z cut can be neglected. However, given that y cut or z cut are not usually taken parametrically small, we shall not systematically resum logarithms of y cut or z cut , even if such a resummation could conceivably be carried out.
Our results will apply to jets produced both at hadron colliders and at e + e − colliders. We will imagine the hadron-collider jets to be produced at rapidity y = 1 2 ln E+pz E−pz = 0, as a result of which E = p t and the boost-invariant angular separations 2 are equal to angular separations θ ij for small θ ij . Thus results will be identical whether we use hadron-collider (p t and ∆ based) or e + e − (E and θ based) formulations of the jet algorithms. For simplicity of notation we will use energies and angles as our main variables.
In the introduction we already defined the variable ρ = m 2 /(p 2 t R 2 ) (or equivalently ρ = m 2 /(E 2 R 2 )). In the small-angle approximation, ρ is invariant under boosts along the jet direction, since they scale the jet p t up by some factor (say γ) and scale its opening angle by the inverse factor (1/γ) while leaving the mass unchanged. Because of this invariance, the analytical results are often simplest when expressed in terms of ρ, rather than separately in terms of m, p t and R.
All jets will be assumed to have been found with the Cambridge/Aachen (C/A) algorithm [39,40], which is the algorithm of choice for both the mass-drop tagger and pruning. In its hadron-collider version, the algorithm successively recombines the pair of particles with the smallest ∆ ij , until no pairs are left with ∆ ij < R. All objects that remain at this stage are called jets. The e + e − version of the algorithm simply replaces ∆ ij with θ ij .
Finally, we will explicitly derive results only for quark-initiated jets. This is for reasons of brevity: gluon-initiated jets are no more complicated to consider, usually involving just trivial modifications of the results that we give. Results for gluon jets are collected in appendix A.
The companion paper [42], limited to the first two perturbative orders in e + e − collisions, lifts the small-R and small-y cut (or z cut ) approximations.  [45] represent emission kinematics in terms of two variables: vertically, the logarithm of an emission's transverse momentum k t with respect to the jet axis, and horizontally, the logarithm of the inverse of the emission's angle θ with respect to the jet axis, i.e. its rapidity with respect to the jet axis. Here the diagram shows a line of constant jet mass, together with a shaded region corresponding to the part of the kinematic plane where emissions are vetoed, leading to a Sudakov form factor.

Recap of plain jet mass
For concreteness, and subsequent reference, it is perhaps worthwhile writing the integrated jet-mass distribution (for quark-initiated jets) with the approximations mentioned above. Let us define is the quark-gluon splitting function, stripped of its colour factor, and the fixed-coupling approximation in the second line helps visualise the double-logarithmic structure of D(ρ).
To NLL accuracy, 3 i.e. control of terms α n s L n+1 and α n s L n in ln Σ(ρ), where L ≡ ln 1 ρ , the integrated jet mass distribution is given by The first factor, which is double logarithmic, accounts for the Sudakov suppression of emissions that would induce a (squared, normalised) jet mass greater than ρ. In terms of the "Lund" representation of the kinematic plane [45], Fig. 2, it accounts for the probability of there being no emissions in the shaded region, with the 1 2 ln 2 1/ρ term in Eq. (3.1b) for D(ρ) coming from the bulk of the area (soft divergence of p gq ), while the − 3 4 ln 1/ρ term comes from the hard collinear region (finite z). The second factor in Eq. (3.2), defined in terms of D ′ (ρ) ≡ ∂ L D, encodes the single-logarithmic corrections associated with the fact that the effects of multiple emissions add together to give the jet's overall mass. These emissions tend to be close to the constant-jet-mass boundary in Fig. 2. The third factor, also single logarithmic, accounts for modifications of the radiation pattern in the jet (nonglobal logarithms [47]) and boundaries of the jet (clustering logarithms [48][49][50]) induced by soft radiation near the jet's edge, i.e. near the left-hand, vertical edge of the shaded region. Had we been working with the anti-k t jet algorithm [51], only the non-global logarithms would have been present, which could then be parametrised (in the large-N C limit) as a function S(t) of a variable t(ρ) = 1 [47]. Note that nonglobal logarithms are moderately problematic, because their resummation [21,47,[52][53][54] has until very recently always been restricted to the large-N C limit. 4 In effect, non-global logarithms are the main reason why there does not exist a full resummation of the standard jet mass beyond NLL accuracy (for work towards higher accuracy, see Refs. [58,59]) and why even the NLL calculations have to neglect some of the terms suppressed by powers of 1/N 2 C , as done in Ref. [44].
To visualize the expected behaviour of the jet mass distribution, we can resort to a fixed-coupling approximation, ignoring all but the first factor in Eq. (3.2), leading to the following differential jet mass distribution This shows a characteristic initial growth linear in ln 1 ρ as ρ decreases, cut off by a Sudakov suppression (the exponent) as ρ decreases further. Both of those features are visible in Fig. 1. It is also simple to use Eq. (3.3) to analytically estimate the position of the peak in ρdσ/dρ. It is given by L peak = 1/ √ᾱ s + O (1), whereᾱ s = α s C F /π for quark-jets and α s = α s C A /π for gluon-jets . Substituting α s = 0.12 gives a reasonable degree of agreement with the Monte Carlo peak positions.

Trimming
Trimming [37], in the variant that is most widely used today, takes all the particles in a jet of radius R and reclusters them into subjets with a jet definition with radius R sub < R. All resulting subjets that satisfy the condition p (subjet) t > z cut p (jet) t are kept and merged to form the trimmed jet. 5 The other subjets are discarded. While our Monte Carlo results are obtained using the Cambridge/Aachen algorithm (for both the original jet finding and the reclustering), at the accuracy that we shall consider here, our analytical results will hold independently of the jet algorithm used, at least for any member of the generalised-k t family [39,40,51,60,61]. 4 A resummation at finite NC has been performed in Ref. [55], using an approach initially developed in Ref. [56]. Some of the complications that occur beyond leading NC have also been explored in [57], finding terms enhanced by additional logarithms that are associated with emissions collinear to the beam directions. 5 In usual formulations of trimming, the parameter that we refer to as zcut is called fcut. We use zcut in order to emphasize the connection with the parameters used in other taggers.

Leading-order calculation
Let us first consider the situation at leading order. If a gluon is emitted at an angle θ > R sub it will be included in the final trimmed jet only if it carries an energy fraction z > z cut . On the other hand, if it is emitted at an angle θ < R sub , it will be included in the same subjet as the leading parton and will automatically pass the trimming condition. In this case it will contribute to the jet mass independently of its energy fraction z.
The above understanding leads to the following integral for the trimmed-mass distribution, It is straightforward to evaluate this for any value of z cut [42], but the expressions that we obtain and the subsequent resummation will be much simpler if we assume that z cut is small (as it usually is in practice), so that we can neglect terms suppressed by powers of z cut . Working furthermore in the approximation m 2 ≪ p 2 t R 2 , i.e. ρ ≪ 1, and making use of the fact that p gq (z) is finite for z → 1, we can then discard the middle Θ-function in the first term in square brackets and ignore the (1 − z) factors in the δ-function. One may then reorganise the contents of the second line so as to obtain Carrying out the integration over θ, and expressing the result in terms of ρ and r ≡ R sub /R gives 3) The remaining z integral is straightforward to evaluate and leads to the following result: For ρ > z cut this is simply the same as the leading-order jet mass distribution, with a linear growth of the distribution as ln 1/ρ. In the integrated distribution Σ(ρ), this corresponds to an α s L 2 growth, with the two powers of L associated with simultaneous soft and collinear divergences. For ρ < z cut but ρ > r 2 z cut still, the z cut condition tames the soft divergence: the integrated distribution then goes as α s L ln 1 zcut , dominated by just the collinear divergence. However, because the z cut condition is applied only to subjets separated by at least R sub from the main jet, this taming is short-lived: small jet masses with arbitrarily small z values can come from angular regions θ < R sub . As a result, for ρ < r 2 z cut , the structure of the result reverts to that for a standard jet mass, albeit with a reduced radius, R sub ≡ rR. The three situations for the trimmed jet mass can be visualised in Fig. 3 with the help of appropriate Lund kinematic diagrams. The LO integrated cross section Σ(ρ) is proportional to the area of the shaded regions, and the differential cross section proportional to the length of the thick (red) line. For ρ > z cut the integrated cross section corresponds to a triangular region, hence a dependence on L 2 . For ρ < z cut but ρ > r 2 z cut , the extra contribution to the integrated cross section comes from a rectangular region, with one side growing with L and the other of fixed length ≃ ln 1/z cut . This gives an integrated cross section that grows as L ln 1/z cut , i.e. with only one power of L. Finally for ρ < r 2 z cut there is once more a triangular region, and so a dependence on L 2 .

Resummed calculation
Thanks to the above considerations it is relatively straightforward to obtain an understanding of the all-order trimmed jet-mass distribution. The key result that we use from the extensive literature on event-shape and jet-mass resummations (see e.g. Ref. [27,62]) is that one can effectively use an independent-emission approximation, ignoring subsequent splittings of those emissions, other than in the treatment of the running coupling. This can be understood as a consequence of angular ordering and is sufficient to derive all of Eq. (3.2) except for the non-global terms. This approach is not necessarily appropriate for all taggers, however it will be suitable for most of the cases in this paper where we give a final resummed answer. The resummation is most easily written for the integrated cross section, involving a sum over an arbitrary number of independent emissions and corresponding virtual corrections. We parametrise each emission in terms of its momentum fraction z i = E i /E jet 6 and its individual contribution ρ i = z i θ 2 i /R 2 to the squared, normalised jet mass: There are three terms in the square brackets: the last one corresponds to virtual corrections, while the first two correspond to different regions of real phase-space: the first states that we can sum over any emission whose individual contribution is ρ i < ρ; the second states that we can sum over emissions with ρ i > ρ, if they are trimmed away, i.e. have z < z cut and θ i > R sub (which is straightforward to express as a condition on z i < ρ i /r 2 ). The total contents within the square brackets equal −1 in the shaded kinematic regions of Fig. 3 and 0 elsewhere. The sum over n in Eq. (4.6) simply leads to an exponential and we can write the final result as where D was defined in Eq. (3.1) and the function S is given by and contains only single logarithms, α n s ln n a b (treating powers of ln 1 zcut as finite coefficients). To help better visualise structure of Eq. (4.7), one may prefer to examine its closed form for fixed coupling: Eq. (4.7) resums terms α n s L 2n and α n s L 2n−1 in Σ(ρ) (neglecting finite z cut effects and terms enhanced by powers of ln z cut ). It also resums all terms α n s L n+1 in ln Σ(ρ). To obtain what is commonly referred to as NLL accuracy, i.e. all terms α n s L n in ln Σ(ρ), would require a treatment of several additional effects: the two-loop β-function and cusp anomalous dimension, non-global logarithms involving resummation of terms ln(z 2 cut r 2 /ρ), related clustering logarithms, and multiple-emission effects on the observable. The clustering logarithms will depend on the jet algorithm used for the trimming, but the rest of the structure will be independent of this (as long as the algorithm belongs to the generalised-k t family). These terms are all relatively straightforward to include, since they follow the structure of the plain jet-mass distribution. However, we leave their study to future work. Analogous results can be also derived for gluon-induced jets. Explicit expressions are collected in appendix A.

Comparison with Monte Carlo results
One test of Eq. (4.7) is to compare it to the Monte Carlo results. We do this in Fig. 4 where the left-hand plots show the trimmed-mass distribution as obtained with Monte Carlo simulation and the right-hand plots shows the corresponding analytical results. 7 The upper row is for quark-initiated jets, while the lower one is for gluon-initiated jets. Two sets of trimming parameters are shown, to help visualize the dependence on them.
The three regions of ρ are clearly distinguishable in each plot, with a close correspondence of the Monte Carlo and analytic shapes and transition points, as well as their dependence on the trimming parameters. Specifically, in the case of quark jets, for ρ > z cut , one sees a linear rise with ln 1/ρ. For ρ < z cut , down to ρ = r 2 z cut there is an approximate plateau, whose height increases for smaller z cut , as expected from the ln 1/z cut term for this region in the LO formula, Eq. (4.4). For ρ < r 2 z cut , the linear rise starts again, but is quickly suppressed by a Sudakov form factor, giving the usual jet-mass type peak. The case of gluon-initiated jets is similar, although the single-logarithmic region is not flat, because of the specific choices of z cut . 7 Resummed expressions for the various taggers (as well as for the plain jet mass) contain integrals of the strong coupling αs(k 2 t ). In order to evaluate these integrals down to low scales, we must introduce a prescription to deal with the non-perturbative region. We decide to freeze the coupling below a nonperturbative scale µNP: where α 1-loop s (k 2 t ) is the usual one-loop expression for the strong coupling, i.e. its running is evaluated with β0 only. We use αs(mZ) = 0.118, n f = 5 and µNP = 1 GeV throughout this paper.  Insofar as z cut and R sub are not too small, the peak position is essentially given by the peak position for the mass of a jet of size R sub rather than R, i.e. at a ρ value that is a factor r 2 smaller than for the plain jet mass. This is consistent with what is observed comparing the Monte Carlo results for the plain and trimmed jet masses. A final comment is that while the peak position is independent of z cut , its height is not: the smaller the value of z cut , the greater the Sudakov suppression associated with vetoing emissions in the range z cut r 2 < ρ < z cut , and so the smaller the peak height, again in accord with the Monte Carlo results.

Pruning
Pruning [35,36] takes an initial jet, and from its mass deduces a pruning radius R prune = R fact · 2m pt , where R fact is a parameter of the tagger. It then reclusters the jet and for every clustering step, involving objects a and b, it checks whether ∆ ab > R prune and min(p ta , p tb ) < z cut p t,(a+b) , where z cut is a second parameter of the tagger. If so, then the softer of the a and b is discarded. Otherwise a and b are recombined as usual. Clustering then proceeds with the remaining objects, applying the pruning check at each stage.
In analysing pruning, we will take R fact = 1 2 , i.e. its default suggested value [36]. In analogy with our approach for trimming, we will work in the limit of small z cut (but ln z cut not too large). We will assume that the reclustering is performed with the Cambridge/Aachen algorithm, the most common choice, and that adopted by CMS [7]. AT-LAS [6] have instead performed the reclustering with the k t algorithm [60,61]). Similar methods could be used to study that case, but we leave such an investigation to future work.

Leading-order calculation
At leading order, i.e. a jet involving a single 1 → 2 splitting, R prune = m pt = ∆ ab z(1 − z), which guarantees that ∆ ab is always larger than R prune . To establish the pruned jet mass, one then needs to examine the second part of the pruning condition: if min(z, 1 − z) > z cut then the clustering is accepted and the pruned jet has a finite mass. Otherwise the pruned jet mass is zero. This pattern is true independently of the angle between the two prongs. This leads to the following result for the mass distribution: where to obtain the last line we have made use of the fact that z cut is small and that the integral is dominated by the region z ≪ 1. The final z-integration is straightforward to perform and gives This has the structure of a rise linear in ln ρ for ρ down to z cut , and then it is constant below. For small ρ, the corresponding integrated cross section has the remarkable property that it contains no double-logarithmic terms, i.e. no α s L 2 contribution. This is, in a certain sense, what pruning was, in our understanding, intended to achieve: the double-log contribution Figure 5. Configuration that illustrates generation of double logs in pruning at O α 2 s . Soft gluon p 3 dominates the jet mass, thus determining the pruning radius. However, because of p 3 's softness, it is then pruned away, leaving only the central core of the jet, which has a usual double-logarithmic type mass distribution.
comes from the region of arbitrarily soft gluon emission, and pruning removes such soft emissions.

3-particle configurations: Y-pruning and I-pruning
When we consider 3-particle configurations the behaviour of pruning develops a certain degree of complexity. Fig. 5 illustrates the type of configuration that is responsible: there is a soft parton (p 3 ) that dominates the total jet mass and so sets the pruning radius, but it does not pass the pruning z cut threshold, meaning that it does not contribute to the pruned mass; meanwhile there is another parton (p 2 ), within the pruning radius, that contributes to the pruned jet mass independently of how soft it is. We call this "I-pruning", because at the angular scale R prune , the final pruned jet consists of a single prong. It is to be contrasted with the type of configuration that contributed to the leading order result Eq. (5.3), for which at an angular scale R prune , the pruned jet always consisted of two prongs. That we call "Y-pruning". 8 Let us work through I-pruning quantitatively. For gluon 3 to be discarded by pruning it must have z 3 < z cut ≪ 1, i.e. it must be soft. Then the pruning radius is given by R 2 prune = z 3 θ 2 3 and for p 2 to be within the pruning core we have θ 2 < R prune . This implies θ 2 ≪ θ 3 , which allows us to treat p 2 and p 3 as being emitted independently (i.e. due to angular ordering) and also means that the C/A algorithm will first cluster 1 + 2 and then (1 + 2) + 3. The leading-logarithmic contribution that one then obtains at O α 2 s is ρ σ where we have directly taken the soft limits of the relevant splitting functions. The ln 3 ρ contribution that one observes here in the differential distribution corresponds to a double logarithmic (α 2 s ln 4 ρ) behaviour of the integrated cross-section, i.e. it has as many logs as the raw jet mass, with both soft and collinear origins. This term is the first of a whole tower of terms α n s ln 2n ρ, all associated with configurations where the emission(s) that set the total jet mass are discarded during pruning, leaving just the mass of the core of the jet (at angles smaller than R prune ).
In general, substructure taggers aim to eliminate contributions from soft emission. What we see here is that this is not entirely the case for pruning. However, in an experimental analysis, it is easy to diagnose whether configurations such as that in Fig. 5 have arisen. Accordingly, we introduce explicit operative definitions for I-pruning and its converse, Y-pruning: Y-pruning: if at any stage during the sequential recombination there was a clustering that satisfied the ∆ ab > R prune condition and the requirement > z cut , the jet is deemed to pass the Y-pruning (i.e. two-prong) requirement. The jet mass was dominated by (semi)-hard radiation and it is likely that the pruning radius was set appropriately for that radiation. 9 I-pruning: if during the sequential recombination there was never a clustering satisfying the ∆ ab > R prune condition and the requirement > z cut , the jet is deemed to belong to the I-(i.e. one-prong) pruned class. Typically, for this class of jets, the jet mass was dominated by soft emissions, leading to a pruning radius that had no relation to any hard substructure potentially present in the jet.
According to our first definition of grooming and tagging in section 2, generic pruning is a grooming procedure: given an initial jet, there is always a corresponding pruned jet, though often with a different mass. In contrast, according to that same definition, Y-pruning is a tagger: i.e. given some initial jet, there will not always be a corresponding Y-pruned jet. In the Monte Carlo results that we will discuss below in section 5.4, for our default choice of pruning parameters, Y-pruning tags about 40% of QCD jets.
Let us examine the α 2 s contribution for Y-pruning. Physically, the key addition relative to the LO result (for which we exclusively have Y-pruning) is the requirement that there should have been no radiation p 3 that would set a pruning radius larger than θ 2 , i.e. no radiation with ρ 3 ≡ z 3 θ 2 3 > θ 2 2 . Insofar as we neglect logarithms of z cut , we can replace this with the condition ρ 3 > ρ 2 ≡ ρ, resulting in a structure up to α 2 s of where the round bracket comes (as at LO), from the integral over allowed z 2 values, and we have used a double-logarithmic approximation for the contents of the square brackets. Figure 6. Lund kinematic diagrams for pruning, considering three different possible values of ρ. In each case, to obtain the given value of ρ, there must be an emission somewhere along the thick (red) line, and there must be no emissions in the shaded region. The solid part of the thick line corresponds to Y-pruning, while the dashed part gives I-pruning. Emissions in the unshaded regions have no impact on the pruned jet mass. The behaviour of the pruner can be affected by the presence of an emission that dominates ρ fat (and so sets the pruning radius), but is discarded because it is below the pruning energy cut. The dotted line that shows the pruning energy cut is parametrised in terms of the jet energy; this is a simplification, insofar as pruning uses the local subjet to provide its reference energy.
Translating to the integrated distribution, Eq. (5.5) implies the presence of a term of the form α 2 s ln 3 1/ρ, i.e. with one logarithm fewer than the I-pruning contribution. As we shall see below, this difference will be related to highly distinct resummation structures for the two types of contribution.

Resummed results
To understand how to resum the pruned jet mass, for both the Y and I components, it is useful to refer to Fig. 6. The left-most figure corresponds to the the region ρ > z cut and is essentially identical to the plain jet mass (as for trimming in this region). In this region we only have Y-pruning.
The middle and right-hand plots illustrate two of the main configurations that are relevant when ρ < z cut . Both show an emission (small black disk) that dominates the total jet mass (ρ fat ) and so sets the pruning radius It will always be at an angle larger than R prune , and for the discussion here it will be interesting to consider the cases where it has a momentum fraction z fat < z cut , so that it is pruned away. We then need to consider a second emission, somewhere along the thick (red) solid and dashed lines, with momentum fraction z and angle θ, that sets the final pruned mass ρ. The two possible situations are: where the conditions on z have been derived by combining the relation θ 2 = ρR 2 /z with Eq. (5.6).
In the middle panel of Fig. 6, the Y-pruning region is represented by a thick (red) solid line, while the I-pruning region is represented by a thick (red) dashed line.
In the rightmost panel, with ρ/ρ fat < z cut , there can be no Y-pruning, because emissions with θ > R prune necessarily have z < ρ/ρ fat < z cut . There is then only I-pruning, and because there is no direct constraint on the momentum fraction of emissions with θ < R prune , any z > ρ/ρ fat contributes to the I-pruning, even if z < z cut . Given that ρ fat < z cut , I-pruning with z < z cut starts to appear only for ρ < z 2 cut . To determine the distributions for Y-and I-pruning, we will work, as for trimming, in an independent emission picture. However, for brevity, we will not explicitly write the independent emissions here, but instead make use of the result that when one forbids emissions (i.e. the shaded regions of Fig. 6), one simply includes a factor corresponding to the exponential of (minus) the integral of the coupling times the splitting function over the forbidden region.

Y-pruning
For Y-pruning, one way of writing the result is as an integral over the momentum fraction z of the emission that gives the final pruned mass. For a given z to contribute it must obviously satisfy z > z cut . In addition the fat jet mass must be smaller than ρ/z. From the considerations of the previous section, this then gives us, for ρ < z cut , The D min(z cut , ρ z ) terms accounts for the suppression of all emissions that would produce a ρ fat > ρ/z (or ρ fat > z cut ). The term S min(z cut , ρ z ), ρ accounts for the further required suppression of emissions with z > z cut contributing a mass between ρ/z and ρ.
Another, equivalent way of writing the result makes the ρ fat integral more explicit: The term on the first line corresponds to configurations in which the emission that dominates the pruned mass also dominates the overall fat-jet mass. The term on the second and third lines corresponds to situations where there is an explicit emission with momentum fraction z ′ < z cut that gets pruned away. 10 It sets a fat-jet mass substantially larger than the final pruned mass, ρ fat ≫ ρ, while the emission that dominates the pruned mass still has θ > R prune . The above two expressions should capture terms α n s L 2n−1 and α n s L 2n−2 in Σ (Y-prune) (ρ). It is less straightforward to discuss the accuracy for ln Σ (Y-prune) (ρ): this is because unlike the cases of plain jet mass and trimming, pruning does not lead to a simple exponentiated structure. Analogous results for gluon-initiated jets are given in appendix A.3.
To help understand the structure of Eqs. (5.8) and (5.9), it is useful to evaluate them in a fixed-coupling approximation, neglecting terms ∼ α s ln 2 z cut , which for ρ < z 2 cut yields where the second line provides a further simplification for situations where ρ is not too small and illustrates the consistency with Eq. (5.5).

I-pruning
The resummed result for I-pruning reads for ρ < z cut ρ σ In order to have I-pruning, there must be an emission that sets the fat-jet mass and pruning radius such that that first emission gets pruned away and a second emission falls within the pruning radius. The first line of Eq. (5.11) gives the distribution for the fatjet mass, assuming that the corresponding emission has z < z cut , i.e. gets pruned away. The second line includes a Sudakov suppression e −S(ρ fat ,ρ) for forbidding emissions with z > z cut between the scales of ρ fat and ρ, and also includes an integral over the allowed z values for emissions that fall within the pruning radius. This multiplies a square bracket containing two terms: the first corresponds to the middle diagram of Fig. 6, while the second corresponds to the right-hand diagram, and accounts for the required additional Sudakov suppression of emissions with z < z cut and θ < R prune . In this factor, we have directly replaced dz p gq (z) with dz/z, neglecting corrections suppressed by powers of z cut .
Eq. (5.11) should account for terms α n s L 2n and α n s L 2n−1 in Σ (I-prune) , i.e. the first two towers of logarithms. Note that overall we have one power of L more than for Y-pruning. As for the case of Y-pruning, it is less straightforward to discuss the accuracy for ln Σ (I-prune) . Analogous results for gluon-initiated jets are given in appendix A.3. A calculation beyond the small-z cut limit reveals that there are flavour-changing contributions that mix quarkinitiated and gluon-initiated jets. They give rise to terms ∼ z cut α n s L 2n−1 [42], and they are neglected here because they vanish as z cut → 0.
The structure of Eq. (5.11) is relatively complicated. Accordingly, to gain some insight into it we will make a double logarithmic approximation, considering just terms α n s L 2n in Σ (I-prune) (ρ). Within this approximation we can replace p gq (z) with 1/z, assume z cut to be of order 1 and take α s fixed. This then gives It is straightforward to verify that this has no α s term and is equivalent to Eq. (5.4a) at order α 2 s . The structure involving the factor L 2 e − 1 4ᾱ sL 2 can be seen to arise from the point where the integrand in Eq. (5.12) is maximal. Insofar as it is legitimate to consider just this structure, one might expect the I-pruned mass distribution to have a maximum situated near L = 2/ √ᾱ s . Using the full form of Eq. (5.13), the maximum is at L ≃ 2.284/ √ᾱ s , which is to be compared to the maximum of the plain jet-mass distribution, situated at L = 1/ √ᾱ s . We will return to these observations when we discuss comparisons with Monte Carlo below.

Sum of Y and I components
Finally let us add together Y-and I-pruning in the region z 2 cut < ρ < z cut , working in a fixed-coupling approximation for simplicity. In this region, the upper limit of the ρ fat integrals in Eqs. (5.9) and (5.11) becomes z cut . In the square brackets of Eq. (5.11), it is the first of the Θ-functions that is relevant (because we have ρ fat < z cut and ρ > z 2 cut ). The z integrals in Eqs. (5.9) and (5.11) are associated with the same prefactors and ρ fat integration, and have complementary limits in z, z cut < z < ρ/ρ fat and ρ/ρ fat < z < 1 respectively and so add together to give an integral over z from z cut to 1. We can therefore write the sum as (5.14) Using a fixed-coupling approximation for simplicity, and making use of the fact that we then obtain the simple result which corresponds to the following integrated cross section: This second form holds also with running coupling effects included. Several comments can be made about Eq. (5.17). Relative to the middle panel of Fig. 6, the key point is that for z 2 cut < ρ < z cut , the presence or not of a distinct "fat-jet" emission (one with z ′ < z cut ) only modifies the separation between I and Y-pruning, but not their sum. As a result, − ln Σ(ρ) is effectively just the integral of the leading order distribution, Eq. (5.3). This is the pattern that is seen also for trimming and the plain jet mass (at NLL accuracy in Σ), but with the difference that in the case of pruning the pattern breaks down for ρ < z 2 cut , whereas for trimming and plain jet mass it holds for all ρ values.
Another point of interest is that Eq. (5.17) is identical to the result for trimming, Eq. (4.9), in the corresponding region r 2 z cut < ρ < z cut . Trimming and pruning are also identical, at our accuracy, for ρ > z cut . We will return to this point later when we discuss the comparisons between taggers in section 8.1.
Finally, as in the case of trimming, to go beyond the accuracy aimed for in this paper for pruning would require the treatment of several additional effects: non-global logarithms and related clustering logarithms, multiple-emission effects on the observable and the twoloop cusp anomalous dimension.
Non-global logarithms enter in a number of ways: in particular, from the boundary at θ ∼ R, they affect the fat-jet mass, and through it the distribution of the pruning radius. This has implications for both the Y and I components starting, in the small-z cut limit, from order α 3 s . Moreover, at finite z cut , I-pruning receives non-global contributions already at order α 2 s [42]. We leave a full resummation of pruning to single-logarithmic accuracy to future work. Figure 5.4 shows predictions for the pruned mass distribution from Pythia in the left-hand panels and from our analytical calculation in the right-hand panels. Upper and lower rows correspond to quark jets and gluon jets respectively. As was the case with trimming, the agreement between the MC and analytical results is reasonable. The expected transition points at ρ = z cut and z 2 cut are labelled with arrows in the upper MC plot. Above ρ = z cut we see a similar behaviour as for the plain jet mass. For z 2 cut < ρ < z cut , we see a flat region in the quark case, akin to the leading-order result, however in the gluon case that flatness is strongly modified by higher orders (the exact impact of these higher orders depends strongly on z cut ). The transition at ρ = z 2 cut is much smoother than that at z cut . Recall that the transition occurs because phase space opens for emissions with z < z cut to The upper panels are for quark jets, the lower panels for gluon jets. The plots show full pruning as well as its breakdown into Y and I components. In the upper left panel, arrows indicate the expected transition points, at ρ = z 2 cut (in black) and ρ = z cut (in grey). The details of the MC event generation are as for Fig. 1. dominate the pruned jet mass. As one can verify analytically, that phase space initially opens up slowly (cf. also Fig. 6) and the most singular contribution for pruning (Y+I components) goes as α 2 s ln 3 z 2 cut /ρ. The transition is therefore gradual. Going substantially below ρ = z 2 cut , for quark jets, one sees a clear peak in total pruning, which results from the I component. In the gluon case, while that peak is similarly visible in the I component, in the sum with Y-pruning it manifests itself as a shoulder, because the peak occurs in a region where the Y-pruning component is not entirely suppressed. As before, this precise picture holds for our specific choice of z cut .

Comparison with Monte Carlo results
The position of the peak for the I component, in the case of quark-initiated jets, is in reasonable agreement with the one determined by the fixed-coupling approximation, Eq. (5.13), though the agreement is poorer for gluon jets: for a reliable quantitative treatment of the peak region it is important to include subleading terms.

Mass Drop Tagger
The mass-drop tagger [23] was designed to be used with jets found by the Cambridge/Aachen algorithm [39,40]. It involves two parameters y cut and µ and, for an initial jet labelled j, proceeds as follows: 1. Break the jet j into two subjets by undoing its last stage of clustering. Label the two subjets j 1 , j 2 such that m j 1 > m j 2 .
2. If there was a significant mass drop, m j 1 < µm j , and the splitting is not too asymmetric, y = min(p 2 tj 1 , p 2 tj 2 )∆R 2 j 1 j 2 /m 2 j > y cut , then deem j to be the tagged jet.
3. Otherwise redefine j to be equal to j 1 and go back to step 1 (unless j consists of just a single particle, in which case the original jet is deemed untagged).
Typical parameter choices are for example µ = 2/3 and y cut in the range 0.09 − 0.15. While the y cut parameter will appear explicitly in our results, µ will not, and indeed we shall see that its exact value is not critical as long as it is not parametrically small.

Leading order calculation
As usual, it is useful to start with a leading-order configuration, for which the jet consists of just two partons. When the jet is declustered, each of the prongs is massless, so that the mass-drop condition is automatically satisfied, rendering the µ parameter irrelevant. There are then two possibilities: if the asymmetry condition is satisfied the jet is tagged, with the tagged mass equal to the original jet mass. Otherwise the jet does not contribute to the tagged jet mass distribution. Considering a quark that splits into a quark with momentum fraction 1 − z and a gluon with momentum fraction z, we have m 2 j = z(1 − z)E 2 . The asymmetry condition then becomes z 1−z > y cut and 1−z z > y cut . We may now write the differential cross section for the jet to have a given tagged mass: Proceeding as with our other LO calculations, including a requirement y cut ≪ 1, leads us to the following result Modulo the replacement z cut → y cut , this is identical to the result for pruning, Eq. (5.3), and in particular has two regimes: it is linear in ln 1 ρ when ρ > y cut , and saturates at a constant value (ln 1 ycut − 3 4 ) for ρ < y cut . In contrast to the case of pruning, it is intriguing that this structure appears rather similar to what is observed in the Monte Carlo results for quark jets in Fig. 1. This would suggest that there are cases where effects beyond LO might be modest.

3-particle configurations
The next step in understanding the mass-drop tagger is to consider 3-particle configurations, where for the first time one encounters the recursive nature of the tagger and potentially also the dependence on µ.
Since we will be mainly interested in logarithmically enhanced contributions, we can exploit the fact that these come from configurations in which momenta are ordered in angle and/or energy. Some interesting such configurations are illustrated in Fig. 8.
Configuration (a) has the ordering θ 13 ≫ θ 12 , with the ordering sufficiently strong that we can assume m jet = m 123 ≫ m 12 . Because the jet was clustered with the angular-ordered C/A algorithm, the MDT first splits the jet into j 12 and j 3 . If E 3 /E 12 > y cut then the declustering passes the asymmetry cut; the strong angular ordering ensures that it also passes the mass-drop condition and so the jet as a whole is tagged. If E 3 /E 12 < y cut , then the MDT recurses, into the heavier of the two subjets, i.e. j 12 , which can be analysed as in the previous, LO section. The key point here is that in the limit in which E 3 ≪ E jet , the presence of gluon 3 has no effect on whether the j 12 system gets tagged. This is true even though we chose a configuration where m jet is dominated by emission 3. This was part of the intended design of the MDT: if the jet contains hard substructure, the tagger should find it, even if there is other soft structure (including underlying event and pileup) that strongly affects the original jet mass. It is possible to show that if one combines the NLO contribution that comes from configurations like (a) with the corresponding virtual graphs, one obtains a contribution to Σ (MDT) (ρ) that goes as α 2 s L 2 for arbitrarily large L. This involves fewer logarithms than any of the plain jet mass, trimming or pruning. However it turns out not to be the leading contribution in terms of a counting of logarithms and therefore we postpone its detailed discussion. Fig. 8 reveals an unintended behaviour of the tagger. Here we have θ 23 ≪ θ 12 ≃ θ 13 , so the first unclustering leads to j 1 and j 23 subjets. It may happen that the parent gluon of the j 23 subjet was soft, so that E 23 < y cut E jet . The jet therefore fails the symmetry requirement at this stage, and so recurses one step down. The formulation of the MDT is such that it recurses into the more massive of the two prongs, i.e. only follows the j 23 prong, even though this is soft. This was not what was intended in the original design, and is to be considered a flaw -in essence one follows the wrong branch.

Configuration (b) in
It is interesting to determine the logarithmic structure that results from the wrongbranch issue. Exceptionally, we are going to work in an approximation in which we treat logarithms of y cut on the same footing as logarithms of ρ. We will, however, neglect terms that do not have the maximal number of logarithms of either argument. The wrong-branch distribution can then be written as where θ is the angle between j 1 and the j 23 system, while x = E 23 /E jet and z = E 2 /E 23 .
In writing the constraints on the angles, we have assumed strong-ordering of the angles. We are also working in a soft approximation, x ≪ 1 and z ≪ 1. The answer is non-zero only for ρ y 2 cut , because x must be less than y cut , while the maximum θ 23 angle is of order R 2 . 11 If ρ y 3 cut then the y cut condition in the second line of Eq. Considering just the asymptotically small-ρ region, which starts for ρ y 3 cut , the integrated distribution, Σ (MDT) (ρ) has a logarithmic structure α 2 s L 3 ln 1 ycut , i.e. enhanced by α s L 2 relative to the LO result and by a power of L/ ln 1 ycut relative to configurations of type (a). Based on the above calculation, one might expect the "wrong-branch" contributions to dominate over the LO type behaviour. In practice they don't. Part of the reason for this 11 In the phase-space region where θ ∼ θ23 ∼ R, the approximation of strongly ordered angles is inappropriate. The determination of the exact onset of the wrong-branch issue would require a full treatment of that region. One would also need to go beyond the small-z approximation: insofar as the squared jet mass involves a factor z(1 − z) rather than simply z, one would then expect an onset in the neighbourhood of ρ ∼ y 2 cut /4 rather than y 2 cut . However, in terms of a logarithmic counting, these considerations should only affect subleading logarithms.  Figure 9. The MDT mass distribution, from Monte Carlo simulation (same as Fig. 1), with the contribution originating from wrong branches shown as a dashed line. Wrong branches are those for which, at some stage during the declustering, the tagger followed a prong whose m 2 + p 2 t was smaller than that of its partner prong.
is visible in the fixed-order result: these terms set in only for relatively small values of jet mass, ρ y 2 cut , with a small coefficient, and the logarithm itself is reduced in size because it involves either y 2 cut /ρ or y 3 cut /ρ, depending on the region. Another part of the reason is that at higher orders the wrong-branch contribution involves a Sudakov-type suppression, coming from the probability that the harder prong of the jet was less massive than the softer one, even though it has an energy that is at least a factor of 1/y cut larger than the softer prong. The small contribution from the wrong-branch configurations is illustrated in Fig. 9, obtained in Monte Carlo simulation, where events with a wrong-branch tag are defined as those for which at some stage during the declustering, the tagger followed a prong whose m 2 + p 2 t was smaller than that of its partner prong. While the wrong branch issue is numerically small, it is an undesirable characteristic of the MDT and calls for being eliminated. Rather than pursuing a full (and non-trivial) calculation of the resummed mass distribution for the MDT, we therefore propose in the next section that the MDT be modified.

Modified Mass-Drop Tagger
The modification of the mass-drop tagger that we propose is to replace step 3 of the definition on p. 22, with 3. Otherwise redefine j to be that of j 1 and j 2 with the larger transverse mass (m 2 + p 2 t ) and go back to step 1 (unless j consists of just a single particle, in which case the original jet is deemed untagged). At leading order, since there is no recursion, this modified MDT (mMDT) behaves identically to the original MDT. However, in the case of configurations like those of Fig. 8b, the tagger will follow the j 1 branch rather than the j 23 branch thus eliminating the wrongbranch issues and the associated terms in Eq. (6.3). Fig. 9 includes the tagged-mass spectrum from the modified mass-drop tagger in Monte Carlo simulation. One sees that, phenomenologically, the modification is a minor one, as can be checked also on events where the jet stems from a resonance decay (i.e. signal rather than background).

All-order tagged-mass distribution
Not only does the mMDT eliminate the wrong-branch issue, but it also turns out to greatly facilitate the resummation of the tagged mass distribution.
As usual, we will work in the limit in which y cut is small, but α s ln y cut is also small. To avoid complicating our formulae with excessive Θ-functions, we will only quote explicit results in the plateau region of the LO calculation, i.e. ρ < y cut . For ρ > y cut , one simply obtains the plain jet-mass distribution.
It is useful to carry out the calculation in an angular ordered formulation, reflecting the inherent angular ordering that is present in the unclustering sequence followed by the tagger, a consequence of the fact that it is based on the C/A algorithm. We consider any number n of emissions, strongly ordered in angle, θ i ≪ θ i−1 , in configurations such that the n th emission has a momentum fraction greater than y cut , while all the others, at larger angles, have momentum fractions smaller than y cut . The latter are simply unclustered and discarded by the mMDT and it is only when it reaches gluon n, the first with a momentum fraction greater than y cut , that it tags the structure. This leads to the following all-order result for the mass distribution: In this formula, z i is the fraction of energy carried by gluon i relative to that of the original jet. Because y cut ≪ 1, all emissions i < n carry away only a negligible fraction of the jet's energy, so that one can consider the jet as always having the same energy even after multiple declusterings. As well as including real emissions, we have accounted for virtual corrections, the −1 contribution in the square brackets; from unitarity considerations, these can be treated as having the same phase-space integration as the real corrections, but obviously without the constraint z i < y cut imposed by the mass drop tagger. The terms in square brackets in Eq. (7.1) can be rewritten −Θ(z i − y cut ). This makes it clear that all the z i in the integrals are restricted to be larger than y cut . Insofar as we neglect logarithms of y cut , we can then replace the ordering of θ i with an ordering in the variable ρ i ≡ z i θ 2 i /R 2 , allowing us to rewrite Eq. (7.1) in terms of integrals over (strongly) ordered ρ i values, i.e. ρ i < ρ i−1 . The result for the integral of the ρ distribution is then straightforward to express as an exponential, where we have now explicitly written in the scale for the coupling and taken care of the modified z integration limit for ρ ′ > y cut . As usual, it can be convenient to examine Eq. (7.2) in the fixed coupling approximation. It is given by 3) which is simply the exponential of the integral of the LO result, Eq. (6.2).
Eq. (7.2) corresponds to evaluating the probability for excluding the shaded region shown in Fig. 10. From this, and the explicit fixed-coupling form, Eq. (7.3), it is straightforward to see that the most logarithmically divergent term in Σ (mMDT) at any order in α s is α n s L n , i.e. there are no terms beyond single logarithms. Considering that all other taggers had terms α n s L p with p up to 2n or 2n − 1, this is a striking result. Note that the strong ordering approximation for ρ i values that is implicit in obtaining Eq. (7.2) is the main reason why we are able to neglect the effect of the mass-drop condition in the tagger: for µ not too small, each time that one unclusters a subjet j into a j 1 and j 2 , if z > y cut , then one knows that m j 1 ≪ m j and so the mass-drop condition m j 1 < µm j is automatically satisfied. Of course, for finite µ values, there is a relative order α s probability that m j 1 > µm j , so causing the mass-drop condition to fail. Insofar as we control terms α n s ln n ρ in Σ (mMDT) , this corresponds to corrections α n+1 s ln n ρ, which are beyond our accuracy.
It is interesting that Eq. (7.2), evaluated with a coupling that freezes in the infrared, tells us that every jet should be successfully mass-drop tagged, albeit possibly with a very small tagged mass. In practice, confinement modifies this picture and in Monte Carlo studies at hadron-level about 90% of jets pass the mMDT procedure.
So far we have concentrated on a limit where y cut ≪ 1, while at the same time neglecting logarithms of y cut . It is interesting to explore what happens when we go beyond this limit. For sufficiently small y cut , one might also aim to control terms (α s ln 2 y cut ) m (α s ln ρ) n for any m, n. In this case a potential subtlety is that one should account for the difference between angular and mass ordering, because given some emission a with z > y cut , there is a probability ∼ α s ln 2 y cut of having a second emission b with z > y cut , at a smaller angle than a but contributing more than a to the jet mass. Such a configuration is illustrated in Fig. 10. Here, emission a will be unclustered before emission b. Its contribution to the squared mass m 2 a1 will in general be much smaller than that from b, m 2 b1 . Consequently m ab1 − m b1 ≪ m ab1 , i.e. there is no substantial mass drop when unclustering a. Emission a is therefore discarded and it is only when b is unclustered that the jet is tagged. This type of configuration might appear to complicate the treatment of the tagger, but actually it simply implies that it is irrelevant whether emission a is present or not. For this reason, we believe that Eq. (7.2), written in terms of mass ordering, is correct for all terms (α s ln 2 y cut ) m (α s ln ρ) n . Accordingly, we have chosen to explicitly include terms that are subleading in a counting of powers of ln ρ, but ln 2 y cut -enhanced, in our expressions Eqs. (7.2), (7.3). 12 We believe the result is identical also for µ = 1: there will be an infinitesimal mass drop when emission a is unclustered, which is now sufficient to trigger the mass-drop condition; however, the masses m ab1 and m b1 differ little in most of the relevant phase space, so that once again it is irrelevant whether emission a is present or not.
It is also possible to examine the mass-drop tagger for moderate y cut values. One of the key new features that arises at single-logarithmic accuracy in this limit is that one now discards emissions with moderate z, and these have a finite probability for modifying the flavour of the remaining hard prong. Therefore Eq. (7.2) needs to be extended to account for a matrix structure in flavour space. This, and other aspects of the moderate-y cut case, are discussed in detail in appendix B.

Absence of non-global logarithms
As we have already observed, there are no terms in the integrated tagged mass distribution of the form α n s ln m ρ with m > n. In other words, there is at most one logarithm of ρ for each power of α s . It is to our knowledge the first time that a jet-mass type observable is found with this property. The reason that there are only single logarithms is that the mMDT completely removes contributions from soft emissions, i.e. one is left only with collinear divergences, but not soft-collinear ones, or pure soft ones.
The absence of pure soft divergences has a particularly interesting consequence, namely the absence of non-global logarithms. As we explained in section 3, non-global logarithms are potentially problematic. They typically arise from situations where a soft emission outside a (sub)jet emits a yet softer emission into the (sub)jet. Soft emissions inside the jet are systematically discarded by mMDT (or, in the situations where they're kept, don't affect the final tagged jet mass) and so the non-global logarithms are eliminated. The same mechanism ensures the absence of related "clustering" logarithms [48,49]. This makes the mMDT particularly interesting, as the only infrared and collinear safe single-jet observable that can be straightforwardly calculated to single logarithmic accuracy with the full N C dependence. It also suggests that the mMDT should be given priority in calculations aiming for accuracy beyond single logarithms.

Comparison with Monte Carlo results
Our analytical results are shown in Fig. 11 (right-hand plots) compared to parton-level Monte Carlo predictions with Pythia 6 (left, virtuality ordered shower). The upper panels show the results for quark jets, the lower panels for gluon jets. Three choices of y cut are shown. The agreement between Monte Carlo and the analytical results is striking. In particular, we note that there are two particular values of asymmetry parameter, namely y cut = 0.13 for quark-initiated jets, and y cut = 0.35 in the case of gluon-initiated jets, for which the mMDT mass distribution is essentially flat. We will come back to this observation in section 8.2, where we discuss background shapes in more detail.
Note that for the y cut = 0.35 choice, the analytical results have been supplemented with a subset of the finite y cut effects, specifically, those that are flavour-diagonal. Further details are given in appendix B. Residual small differences between the Monte Carlo and analytical results for y cut = 0.13 are in part due to the fact that we have left out finite y cut effects there.

Dependence on µ parameter
As we have already discussed in section 7.1, the dependence of the mass-drop parameter µ enters beyond the single-logarithmic accuracy we achieve for mMDT. Fig. 12 (left panel) shows the results of a simple Monte Carlo study to numerically investigate the impact of the mass-drop parameter on the tagged mass distribution. One sees that for 0.4 µ ≤ 1 there is essentially no dependence on µ. For smaller values of µ the background tagging rate drops. This is caused by contributions that are subleading in terms of the number of logarithms of ρ, but enhanced by powers of ln 2 µ, and associated with the Sudakov suppression for requiring that each of the two prongs of the tagged jet have a very small mass.
In light of these theoretical and Monte Carlo observations it seems that one could use mMDT entirely without any mass-drop condition. We believe that this simplification of the  Figure 11. Comparison of Monte Carlo (left panels) and analytic results (right panels) for the modified mass-drop tagger (mMDT). The upper panels are for quark jets, the lower panels for gluon jets. Three values of y cut are illustrated, while µ is always taken to be 0.67 (its precise value has no impact on the results, as long as it is not substantially smaller than this). The details of the MC event generation are as for Fig. 1.
tagger deserves further investigation in view of possibly becoming the main recommended variant of mMDT. 13

Interplay with filtering
The mass-drop tagger is often used together with a filtering procedure, which reduces sensitivity to underlying event and pileup. In its original incarnation a filtering radius R filt was chosen equal to min(∆ 12 /2, 0.3) [23], where ∆ 12 is the angular separation between the two prongs of the jet after tagging (for brevity, we call this the tagged jet). The tagged jet was then reclustered with radius R filt , and only its n filt hardest prongs are kept.
From the point of a general analytical discussion of the effect of filtering, it is immaterial whether one use R filt = min(∆ 12 /2, 0.3) or simply some moderate fixed fraction of ∆ 12 . 14 What matters more is the choice of n filt : for a tagged jet with n particles, filtering will always leave the jet unmodified if n ≤ n filt . It is only if the jet has more than n filt subprongs on an angular scale R filt that filtering will change its mass. This occurs with relative probability α n filt −1 s (e.g. for n filt = 3 there must be at least two additional gluons in order for filtering to discard anything).
Naively one would therefore think that filtering introduces a modification at order N n filt −1 LL. However one should keep in mind that filtering doesn't cause the jet to be discarded, but instead simply changes its mass. Suppose, for instance, that it reduces the mass by some factor f with a probability α n filt −1 s . Given a pre-filtering integrated mass distribution of Σ(ρ) = n c n α n s L n , the post-filtering distribution will be The right-hand term of Eq. (7.4b) goes as α n filt +n−1 s L n−1 , i.e. it is N n filt LL. Accordingly, with the common choice n filt = 3, it is unlikely that there will be a need to perturbatively calculate filtering's impact on the background in the near future! We can verify this conclusion numerically with the help of a Monte Carlo study. This is shown in Fig. 12 (right), where mMDT mass distributions are compared with and without filtering, using n filt = 3. The difference between them is hardly perceptible.

Calculability at fixed order
An interesting consequence of the presence of only single logarithms relates to the extent to which fixed-order calculations are reliable. For observables with terms α n s L 2n , fixed-order perturbation theory breaks down when L ∼ 1/ √ α s and becomes unreliable somewhat earlier. Instead, for observables whose most divergent terms are α n s L n , the breakdown occurs when L ∼ 1/α s , i.e. fixed-order perturbation theory has a parametrically larger domain of applicability. We have not investigated the behaviour of the fixed-order predictions in detail, however such a study would be worthwhile and is straightforward to perform to NLO in the jet mass distribution with tools such as MCFM [63] and NLOJet++ [64].

Comparisons between taggers
We have commented in previous sections on similarities between the taggers for regions of intermediate tagged mass. In particular if one chooses y cut = zcut 1−zcut , then one expects trimming and pruning to be nearly identical to mMDT in the regions ρ > z cut (R sub /R) 2 and ρ > z 2 cut respectively. Choosing y cut = 0.11 and z cut = 0.1, this feature is evident in Fig. 13. There are remaining small differences between the tools, and in particular in the gluon case, for ρ < z cut one sees that trimming and pruning are closer to each other than either is to mMDT. With the help of further Monte Carlo studies, we have traced the difference to fact that both trimming and pruning directly cut on transverse momentum fractions (albeit normalised slightly differently), while mMDT cuts on a ratio of a k t -distance to a mass, which only indirectly translates to a cut on momentum fractions. If, for instance, in step 2 of the definition of (m)MDT one replaces the cut y = min(p 2 tj 1 , p 2 tj 2 )∆R 2 j 1 j 2 /m 2 j > y cut with min(p tj 1 , p tj 2 )/(p tj 1 + p tj 2 ) > z cut , then the small differences between mMDT and pruning in the region ρ > z 2 cut disappear almost entirely, as can be seen. It is straightforward to show that this change does not affect the resummation at the order we have considered.
These observations are important, because previous discussions that have commented on differences between groomers (e.g. [1]) were considering them with non-equivalent parameters. As we see here, a suitable choice of parameters is essential for the comparisons to be as informative as possible.
Among the groomers examined in Ref. [1], there was also filtering (without the massdrop procedure and with a fixed R filt ). While we have not investigated plain filtering in a similar level of detail to trimming, pruning and mMDT, preliminary investigations suggest that it leads to a background jet mass distribution that is very similar to that for the plain jet mass, in particular as concerns the leading-log structure α n s L 2n .

Background shapes
From the point of view of searches with a small signal-to-background ratio, the reliability of the prediction for the background and especially its shape is crucial. The background may be predicted with the aid of perturbation theory, for which our resummation, merged with fixed-order calculations, would be the state-of-the-art. Alternatively, backgrounds may be predicted with data-driven methods. One example of such a method is to measure the background mass distribution to the left and right of an expected W/Z or H mass peak and use that to predict the background mass distribution in the peak location. One may also take the shape of the background for moderate p t jets, and attempt to use it to predict the shape for higher p t jets. From this point of view the structures present in the mass distribution are of importance: for example Sudakov peaks, as they appear in the normal jet mass, in trimming and in pruning, can considerably complicate data-driven methods: they prevent one from reliably interpolating the background between two sidebands, because the peak may lie over one of the sidebands, or even worse, in between them; they also make it more complicated to use a mass distribution at one p t to predict the distribution at another p t , because Sudakov peak positions depend on the jet p t . 15 The (modified) mass-drop tagger is particularly interesting in this respect for two reasons. Firstly it is free of Sudakov peaks. Secondly it has an interesting feature that can be seen by expanding Eq. (7.2) to second order in the coupling, restricting our attention to the region ρ < y cut : y cut + · · · (8.1) where β 0 = (11C A −2n f )/12. Relative to the LO formula, Eq. (6.2), running coupling effects (the β 0 term) cause the the distribution to increase for low ρ, while the exponentiation in Eq. (7.2) brings a (single-logarithmic) Sudakov type suppression. For a specific value of y cut , exp(− 3 4 − β 0 C F ) in the case of quark jets, those two effects cancel, leaving a mass spectrum that is to a good approximation independent of ρ, a property that is potentially valuable in data-driven background estimates. For n f = 5 the relevant y cut value is y cut = e − 35 16 ≃ 0.11. Note that this is determined in the small-y cut approximation, which is subject to corrections of relative O (y cut ). Those corrections lead to a slight increase of the critical y cut value that is needed for flatness, which is consistent with the practical observation of flatness for quark jets in Fig. 11 at y cut ≃ 0.13. Fig. 11 is also consistent with the expectation from Eq. (8.1) that for small y cut the mass distribution will tend to fall off towards small ρ, with the slope being dominated by the Sudakov term; conversely, for large y cut the distribution is more likely to increase towards small ρ, with the slope being dominated by the running-coupling term. For gluon jets the C F coefficients are replaced by C A (and 3 4 by β 0 /C A = 23 36 ). This causes the Sudakov-induced term to be relatively more important, hence the tendency to decrease more steeply towards small ρ and the need for a larger y cut value in order to obtain a flat distribution.

Non-perturbative effects
While the main aim of this work has been to understand perturbative effects in the taggers, it is important to also be aware of the extent to which they may be affected by nonperturbative contributions.

Limit of perturbative calculation
One simple study is to determine, for each tagger, the non-perturbative transition point, below which our calculations start to probe the non-perturbative region. One can define the transition point as the highest mass for which the coupling, in any of the integrals, must be evaluated below some non-perturbative transition scale µ NP . One can imagine µ NP to be of order 1 GeV.
For the normal jet mass, the transition point can be evaluated by considering an emission i with E i θ i = µ NP . The squared jet mass is m 2 = E i E jet θ 2 i and so the transition point is found taking the largest possible value for θ, which gives m 2 ≃ µ NP E jet R. In longitudinally-invariant variables, this reads m 2 ≃ µ NP p t,jet R , (plain jet mass). (8.2) Note that this scale grows with the jet p t , so that even apparently large masses, m ≫ Λ QCD , may in fact be driven by non-perturbative physics. For a 3 TeV jet with R = 1, taking µ NP = 1 GeV, the non-perturbative region corresponds to m 55 GeV, disturbingly close to the electroweak scale! To obtain the transition point for trimming, one simply replaces R with R sub , giving assuming that this lies in the region ρ < r 2 z cut , which usually will be the case for sufficiently high p t jets. For our canonical 3 TeV, R = 1 jet, taking R sub = 0.2 tells us that the nonperturbative region is m 25 GeV. For both Y-and I-pruning, the non-perturbative transition region is formally in the same location as for the plain jet mass. This is because of the integrals over ρ fat , Eqs. (5.9), (5.11), whose lower limits can be as low as ρ. Note, however, that the onset of the nonperturbative effects may be substantially different, because the fraction of the answer that is associated with the non-perturbative region, as well as the interplay between real and virtual components, are different compared to the plain jet mass.
Finally, for the modified mass-drop tagger, we first observe that the smallest scale in the coupling will occur when the momentum fraction of the tagged splitting is z ≃ y cut . The squared mass of the jet is then m 2 ≃ y cut E 2 jet θ 2 . Substituting the condition for the emission to be non perturbative, y 2 cut E 2 jet θ 2 = µ 2 NP , leads to a transition point of Note that in contrast with the cases seen above, this transition point is independent of the jet p t , and genuinely close to the non-perturbative region. Taking y cut = 0.1, it corresponds to a scale of about 3 GeV. 16

Monte Carlo study of hadronisation
It is instructive to supplement the above discussion with Monte Carlo studies of the effect of hadronisation. Figure 14 shows the mass distributions at parton-level, hadron-level without underlying event (UE) and hadron-level with UE, for plain jet mass, trimming, full and Y-pruning and mMDT using either a y cut or a z cut . Figure 15 shows the corresponding ratios of hadron and parton-level distributions. Let us first concentrate on the effect of hadronisation. For any given mass, the plain jet mass is the most strongly affected by hadronisation, with 25% corrections even for jet masses of 100 GeV, in the neighbourhood of the peak region. This scale is about twice that estimated as the limit of the perturbative calculation in section 8.3.1, 17 which itself was large because it scales as √ p t , as given in Eq. (8.2). 16 The unmodified mass-drop tagger is more subtle, because non-perturbative effects can influence the likelihood of following the right v. wrong branches. As a result, non-perturbative effects can set in, at least formally, at the same scale as for the plain jet mass, i.e. µNP pt R. In practice, given that the wrong branch issue is phenomenologically minor, this is unlikely to lead to substantially enhanced non-perturbative effects relative to the mMDT, however it is a relevant consideration from a calculational point of view. 17 The belief that jet mass peaks are beyond perturbative control is widespread, though this statement usually holds for the peak of dσ/dm or dσ/dm 2 . Here we are instead considering m 2 dσ/dm 2 , whose peak is at much larger mass values. It is therefore somewhat surprising that there are still substantial effects. We anticipated that trimming should only be affected by non-perturbative physics at a somewhat smaller mass than for ungroomed jets. This is indeed what we see (most clearly in the top left panel of Fig. 15). Still, trimming's peak region is strongly affected, even more so than for the plain jet mass, which is a consequence of the non-trivial interplay between the change in perturbative peak position and the change in non-perturbative effects as one goes from plain to trimmed jet mass. While pruning nominally has non-perturbative effects setting in at the same mass as the plain jet mass, we argued that their onset might in practice be somewhat different, as is indeed observed: it appears not too dissimilar to trimming. Y-pruning looks somewhat different because it doesn't have a Sudakov peak, however from Fig. 15 it is clear that the order of magnitude of hadronisation effects is similar in full pruning and Y-pruning.
As expected, it is the mMDT that has the smallest hadronisation corrections, with non-trivial structure appearing at about 10 GeV, about three times the scale estimated in section 8.3.1 for the limit of the perturbative calculation. The impact of hadronisation for mMDT depends somewhat on whether it is used with a y cut or z cut , and for the latter in particular hadronisation remains very modest all the way down to 10 GeV.

Analytic hadronisation estimate for mMDT
It is worthwhile examining whether the form of the onset of hadronisation for mMDT, above 10 GeV, can be explained at least qualitatively. Multiple effects can play a role: for example, hadronisation was argued in [43] to shift a given jet's squared mass by an amount δm 2 ≃ C Λ NP p t R, where C is either C F or C A and Λ NP ∼ 0.4 GeV. Hadronisation is also believed to change a jet's (or a prong's) momentum, shifting it by an amount δp t ≃ −CΛ NP /R [43,65]. (The numbers are given here for the anti-k t algorithm with R ≪ 1 and in the case of the jet mass assume a scheme in which hadron masses are neglected; the p t shift result for the k t algorithm is given in Ref. [66]; the other cases, including for the C/A algorithm, have yet to be calculated).
For a tagger one needs to work out the interplay between hadronisation and the tagging procedure. For example, let us consider the shift in jet mass, in the case of a quark jet. 18 The action of the tagger is such that the average effective radius of a tagged jet is a function of the tagged jet mass itself, for quark-initiated jets. For y cut ≃ 0.1, f (y cut ) ≃ 2.5. Thus we obtain For cases where dσ/dm scales as 1/m, this leads to a correction dσ dm Next, let us consider the effect of the p t shift. This is most relevant in cases where one of the prongs, at parton level, has a momentum such that it just passes the y cut asymmetry requirement. After hadronisation its p t is reduced, and so it may no longer pass that requirement. That leads to a drop in efficiency, which can be evaluated as follows. The effect will be relevant for asymmetric splittings, where the softer prong's momentum fraction is z ∼ y cut . The effective jet radius will be of order m pt y − 1 2 cut , and so the absolute change in the prong's p t will be −C A Λ NP y 1/2 cut pt m . This leads to a change in the momentum fraction (relative to original jet) for the softer prong of −C A Λ NP m y 1/2 cut . Note the C A colour factor here, since the soft prong will almost always be a gluon. Given that the perturbative tagging efficiency is equal to the integral over the splitting function down to momentum fractions ≃ y cut , the non-perturbative correction can be evaluated by estimating how the integral changes when requiring a momentum fraction greater than y cut + C A Λ NP m y 1/2 cut . This 18 We are grateful to Jesse Thaler for useful discussions on this point.
gives us One element that we have neglected here is that if hadronisation causes a (sub)jet with mass m 1 to fail the y cut (or z cut ) requirement, then mMDT continues to recurse into the harder prong. This will populate the lower mass region and the jet might then tagged as having mass m 2 ≪ m 1 . The contribution from this effect to masses of order m 2 will be proportional to α s Λ NP /m 1 , whereas the direct correction to masses of order m 2 will be proportional to Λ NP /m 2 , which is parametrically larger. The dependence of the hadronisation correction on m is identical in Eqs. (8.7) and (8.8), with only the coefficient changing. Interestingly the corrections depend just on the jet mass, and not on the jet p t ; this is characteristically different from the situation for plain jet mass.
Numerically it is the negative contribution from the p t shift that dominates over the mass shift. Considerable caution is needed, however, as concerns the actual numerical prediction from these formulae: we have ignored hadron-mass effects, which are known to be substantial [67,68]; we have ignored the (complicated) issue that the two-pronged structure of the jet will undoubtedly modify the pattern of hadronisation corrections relative to the calculations of [43], both for the overall jet mass and the prong transverse momentum; we have also ignored the differences between mMDT with a y cut and a z cut , even though we have seen that they have different non-perturbative effects, possibly because y's definition involves the jet mass, which is itself subject to further corrections. Accordingly, it is probably only the overall Λ NP /m scaling in Eqs. (8.7) and (8.8) that can be considered robust.
Despite these caveats, it is still interesting to compare the result of Eqs. (8.7) and (8.8) to the Monte Carlo results. This is done in the top-right plot of Fig. 15. The plot shows the Monte Carlo results for both the y cut -and z cut -based mMDT. For the results labelled "0-mass," all particles' 4-momenta have been transformed (before clustering) so as to have zero mass, while maintaining their p t , rapidity and azimuth. The figure also shows our analytical result, as well as a variant where the hadronisation corrections have been rescaled by an (arbitrarily chosen) factor of 2.4. All the Monte Carlo results seem to be roughly consistent with our predicted Λ NP /m scaling down to O (10 GeV). However the normalisation of the hadronisation correction appears to be very sensitive to the details of the tagger and the input particles. The version of mMDT formulated in terms of a z cut and with massless input particles appears to agree reasonably well with our prediction. This may just be a coincidence, though it is also true that this is the variant for which our estimates above were most likely to be reasonable.
A final comment concerns the absolute size of the hadronisation corrections for the z cutbased mMDT variants: in the region of phenomenological interest, it seems that hadro-nisation is just a couple of percent. This suggests that these mMDT variants may be optimally suited to high-precision studies, both in new physics searches, and possibly also even applications such measurements of the strong coupling.

Underlying event
A discussion of non-perturbative effects would not be complete without considering the underlying event (UE), whose impact for each tagger can be seen in Fig. 14, with a summary in the bottom plot of Fig. 15. The jet mass is the most strongly affected, while all the groomed/tagged results show a significantly reduced UE sensitivity, which was part of the intention in their design. For trimming and pruning this sensitivity remains genuinely small throughout the phenomenologically relevant region, and in particular significantly smaller than the hadronisation corrections. For mMDT the dependence on UE is almost imperceptible, at or below the 1% level for all jet masses.
For Y-pruning the UE sensitivity is not negligible: this is because the UE can significantly increase the original jet's mass and the resulting pruning radius. Consequently, a jet that was classified as Y-pruned without UE, may be reclassified as I-pruned. The overall pruning rate increases slightly (because for I-pruned jets the z-cut is turned off), while the Y-pruned rate is noticeably decreased. This sensitivity to UE is perhaps the one main disadvantage of Y-pruning, and is, we believe, inherent to any approach that effectively relies on the original jet mass to help discriminate between colour singlet signals and colour triplet/octet backgrounds.
One should be aware that the above pattern of UE dependence does depend on the jet transverse momentum. For example, mMDT was originally designed in conjunction with filtering in order to reduce the effect of UE. This appears not to be necessary here, but had we considered jets with transverse momentum of a couple of hundred GeV, as was the context for the original MDT+filtering study, then the much larger effective radius for the tagged jet would have led to noticeable UE effects in the absence of filtering.

Choice of Monte Carlo Generator
Throughout this work, we have regularly compared our analytical results for the tagged mass distributions with the output of Monte Carlo parton shower simulations from Pythia 6.425 [17], with the DW tune [38] of its virtuality-ordered shower. We have generally found good agreement between our analytics and the Pythia parton-shower simulations. It is also of interest to check whether the agreement is equally good when using different parton showers. To do so, we concentrate on the mMDT mass distribution, in the case of quark-initiated jets, for y cut = 0.13, at the parton level.
The top-left plot of figure 16 shows the comparison between the different showers in Pythia 6 and Pythia 8: the virtuality-ordered one in Pythia 6, our default, and the p tordered one in Pythia 6 [70] (in the Perugia 2011 [71] tune) and the p t -ordered shower from Pythia 8 [18] (in the 4C tune [72]). The top-right plot shows the mMDT mass distribution obtained with the angular-ordered showers from Herwig 6.520 [73] and Herwig++ 2.6.3 [16,74,75] in their default tunes. The Monte Carlo curves are obtained with a generation cut of p t > 2.2 TeV applied to the qq → qq hard process, and the tagging analysis is then carried out on all jets with p t > 3 TeV. 19 All plots include the full leading order (LO) result obtained with the program NLOJet++ [64]. The fixed-order calculation is important in that it enables us to check the distributions for large masses, where resummation may not be appropriate. We ensure a high purity of quark-initiated jets in the fixed-order calculations by setting the incoming gluon parton distribution functions to zero. The plots in the top row figure 16 show that nearly all the Monte Carlo generators are in 19 While it is clear that having one generator cut and a higher subsequent jet selection cut is the correct thing to do, it is also computationally more expensive. In all the other plots of this paper, we have simply used a generator cut of 3 TeV, and always examined the two leading jets. We have verified that these two procedures give essentially identical results, both for Pythia 6.4's virtuality ordered shower and for Herwig 6.520. In contrast, for the pt ordered showers in Pythia 6.4 and Pythia 8, the two procedures give visibly different results, and it is mandatory to use the procedure with staggered generation and selection cuts.
reasonable agreement with each other, with our resummation and with the LO calculation. The one exception is the p t -ordered shower in Pythia 6.245, which predicts a noticeably different shape for the distribution, both at small and large masses. We have checked that this characteristic holds also in another widespread tune of the p t -ordered shower, Z2 [76]. This significant difference relative to our calculations and the other generators appears to be limited to situations where the jet transverse momenta are close to the kinematic limit. We have checked that similar differences appear also for the other substructure tools considered in this paper. Following discussions with the authors of Pythia, they provided us with code for a modified version of the p t -ordered shower, which resolves an issue in which the hardness of the final-state shower could be affected by the presence (or not) of soft initial-state emissions. Results with this modified shower are shown in Fig. 16 (top-left) as a dotted curve, labelled v6.428pre, and one observes a clear improvement in the agreement with other tools. This example illustrates the value of analytical understanding in situations such as this where Monte Carlo results from various generators differ noticeably.
We note that the LO curve exhibits non-trivial structure (a small bump) in the vicinity of ρ = 0.1. This structure is absent in most of the Monte Carlo results, as well as in the results obtained from our analytical calculation (it is however present for Herwig++, and somewhat stronger than in the LO result). We believe that it is driven by the precise structure of hard large-angle radiation: this can be thought of as having a significant hard initial-state radiation contribution, neglected in our calculations and only approximately present in the parton showers. To confirm this hypothesis we also show the LO calculation for a jet of radius R = 0.5 (left-hand plot), which should reduce the initial-state radiation contribution. Indeed, the structure at ρ ≃ 0.1 is much less pronounced. We expect that if we had carried out simulations with tools such as MC@NLO [77] or POWHEG [78] (or alternatively CKKW [79] or MLM [80] matching), these would have correctly accounted for this type of large-mass structure, without significantly modifying the results at lower ρ. It would be interesting to verify this expectation, however such a study is beyond the scope of this work.
Finally, the bottom-left plot shows our resummed prediction and the NLO result. As discussed in section 8.2, the choice y cut = 0.13 minimises higher-order corrections and hence the all-order result is dominated by the LO contribution, even at relatively small masses. This property is confirmed in the NLO calculation, whose central value is just within the scale uncertainty band of the LO calculation. 20

Effect of the taggers on signal-background discrimination
We have so far considered only the question of how the various taggers/groomers behave for backgrounds, i.e. quark or gluon-induced jets. A key question for evaluating the per- 20 Scale uncertainties have been obtained through simultaneous variation of renormalisation and factorisation scales by a factor of two around a central value taken equal to the pt of the leading jet. The scales are kept identical in the (3-jet@NLO) differential mMDT cross section and in the (2-jet@NLO) normalisation cross section. Note the following caveat when varying factorisation scales: the variation of the quark densities is a function also of the gluon densities, however the matrix elements involving incoming gluons are all discarded, in order to obtain mainly quark jets; therefore factorisation scale dependence is not expected to cancel exactly at NLO, in contrast with the situation for a normal NLO calculation. W tagging efficiencies hadron level with UE mMDT (y cut = 0.11) pruning (z cut =0.1) Y-pruning (z cut =0.1) trimming (R sub =0.3, z cut =0.05) Figure 17. Efficiencies for tagging hadronicallydecaying W 's, for a range of taggers/groomers, shown as a function of the W transverse momentum generation cut in the Monte Carlo samples (Pythia 6, DW tune). Further details are given in the text.
formance of taggers is also that of how they fare on signal jets, for example W , Z or Higgs-bosons. The basic, known tree-level result, is that for the decay of a scalar particle, the tagging efficiency of a tagger like mMDT is essentially where the results makes use of the fact that P H→qq (z) = 1. As usual, in the small z cut limit, y cut and z cut are interchangeable. The same result holds for pruning (original and Y-pruning), modulo corrections associated with initial-state radiation (ISR). For trimming, the result depends on m/p t , and is 1− 2z cut for ρ > z cut r 2 and tends to 1 for asymptotically smaller m/p t (again, modulo corrections from ISR). Of course, the tagging always needs to be performed in a given mass window, and these estimates assume that the mass window is sufficiently wide relative to any loss of mass resolution caused by ISR, UE and pileup (the width was studied in detail for MDT with filtering by Rubin in Ref. [21]). Fig. 17 shows tagging efficiencies obtained with Pythia 6 (DW tune) at hadron level (with UE). They have been obtained in W Z events, with the Z decaying leptonically and the W hadronically. The tagger is applied to the hardest jet in the event, which is deemed tagged if its final mass is in the window 64-96 GeV. The fraction of jets that were tagged is shown as a function of a minimum p t cut applied on the qq → W Z hard event in the simulation. As expected, the tagging efficiencies are fairly independent of the p t,min choice, and reasonably consistent with the 1 − 2z cut expectation. The differences that one sees relative to that expectation have two main origins. Firstly Eq. (8.9) holds at tree-level. It receives O (α s ) corrections from gluon radiation off the W → qq ′ system. Monte Carlo simulation suggests these effects are responsible, roughly, for a 10% reduction in the tagging efficiencies. Secondly, Eq. (8.9) was for unpolarized decays. By studying leptonic decays of the W in the pp → W Z process, one finds that the degree of polarization is p t dependent, and the expected tree-level tagging-efficiency ranges from about 76% at low p t to 84% at high p t . These two effects explain the bulk of the modest differences between Fig. 17 and the result of Eq. (8.9). However, the main conclusion that one draws from Fig. 17 is that the ultimate performance of the different taggers will be driven by their effect on the background rather than by the fine details of their interplay with signal events. This provides an a posteriori justification of our choice to concentrate our study on background jets. mMDT (y cut = 0.11) pruning (z cut =0.1) Y-pruning (z cut =0.1) trimming Figure 18. The significance obtained for tagging signal (W 's) versus background, defined as ǫ S / √ ǫ B , for a range of taggers/groomers, shown as a function of the transverse momentum generation cut in the Monte Carlo samples (Pythia 6, DW tune) Further details are given in the text. Figure 18 shows the overall performance of the different taggers quantified as S = ǫ S / √ ǫ B , which is proportional to the signal significance that can be obtained with a given tagger. Here ǫ B is the fraction of quark (left plot) or gluon (right plot) jets that are tagged and pass the mass cut. Let us start by discussing mMDT. Its signal significance S grows with p t . This is driven by three modest effects combining together: the signal efficiency increases at high p t ; the background tagging rate is, in a first approximation, proportional to α s (p t ), which decreases at high p t ; and for our choice y cut = 0.11, the tagging rate decreases slightly for decreasing m/p t (cf. Fig. 11). The signal significance is lower for gluon backgrounds than for quark backgrounds, which is simply a consequence of the C A v. C F colour factor in the leading-order background tagging rate. This is partially compensated for at high p t by the steeper m/p t dependence in the gluon case.
Next, consider trimming. At low p t it has a slightly lower significance than mMDT, mainly because the particular z cut we've used is slightly non-optimal for tagging purposes. However, its main relevant feature is the drop in significance relative to the mMDT curve for p t 800 GeV. This corresponds to a ρ value of 0.01, which is to be compared to the point ρ = r 2 z cut = 0.0045 in Eqs. (4.4), (4.9) at which the background starts to grow and develop a low-mass Sudakov peak. cf. Eq. (4.9). The departure from mMDT is less pronounced in the gluon case than in the quark case because the stronger Sudakov suppression from the C A colour factor reduces the height of the low-mass background Sudakov peak.
Finally, we examine pruning. Like trimming, pruning has a low-mass Sudakov peak, but it develops only for lower masses than for trimming, and accordingly the drop in highest logs transition(s) Sudakov peak NGLs NP: m 2 plain mass α n s L 2n -L ≃ 1/ √ᾱ s yes µ NP p t R trimming α n s L 2n z cut , r 2 z cut L ≃ 1/ √ᾱ s − 2 ln r yes µ NP p t R sub pruning α n s L 2n z cut , z 2 cut L ≃ 2.3/ √ᾱ s yes µ NP p t R MDT α n s L 2n−1 y cut , 1 4 y 2 cut , y 3 cut -yes µ NP p t R Y-pruning α n s L 2n−1 z cut (Sudakov tail) yes µ NP p t R mMDT α n s L n y cut -no µ 2 NP /y cut Table 1. Table summarising the main features for the plain jet mass, the three original taggers of our study and the two variants introduced here. In all cases, L = ln 1 ρ = ln R 2 p 2 t m 2 , r = R sub /R and the log counting applies to the region below the smallest transition point. The transition points themselves are given as ρ values. Sudakov peak positions are quoted for dσ/dL; they are expressed in terms ofᾱ s ≡ α s C F /π for quark jets andᾱ s ≡ α s C A /π for gluon jets and neglect corrections of O (1). "NGLs" stands for non-global logarithms. The last column indicates the mass-squared below which the non-perturbative (NP) region starts, with µ NP parametrising the scale where perturbation theory is deemed to break down. performance of pruning relative to mMDT is mitigated. Most interesting, perhaps, is Y-pruning. Its background enjoys a double-logarithmic Sudakov suppression for small m/p t , due to the factor e −D(ρ) in Eq. (5.10a). The analogous effect for the signal is, we believe, single-logarithmic, hence the modest reduction in signal yields in Fig. 17. Overall the background suppression dominates, leading to improved tagging significance at high p t . This is most striking in the gluon case, because of the C A colour factor in the e −D(ρ) Sudakov suppression. Despite this apparent advantage, one should be aware of a defect of Y-pruning, namely that at high p t the Y/I classification can be significantly affected by underlying event and pileup, because of the way in which they modify the original jet mass and the resulting pruning radius. It remains of interest to develop a tagger that exploits the same double-logarithmic background suppression while not suffering from this drawback. 21 summarised in table 1. We found, analytically, that the taggers are similar in certain phase-space regions and different in others, identified the transition points between these regions and carried out resummations of the dominant logarithms of p t /m to all orders.
One tagger has emerged as special, mMDT, in that it eliminates all sensitivity to the soft divergences of QCD. As a result its dominant logarithms are α n s L n , entirely of collinear origin. It is the first time, to our knowledge, that such a feature is observed, and indeed all the other taggers involve terms with more logarithms than powers of α s . One consequence of having just single, collinear logarithms is that the complex non-global (and super-leading [57]) logarithms are absent. Another is that fixed-order calculations have an enhanced range of validity, up to L ≪ 1/α s rather than L ≪ 1/ √ α s . The modified mass-drop tagger is also the least affected by non-perturbative corrections. Finally the y cut parameter of the tagger can be chosen so as to ensure a mass distribution that is nearly flat, which can facilitate the reliable identification of small signals. Intriguingly, the massdrop parameter appears to be largely redundant, which suggests that one might further simplify the tagger by eliminating it, while retaining all of the tagger's attractive features. Also of interest is the Y variant of pruning. This is the only one of these simple taggers to derive a significant advantage from the difference in net colour between electroweak signals and QCD backgrounds. That advantage comes at the cost of enhanced UE and pileup sensitivity, and it remains to be seen if this drawback can be alleviated. This article forms part of a wider project to gain an understanding of the behaviour of taggers on both signals and backgrounds. Such an understanding is important to help ensure that these tools are used as robustly as possible and to gain insight into the similarities and differences between tools. We saw explicitly, in section 8.4, how our results helped identify issues in Monte Carlo generators, and in section 8.5, how they gave us a powerful tool to understand signal-background discrimination performance as a function of jet p t .
We look forward to continued future work on this subject. This may include the extension of our analysis to signal processes, higher accuracy calculations for the taggers, measurements and phenomenological comparisons especially for mMDT, and the study of a wider range of observables. We believe that such work will provide solid foundations for the field of jet substructure and help guide its future development.

A Formulae for gluon jets
In the main text we explicitly derived resummed expressions for quark-initiated jets. Analogous expressions for gluon jets can be easily obtained by replacing the colour factor C F with C A and considering gluon splittings rather than quark ones, which amounts to the substitution p gq → p xg ≡ 1 2 p gg + Note that, exploiting the symmetry z ↔ (1 − z) of the g → gg splitting, p gg has been conveniently written in such a way that it only exhibits a singularity for z → 0. We can define now the equivalents of Eq. (3.1) and Eq. (4.8) for gluon-induced jets: It is then easy to write down the resummed expressions for the mass distribution of gluoninduced jets, for each of the cases considered in this paper, i.e. plain jet mass, trimming, pruning and mMDT. As in the main part of the paper, we report results in the small-z cut (y cut ) limit.

A.1 Plain jet mass
The resummed expression for the integrated distribution of the plain jet mass, in the case of gluon jets is given by where N g (ρ) contains non-global logarithms and clustering logarithms. The above expression is to be compared to the case of quark-initiated jets, Eq. (3.2).
In the case of trimming, the subjet radius is a fixed, user-chosen parameter. Therefore, for sufficiently small values of ρ, two-prong configurations are either entirely contained inside a single subjet, or else one of the prongs falls below the z cut requirement. In other words for Y-trimming there will be a minimal value of ρ that can be probed, which, in the small z cut approximation is z cut r 2 , where we recall r = R sub /R. In effect the situation is similar to that for normal jet finding with a fixed jet radius. 22 This means that, unlike the other taggers we have considered, Y-trimming is not ideally suited to probing a broad range of boosts. It is for this reason that we have not included it in as part of our main discussion of taggers.
In this context, it is interesting to note that a cut on the subjet separation was used in early ATLAS work on MDT [4], ∆ j 1 j 2 > R min with R min = 0.3. This cut has the same effect as the two-subjet requirement in trimming, i.e. it leads to a minimal accessible value of ρ of y cut r 2 , where now r = R min /R. The cut was imposed so as to reduce sensitivity to detector and reconstruction granularity. It is to be hoped that ongoing and future work by the ATLAS collaboration will eliminate the need for such a cut in substructure studies.
For completeness we provide here the exact LO result for Y-trimming. We work in the small-R limit, but relax the small z cut and small ρ approximations, because of the presence of multiple transition points that are quite close to each other in ln ρ. Defining Π q (x) = Eq. (C.2) is valid if r 2 ≤ 4z cut (1 − z cut ). For r 2 = 4z cut (1 − z cut ), the plateau region between r 2 /4 and z cut (1 − z cut ) is replaced with a single peak transition point at ρ = r 2 /4, and a minimal ρ of r 4 /4. For larger values of r, the result is left as an exercise for the reader.
In figure 19 (left) we show the ρ distribution for Y-trimming and normal trimming, where the transition points are clearly visible. Finally, in the right-hand plot, we show the signal significances versus minimum jet p t in the presence of quark jet backgrounds, confirming that Y-trimming is not an adequate boosted-object tagger at high transverse momenta.