Non-global and clustering effects for groomed multi-prong jet shapes

We present a resummation of the non-global and clustering effects in groomed (with modified mass drop tagger) multi-pronged observables, valid to next-to leading logarithmic accuracy in the D2 distribution (all single logarithmic terms), focusing on the non-global and clustering effects which cannot be removed by normalizing the cross-section. These effects are universal in the sense that they depend only on the flavor structure of the 1 → 2 splitting forming the multi-pronged subjets and the opening angle of the splitting, being insensitive to the underlying hard process or underlying event. The differential spectra with and without the non-global and clustering effects are presented, and the change in the spectra is found to be small.


JHEP02(2019)114
1 Brief discussion of clustering effects and NGLs In refs. [1] and [2], a factorization for two versus one pronged jets using the jet shape D 2 and the modified mass drop tagger (mMDT) [3,4] or equivalently, soft drop with angular exponent β = 0 [5] was presented. Within ref. [1], it was argued that the nonglobal and clustering effects do exist (in a limited sense) when the jet has genuinely two prongs. The two prongs creates an effective jet area for secondary splittings to radiate into, where the groomer cannot remove them. The purpose of this note is to provide numerical evidence that these effects are small, by detailing the large-N c resummation to leading logarithmic accuracy of both the clustering and non-global effects, justifying the claimed next-to-leading logarithmic (NLL) accuracy of the resummation in ref. [1], that is, all terms of the form α s lnD 2 . An outline of the algorithm used here to estimate the effects of the non-global radiation was already presented in ref. [1], where the numerical results were used to conclude that the NGL effects were small. Here we simply provide a detailed analysis of the numerics and presentation of the algorithm. 1,2

JHEP02(2019)114
The clustering and non-global effects 3 are incorporated formally in our factorization formulae within the collinear-soft function: is the recoiling direction of the jet, whereas the directions a = (1,â) and b = (1,b) are the directions of legs of the dipole structure selected by the requirement e 3 (e 2 ) 3 , that is, the directions of the subjets, with some opening angle θ ab . T is a color generator fixed by the flavor structure of the splitting, that the color indicies of the wilson lines contract into. The jet axis isn =â +b |â+b| . Thus the soft function is sensitive to two soft scales set by e 3 , and z cut , as well as the modified geometrical structure of the jet due to multiple emissions. Emissions that are clustered into legs a and b at one emission level may not be so clustered at higher orders due to other emissions that are closer in angle, yet outside the clustering region of the two-prongs. The boundary set by R of the groomed jet is irrelevant for the issue of non-global logarithms (NGL) of the groomed jet distribution for a specific flavor of jet, since we may always take the emissions that do not cluster into the dipole to fail soft drop, wherever they are. However, in hadron-hadron collisions, there are non-global contributions to the quark and gluon fractions which can be sensitive to the boundary of the jet. Thus the only relevant geometrical constraint is that we are concerned about the history of emissions at angles greater than θ ab ton, where the precise region that is clustered into the legs is given by the Cambridge/Aachen algorithm [23][24][25].
We emphasize at this level already, the NGLs and clustering logs within the groomed jet are much less worrisome than traditional NGLs. This is due to the fact that the NGLs and clustering logs are local to the jet, that is, process independent, and largely determined by the flavor structure of the splitting giving rise to the subjets, and possibly the opening angle of the subjets θ ab . This is due to the fact that the subjet splitting is boosted deep into the fat jet, so all other eikonal lines of the process are collapsed onto a single recoiling direction. 4 The grooming algorithm guarantees only these quasi-collinear eikonal lines can contribute to the NGL distribution, and the possible clustering effects. That is, if a set of eikonal lines emits into the region subtended by the subjet splitting, and neither leg of the radiating dipole is the a or b leg of the subjets, the probability for the emission to be clustered will be proportional to θ 2 ab 1, a power suppressed contribution. Thus only the boosted set of eikonal lines forming the subjet splitting can contribute. Hence the distribution is universal, and can in principle be computed once and for all.
We give a procedure to compute these effects below, which we used to estimate whether our predictions for the distributions from exponentiating the global contributions were sufficient for NLL acurracy. The main theoretical interest is the necessity to keep track of the whole history of emissions in the cascade of soft partons, due to the Cambridge/Aachen clustering metric. 5 This entangles the emissions off of distinct dipoles, which can often be considered independently in their evolution history if the phase-space constraint is geometric to all orders in perturbation theory.

Review of groomed jet shapes
For completeness, we give a review of the mMDT procedure that underpins much of the analytical progress in understanding jet substructure, as well as the D 2 jet shape observable for distinguishing 1 versus 2 pronged jet substructure. The mMDT procedure grooms a candidate jet, which could have been constructed out of the event from any other suitable jet algorithm, by reclustering the constituents with the C/A algorithm. At each step in the algorithm, the two particles (the daughters) whose momenta have the smallest opening angle are combined into a pseudo-particle (the parent) with the same total energy, and pointing (typically) either in the direction of the sum of the momenta (called the E-scheme), or in the direction of the more energetic of the two particles being combined (the winnertakes-all scheme). The two particles being combined are deleted from the list of particles in the jet and are replaced by the pseudo-particle. This reclustering continues until all particles are combined into a single pseudo-particle. The clustering tree then is the history of the recombinations.
To groom the jet, the clustering tree is examined in the opposite order that it was constructed. One takes the current pseudo-particle (starting with the psuedo-particle forming the total jet, the last clustering of the C/A algorithm) and examines the two daughters that compose it. Let z i and z j be the energy fractions of the two daughters. The current pseudo particle is declared to be the groomed jet if: Where the parameter z cut is an input to the grooming procedure. If the two daughters fail the condition of eq. (2.1), then the less energetic of the two are discarded from the jet, and the more energetic daughter is now the current psuedo-particle to be declustered and tested. A pictorial representation of the mMDT can be found in figure 1. The solid black line follows the most energetic branch, and the two red branches are at the widest angle, and so are the first to be declustered. They are assumed to fail the mMDT criterion, and further down the clustering tree of the most energetic line, we find the branching which passes mMDT at a much smaller angle. Once we have found the branching which passes the mMDT condition of eq. (2.1), all the particles which form the two branches are the groomed jet, and any measurement on JHEP02(2019)114 those particles are a groomed measurement. For the purposes of multi-prong discrimination, that is, finding whether a jet has a two subjets or a single hard core, a convinient shape variable is the so-called D 2 observable of ref. [30], which is formed from the ratio of two energy-energy correlation measurements of ref. [31]. Let J be the list of particles within the jet, and G(J, z cut ) the list of particles that survive the mMDT grooming procedure. Then the appropriate shape variable is given by: . (

2.4)
E J is the total jet energy, p i is the four momenta of the i-th particle, and E i is its energy. When β = 2, we will simply write D (2) 2 = D 2 . When D 2 1 we have a jet with genuinely two distinct subjets.
For e + e − → hadrons, the cross-section for a groomed jet in the limit D 2 1 when the two subjets are collinear assumes the factorized form: (2.5) Within this factorization, Q is the center of mass energy, H is the hard matching coefficient describing the decay of a off-shell photon or Z-boson into a quark/anti-quark pair at the scale Q, or a Higgs boson into gluons, Jn is the recoiling jet function, S is the global soft function describing the ungroomed jet boundaries and the structure of the soft radiation which fails mMDT, H 1→2 describes the formation of the two collinear subjets, labeled 1 and 2, with energy fractions z and 1 − z (relative to Q/2, the energy of the fat parent jet), J 1 and J 2 are jet functions describing the collinear contributions to the e (β) 3 measurement, and C s is the collinear soft-function of eq. (5.1), describing the soft radiation which is sensitive to the dipole structure of the subjets. The function H 1→2 is given by the finite terms of the 1 → 2 squared collinear splitting amplitudes. Field theoretic definitions of all functions, their calculation to one-loop accuracy, their renormalization, and (global) resummation can be found in ref. [1], as well as factorizations for the regions where a subjet becomes soft-collinear, or when D 2 ∼ 1. In this paper we focus on the non-global resummation of C s .
At β = 2, we can compute the D 2 distribution with a cut on the mass by the marginalizing over the cross-section: This is a description of a Monte Carlo algorithm for computing the soft drop non-global logarithms and clustering effects for the e 3 distribution. The main idea was summarized in ref. [1], appendix E, and follows the scheme first outlined in ref. [32], but we now spell out the algorithm in detail. To single-logarithmic accuracy in the large N c limit, this algorithm computes the NGLs and clustering logs for the collinear soft function C s given above. We need: • All four vectors are null, and determined by their spatial direction, being of the form: a = (1,â). Thus for all dot products between two null vectors, we have a · b = 1 − cos θ ab , with θ ab being the angle between them in the laboratory frame.
• a, b are the hard prongs of the soft dropped jet.
• List of emissions E.
• List of dipoles D, where an element is given by {x, y, ∆η xy }. x, y are the null directions of the legs, and ∆η xy is their opening angle in rapidity.
• Histogram H t , indexed by resummation time t = αsC A π ln zcut e 3 .
• w, the weight of the current event.
• Rapidities are calculated in the lab frame.
In what follows, we denote the angle between eikonal lines x and y by θ xy . We let: First zero out the histogram H t = 0 , ∀t.
• Initialize: For instance, for the various splittings, we have the dipole structures: JHEP02(2019)114 The differential probability for generating an emission in direction j from the set of dipoles D is then:F Where the soft drop virtual subtraction phase-space C ab j, {a, b} is defined in eq. (3.10). The algorithm is now: 1. Calculate ∆η tot by summing over the ∆η's in D. Generate a random ∆t via the probability distribution: Increase t by ∆t.
4. Calculate: where θ xy is the angle between x, y.
5. If C ab (j, E) = 0 & C ab j, {a, b} = 0, then the emission is not clustered into the emissions a or b before a, b are clustered together, nor is it possibly a virtual subtraction. Delete {x, y, ∆η xy } from D, add {x, j, ∆η xj } and {j, y, ∆η jy } to D, and add j to E. Goto step 1.
6. If C ab (j, E) = 0 & C ab j, {a, b} > 0, then the emission is not clustered into the emissions a or b before a, b are clustered together, but it can be a virtual subtraction.
• x = a i and y = b i for all original dipoles {a i , b i }. Delete {x, y, ∆η xy } from D, add {x, j, ∆η xj } and {j, y, ∆η jy } to D, and add j to E. Goto step 1.
• Either x or y is a leg of an original dipole. Then X = ComputeVeto(j, x, y).
• If X > 0, then with probability X, we delete {x, y, ∆η xy } from D, add {x, j, ∆η xj } and {j, y, ∆η jy } to D, and add j to E. Goto step 1. Otherwise, we throw away this emission j, and goto step 1.
• If X ≤ 0, then add wX to H t , reset w to w(1 − X), and goto step 1.
7. If C ab (j, E) > 0, then we cluster j into a or b, then • x = a i and y = b i for all original dipoles {a i , b i }. Then add w to H t and start new event, re-initialize.

JHEP02(2019)114
• If either x or y are legs of the original dipole, X = ComputeVeto(j, x, y).
• If X > 0, then with probability X, we add w to H t and start a new event, re-initialize. Otherwise, we throw away this emission j, and goto 1.
• If X ≤ 0, then add wX to H t , reset w to w(1 − X), and goto step 1.
The emissions are generated in step (3) in the selected dipole rest frame, where they can be given by generating uniform distributions of vectors in rapidity and azimuth with respect to the back-to-back eikonal lines. We then boost back to the lab-frame and check the angular cutoff conditions are satisfied.
The veto. This is how we compute the reweighting veto, ComputeVeto(j, x, y): y} is one of the original dipoles, then: • y = b i , a leg of one of the original dipoles D i , but x = a i , then, • Return X.
The calculation of the reweighting X value in the ComputeVeto function splits the virtual subtraction between the two legs of an initial dipole according to which leg it is closer to. So for instance if the virtual subtraction is due to the initial dipole which is the a, b-dipole forming the subjets, we justify the partitioning of the subtraction as follows: where the virtual subtraction has angular phase space given by θ SD = C ab j, {a, b} , see ref. [1]. We note that the phase-space given by the function C ab j, {a, b} is the same angular phase-space used to define the (sudakov) global logarithms. If the virtual

JHEP02(2019)114
subtraction is from an a −n or b −n-dipole, that is, a dipole formed from an initial leg and the recoiling direction, then the θ-function in eq. (3.12) or eq. (3.13) is always satisfied. That is, an emission that is closer to the recoil direction then to either leg a or leg b cannot satisfy θ SD > 0 when θ ab 1. 6 The real emissions have an angular phase space that is dictated by the complete emission history up to this point. Thus the algorithm naturally incorporates the clustering effects that arises from mis-matching phase space constraints between the exponentiated one loop result and the result given by multiple emissions.

Discussion of Cambridge/Aachen clustering history
In the above algorithm, we are working in the strongly energy-ordered limit. Formally, every emission has an energy much greater than all subsequent emissions. This is justified in part due to the fact that the collinear-soft function itself is a product of eikonal lines, and thus already contains the strongly energy-ordered QCD diagrams as a proper subset of its full diagrammatic expansion.
Since the emissions are strongly energy-ordered, if emission p j is produced late in the cascade, we simply need to compare the angle that this emission has to all previous emissions, assuming the previous emissions satisfy: • They fail soft drop on their own.
• They have not clustered into a or b before a, b are themselves clustered.
Emission p j will be clustered into whatever prior emission it is closest to in angle. Moreover, by the strong energy ordering assumption, it will not change the direction of that emission. Thus it can only contribute to the observable e 3 if it manages to be clustered into a or b before it is clustered into any emission generated so far in the cascade, that is whether: Note that any later emission after a given real emission has been established in the cascade cannot change the directions of the emissions it may be clustered into: the resulting pseudoparticle in C/A will point in the direction of the more energetic emission. This is exactly true if one used a winner-take-all clustering scheme of refs. [33,34], and approximately true if using a standard E-scheme, where one simply sums the momentum. Thus softer emissions cannot change whether p j above is clustered into {a, b} or not. The action of the clustering history is illustrated in figure 2.

Numerical results
First we present the NGL/clustering contribution to the collinear-soft function, having factored out the global evolution (this is the direct output of the MC algorithm above): 7 The scales µ e 3 and µ zcut minimize the logarithms given by the calculation of the one-loop anomalous dimensions, as discussed in ref. [1]. To calculate the cross-section with nonglobal and clustering effects, we laplace transform the cross-section of eq. (2.5), solve the renormalization group equations in laplace space for each function with generic scales, all evolved to a common scale µ. We then invert the laplace transform analtyically, and take the cumulative distribution up to some maximum D 2 . We then fix the scales in the cumulative resummed distribution. This procedure resums all global logarithms to NLL accuracy, and to add the clustering and the non-global effects, we multiply this cumulative resummed distribution by g NGL SD . Taking the derivative will give the differential resummed cross-section, as plotted in figures 5. We use an angular cutoff of δ = 0.002 for the shower in what follows.
In figure 3, we plot the non-global and clustering modification factor g NGL SD (t, θ ab ), for the splittings g → gg, q → qg, and Z → qq. We give the distribution for a variety of

Resummation of Non-global & Clustering Effects
Hemisphere Fundamental Jets mMDT Z → qq, θ ab = 0.36 mMDT Z → qq, θ ab = 0.18 mMDT Z → qq, θ ab = 0.09 mMDT Z → qq, θ ab = 0.045 opening angles θ ab for the collinear splitting, and find that as θ ab → 0, the distribution tends to a universal value, very weakly dependent upon the exact value of θ ab . The fact that the quark initiated splittings tend to the same asymptotic value is expected given the arguments of ref. [18]. The asymptotics is determined by the number of legs in the active jet region, which in this case is the one recoil direction. Moreover, we find in figure 4 that we have to a good approximation the different flavor splittings satisfy: .
This is a very unexpected result, since the different initial dipole configurations ought to lead to very different branching histories, which the C/A clustering is sensitive to. If the real emission phase space constraint did not depend upon the emission history off of all dipoles, like in the hemisphere case (where the geometrical constraint for real emissions is the same for all soft emissions to all orders), such a result would have been expected, based on the large N c factorization of color-disconnected dipoles.
In figure 5, we plot the difference between the D 2 spectra with non-global and clustering effects, and the spectra with simple exponentiation of the global anomalous dimensions.  We give the results for gluon initiated jets, with three different mass cuts m j = 45, 90, and 135 GeV, with a jet energy of 500 GeV, in order to probe different opening angles. Since the non-global and clustering distribution is strongest for the gluon, we do not include distributions for the quark or Z initiated jets. We can clearly see that the non-global and clustering effects are well within the uncertainty estimates due to scale variation of the starting and ending scales of resummation (this is including variations in where to start the non-global resummation), and that for the most part, simply exponentiating the global anomalous dimensions gives an accurate description of the NLL spectrum. Ultimately, this small effect of the non-global and clustering logarithms is due to the ratio of scales of the distinct soft regions in the function C s (see appendix A of ref. [1]): .
z is the energy fraction of one of the subjets of the splitting. We note that this is a pessimistic estimate for the ratio of scales the non-global resummation is sensitive to, since: For z cut = 0.1 or 0.05, this is never a very large ratio of scales (that is, much greater than 1) until well after the sudakov suppression of the cross-section sets in. We illustrate the effect of changing z cut in figure 6. However, we caution that as z cut → 0, non-global effects associated with power corrections due to the expansion z cut can become important, and which are not considered in this study. Note that these non-global effects would also effect the soft-drop/mMDT groomed jet mass distributions of refs. [3,4,35,36]. for δ = 0.001, 250,000 events for δ = 0.002, and 5000,000 events for δ = 0.004. The RMS spread was estimated by running the MC for 10 statistically independent runs, each containing the same number of events. We then calculate the RMS over the 10 runs. Finally we smooth the error by fitting an exponential function a × e bt to the ratio of the RMS to the mean as a function of t. We then take the upper and lower bounds as given by g NGL ± (t) = g NGL ave. (t)(1 ± a × e bt ) as the estimation of the statistical uncertainity. We find negligible statistical uncertainty and cutoff dependence for NGL values out to t ∼ 2.5.

Conclusions
We have investigated the contribution of non-global and clustering effects that directly change the shape of the groomed shape variable D 2 for multi-pronged jets using the mMDT grooming procedure. In all, there are three sources of non-global contributions to the groomed D 2 observable. Two of them are directly shared with the groomed mass distribution of refs. [3,4,35,36]. First, there are the non-global contributions directly contributing to the groomed constituents of the jet, which for an ungroomed jet would modify its mass spectrum. At leading power, these contributions are removed, a major feature of the mMDT algorithm, and effectively erasing any hard geometric boundary of the jet with respect to the rest of the event, so that the jet appears to have zero active area. Secondly, there are non-global contributions which can change the relative quark and gluon jet fractions. These contributions correspond to so-called "global" soft modes which are not associated to any particular jet, and are therefore sensitive to the precise jet boundaries drawn over the event, as well as the additional cuts one places on the event. These contributions cannot directly affect the shape of a quark jet or a gluon jet mass or D 2 spectrum (the global soft functions being identical in the two cases). Thus for e + e − collisions, such contributions can be normalized away, since jets are dominated by the quark initiated process, but may play a role in hadron-hadron jet spectra. In the hadron-hadron JHEP02(2019)114 collision case, they can also be resummed in each quark and gluon fraction for fat jets using the techniques of ref. [26]. Finally, there are the contributions to the D 2 spectrum which do not change the mass spectrum of the groomed jet. These contributions are what were considered and resummed in this paper. Again, these contributions are not sensitive to the precise jet boundary for large R jets, and are determined by specifying the flavor structure of the boosted jet. The secondary branching takes place outside of the region defined by the opening angle of the dipole required to exist in the D 2 → 0 limit, and whether the branching is inside or outside the groomed jet is irrelevant as all such radiation fails the grooming requirement.