Improving the Understanding of Jet Grooming in Perturbation Theory

Jet grooming has emerged as a necessary and powerful tool in a precision jet physics program. In this paper, we present three results on jet grooming in perturbation theory, focusing on heavy jet mass in $e^+e^-\to$ hadrons collisions, groomed with the modified mass drop tagger. First, we calculate the analytic cross section at leading-order. Second, using the leading-order result and numerical results through next-to-next-to-leading order, we show that cusps in the distribution on the interior of phase space at leading-order are softened at higher orders. Finally, using analytic and numerical results, we show that terms that violate the assumptions of the factorization theorem for groomed jet mass are numerically much smaller than expected from power counting. These results provide important information regarding the convergence of perturbation theory for groomed jet observables and reliable estimates for residual uncertainties in a precision calculation.


Introduction
A precision program for jet substructure calculations and measurements has developed through advances in jet grooming algorithms. Because of its mitigation of non-global logarithms [1] that would inhibit systematic improvability of theoretical predictions, the modified mass drop tagger (mMDT) groomer [2,3], and its generalization soft drop [4], have emerged as the necessary tools for the precision task. Following the original papers that introduced the groomers, a large literature of calculations and applications has resulted  and demonstrated that standard jet observables like the mass that have been groomed exhibit significantly improved sensitivity to the value of the strong coupling α s and over a much wider dynamic range than its ungroomed counterpart. This explosion of theoretical advances has been accompanied by measurements of groomed jet masses by both the ATLAS and CMS collaborations at the Large Hadron Collider (LHC) [27][28][29].
For simplicity, much of these theoretical analyses have focused on jet production in e + e − collisions, even further focused on center-of-mass energies of the Z pole. Recently, a reanalysis of archived data from the ALEPH experiment [30] at the Large Electron-Positron Collider (LEP) has demonstrated the proof-of-principle that studying jet grooming in e + e − collisions can be more than just a purely academic exercise. In this paper, we restrict to jets in e + e − collisions for these reasons. A precision prediction of any event or jet shape at a lepton collider requires three broad components: fixed-order calculations in the perturbation theory of QCD, resummation of large logarithms near the exclusive phase space boundaries to all orders in the coupling, and the dominant corrections from non-perturbative physics in the bulk of the phase space. Advances have been made in all three of these directions for mMDT grooming in particular. Next-to-next-to-leading order (NNLO) predictions for groomed jet mass has been computed [31] in the CoLoRFulNNLO subtraction method [32][33][34]. Using the factorization theorem of Refs. [5,6], supplemented with two-and three-loop results [35][36][37][38], next-to-next-to-next-to-leading logarithmic order (NNNLL) resummed predictions have been presented [39]. In Ref. [40], the first matrix element definition of non-perturbative corrections was provided for these groomers, with the leading contributions encapsulated into three universal coefficients. Through appropriate combination of these results, predictions for mMDT jets in e + e − collisions can be provided that rival the precision established of classic event shapes such as thrust [41,42] and C-parameter [43,44].
However, even in restricting analysis to mMDT groomed jets in e + e − collisions, there are as of yet unresolved issues with the precision predictions that have been presented. In addition to the scale enforced by the measurement of the jet mass, the groomer introduces another scale that defines which emissions are kept or removed from a jet. The measurement scale and the grooming scale play off one another and result in interesting structure in the resulting distribution, depending on the relative size of these two scales. Where the value of the jet mass is equal to the grooming scale, the leading-order distribution develops a cusp, and this may lead to significantly inaccurate higher fixed-order predictions in the vicinity [45]. The factorization theorem of Refs. [5,6] is only valid when the grooming scale is parametrically larger than the jet mass, but this isn't necessarily the regime that is most relevant for experiment. The numerical size of corrections to the factorization theorem description hasn't been firmly established, which calls into question its relevance as the dominant description of the groomed jet near the exclusive phase space boundary.
In this paper, we address these issues directly and establish that their effect is actually substantially numerically smaller than would be naïvely expected. In Sec. 2 we present the analytic prediction of the leading-order distribution of the groomed heavy hemisphere mass, which provides a foundation for the analyses in the following sections. In Sec. 3, we study the cusp in the leading-order distribution of the groomed heavy hemisphere mass and show explicitly using numerical next-to-and next-to-next-to-leading order codes that the cusp is softened, contrary to what one might expect. In Sec. 4, using numerical fixed-order codes, we isolate the contribution to the groomed heavy hemisphere mass distribution that is not described by the factorization theorem and show that its numerical size is about a factor of 4 times smaller than would be expected, for experimentally-relevant values of the grooming parameter. We conclude and discuss future directions in Sec. 5.

Leading-Order Distribution
As mentioned in the introduction, we restrict our attention to jets produced in e + e − collisions, which requires a slightly modified definition of the mMDT groomer than that presented in its original form [2]. As e + e − collisions occur in the center-of-mass frame, we groom each event hemisphere individually. Once the events have been groomed, we then measure the masses of the event hemispheres. Grooming decorrelates the hemispheres, and so a more natural scale to compare the mass to is the ungroomed hemisphere energy, rather than the center-of-mass energy. 1 We then only measure the "heaviest" of the two hemisphere masses. Details of the precise algorithm can be found in, e.g., Ref. [37].
With this definition of our measurement procedure, it is straightforward to analytically calculate the leading-order distribution for the heavy hemisphere groomed jet mass ρ. We first note that at leading order in the center-of-mass frame, one event hemisphere has two particles in it, while the other has only a single particle. Thus, the heaviest hemisphere must be the one with two particles. Using the three-body phase space variables {x i }, where where Q is the total four-momentum of the event and i = 1, 2, 3 ranges over the final state particles, the energy of the hemisphere with two particles is: (2. 2) The least energetic particle of the event is also the least energetic particle of the two-particle hemisphere, with energy The mMDT grooming requirement on the heavy hemisphere enforces that the groomed mass is only non-zero if If the grooming requirement is satisfied, then the groomed jet mass is just the total hemisphere mass: in terms of the three-body phase space variables. The observable of interest ρ is then the ratio of this mass to the hemisphere energy: The leading-order distribution of ρ can then be calculated from integrating over the matrix element for e + e − → qqg production: where σ 0 is the leading-order electroweak cross section for e + e − → qq and C F = 4/3 is the fundamental Casimir of SU(3) color. The phase space constraints are simple enough that the integral can be evaluated exactly. We find

����� �����
This distribution is plotted in Fig. 1 for a few values of the grooming parameter z cut . The cusp in the distribution located at ρ = 2z cut −z 2 cut is clear: for values of ρ above the cusp, grooming has no effect, while for ρ below the cusp, grooming significantly modifies the distribution from its ungroomed counterpart.
With an analytic result, it is interesting to isolate components of the distribution in different limits. First, in the limit that ρ z cut , but z cut is arbitrary, the cross section reduces to Thus, in this limit, this logarithmic cross section approaches a constant value, set by the value of z cut . Additionally, the first two terms in the parentheses on the right, −3 − 4 log z cut , survive in the z cut 1 limit. This sequential strongly-ordered limit ρ z cut 1 is that described by the factorization theorem of Refs. [5,6]. The terms relevant for z cut ∼ 1, 6z cut +4 log(1−z cut ), have not yet been calculated to arbitrary accuracy within a factorization theorem. These terms arise from collinear splittings at leading power in ρ 1, because soft, wide-angle emissions that pass the groomer enforce that z cut 1. They were first calculated explicitly in Ref. [8], which incorporated finite z cut effects into resummation of groomed mass for narrow jets at next-to-leading logarithm, following a proposal from the original paper on the mMDT groomer [2].
We can also isolate the distribution around the cusp with weak grooming, where ρ ∼ z cut 1. In this region, the cross section becomes This expression is continuous through ρ = 2z cut , but not smooth, which can be verified by differentiating above and below ρ = 2z cut . Just above ρ = 2z cut we have while just below ρ = z cut we find (2.12) Thus at leading power in z cut only the position of the cusp depends on z cut , but not its shape, as also seen in Fig. 1. We will identify more features of this cusp in the following section.

Cusps at Fixed Order
With the analytic result for the leading-order cross section established in the previous section, we can calculate the discontinuity of the derivative of the leading-order cross section at the point where ρ = 2z cut − z 2 cut , for arbitrary z cut . The difference in the derivative above and below that point is where the + and − superscripts denote above and below the point ρ = 2z cut −z 2 cut , respectively. As z cut → 0, this reduces to the difference calculated in the previous section.
A cusp located on the interior of phase space in a differential distribution can potentially produce unreliable predictions at higher fixed orders [45]. These are typically caused by end points in low-order distributions that are not at the edge of the full phase space. The cusp introduces a new "boundary" of phase space at that point at which the derivative of the cross section is discontinuous. At higher orders, points immediately below the cusp can correspond to a degenerate phase space configuration in which virtual corrections are added to the leading-order prediction. Points immediately above the cusp can be generated by soft or collinear real emissions off of the leading-order configuration. Thus, immediately above and below the cusp, there can be a mis-cancelation of real and virtual divergences in the derivative of the cross section. The differential cross section itself can still be continuous, but further higher-order corrections can transform the cusp to become more and more step-like. This feature is observed, for example, at the endpoint of the leading-order distribution of thrust, where τ = 1/3. The next-to-leading order correction extends beyond τ = 1/3, but begins to form a step-like shape around τ = 1/3 [46].
The general analysis of Ref. [45] would seem to suggest that the cusp observed in the groomed heavy hemisphere mass distribution would transform into a discontinuous step with the inclusion of higher fixed-order contributions. Unlike the examples studied in that paper, though, the cusp in the groomed mass distribution lives on the interior of the phase space even at leading-order, so its higher-order corrections will have a different structure than, say, the τ = 1/3 end point in thrust. If it were the case that this groomed cusp developed into a step, then the fixed order expansion would not smoothly converge around ρ = 2z cut − z 2 cut , and this could be problematic for claiming theoretical precision throughout the distribution. While no evidence for such a step has been observed in studies of mMDT grooming at next-to-leading order and beyond [6,17,31], this could simply be due to the fact that these studies used relatively large grid spacing in ρ for the numerical fixed-order results. The immediate region around the cusp hasn't been studied with sufficient resolution to identify step-like behavior or not at higher-orders.
To study the higher-order behavior of the cusp in the groomed heavy hemisphere distribution, we use results from fixed-order codes. At next-to-leading order, we generated 10 13 events at next-to-leading order in EVENT2 [47], with grooming parameter z cut = 0.04 , 0.06 , 0.08 , 0.1. From these events, we generated histograms with 400 uniform bins in log ρ in the range log ρ ∈ [−4, 0]. This range is sufficient to cover the location of the cusp for each value of z cut considered and the bins are small enough to clearly resolve the cusps. At next-to-next-to-leading order, we use the results generated with the CoLoRFulNNLO method, originally for the study of Ref. [37]. Details about event generation can be found in that reference. The result of this numerical analysis is shown in Fig. 2, in which we plot the leading, next-to-leading and next-to-next-to-leading order distributions, fixing α s = 0.118 and the number of active quarks n f = 5 in QCD. In going from leading next-to-leading order, we see that the cusp is actually softened and nothing like a discontinuous step seems to be starting to be resolved at next-to-leading order or beyond.

Fixed-Order Heavy Groomed Mass
To understand this a bit more, we can determine the fixed-order expansion of the discontinuity of the derivative at the cusp order-by-order. EVENT2 calculates the cross section in each color channel, so we separate out the O(α 2 s ) contributions in each color channel and numerically calculate the cusp. To do this, we fit lines to the five points immediately above and below the location of the cusp, respectively, and then calculate the difference between the slopes of these lines. With z cut = 0.1, we find that this procedure determines the α s expansion to be: In QCD, the adjoint Casimir C A = 3 and T R = 1/2 and we don't quote uncertainties on the O(α 2 s ) values as they are meant to be representative, not precise. The next-to-leading order correction to the discontinuity to the derivative is opposite in sign to the leading-order discontinuity, resulting in a smoother distribution at higher orders. This suggests that the description of the cusp and its resolution through higher fixed-orders converges, with no need for resummation of soft and collinear emissions around the cusp region.

Factorization-Violating Contributions
All-orders resummation of the groomed jet mass has been accomplished at the highest accuracy through factorization of the different components to the cross section, at leading power in the limit in which ρ z cut 1 [6]. We won't review the factorization theorem here, and instead just point the interested reader to the original literature. In this strongly-ordered limit in which ρ z cut 1, all emissions that remain in the jet after grooming are necessarily collinear, within an angular distance θ 2 of the jet axis of by assumptions of the factorization theorem. Because of this effective collinear restriction, no non-global logarithms in the mass ρ are present in this limit, and with mMDT grooming, all simultaneously soft and collinear divergences in the mass are also eliminated. This significantly simplifies the structure of the emissions that can contribute to the groomed mass, hence enabling high precision resummation. This leading-power factorization theorem can be used to predict all contributions to the cross section of the groomed heavy hemisphere mass that are enhanced by logarithms of ρ and/or z cut . That is, the factorization theorem predicts the cross section to be a function of log ρ and log z cut : and all contributions from positive powers of ρ or z cut are formally suppressed in this limit. As we measure the cross section differential in ρ, we can always restrict to a region in which ρ 1, and therefore power corrections in ρ would be numerically suppressed. However, because z cut is a fixed parameter of the groomer, the assumption of z cut 1 is not necessarily satisfied for any application of the groomer. In particular, a typical value of z cut is about 0.1, which is small, but the largest that z cut can possibly be is 0.5, and it's not obvious that 0.1 is parametrically smaller than 0.5. At the very least, we should assess the potential impact of finite z cut corrections to the resummation accomplished in the factorization theorem.
With this goal in mind, we can express the differential cross section for the groomed heavy hemisphere mass in the regime in which ρ z cut , but with no restriction on the value of z cut as: where the · · · represents terms at higher powers of z cut . The factorization theorem only describes the first term in this series in z cut and no systematic procedure has been presented as of yet to calculate the cross section coefficients of z i cut in this series to arbitrary order in the coupling α s . Further, as powers of z cut have been made explicit in this expansion, we can estimate the relative size of the power corrections in z cut to the cross section valid in the ρ z cut 1 limit. We assume that ρ 1 in every term on the right side, so every term should be some function of log ρ. As such, we do not expect any parametric difference between the ρ z cut 1 term and the other cross sections, stripped of their z cut dependence. That is, we expect dσ ρ zcut Therefore, all scaling of terms in this expansion are carried by the explicit powers of z cut , and so we would expect that the factorization theorem in the regime ρ z cut 1 describes the cross section when ρ z cut up to corrections of order z cut : Concretely, if z cut = 0.1, we expect the factorization theorem to correctly describe the cross section in this region up to 10% corrections. With the factorization theorem and the complete fixed-order cross section through nextto-next-to-leading order, we can test this assumption. First, we expand the all-orders cross section of the factorization theorem in powers of α s as: The superscript (n) denotes the term at order α n+1 s in the limit in which ρ z cut 1. The first three terms have been calculated and are [6,37]: Here we substituted explicitly the color factors of QCD (C F = 4/3, C A = 3, T R = 1/2), and set the number of active quarks to n f = 5. As written, this is a function of z cut and so the numerical size of the terms is still obscured. Setting z cut = 0.1, the leading-power cross section is: To assess the size of the finite z cut corrections order-by-order, we will calculate the fractional difference between the complete cross section in the ρ z cut limit and the leadingpower prediction: From our earlier arguments, we expect ∆ (n) ∼ z cut . Starting with n = 0, we can compare the complete leading order cross section expanded for ρ z cut of Eq. (2.9) to the leading-power result: The leading term in the numerator of this expression is indeed proportional to z cut , but for z cut 0.1, the denominator is substantially large. Plugging in z cut = 0.1 we find which is about a factor of 4 smaller than z cut . The denominator of Eq. (4.10) is logarithmic in z cut and so for small excursions varies slowly. So, as a rule of thumb, for experimentallyrelevant values of z cut 0.1, we can approximate That is, the finite z cut contributions in the leading order cross section of the groomed heavy hemisphere mass are just few percent corrections to the leading-power prediction of the factorization theorem in the limit ρ z cut 1. We extend this fractional difference comparison through O(α 3 s ) in Fig. 3. At O(α 2 s ), we compare the result of the factorization theorem to the output of EVENT2, and observe that the scaling of the fractional difference is very similar to that at leading order, as ρ → 0. That is, we can also make the approximation At O(α 3 s ), we compare the result of the factorization theorem to the output of the CoLoR-FulNNLO method, as tabulated in Ref. [37]. The bins in log ρ are large at this order and do not extend as far into the infrared as lower orders, but a similar outcome is observed. As ρ → 0, the finite z cut corrections at O(α 3 s ) are significantly smaller than the expected z cut size.
In a precision prediction, one must match the leading-power resummation to fixed-order for a prediction that is accurate over all of phase space. The simplest matching procedure is additive matching in which resummed and fixed-order results are added, and their overlap is subtracted: (4.14)

(α s ) Heavy Groomed Mass
(α s 2 ) Heavy Groomed Mass The final term represents the resummed result expanded to the order in α s at which the fixedorder prediction is accurate. If a fixed-order prediction for the groomed heavy hemisphere mass is matched to a resummed prediction in the limit that ρ z cut 1, these results demonstrate that the fixed-order prediction will have a residual contribution to the matched cross section in the limit that ρ → 0 at the order of a few percent of the resummed prediction, due to finite z cut effects. This is at the order of, or even smaller than, estimates of theoretical uncertainties by scale variation [6,39]. With sufficiently high fixed-order matching, the uncertainty due to not resumming the finite z cut corrections that survive in the ρ → 0 limit could then be accounted for within an appropriate uncertainty budget.

Conclusions
Jet grooming, especially with mMDT or soft drop, has opened up a new precision regime in jet substructure. The groomer introduces a new scale z cut on the jet, beyond the scale of the measurement, and that new scale both provides opportunities and challenges for precision calculations. Because of the grooming scale in mMDT/soft drop, non-global logarithms of the jet mass ρ are eliminated at small masses. This enables an all-orders factorization theorem in the ρ z cut 1 limit, but also produces non-analytic behavior at leading order around ρ ∼ z cut and misses finite z cut corrections in the ρ → 0 limit. In this paper, we explicitly demonstrated using fixed-order codes that both of these potential issues are benign. Unlike endpoint cusps in the thrust distribution, for example, higher-order corrections soften the cusp in the groomed mass distribution, suggesting that the region around ρ ∼ z cut becomes smooth and stays continuous as higher orders are included. In the ρ → 0 limit, finite z cut corrections through O(α 3 s ) are actually numerically much smaller than expected, at the percent level even for typical values of z cut ∼ 0.1. This level is small enough that any residual uncertainty from not resumming finite z cut corrections can be absorbed in theoretical uncertainties.
While these results demonstrate numerical control over the groomed mass distribution, it may be desirable to have a more complete analytical understanding of the features studied here. For instance, while no non-global logarithms are present in the groomed mass distribution as ρ → 0, there is a conservation of complexity. The non-global logarithms are pushed to the ρ ∼ z cut region, and may have a relationship to the physics responsible for softening the cusp. It should be possible to construct an effective theory for small excursions away from the cusp region, and correspondingly account for soft and collinear emissions about the cusp to all orders. Such a study would unambiguously demonstrate whether higher-order corrections do indeed smooth the cusp or not. Though the finite z cut corrections are numerically small, they could essentially be completely eliminated by a O(z cut ) factorization theorem, for ρ → 0. For example, we expect that enumerating and factorizing all contributions that yield the first z cut corrections should be possible, as to that order there can be at most one hard emission groomed away, for example. Accounting for these additional effects will provide an even more precise picture of groomed jets to compare to experiment.