Recursive Soft Drop

We introduce a new jet substructure technique called Recursive Soft Drop, which generalizes the Soft Drop algorithm to have multiple grooming layers. Like the original Soft Drop method, this new recursive variant traverses a jet clustering tree to remove soft wide-angle contamination. By enforcing the Soft Drop condition N times, Recursive Soft Drop improves the jet mass resolution for boosted hadronic objects like W bosons, top quarks, and Higgs bosons. We further show that this improvement in mass resolution persists when including the effects of pileup, up to large pileup multiplicities. In the limit that N goes to infinity, the resulting groomed jets formally have zero catchment area. As an alternative approach, we present a bottom-up version of Recursive Soft Drop which, in its local form, is similar to Recursive Soft Drop and which, in its global form, can be used to perform event-wide grooming.


Introduction
As the Large Hadron Collider (LHC) collides protons at the highest energies accessible in a laboratory setting, electroweak-scale resonances are routinely produced with transverse momenta far exceeding their rest mass. These highly boosted objects will generate collimated hadronic decays, which are often reconstructed as a single fat jet. Due to the differences in their radiation patterns, fat jets originating from boosted objects can be distinguished from ordinary quark and gluon jets by studying their substructure. Since the start of the experimental program of the LHC, jet substructure has matured into a highly active field of research [1][2][3][4][5][6][7]. First introduced in the pioneering studies of Refs. [8][9][10][11], jet substructure was revived by seminal work showing its potential application in the search for a light Higgs boson decaying to bottom quarks [12].
By now, a variety of tools use jet substructure to tag boosted objects and mitigate contamination from poorly modeled contributions such as underlying event and pileup , which have already found numerous experimental applications . One particular technique that has emerged both as a powerful substructure probe and as an analytically tractable approach is the modified Mass Drop Tagger (mMDT) [28], and its later extension, Soft Drop (SD) [42]. The SD procedure takes an initial jet with radius R 0 , reclusters its constituents with the Cambridge/Aachen (C/A) algorithm [95,96], and removes soft wideangle emissions that do not satisfy the SD condition, defined as min(p t,1 , p t,2 ) p t,1 + p t,2 > z cut ∆R 12 R 0 β , (1.1) where the notation will be reviewed below. This method has been used in a variety of analyses at the LHC, including jet mass and transverse momentum measurements in dijet events [97,98], vector resonance and dark matter searches [99][100][101][102], and boosted H → bb searches [103]. It has also been used as a powerful probe of the QCD splitting function, both in proton-proton collision [104,105] and in heavy ion collisions [106][107][108], where the shared momentum fraction z g provides a handle on medium effects [109][110][111]. Because grooming with mMDT/SD removes complications due to unassociated wide-angle emissions, it has also allowed analytic calculations of the groomed jet mass to reach unprecedented accuracies [50,51,54,55].
In this paper, we introduce a recursive extension of the SD algorithm-aptly named Recursive Soft Drop (RSD)-where SD is reapplied along the C/A clustering history until a specified number N of SD conditions have been satisfied. We focus on jet grooming with RSD, taking an angular exponent β ≥ 0. The case N = 0 involves no jet grooming, the case N = 1 corresponds to the original SD procedure, and the cases N ≥ 2 are well-suited to multi-prong boosted objects. Like the original SD, RSD is stable under hadronization and underlying event effects, but RSD is able to provide improved jet mass resolution for signals such as boosted 2-prong W bosons, 3-prong top quarks, and 4-prong Higgs bosons. Intriguingly, in the N → ∞ limit, groomed jets from RSD formally have zero catchment area [112], a feature that suggests that RSD would be well suited for pileup mitigation.
We focus our attention here on the phenomenological applications of RSD, leaving a detailed study of its analytical properties to future work. The behavior of RSD is summarized in Fig. 1 in the case of distinguishing boosted top quark signals from quark/gluon jet backgrounds. As the number of RSD layers increases, the top mass peak is better resolved, with the best performance (for this choice of SD condition) achieved in the infinite N limit, hereafter labeled RSD ∞ . For quark/gluon jets, RSD has a much smaller impact on the groomed mass, but there are still substantial gains in top tagging performance just -2 - from the increased signal mass resolution. In general, any application at the LHC suitable for SD is also suitable for RSD, with the possibility of loosening the SD condition while increasing the number of layers N to balance performance and robustness. As a concrete example, we show how to use RSD ∞ in combination with either the SoftKiller [113] or the area-median method [112,114] to mitigate pileup in high-luminosity conditions. The rest of this paper is organized as follows. We introduce the RSD algorithm in Sec. 2, and describe its basic features in Sec. 3, such as its robustness to non-perturbative effects and the N → ∞ limit of zero-area jets. In Sec. 4, we show how RSD improves jet mass resolution for boosted resonances with hadronic decays, and we present a brief case study of boosted top tagging. In Sec. 5, we discuss the application of RSD to pileup mitigation. We present an alternative version of the RSD algorithm in Sec. 6, where a bottom-up strategy can be applied locally (at the jet level) or globally (at the event level). We conclude in Sec. 7, leaving additional studies to the appendices.
β defining how much collinear radiation is removed by the grooming procedure. It is also convenient to introduce a reference angular scale R 0 (absorbable into the definition of z cut ), which is typically set to the initial jet clustering radius R. We denote by p t,i the transverse momentum of the i-th subjet, and by ∆R ij the rapidity-azimuth distance between the i-th and j-th subjets.
The SD algorithm proceeds as follows: 1. Undo the last C/A clustering step of the jet j and label the two parent subjets as j 1 and j 2 . 4. Iterate this process until the SD condition in Eq. (2.1) has been met N times, or until the C/A tree has been fully recursed through (using the same definitions of grooming/tagging mode as in ordinary SD). The groomed jet is then made of all the remaining branches.
After N iterations where the SD condition is satisfied, one obtains a groomed jet constructed of (N + 1) subjets. 3 For N = 0, this procedure returns the initial ungroomed jet, while for N = 1 it is equivalent to the original SD algorithm. We use RSD N (β, z cut ), or simply RSD N , to denote RSD grooming with N iterations and parameters β and z cut , such that RSD 0 = 1 and RSD 1 = SD. For fully recursive SD grooming with N = ∞, we use RSD ∞ . 4 In this paper, we only study RSD with β ≥ 0 in grooming mode. In principle, one could also use RSD N (β < 0, z cut ) to define an (N + 1)-prong tagger. For example, one could use RSD 2 with β < 0 as a top tagger, much in the way SD = RSD 1 can be used for boosted W tagging (see Section 7 of Ref. [42]). Ultimately, though, we find that RSD ∞ (with β ≥ 0 in grooming mode) shows good overall tagging performance, making it our recommended default algorithm.
It is worth noting that RSD bears some resemblance to the Iterated Soft Drop (ISD) procedure recently introduced in Ref. [117] for quark/gluon discrimination. A key difference, however, is that RSD follows both branches of the clustering tree, while ISD limits itself to traversing only the harder branch. Both RSD and ISD, along with mMDT and SD, are implemented in RecursiveTools (≥2.0.0) included as part of fastjet-contrib [118]. 3 Alternatively, one could consider a fixed-depth recursion instead, where the SD condition is applied N times on each branch of the clustering tree, resulting in (up to) 2 N prongs. This coincides with the variable depth algorithm in the N → ∞ limit, but will differ at finite N due to the removal of small angle emissions on the subleading branch. 4 Note that RSD∞ is only well defined in grooming mode, and, therefore, only IRC safe for β > 0.

Dynamic R 0 for aggressive grooming
In the default RSD algorithm, the R 0 in Eq. (2.1) is fixed to the initial jet radius (henceforth denoted fixed R 0 mode). One can instead take a more aggressive approach, in which one updates R 0 dynamically during the grooming procedure. In this dynamic R 0 mode, at each step of the process where two particles i, j meet the SD condition, the R 0 value is updated to R 0 = ∆R ij , with R 0 being kept independent on each branch of the C/A clustering tree. By decreasing R 0 in each grooming step for β > 0, one imposes a more stringent requirement on the momentum fraction. This yields a more aggressive grooming strategy for all N > 1.
For most practical purposes, the dynamic R 0 variant behaves quite similarly to RSD with a smaller β value. As we will see in Sec. 5.2, though, it does have some specific advantages for pileup mitigation with area-median subtraction.

Basic properties of Recursive Soft Drop
We perform a variety of parton shower studies in this paper to highlight the features of RSD. In this section, as well as in the more detailed studies in Secs. 4 and 5, we always generate √ s = 13 TeV proton-proton collisions in Pythia 8.223 [119][120][121] with the default 4C tune. To reduce computation time, we turn off π 0 and B-hadrons decays (still letting other hadrons decay). 5 Jet clustering is performed with FastJet 3.2.1 [123], using the anti-k t algorithm [124] with the default E-scheme recombination and a jet radius R = 0.8. We then select jets that have transverse momentum p T > 500 GeV and rapidity |y| < 5.
To demonstrate the key similarities and differences between SD and RSD N , we discuss the groomed radii and splitting scales in Sec. 3.1, the robustness to non-perturbative effects in Sec. 3.2, and the N → ∞ limit of zero-area jets in Sec. 3.3. In App. A, we present fixedorder studies of the RSD jet mass distribution to order α s and α 2 s .

Groomed radii and momentum fractions
In addition to grooming a jet, RSD defines a range of new jet observables. At each SD step i, one can define the groomed radius R g,i and momentum fraction z g,i , equal to the ∆R 12 and z 12 values for the corresponding branch that passes Eq. (2.1). For β > 0, the R g,i observables are IRC safe, while the z g,i are in general Sudakov safe [116]. Thus, after N layers of RSD grooming, we obtain N pairs of {R g,i , z g,i } values containing information about the grooming history of the jet. The values obtained for i = 1 are identical to the ones obtained from ordinary SD. 6 We now consider boosted top quark events from the process pp → tt, forcing the tops to decay in the hadronic channel. In Fig. 3a, we show the R g,i distributions from RSD i with i ∈ {1, 2, 3, 4}, taking β = 1 and z cut = 0.05 as baseline parameters. As expected, the 0.
Top Events  groomed radius of the jet decreases as more layers of RSD are applied. There is a small kink structure at lower values of R g for i = {1, 2} due to the presence of W jets in the top quark decay. In Fig. 3b we show the z g,i distributions, and find that the momentum fraction of the groomed jet is also peaked at values closer to zero as i increases. The sharp cut at 0.5 is because z g is defined as the relative momentum fraction of the softer subjet, which can be at most half. For i = 1, the z g,i distribution has a nontrivial structure from cases where the clustering history differs from the expected t → bW topology. The case of N = 2 yields a three-pronged grooming strategy, which may be useful for the study of boosted top decays, and we explore this possibility in Sec. 4.5. Alternatively, one can use the {R g,i , z g,i } values directly to discriminate signal events from backgrounds. For example, we can use ratios like R g,3 /R g,2 as a probe for boosted top jets, somewhat analogous to the use of the N -subjettiness ratio τ 32 [20,21]. Similarly, one can use the z g,1 , and z g,2 observables to distinguish QCD-like 1 → 2 parton splittings (see e.g. [104]) from hard t → bW and W → qq decays. In practice, though, top taggers built from R g,i and z g,i do not seem to perform quite as well as N -subjettiness (with RSD ∞ grooming), but might remain useful inputs for multivariate analyses, depending on how much R g,i and z g,i are correlated with N -subjettiness.

Non-perturbative effects
Analytical control in jet substructure is mainly limited to perturbative QCD effects. Because internal jet properties probe very exclusive kinematic regions, however, it is not uncommon for non-perturbative effects to yield substantial corrections to perturbative predictions. As such, an important ingredient for robustness of a grooming or tagging algorithm is having a limited sensitivity to non-perturbative contributions, such as hadronization or underlying event. This robustness has already been demonstrated for the mMDT and SD algorithms [28,42], and we present here a similar analysis for RSD.   We use the process pp → Z + j to generate samples of background QCD jets, and use the benchmark RSD parameters β = 1 and z cut = 0.05. In Fig. 4a, we plot the ratio of the jet mass distributions before and after the hadronization step in Pythia. Without any grooming, there are around 10% hadronization corrections throughout the whole distribution, with large corrections below a jet mass of m j ∼ 50 GeV. With RSD N , though, the distribution is significantly more stable down to jet masses of around 20 GeV, independent of the number of RSD layers. Remarkably, in the bulk of the distribution between 50-400 GeV, RSD ∞ exhibits around 5% hadronization corrections. At large mass, the RSD results also show a sizable improvement as one increases the number of RSD layers.
In Fig. 4b, we show the impact of underlying event, plotting the ratio of the jet mass distributions before and after the inclusion of multiple parton interactions (MPI). Here again, we see a similar behavior for all N > 0 curves in the small mass limit, with relatively small corrections due to non-perturbative underlying event effects. We observe furthermore that as N → ∞, the stability of the jet mass distributions improves substantially at high masses, such that the overall corrections are less than 10% throughout the distribution. It is therefore clear that RSD with β > 0 substantially improves the robustness of groomed jets to non-perturbative effects, notably by providing more stable results than SD for large jet masses. We also checked that the jet p t was stable to both hadronization and underlying event effects, with similar performance for all N ≥ 1.

The N → ∞ limit of zero-area jets
An interesting property of RSD groomed jets is that their catchment area [112] goes to zero in the N → ∞ limit. This is due to the fact that soft ghost particles with infinitesimal energy always fail the SD condition in Eq. (2.1) for β ≥ 0, such that the final jets always have vanishing active and passive areas. 7 For this reason, one might expect that RSD jets would be particularly robust to pileup contamination, a feature we will explore further in Sec. 5. That said, many of the commonly used pileup mitigation techniques, such as the area-median method [112,114], rely in some way on the jet area. Applying these methods to zero-area jets, obtained when grooming with RSD ∞ , requires some care (see Sec. 5.2).
In Fig. 5a, we use the same pp → Z + j samples from Sec. 3.2 and plot the active jet area as a function of the number of RSD layers. Here, we fix z cut = 0.05 and scan the exponent β = {0, 1, 2}, and we explore both the fixed R 0 and dynamic R 0 variant from Sec. 2.3. For all choices of β, the active jet area decreases exponentially with N , as anticipated. For smaller β, the algorithm is more aggressive at removing soft radiation, such that the jet area decreases the most rapidly for β = 0. In Fig. 5a, one can see that the dynamic R 0 algorithm yields jets with smaller active area for any given β value. In this way, dynamic R 0 behaves more closely to the β = 0 limit of the fixed R 0 algorithm, leading to decreased pileup sensitivity.
Although RSD ∞ leads to zero-area jets in a formal sense, soft particles from, say, pileup have finite energy in an experimental setting. In Fig. 5b, we plot the effective jet area for RSD ∞ with finite-energy ghost particles, again considering β = {0, 1, 2}. Here, we report the ghost transverse momentum flow density per unit area, which is roughly 0.5 GeV per 1.0 × 1.0 bin in rapidity/azimuth for one minimum bias collision. For transverse momentum flow densities starting around 50 GeV-below the pileup p t densities anticipated at the high-luminosity LHC [127]-we observe that while the jet area after RSD ∞ grooming is reduced, it remains substantially above zero even in the N = ∞ limit. This behavior explains in part why RSD is not sufficient in itself to remove pileup, and instead performs best when combined with another pileup mitigation technique, as discussed in Sec. 5.

Improved mass resolution
The simplest way to identify a boosted hadronically-decaying resonance is through the invariant mass of its decay products; in a contamination-free setting, this would correspond to the plain jet mass. In practice, though, the jet mass is particularly sensitive to unassociated soft wide-angle emissions which smears out the distribution. To restore the mass resolution, it is therefore necessary to mitigate soft contamination through appropriate grooming.
While mMDT and SD are not always the most effective methods for enhancing the signal efficiency for boosted objects, they have the advantage of being particularly robust [48]. As seen in Sec. 3.2, hadronization and underlying event effects are suppressed, and they have an analytically-tractable behavior due to the absence of double logarithms and leading non-global contributions [28,128,129]. To maximize the tagging performance, though, Ref. [12] found that MDT grooming should be supplemented by an extra filtering step to improve the jet mass resolution. The hope is that an algorithm like RSD ∞ could, by extending the grooming procedure down to smaller angular scales, achieve excellent mass resolution without requiring any further post-processing.
In this section, we study the mass resolution with RSD in three cases of interest for the LHC: 2-prong boosted W bosons, 3-prong boosted top quarks, and 4-prong boosted Higgs jets (H → V V → 4f ). In each case, we use the same Pythia 8.223 generator settings from Sec. 3, and consider all jets in the event that pass the selection cuts p t > 500 GeV and |y| < 5. As we will see, RSD provides a way to improve the achievable resolution, with gains in the 10-20% range, while retaining the tractability and robustness of SD.
Our overall recommendation from these studies will be to use RSD ∞ with the default settings of β = 1 and z cut = 0.05, which gives good performance across the three test cases. (although other values of β and z cut are worth investigating as well). This conclusion will persist even after the inclusion of pileup in Sec. 5.1, making RSD ∞ useful in extreme environments such as the one faced by the high-luminosity LHC.

Definition of the mass peak and resolution
To define the mass resolution, we identify the smallest mass interval that contains a fixed fraction f of the total event samples (see e.g. [3]). For concreteness, we set f = 0.4 for these studies. The central value is then defined as the median of the mass interval, and the width is defined as the width of the mass interval.
An alternative approach would be to fit the mass distribution with two curves, a narrow Cauchy distribution to capture the signal and a wider background distribution to account for cases of poor reconstruction. The advantage of the interval method is that it allows us to avoid biases associated with the choice of a fitting functional form. That said, we did test the fitting approach and got similar qualitative features to the ones shown here. We also tested that the choice of fraction f did not affect the qualitative conclusions.
One caveat of the interval method is that it combines two logically distinct effects: mass resolution and signal efficiency. For example, if a technique yields perfect mass resolution, but only on a small subsample of events, then the mass interval will be larger than the 60 70 80 resolution in order to include the fixed fraction f . In practice, though, many boosted-object taggers select jets based on a fixed mass window, so our definition of mass resolution is appropriate for that setting.

Two-prong W decays
In Fig. 6a, we show the jet mass distribution from a W jet sample obtained from pp → W W , with the W decaying hadronically. At least some level of grooming is required to obtain a peak within 10% of the W mass. With β = 1 and z cut = 0.05, the jet mass distribution is already close to the expected W mass value after applying SD alone. Adding additional SD layers with N ≥ 2, the peak shifts somewhat below the expected W mass value, but, more importantly, the width of the mass distribution decreases. Another interesting observation is that the mass distribution is more symmetric with additional grooming layers, such that with N = ∞, the jet mass can be accurately fit by a Cauchy distribution.
To get a sense of the dependence on the choice of RSD parameters, in Fig. 6b we plot the central value and width of the mass distribution as a function of N . As discussed in Sec. 4.1, we use a mass interval containing a fraction f = 0.4 of events. The benchmark parameters of β = 1 and z cut = 0.05 undershoot the W mass central value by around 1 GeV, while switching to β = 2 gives a better reconstruction of the central value. On the other hand, the benchmark parameters yield the best W mass resolution, which implies somewhat better tagging performance. More generally, all of the RSD settings yield a sizable improvement over the ungroomed case and a smaller but clearly visible improvement, of order 10-20%, over the SD case in terms of resolution.

Mass Resolution of Top Jets
Top Jets, Mass Interval, f=0.4 Figure 7. Same as Fig. 6 but for the top jet sample.

Three-prong top decays
Unlike SD, which terminates after probing only the leading two subjets, RSD with N > 1 is well suited to handle the broad radiation patterns and three-prong topologies of top quarks. We consider a sample of pp → tt events, where the top mass in Pythia is set to 173.2 GeV. In Fig. 7a, we show the mass distribution of boosted tops for different layers of RSD. As expected, we find that without grooming, the peak of the distribution occurs at values well above the top mass. With successive layers of RSD grooming, the mass peak shifts to lower values, and converges very close to the top mass for N = ∞. This is due to the fact that we discard more of the extra unassociated radiation, and therefore the mass tends to decrease. In Fig. 7b, we consider the central mass value and width containing a fraction f = 0.4 of the mass distribution. We find that β = 1 and z cut = 0.05 gives nearly optimal performance (by the mass interval measure), with β = 2 and z cut = 0.1 giving comparable performance in terms of accuracy and resolution. The gain in resolution compared to SD is found to be slightly larger than 10%.
Compared to the W case in Sec. 4.2, where the resolution tends to saturate by N = 2, the top case benefits from taking N ≥ 3. In fact, there is a marginal benefit from taking N → ∞, which is why we recommend RSD ∞ for mass resolution studies involving the top quark. It would be interesting to see whether RSD ∞ would also be appropriate for defining a short-distance top quark mass [130].

Four-prong Higgs decays in associated production
Four-prong jets can provide important information in certain Higgs decays, where the H → V V → 4f channel plays an important role in determining properties of the Higgs [131] as well as its couplings to bosons [132]. They can also provide a useful probe for new physics, such as in hadronic decays of heavy resonances decaying to HW or HZ [81], or in hadronic decays of a boosted Higgs pair [133].   In Fig. 8a, we show the reconstructed Higgs mass in boosted H → W + W − → qqqq decays. This comes from a sample of pp → HZ events, with the Z boson decaying to neutrinos. Although grooming with ordinary SD performs relatively well, the Higgs mass reconstruction is better with N ≥ 3, as one would expect for a four-particle decay. A comparison of the central mass values and width for different grooming parameters is shown in Fig. 8b, showing gains in resolution around 10-15%. Once again, the benchmark choice of β = 1 and z cut = 0.05 gives an excellent reconstruction, especially in the N → ∞ limit, although β = 2 and z cut = 0.05 gives a slightly better resolution.

Boosted top tagging
We conclude this section with a concrete example of the impact of improved mass resolution from RSD. Using the same samples as Sec. 4.3, we perform a study of boosted top tagging performance. We consider two standard observables used for discriminating top jets from QCD background: the N -subjettiness ratio τ 32 = τ 3 /τ 2 [20,21], and the generalized energy correlation function ratio N 3 ) 2 [22,25]. By adjusting the degree of grooming-on both the jet mass and on the jet discriminant-we can assess the potential performance gains from RSD.
In Fig. 9, we plot the top signal efficiency versus dijet background mistag rate (ROC curves). The rightmost endpoints of the ROC curve indicate the impact of an overall cut of m groomed ∈ [160, 185]. As N increases, the dijet mistag rate improves somewhat, and the top signal efficiency increases due to the improved mass resolution. The rest of the ROC curve is associated with sweeping a cut on τ 32 and N 3 . Note that as the substructure cut becomes more stringent, there is less of a gain from increasing the number of grooming layers. The reason is that grooming removes part of the radiation phase-space where there is still some discriminating power, or equivalently, a cut on τ 32 or N 3 already removes some of the phase-space regions targeted by grooming. Because this phase-space region is also the most sensitive to non-perturbative effects, there is a tradeoff between tagging performance and non-perturbative robustness [48]. Despite this tradeoff, RSD ∞ maintains the best tagging performance for top efficiencies greater than 10%.
As discussed in Ref. [25], τ 32 with grooming and N 3 with grooming have the same soft-collinear power counting, so their performance is expected to be similar (see further discussion in Ref. [24]). This conclusion persists even with multiple grooming layers. The relative difference between τ 32 and N 3 then depends on the precise details of the event selection. N 3 is not defined with respect to any axes, so it is expected to perform better than τ 32 in kinematic regimes where the choice of axes can be ambiguous [25]. Here, though, we are taking a rather tight mass window of m jet ∈ [160, 185] GeV, so the axes effects are subleading and τ 32 turns out to give better performance. Though not shown, we also tested the performance of the M 3 observable [25]. While M 3 always performs worse than τ 32 or N 3 , the relative improvement in going from SD to RSD ∞ is greater due to the removal of radiation in all three prongs.
We found similar tagging results for other boosted objects, such as hadronic decays of boosted W and Higgs bosons. In general, RSD ∞ gives similar or improved performance compared to tagging with SD, though most of that comes just from the improved mass resolution. For the case of W tagging, it would be interesting to study a version of the dichroic N -subjettiness ratio [49], where RSD ∞ is used to compute the jet mass and τ 1 (or 1 e 2 ), and a lighter grooming, like (R)SD with larger β and smaller z cut , is used to calculate τ 2 (or 2 e 3 ). Results for boosted top tagging with large pileup multiplicity are shown in App. B.2.

Robust pileup mitigation
As the LHC progresses towards ever higher luminosities, mitigating the effect of secondary proton-proton collisions becomes an increasingly important challenge. Large pileup levels can substantially impact typical jet observables, increasing for example the jet's mo-mentum, while shifting and distorting jet shapes. A number of techniques have been developed to remove soft radiation associated with secondary vertices. These include areamedian [112,114] or shape [134] subtraction; grooming with trimming [15], pruning [17], or mMDT/SD [28,42]; charged hadron subtraction [135]; particle-level removal methods such as SoftKiller [113] and PUPPI [136]; and machine learning methods [137]. In practice, in the context of jet substructure, it is generally most useful to combine a removal or subtraction method with some type of jet grooming to achieve optimal results (see e.g. [3]).
Here, we investigate the properties of jets after grooming with RSD under varying levels of pileup, with additional pileup studies appearing in App. B. We will see that, when used in combination with SoftKiller [113], the RSD algorithm yields robust pileup mitigation results even under high pileup conditions. 8 We will also test RSD with the area-median subtraction approach [114].

Mass resolution with SoftKiller
SoftKiller is an event-wide particle-level removal method, which uses a dynamic cut on transverse momentum p cut t to remove soft particles [113]. The threshold is determined dynamically on an event-by-event basis as follows: 1. The event is split into patches of size a SK × a SK in rapidity-azimuth space.
2. For each patch, one determines p max t,i , the largest particle transverse momentum in patch i.

The transverse momentum threshold p cut
t is determined by where the median is taken across all patches.
4. All particles with transverse momenta below p cut t are removed from the event.
An equivalent description of SoftKiller is finding the minimal p cut t such that exactly half of the patches have zero momenta. This p cut t ensures that the median across patches of transverse-momentum flow per unit area ρ is zero. 9 For each analysis, one has to set the appropriate patch size by choosing the SoftKiller parameter a SK . The goal is to set the parameter a SK to minimize both the shift in jet p t and mass. After RSD grooming of R = 0.8 jets, this usually achieved with a SK = 0.5, although this was found to be somewhat process dependent. 10 For the case of pp → tt, the dependence of the jet mass on the choice of grid parameter is shown in Fig. 10. We find that the position of the top peak depends significantly on the value of a SK with small (large) 8 Applying RSD without any pileup mitigation shows poor performance, so we have not included those results in our discussion. 9 The quantity ρ is computed via ρ = median Ai is the patch area and pt,i is the transverse momentum of the patch. 10 The optimal choice of aSK would also change if we were to include π 0 and B-hadron decays in our simulation or if we were performing charge-hadron subtraction or including a calorimeter simulation [113].

Mass Resolution of Top Jets
Pythia 8.223, s =13 TeV, p T >500 GeV, pp→tt n PU 0 20 60 140 Top Jets, Mass Interval f=0.4, SK, a SK =0.5 values of a SK showing a clear undersubtraction (oversubtraction). In our simulations, a SK 0.5 shows only a small average bias, so we take this value as our benchmark. We now study the effects of pileup on the mass resolution in groomed jets, following the same analysis strategy as Sec. 4. Since the behavior is quite similar to that observed previously, we focus here only on top events, with the W and Higgs cases discussed in App. B.1.
We use the same pp → tt process as in Sec. 4.3, with the Pythia top mass at 173.2 GeV. In Fig. 11a, we show the top jet mass distribution with the addition of 140 pileup vertices. With SoftKiller but without any jet grooming, there is a 12 GeV shift in the reconstructed top mass peak. (The shift would be larger than 200 GeV if SoftKiller were not applied.) The shift decreases to about 5 GeV after applying SD, with additional improvements in going to RSD 2 and RSD ∞ . Though the shape of the top mass distribution is not nearly as  (a)

Mass Resolution of Top Jets
Top Jets, Mass Interval f=0.4, Area Median a=0.55 symmetric as without pileup, RSD ∞ does restore some of the symmetry of the top mass distribution compared to SoftKiller alone. 11 In Fig. 11b, we show the central mass value and width after the application of SoftKiller and RSD N for several pileup levels. As n PU increases from 0 to 140, the central top mass value and the width increases monotonically, even with the application of SoftKiller. By applying more layers of RSD, the central top mass decreases toward the physical value, with somewhat improved stability across the different pileup levels. The best performance is obtained for RSD ∞ , in agreement with the analysis of Sec. 4.3, and the mass difference is less than 5 GeV even at the highest pileup level. This shows that, as well as improving the resolution of observables such as the jet mass, grooming with RSD somewhat improves the stability of the distributions as a function of the number of pileup vertices n PU . That said, there are still substantial distortions to the width of the top mass distribution even after RSD ∞ , reflecting the underlying challenge of pileup at the LHC.

Mass resolution with the area-median method
Let us now consider pileup mitigation using RSD in conjunction with the area-median method [112,114]. This removal technique is widely used in experimental analyses, and we therefore provide a short study of its use with RSD groomed jets.
Generally speaking, combining the area-median procedure with grooming requires more than just subtracting the jet after grooming. Indeed, since the grooming procedure imposes kinematic constraints on subjets, one wants to apply the subtraction procedure directly on the subjets, so that these kinematic constraints use quantities which have been 11 As shown in Ref. [113], one can get better performance for SoftKiller alone by increasing the grid-size parameter aSK. Even if this works for correcting for the mass shift, however, it results in much broader peaks than what we obtain here by combining SoftKiller with (R)SD. corrected for pileup (see e.g. [3]). In the case of RSD (and mMDT/SD), this means that, at each step of the declustering procedure, we should apply an area-median subtraction to both subjets before imposing the SD condition in Eq. (2.1). 12 Note that even in the case of RSD ∞ where the final groomed jet has zero area, intermediate subjets have a non-zero area, so subtracting the intermediate subjets is crucial.
The resulting mass peak is shown in Fig. 12a, for both a dynamic and fixed R 0 implementation of RSD for 60 pileup events, using the area-median parameter a = 0.55. We find that only the most aggressive algorithm RSD ∞ , preferably using dynamic R 0 from Sec. 2.3, succeeds at reconstructing the top peak. This is confirmed by Fig. 12b, where we show the central mass values and widths for various pileup levels. Despite the fact that RSD ∞ with dynamic R 0 does recover the top mass peak, one should still note that the mass resolution (and median peak position) significantly worsens with increasing pileup multiplicity, encouraging the investigation of more recent particle-level pileup mitigation techniques for future runs of the LHC.

Bottom-Up Soft Drop for event-wide grooming
We have seen that the default RSD algorithm in Sec. 2.2 yields sensible grooming behaviors across a wide range of applications, with excellent overall performance for N = ∞. It is possible, however, to obtain similar results with a different approach. In this section, we introduce an alternative to RSD ∞ called Bottom-Up Soft Drop (BUSD), which is also available in RecursiveTools (≥2.0.0) through fastjet-contrib [118].
The default RSD is a top-down algorithm, where the SD condition in Eq. (2.1) is imposed by declustering a jet starting from its clustering tree. By contrast, BUSD imposes Eq. (2.1) as part of a (re)clustering procedure, effectively starting from the leaves of the clustering tree. BUSD can either be applied to a single jet (local BUSD) or to the event as a whole (global BUSD).
With either BUSD approach, the SD criteria is applied at each pairwise combination during the reclustering stage, somewhat like pruning [17]. In the fastjet-contrib implementation, this is achieved through a modified recombination scheme, such that at each step of the reclustering with a large-R C/A algorithm, the pseudojet obtained from combining particles i and j with smallest distance d ij = ∆R ij /R 0 is given by where the maximum is defined by p t . In the definition of d ij , we choose R 0 to match the SD criteria in Eq. (2.1). For the studies below, we always match the parameter R 0 to the jet radius R.
With local BUSD, the reclustering is applied to an individual jet found by another jet algorithm. While one cannot obtain finite N results with BUSD, the bottom-up algorithm 12 A similar philosophy is also recommended, but sadly not always implemented, when using other grooming techniques like pruning and trimming. provides results that are very similar to the N = ∞ top-down approach of RSD ∞ . This is expected, since local BUSD uses the same initial jet constituents as RSD ∞ .
With global BUSD, the full event is clustered into a single large C/A tree. This provides an event-wide grooming strategy that does not require any specific jet definition. After grooming with global BUSD, the groomed event contains only a subset of the initial particles, so any jet definition can be used to cluster the remaining particles. The resulting jets are guaranteed to have zero active area, without any additional treatment required for each individual jet.
The behavior of BUSD is shown in Fig. 13a for the same pp → tt sample from Sec. 4.3. Without pileup, RSD ∞ , local BUSD, and global BUSD give nearly identical results on the top jet mass distribution. This suggests that globally grooming an event before the jet clustering stage could be a practical alternative to standard jet-based grooming.
We test the robustness of BUSD to pileup in Fig. 13b. As in Sec. 5.1, we overlay 140 pileup vertices and apply the SoftKiller algorithm with grid parameter a SK = 0.5. There is still nearly identical behavior for RSD ∞ and local BUSD, with somewhat degraded performance seen using global BUSD. So while global BUSD grooming still performs better than the original SD in reconstructing the top mass, it is outperformed by RSD ∞ (and local BUSD) in extreme environments.

Conclusions
Recursive Soft Drop (RSD) is a generalization of the original mMDT/SD algorithm, which has already seen successful applications at the LHC. By recursively applying the SD declustering procedure N times, RSD N can remove jet contamination more efficiently and with looser SD parameters than original SD. With benchmark parameters β = 1 and z cut = 0.05, we found that RSD grooming performs particularly well in the fully-recursive N = ∞ limit. This limit also shows an improved robustness against pileup effects, a feature which is likely connected to the fact that RSD ∞ yields groomed jets with formally zero area.
Jet grooming with RSD provides several refinements over previous techniques, notably by increasing the robustness to non-perturbative effects and by improving the mass resolution for boosted heavy resonances such as W bosons, top quarks, and Higgs bosons. In the context of jet tagging, the additional declustering layers have a modest impact on QCD backgrounds, so much of the gains in tagging performance come from the improved jet mass resolution. While RSD N −1 seems to be the natural choice for grooming N -prong objects, we have noticed that adding more grooming layers, e.g. using RSD N , comes with further resolution gains. In the end, we recommend the use of RSD ∞ , since it offers the same or better performance in our case studies with no discernible downsides. The one possible exception is for boosted W bosons, where RSD ∞ gave a better W boson resolution compared to SD but at the expense of shifting the W peak location by O(1) GeV. That said, this shift could be minimized by using lower values of z cut or higher values of β.
For pileup mitigation, RSD works best when paired with a particle-level removal algorithm such as SoftKiller. In the presence of pileup, we found substantial improvements in the groomed mass resolution after grooming with RSD when compared to the original SD procedure. We recommend the use of RSD ∞ in high luminosity environments, since this always performed better than RSD with finite N . It is also possible to use RSD with areabased pileup subtraction methods, as long as the corrections are applied on the finite-area subjets that enter into the SD condition.
An interesting alternative to RSD ∞ is Bottom-Up Soft Drop (BUSD). In its local variant, i.e. applied to a single jet, it shows similar performance to RSD ∞ , with comparable resilience to pileup effects. BUSD also admits a global implementation, where it is applied at the event-wide level to groom an event without committing to a particular jet algorithm. This latter variant shows, however, a slightly larger sensitivity to pileup than RSD ∞ .
Since RSD is closely related to the mMDT/SD algorithms, it shares the advantage of retaining analytic tractability. We look forward to future studies aimed at high-precision and systematically-improvable analytical results of tagging rates. These could be achieved with existing frameworks to leading-logarithmic accuracy, and could potentially be extended to higher logarithmic accuracy through a suitable extension of the CAESAR [138] and ARES [139,140] methods or through a modified factorization theorem [50] in soft-collinear effective theory [141][142][143][144]. An interesting potential application of RSD is in defining the short-distance top quark mass using light grooming [130].
Finally, RSD could also find useful applications in the context of heavy ion collisions, where jet grooming can provide a powerful probe into the effects of the medium on the momentum sharing z g [106,[109][110][111]. By adjusting the number of grooming layers N , one can achieve a more aggressive grooming while retaining more of the underlying hard scattering process. Furthermore, the groomed energy fractions z g,i obtained at every SD layer i may provide an additional handle to study the propagation of partons through the quark-gluon plasma at multiple resolution scales.

A Behavior at fixed order
In this appendix, we study the behavior of RSD at fixed order in α s . To this end, we use Event2 [145,146] to produce a sample of boosted heavy bosons decaying (at tree level) to a quark-antiquark pair. We start from an e + e − collision at a given center-of-mass energy M ≡ √ s, corresponding to the mass of the heavy boson. Then, we boost the partonic system transversely such that the heavy boson is traveling along the x axis of the collision with transverse momentum p t . In what follows, we fix M = 100 GeV and p t = 500 GeV. We cluster the event with the anti-k t algorithm with R = 0.8, and we apply RSD N with β = 1, z cut = 0.2, varying the number of layers N . Note the use of a larger z cut value compared to our recommended default to make the impact of RSD more visible.
The motivation for studying such boosted systems is that it provides a source of jets with 2, 3, and 4 partons at respective orders α 0 s , α 1 s , and α 2 s , allowing us to study the effects of several layers of RSD.
Let us start by discussing the jet mass in Fig. 14a. Instead of plotting the mass distribution after applying RSD N , we plot the difference i.e. the difference in the mass spectrum between two consecutive layers of RSD. From top to bottom, Fig. 14a shows ∆σ N for tree-level events (e + e − → qq) at O(α 0 s ), the order α s correction, and the order α 2 s correction. At O(α 0 s ), there is either one or two partons in the jet. For one parton, the jet mass is always zero, irrespectively of the number of layers of RSD. Jets with two partons, however, have a (plain) jet mass M . Therefore, if we apply SD = RSD 1 to such a jet, we can either have a jet with two partons and a mass M , or a jet with one parton and a zero mass, depending on whether or not the SD condition is satisified. Since there is no additional substructure to probe in the jet, applying additional SD layers with RSD N >1 will not have any effect. This expectation is indeed observed in the top panel of Fig. 14a: applying the first layer of SD yields a decrease of the cross section for m = M = 100 GeV and an increase at zero mass, i.e. ∆σ 0 = 0, and additional SD layers have no effect, i.e. ∆σ (≥)1 = 0. At order α s , we now have jets with three partons, which is enough to see a non-trivial effect from RSD 2 . Indeed, if we have a jet with three partons for which the first declustering passes the SD condition, the second SD layer will sometimes be applied on a subjet itself made of two partons; if this system fails the SD condition, the RSD 2 mass will be smaller than the RSD 1 mass, with all subsequent layers being ineffective. This is seen in the middle panel of Fig. 14a, with ∆σ 1 showing a shift towards smaller masses and ∆σ (≥)2 = 0. Unsurprisingly, the same pattern is observed at order α 2 s , just one layer further down, with ∆σ 2 showing a decrease in mass while ∆σ ≥3 = 0. Note that although the two-loop virtual corrections, contributing at O(α 2 s ), are not available in Event2, they correspond to events where the jets can have at most two partons and so do not contribute to ∆σ N ≥1 . Ultimately, we see that each successive SD layer further grooms the jet. For the N th layer of SD to be effective, one needs at least N + 1 partons in the jet, hence ∆σ N being non-zero starting at order α N s . In addition to the jet mass distribution, we can also study the mass spectrum of the heavier subjet in a jet, shown in Fig. 14b. This is defined by applying RSD to the jet and taking, in the resulting groomed jet, the heavier of the two subjets corresponding to the first declustering that has passed the SD condition. Taking the example of Fig. 2, this is 60 70 80  Figure 15. Same as Fig. 11 but for the W boson sample.

Mass Resolution of W Jets
the heavier of the RSD subjets tagged at vertex "1", i.e. the heavier of the subjet made of the two upper kept particles and the subjet made of the two lower kept particles.
At O(α 0 s ), there can be at most one particle in a subjet so the heavier subjet mass is always 0. At higher orders, though, one can get more partons and hence a non-zero subjet mass. We study the difference ∆σ heavy,N (m) in Fig. 14b, defined analogously to Eq. (A.1) as the difference in the heavier subjet mass distribution between N and N + 1 layers of RSD. As with the jet mass, we need at least N + 2 partons in the jet to get a non-zero ∆σ heavy,N , a situation which starts at order α N s . This is confirmed by our Event2 simulations where, for example, ∆σ heavy,1 is already different from zero at O(α s ) while ∆σ heavy,2 starts being non-zero at O(α 2 s ). A calculation of the logarithmically-enhanced terms appearing in these distributions could serve as a base for an analytic study of RSD.

B Additional pileup studies
In this appendix, we present additional results for RSD in high pileup conditions.

B.1 W and Higgs mass resolution with SoftKiller
Analogously to the case of 3-pronged top decays in Sec. 5.1, here we show results for 2prong boosted W bosons and 4-prong boosted H bosons. In all cases, we use the default RSD parameters β = 1 and z cut = 0.05, and apply SoftKiller with a SK = 0.5.
The W mass distribution is shown in Fig. 15a, with the addition of 140 pileup vertices. This is using the same event samples as in Sec. 4.2 and can be compared to Fig. 7a. Already, the performance is very good for SD alone, though the mass resolution is improved somewhat going to RSD 2 or RSD ∞ . Note that the distribution becomes a bit more symmetric, though not nearly as much as in the case without pileup. In Fig. 15b, we show the central mass value and width as a function of number of SD layers for different pileup   levels. One can see that as N increases, the peak location improves while the distribution becomes narrower, though the performance basically saturates at N = 2.
In Fig. 16a we show the mass distribution for H → 4q decays, again with 140 pileup vertices. This is using the same event samples as in Sec. 4.4 and can be compared to Fig. 8a. Again, one can observe a small improvement in both the location of the central value and the width of the distribution as the number of SD layers is increased. This is shown more explicitly in Fig. 16b, where we can see the narrowing of the distribution and convergence of the peak for different pileup levels, with performance saturating around N = 3 or N = 4.

B.2 Boosted top tagging with pileup
Here, we repeat the top tagging study in Sec. 4.5 in the presence of pileup, again using SoftKiller with a grid parameter a SK = 0.5. In Fig. 17, we show the same ROC curve as in Fig. 9, but with the addition of 140 pileup vertices. As in the previous study without pileup, additional SD grooming layers improve the signal efficiency, with the observables after RSD leading to much better discrimination between top and QCD jets. Here, though, the improvement is much more substantial, since the gains in mass resolution in high-pileup condition is quite dramatic; see Sec. 5.1. This allows for reasonable top tagging performance even with a narrow m ∈ [160, 185] GeV top window, despite the large pileup multiplicity.