1 Introduction

At high-luminosity hadron colliders such as CERN’s Large Hadron Collider (LHC), an issue that has an impact on many analyses is pileup, the superposition of multiple proton–proton collisions at each bunch crossing. Pileup affects a range of observables, such as jet momenta and shapes, missing transverse energy and lepton and photon isolation. In the specific case of jets, it can add tens of GeV to a jet’s transverse momentum and significantly worsens the resolution for reconstructing the jet momentum. In the coming years the LHC will move towards higher luminosity running, ultimately increasing pileup by up to a factor of ten for the high-luminosity LHC [1]. The experiments’ ability to mitigate pileup’s adverse effects will therefore become increasingly crucial to fully exploit the LHC data, especially at low and moderate momentum scales, for example in studies of the Higgs sector.

Some approaches to reducing the impact of pileup are deeply rooted in experimental reconstruction procedures. For example, charged hadron subtraction (CHS) in the context of particle flow [2] exploits detectors’ ability to identify whether a given charged track is from a pileup vertex or not. Other aspects of pileup mitigation are largely independent of the experimental details: for example both ATLAS and CMS [3, 4] rely on the area–median approach [5, 6], which makes a global estimate for the transverse-momentum-flow density, \(\rho \), and then applies a correction to each jet in proportion to its area.

In this article, we introduce and study a new generic pileup removal method. Instead of correcting individual jets, it corrects for pileup at the level of particles. Such a method should make a guess, for each particle in an event, as to whether it comes from pileup or from the hard collision of interest. Particles deemed to be from pileup are simply discarded, while the much smaller set of residual “hard-collision” particles are passed to the jet clustering. Event-wide particle-level subtraction, if effective, would greatly simplify pileup mitigation in advanced jet studies such as those that rely on jet substructure [7]. Even more importantly, as we shall see, it has the potential to bring significant improvements in jet resolution and computational speed. This latter characteristic makes our approach particularly appealing also for trigger-level applications.

The basis of our pileup suppression method, which we dub “SoftKiller” (SK), is that the simplest characteristic of a particle that affects whether it is likely to be from pileup or not is its transverse momentum. In other words, we will discard particles that fall below a certain transverse-momentum threshold. The key feature of the method will be its event-by-event determination of that threshold, chosen as the lowest \(p_t\) value that causes \(\rho \), in the median–area method, to be evaluated as zero. In a sense, this can be seen as the extreme limit of ATLAS’s approach of increasing the topoclustering noise threshold as pileup increases [8].

This approach might at first sight seem excessively naïve in its simplicity. We have also examined a range of other methods. For example, one approach involved an all-orders matrix-element analysis of events, similar in spirit to shower deconstruction [9, 10]; others involved event-wide extensions of a recent intrajet particle-level subtraction method [11] and subjet-level [12, 13] approaches (see also [14]); we have also been inspired by calorimeter [1517] and particle-level [18] methods developed for heavy-ion collisions. Such methods and their extensions have significant potential. However, we repeatedly encountered additional complexity, for example in the form of multiple free parameters that needed fixing, without a corresponding gain in performance. Perhaps with further work those drawbacks can be alleviated, or performance can be improved. For now, we believe that it is useful to document one method that we have found to be both simple and effective.

2 The SoftKiller method

The SoftKiller method involves eliminating particles below some \(p_t\) cutoff, \({{p}_{t}^{\text {cut}}}\), chosen to be the minimal value that ensures that \(\rho \) is zero. Here, \(\rho \) is the event-wide estimate of transverse-momentum-flow density in the area–median approach [5, 6]: the event is broken into patches and \(\rho \) is taken as the median, across all patches, of the transverse-momentum-flow density per unit area in rapidity-azimuth:

$$\begin{aligned} \rho = \underset{i \in \text {patches}}{\text {median}} \left\{ \frac{p_{ti}}{A_i}\right\} , \end{aligned}$$

where \(p_{ti}\) and \(A_{i}\) are, respectively, the transverse momentum and area of patch \(i\). In the original formulation of the area–median method, the patches were those obtained by running inclusive \(k_t\) clustering [19, 20], but subsequently it was realised that it was much faster and equally effective to use (almost) square patches of size \(a\times a\) in the rapidity-azimuth plane. That will be our choice here. The use of the median ensures that hard jets do not overly bias the \(\rho \) estimate (as quantified in Ref. [21]).Footnote 1

Choosing the minimal transverse-momentum threshold, \(p_t^{\text {cut}}\), that results in \(\rho = 0\) is equivalent to gradually raising the \(p_t\) threshold until exactly half of the patches contain no particles, which ensures that the median is zero. This is illustrated in Fig. 1. Computationally, \(p_t^{\text {cut}}\) is straightforward to evaluate: one determines, for each patch \(i\), the \(p_t\) of the hardest particle in that patch, \(p_{ti}^{\max }\) and then \(p_t^\mathrm{cut}\) is given by the median of \(p_{ti}^{\max }\) values:

$$\begin{aligned} p_t^{\text {cut}} = \underset{i \in \text {patches}}{\text {median}} \left\{ p_{ti}^{\max } \right\} . \end{aligned}$$

With this choice, half the patches will contain only particles that have \(p_t < {{p}_{t}^{\text {cut}}}\). These patches will be empty after application of the \(p_t\) threshold, leading to a zero result for \(\rho \) as defined in Eq. (1).Footnote 2 The computational time to evaluate \({{p}_{t}^{\text {cut}}}\) as in Eq. (2) scales linearly in the number of particles and the method should be amenable to parallel implementation.

Imposing a cut on particles’ transverse momenta eliminates most of the pileup particles, and so might reduce the fluctuations in residual pileup contamination from one point to the next within the event. However, as with other event-wide noise-reducing pileup and underlying-event mitigation approaches, notably the CMS heavy-ion method [1517] (cf. the analysis in Appendix A.4 of Ref. [22]), the price that one pays for noise reduction is the introduction of biases. Specifically, some particles from pileup will be above \(p_t^{\text {cut}}\) and so remain to contaminate the jets, inducing a net positive bias in the jet momenta. Furthermore some particles in genuine hard jets will be lost, because they are below the \(p_t^{\text {cut}}\), inducing a negative bias in the jet momenta. The jet energy scale will only be correctly reproduced if these two kinds of bias are of similar size,Footnote 3 so that they largely cancel. There will be an improvement in the jet resolution if the fluctuations in these biases are modest.

Figure 2 shows, on the left, the average \(p_t^{\text {cut}}\) value, together with its standard deviation (dashed lines), as a function of the number of pileup interactions, \(n_{\text {PU}}\). The event sample consists of a superposition of \(n_{\text {PU}}\) zero bias on one hard dijet event, in 14 TeV proton–proton collisions, all simulated with Pythia 8 (tune 4C) [23]. The 4C tune gives reasonable agreement with a wide range of minimum-bias data, as can be seen by consulting MCPlots [24].Footnote 4 The underlying event in the hard event has been switched off, and all particles have been made massless, maintaining their \(p_t\), rapidity and azimuth.Footnote 5 These are our default choices throughout this paper. The grid used to determine \(p_t^{\text {cut}}\) has a spacing of \(a \simeq 0.4\) and extends up to \(|y| < 5\). One sees that \(p_t^{\text {cut}}\) remains moderate, below \(2~\text {GeV}\), even for pileup at the level foreseen for the high-luminosity upgrade of the LHC (HL-LHC), which is expected to reach an average (Poisson-distributed) number of pileup interactions of \(\mu \simeq 140\). The right-hand plot shows the two sources of bias: the lower (solid) curves, illustrate the bias on the hard jets induced by the loss of genuine hard-event particles below \(p_t^{\text {cut}}\). Jet clustering is performed with the anti-\(k_t\) jet algorithm [28] with \(R=0.4\), as implemented in a development version of FastJet 3.1 [29, 30].Footnote 6 The three line colours correspond to different jet \(p_t\) ranges. The loss has some dependence on the jet \(p_t\) itself, notably for higher values of \(p_t^{\text {cut}}\).Footnote 7 In particular it grows in absolute terms for larger jet \(p_t\)’s, though it decreases relative to the jet \(p_t\). The positive bias from residual pileup particles (in circular patches of radius \(0.4\) at rapidity \(y=0\)) is shown as dashed curves, for three different pileup levels. To estimate the net bias, one should choose a value for \(n_{\text {PU}}\), read the average \(p_t^{\text {cut}}\) from the left-hand plot, and for that \(p_t^{\text {cut}}\) compare the solid curve with the dashed curve that corresponds to the given \(n_{\text {PU}}\). Performing this exercise reveals that there is indeed a reasonable degree of cancellation between the positive and negative biases. Based on this observation, we can move forward with a more detailed study of the performance of the method.Footnote 8

3 SoftKiller performance

For a detailed study of the SoftKiller method, the first step is to choose the grid spacing \(a\) so as to break the event into patches. The spacing \(a\) is the one main free parameter of the method. The patch-size parameterFootnote 9 is present also for area–median pileup subtraction. There the exact choice of this parameter is not too critical. The reason is that the median is quite stable when pileup levels are high: all grid cells are filled, and nearly all are dominated by pileup. However, the SoftKiller method chooses the \(p_t^{\text {cut}}\) so as to obtain a nearly empty event. In this limit, the median operation becomes somewhat more sensitive to the grid spacing [21].

Figure 3 considers a range of hard-event samples (different line styles) and pileup levels (different colours). For each, as a function of the grid spacing \(a\), the left-hand plot shows the average, \(\langle \Delta p_t\rangle \), of the net shift in the jet transverse momentum,

$$\begin{aligned} \Delta p_t = p_t^\mathrm{corrected} - p_t^\mathrm{hard}, \end{aligned}$$

while the right-hand plot shows the dispersion, \(\sigma _{\Delta p_t}\), of that shift from one jet to the next, here normalised to \(\sqrt{\mu }\) (right).

Fig. 3
figure 3

Scan of the SoftKiller performances as a function the grid-spacing parameter \(a\) for different hard-event samples and three different pileup levels (Poisson-distributed with average pileup event multiplicities of \(\mu =20,60,140\)). Left Average \(p_t\) shift; right \(p_t\) shift dispersion, normalised to \(\sqrt{\mu }\) for better readability. Results are given for a variety of hard processes and pileup conditions to illustrate robustness. Curves labelled \(p_t > X\) correspond to dijet events, in which one studies only those jets that in the hard event have a transverse momentum greater than \(X\). For the \(t{\bar{t}}\) sample, restricted to fully hadronic decays of the top quarks, the study is limited to jets that have \(p_t > 50 ~\text {GeV}\) in the hard event

One sees that the average jet \(p_t\) shift has significant dependence on the grid spacing \(a\). However, there exists a grid spacing, in this case \(a\simeq 0.4\), for which the shift is not too far from zero and not too dependent either on the hard sample choice or on the level of pileup. In most cases the absolute value of the shift is within about \(2~\text {GeV}\), the only exception being for the \(p_t > 1000~\text {GeV}\) dijet sample, for which the bias can reach up to \(4~\text {GeV}\) for \(\mu =140\). This shift is, however, still less than the typical best experimental systematic error on the jet energy scale, today of the order of \(1~\%\) or slightly better [32, 33].

It is not trivial that there should be a single grid spacing that is effective across all samples and pileup levels: the fact that there is can be considered phenomenologically fortuitous. The value of the grid spacing \(a\) that minimises the typical shifts is also close to the value that minimises the dispersion in the shifts.Footnote 10 That optimal value of \(a\) is not identical across event samples, and can also depend on the level of pileup. However, the dispersion at \(a = 0.4\) is always close to the actual minimal attainable dispersion for a given sample. Accordingly, for most of the rest of this article, we will work with a grid spacing of \(a=0.4\).Footnote 11

Next, let us compare the performance of the SoftKiller to that of area–median subtraction. Figure 4 shows the distribution of shift in \(p_t\), for (hard) jets with \(p_t > 50~\text {GeV}\) in a dijet sample. The average number of pileup events is \(\mu = 60\), with a Poisson distribution. One sees that in the SoftKiller approach, the peak is about \(30~\%\) higher than what is obtained with the area–median approach and the distribution correspondingly narrower. The peak, in this specific case, is well centred on \(\Delta p_t = 0\).

Fig. 4
figure 4

Performance of SoftKiller for 50 GeV jets and \(\mu =60\) Poisson-distributed pileup events. We plot the distribution of the shift \(\Delta p_t\) between the jet \(p_t\) after pileup removal and the jet \(p_t\) in the hard event alone. Results are compared between the area-median approach and the SoftKiller. For comparison, the (orange) dash-dotted line corresponds to the situation where no pileup corrections are applied

Figure 5 shows the shift (left) and dispersion (right) as a function of \(n_{\text {PU}}\) for two different samples: the \(p_t> 50~\text {GeV}\) dijet sample (in blue), as used in Fig. 4, and a hadronic \(t{\bar{t}}\) sample, with a \(50~\text {GeV}\) \(p_t\) cut on jets (in green). Again, the figure compares the area–median (dashed) and SoftKiller results (solid). One immediately sees that the area–median approach gives a bias that is more stable as a function of \(n_{\text {PU}}\). Nevertheless, the bias in the SoftKiller approach remains between about \(-0.5\) and \(1.5~\text {GeV}\), which is still reasonable when one considers that, experimentally, some degree of recalibration is anyway needed after area–median subtraction. As concerns the sample dependence of the shift, comparing \(t{\bar{t}}\) vs. dijet, the area–median and SoftKiller methods appear to have similar systematic differences. In the case of SoftKiller, there are two main causes for the sample dependence: firstly the higher multiplicity of jets has a small effect on the choice of \({{p}_{t}^{\text {cut}}}\) and secondly the dijet sample is mostly composed of gluon-induced jets, whereas the \(t{\bar{t}}\) sample is mostly composed of quark-induced jets (which have fewer soft particles and so lose less momentum when imposing a particle \(p_t\) threshold). Turning to the right-hand plot, with the dispersions, one sees that the SoftKiller brings about a significant improvement compared to area–median subtraction for \(n_{\text {PU}} \gtrsim 20\). The relative improvement is greatest at high pileup levels, where there is a reduction in dispersion of \(30\)\(35~\%\), beating the \(\sqrt{n_\text {PU}}\) scaling that is characteristic of the area–median method. While the actual values of the dispersion depend a little on the sample, the benefit of the SoftKiller approach is clearly visible for both.

Fig. 5
figure 5

Performance of SoftKiller shown as a function of the pileup multiplicity and compared to the area-median subtraction method. Left The average \(p_t\) shift after subtraction, compared to the original jets in the hard event. Right The corresponding dispersion

Figure 6 shows the shift (left) and dispersion (right) for jet \(p_t\)’s and jet masses, now as a function of the hard jet minimum \(p_t\). Again, dashed curves correspond to area–median subtraction, while solid ones correspond to the SoftKiller results. All curves correspond to an average of \(60\) pileup interactions. For the jet \(p_t\) (blue curves) one sees that the area–median shift ranges from \(0.5\) to \(0 ~\text {GeV}\) as \(p_t\) increases from \(20~\text {GeV}\) to \(1~\text {TeV}\), while for SK the dependence is stronger, from about \(2\) to \(-1~\text {GeV}\), but still reasonable. For the jet mass (green curves), again the area–median methodFootnote 12 is more stable than SK, but overall the biases are under control, at the \(1\) to \(2~\text {GeV}\) level. Considering the dispersions (right), one sees that SK gives a systematic improvement, across the whole range of jet \(p_t\)’s. In relative terms, the improvement is somewhat larger for the jet mass (\({\sim }30~\%\)) than for the jet \(p_t\) (\({\sim }20~\%\)).

Figure 7 shows the actual mass spectra of \(R=0.4\) jets, for two samples: a QCD dijet sample and a boosted \(t{\bar{t}}\) sample. For both samples, we only considered jets with \(p_t > 500~\text {GeV}\) in the hard event. One sees that SK gives slightly improved mass peaks relative to the area–median method and also avoids area–median’s spurious peak at \(m=0\), which is due to events in which the squared jet mass came out negative after four-vector area subtraction and so was reset to zero. The plot also shows results from the recently proposed Constituent Subtractor method [11], using v. 1.0.0 of the corresponding code from FastJet Contrib [34]. It too performs better than area–median subtraction for the jet mass, though the improvement is not quite as large as for SK.Footnote 13

One might ask why we concentrated on \(R=0.4\) jets here, given that jet-mass studies often use large-\(R\) jets. The reason is that large-\(R\) jets are nearly always used in conjunction with some form of grooming, for example trimming, pruning or filtering [3537]. Grooming reduces the large-radius jet to a collection of small-radius jets and so the large-radius groomed-jet mass is effectively a combination of the \(p_t\)’s and masses of one or more small-radius jets.

For the sake of completeness, let us briefly also study the SoftKiller performance for large-\(R\) jets. Figure 8 shows jet-mass results for the same \(t{\bar{t}}\) sample as in Fig. 7 (right), now clustered with the anti-\(k_t\) algorithm with \(R=1\). The left-hand plot is without grooming: one sees that SK with our default spacing of \(a=0.4\) gives a jet mass that has better resolution than area–median subtraction (or the ConstituentSubtractor), but a noticeable shift, albeit one that is small compared to the effect of uncorrected pileup. That shift is associated with some residual contamination from pileup particles: in an \(R=0.4\) jet, there are typically a handful of particles left from pileup, which compensate for low-\(p_t\) particles lost from near the core of the jet. If one substantially increases the jet radius without applying grooming, then that balance is upset, with substantially more pileup entering the jet, while there is only slight further loss of genuine jet \(p_t\). To some extent this can be addressed by using the SoftKiller with a larger grid spacing (cf. the \(a=0.8\) result), which effectively increases the particle \({{p}_{t}^{\text {cut}}}\). This comes at the expense of performance on small-\(R\) jets (cf. Fig. 3). An interesting, open problem is to find a simple way to remove pileup from an event such that, for a single configuration of the pileup removal procedure, one simultaneously obtains good performance on small-\(R\) and large-\(R\) jets.Footnote 14

As we said above, however, large-\(R\) jet masses are nearly always used in conjunction with some form of grooming. Figure 8 (right) shows that when used together with trimming [35], SoftKiller with our default \(a=0.4\) choice performs well both in terms of resolution and shift.

Returning to \(R=0.4\) jets, the final figure of this section, Fig. 9, shows average shifts (left) and dispersions (right) as a function of \(n_{\text {PU}}\) for several different jet “shapes”: jet masses, \(k_t\) clustering scales [19, 20], the jet width (or broadening or girth [3840]), an energy–energy correlation moment [41] and the \(\tau _{21}^{(\beta =1)}\) and \(\tau _{32}^{(\beta =2)}\) N-subjettiness ratios [42, 43], using the exclusive \(k_t\) axes with one-pass of minimisation. Except in the case of the jet mass (which uses “safe” area subtraction, as mentioned above), the area–median results have been obtained using the shape subtraction technique [27], as implemented in v. 1.2.0 of the GenericSubtractor in FastJet Contrib.

Fig. 9
figure 9

Performance of the SoftKiller on jet shapes, compared to area–median subtraction and the recently proposed Constituent Subtractor method [11]. All results are shown for dijet events with a 500 GeV \(p_t\) cut on anti-\(k_t\), \(R=0.4\) jets. For comparison of the subtraction performance we also quote, for each observable \(X\), \(\sigma _{X,\mathrm{hard}}\), the dispersion of the distribution of the observable in the hard event. For \(\tau _{21}\) there is the additional requirement (in the hard event) that the jet mass is above 30 GeV and for \(\tau _{32}\) we further impose \(\tau _{21}\ge 0.1\) (again in the hard event), so as to ensure infrared safety

As regards the shifts, the SK approach is sometimes the best, other times second best. Which method fares worst depends on the precise observable. In all cases, when considering the dispersions, it is the SK that performs best, though the extent of the improvement relative to other methods depends strongly on the particular observable. Overall this figure gives us confidence that one can use the SoftKiller approach for a range of properties of small-radius jets.

4 Adaptation to CHS events and calorimetric events

It is important to verify that a new pileup mitigation method works not just at particle level, but also at detector level. There are numerous subtleties in carrying out detector-level simulation, from the difficulty of correctly treating the detector response to low-\(p_t\) particles, to the reproduction of actual detector reconstruction methods and calibrations, and even the determination of which observables to use as performance indicators. Here we will consider two cases: idealised charged-hadron subtraction, which simply examines the effect of discarding charged pileup particles; and simple calorimeter towers.

For events with particle flow [2] and charged-hadron subtraction (CHS), we imagine a situation in which all charged particles can be unambiguously associated either with the leading vertex or with a pileup vertex. We then apply the SK exclusively to the neutral particles, which we assume to have been measured exactly. This is almost certainly a crude approximation, however, it helps to illustrate some general features.

One important change that arises from applying SK just to the neutral particles is that there is a reduced contribution of low-\(p_t\) hard-event particles. This means that for a given actual amount of pileup contamination (in terms of visible transverse momentum per unit area), one can afford to cut more aggressively, i.e. raise the \(p_t^{\text {cut}}\) as compared to the full particle-level case, because for a given \(p_t^{\text {cut}}\) there will be a reduced loss of hard-event particles. This can be achieved through a moderate increase in the grid spacing, to \(a=0.5\). Figure 10 shows the results, with the shift (left) and dispersion (right) for the jet \(p_t\) in dijet and \(t{\bar{t}}\) samples. The SK method continues to bring an improvement, though that improvement is slightly more limited than in the particle-level case. We attribute this reduced improvement to the fact that SK’s greatest impact is at very high pileup, and for a given \(n_\mathrm{PU}\), SK with CHS is effectively operating at lower pileup levels than without CHS. A further study with our toy CHS simulation concerns lepton isolation and is given in Appendix E.

Fig. 10
figure 10

Same as Fig. 5, for events with charged-hadron subtraction (CHS). Note that the grid size used for the SoftKiller curves has been set to \(a=0.5\)

Next let us turn to events where the particles enter calorimeter towers. Here we encounter the issue, discussed also in Appendix A, that SK is not collinear safe. While we argue there that this is not a fundamental drawback from the point of view of particle-level studies, there are issues at calorimeter level: on one hand a single particle may be divided between two calorimeter towers (we shall not attempt to simulate this, as it is very sensitive to detector details); on the other, within a given tower (say \(0.1\times 0.1\)) it is quite likely that for high pileup the tower may receive contributions from multiple particles. In particular, if a tower receives contributions from a hard particle with a substantial \(p_t\) and additionally from pileup particles, the tower will always be above threshold, and the pileup contribution will never be removed. There are also related effects due to the fact that two pileup particles may enter the same tower. To account for the fact that towers have a finite area, we therefore adapt the SK as follows. In a first step we subtract each tower:

$$\begin{aligned} p_t^{\mathrm{tower},\mathrm{sub}} = \max \left( 0,p_t^\mathrm{tower} - \rho A^\mathrm{tower}\right) , \end{aligned}$$

where \(\rho \) is as determined on the event prior to any correction.Footnote 15 This in itself eliminates a significant fraction of pileup, but there remains a residual contribution from the roughly \(50~\%\) of towers whose \(p_t\) was larger than \(\rho A^\mathrm{tower}\). We then apply the SoftKiller to the subtracted towers,

$$\begin{aligned} p_t^{\text {cut},\mathrm{sub}} = \underset{i \in \mathrm{patches}}{\mathrm{median}} \left\{ p_{ti}^{\mathrm{tower,sub,}\max } \right\} , \end{aligned}$$

where \(p_{ti}^{{\mathrm{tower},\mathrm{sub},}\max }\) is the \(p_t\), after subtraction, of the hardest tower in patch \(i\), in analogy with Eq. (2). In the limit of infinite granularity, a limit similar to particle level, \(A^{\mathrm{tower}}=0\). The step in Eq. (4) then has no effect and one recovers the standard SoftKiller procedure applied to particle level.

Results are shown in Fig. 11. The energy \(E\) in each \(0.1\times 0.1\) tower is taken to have Gaussian fluctuations with relative standard deviation \(1/\sqrt{E/\!~\text {GeV}}\). A \(p_t\) threshold of \(0.5~\text {GeV}\) is applied to each tower after fluctuations. The SK grid spacing is set to \(a = 0.6\). Interestingly, with a calorimeter, the area–median method starts to have significant biases, of a couple of GeV, which can be attributed to the calorimeter’s non-linear response to soft energy. The SK biases are similar in magnitude to those in Fig. 5 at particle level (note, however, the need for a different choice of grid spacing \(a\)). The presence of a calorimeter worsens the resolution both for area–median subtraction and SK, however, SK continues to perform better, even if the improvement relative to area–median subtraction is slightly smaller than for the particle-level results.

Fig. 11
figure 11

Same as Fig. 5 for events with a simple calorimeter simulation. The SoftKiller was used here with a grid spacing of \(a=0.6\) and includes the tower subtraction of Eq. (4)

We have also investigated a direct application of the particle-level SoftKiller approach to calorimeter towers, i.e. without the subtraction in Eq. (4). We find that the biases were larger but still under some degree of control with an appropriate tuning of \(a\), while the performance on dispersion tends to be intermediate between that of area–median subtraction and the version of SoftKiller with tower subtraction.

The above results are not intended to provide an exhaustive study of detector effects. For example, particle flow and CHS are affected by detector fluctuations, which we have ignored; purely calorimetric jet measurements are affected by the fact that calorimeter towers are of different sizes in different regions of the detector and furthermore may be combined non-trivially through topoclustering. Nevertheless, our results help illustrate that it is at least plausible that the SoftKiller approach could be adapted to a full detector environment, while retaining much of its performance advantage relative to the area–median method.

5 Computing time

The computation time for the SoftKiller procedure has two components: the assignment of particles to patches, which is \(\mathcal{O}\left( N\right) \), i.e. linear in the total number of particles \(N\) and the determination of the median, which is \(\mathcal{O}\left( P \ln P\right) \) where \(P\) is the number of patches. The subsequent clustering is performed with a reduced number of particles, \(M\), which, at high pileup is almost independent of the number of pileup particles in the original event. In this limit, the procedure is therefore expected to be dominated by the time to assign particles to patches, which is linear in \(N\). This assignment is almost certainly amenable to being parallelised.

In studying the timing, we restrict our attention to particle-level events for simplicity. We believe that calorimeter-type extensions as described in Sect. 4 can be coded in such a way as to obtain similar (or perhaps even better) performance.

Timings are shown in Fig. 12 versus initial multiplicity (left) and versus the number of pileup vertices (right).Footnote 16 Each plot shows the time needed to cluster the full event and the time to cluster the full event together with ghosts (as needed for area-based subtraction). It also shows the time to run the SoftKiller procedure, the time to cluster the resulting event, and the total time for SK plus clustering.

Overall, one sees nearly two orders of magnitude improvement in speed from the SK procedure, with run times per event ranging from \(0.2\) to 5 ms as compared to \(20\) to 300 ms for clustering with area information. At low multiplicities, the time to run SK is small compared to that needed for the subsequent clustering. As the event multiplicity increases, SK has the effect of limiting the event multiplicity to about \(300\) particles, nearly independently of the level of pileup and so the clustering time saturates. However, the time to run SK grows and comes to dominate over the clustering time. Asymptotically, the total event processing time then grows linearly with the level of pileup. A significant part of that time (about \(180~\text {ns}\) per particle, 75 % of the run-time at high multiplicity) is taken by the determination of the particles’ rapidity and azimuth in order to assign them to a grid cell. If the particles’ rapidity and azimuth are known before applying the SoftKiller to an event (as it would be the case e.g. for calorimeter towers), the computing time to apply the SoftKiller would be yet faster, as indicated by the dotted orange line on Fig. 12.

Because of its large speed improvement, the SoftKiller method has significant potential for pileup removal at the trigger level. Since SoftKiller returns an event with fewer particles, it will have a speed performance edge also in situations where little or no time is spent in jet-area calculations (because either Voronoi areas or fast approximate implementations are used). This can be seen in Fig. 12 by comparing the green and the solid blue curves.

6 Conclusions

The SoftKiller method appears to bring significant improvements in pileup mitigation performance, in particular as concerns the jet energy resolution, whose degradation due to pileup is reduced by \(20{-}30~\%\) relative to the area–median-based methods. As an example, the performance that is obtained with area–median subtraction for 70 pileup events can be extended to 140 pileup events when using SoftKiller. This sometimes comes at the price of an increase in the biases on the jet \(p_t\), however, these biases still remain under control.

Since the method acts directly on an event’s particles, it automatically provides a correction for jet masses and jet shapes, and in all cases that we have studied brings a non-negligible improvement in resolution relative to the shape subtraction method, and also (albeit to a lesser extent) relative to the recently proposed Constituent Subtractor approach.

The method is also extremely fast, bringing nearly two orders of magnitude speed improvement over the area–median method for jet \(p_t\)’s. This can be advantageous both in time-critical applications, for example at trigger level, and in the context of fast detector simulations.

There remain a number of open questions. It would be of interest to understand, more quantitatively, why such a simple method works so well and what dictates the optimal choice of the underlying grid spacing. This might also bring insight into how to further improve the method. In particular, the method is known to have deficiencies when applied to large-\(R\) ungroomed jets, which would benefit from additional study. Finally, we have illustrated that in simple detector simulations it is possible to reproduce much of the performance improvement seen at particle level, albeit at the price of a slight adaption of the method to take into account the finite angular resolution of calorimeters. These simple studies should merely be taken as indicative, and we look forward to proper validation (and possible further adaptation) taking into account full detector effects.

Note added As this work was being completed, we became aware of the development of another particle-level pileup removal method, PUPPI [44, 45]. Initial particle-level comparisons at the 2014 Pileup Workshop [46] suggest that both bring comparable improvements.