Pileup and Underlying Event Mitigation with Iterative Constituent Subtraction

The hard-scatter processes in hadronic collisions are often largely contaminated with soft background coming from pileup in proton-proton collisions, or underlying event in heavy-ion collisions. This paper presents a new background subtraction method for jets and event observables (such as missing transverse energy) which is based on the previously published Constituent Subtraction algorithm. The new subtraction method, called Iterative Constituent Subtraction, applies event-wide implementation of Constituent Subtraction iteratively in order to fully equilibrate the background subtraction across the entire event. Besides documenting the new method, we provide guidelines for setting the free parameters of the subtraction algorithm. Using particle-level simulation, we provide a comparison of Iterative Constituent Subtraction with several existing methods from which we conclude that the new method has a significant potential to improve the background mitigation in both proton-proton and heavy-ion collisions.


Introduction
Precision tests of Standard Model of particle physics as well as searches for new physics at the Large Hadron Collider (LHC) require maximizing the collected data which is primarily achieved by increasing the beam intensities. The dilemma is that high intensities also increase the number of simultaneous proton-proton (pp) interactions in a single colliding bunch crossing of the two beams. Such multiple pp collisions, also known as pileup, are then read out as one single event. Pileup contributes a primarily low transverse momentum (p T ) and largely diffuse background of particles to the hard scattering process of interest that subsequently distorts many hadronic final state observables, in particular the kinematics and substructures of hadronic jets.
In the last part of LHC Run 2, the ATLAS and CMS experiments achieved an average and peak pileup of ∼35 and ∼70 pp simultaneous collisions, respectively [1,2]. Each additional pileup pp collision at 13 TeV adds an average p T of approximately ∼900 MeV per unit area in the rapidity-azimuth (y−φ) plane. In the upcoming LHC Run 3, an average pileup of 70 collisions per bunch crossing is expected, corresponding to the maximum pileup level of Run 2. The high-luminosity LHC is expected to deliver an average pileup of 200 pp collisions per bunch crossing [3]. An even more intense environment is present in heavy-ion collisions at the LHC and at the Relativistic Heavy Ion Collider (RHIC) where a large underlying event (UE) may lead to a background p T of more than 300 GeV per unit area in y − φ space [4]. Efficient techniques need to be developed and tested in order to mitigate the impact of these large backgrounds on jet kinematics, jet substructure, and missing E T measurements. Throughout this paper the word background will be used to refer either to pileup in pp collisions or to the UE in heavy-ion collisions.
One category of background mitigation techniques includes subtraction methods or algorithms that do not alter the jet definition itself. Examples of algorithms in this category include the extensively used Area Subtraction [5,6] approach to correct jet kinematics, and its extension to corrections for jet shape observables, the Shape-expansion method [7]. Also in this category are numerous algorithms aimed at correcting the inputs to jet reconstruction prior to the application of a jet algorithm: Constituent Subtraction (CS) [8], SoftKiller [9], PUPPI [10], and jet cleansing [11] are among the most widely studied. A second category of background mitigation techniques consists of the so-called jet grooming methods, which are often used in the context of highly Lorentz-boosted massive objects, to both improve the precision of the reconstruction itself and mitigate the effects of pileup and UE. Grooming algorithms fundamentally modify the jet definition to be less sensitive to these backgrounds. The most frequently used methods are filtering [12], trimming [13], pruning [14], and soft drop [15], each of which have configurable parameters that determine their properties as groomers. 1 These methods often work with subjets that are determined by the Cambridge-Aachen algorithm [16,17] or the k t algorithm [18,19] applied to a largeradius anti-k t [20] jet. Specific algorithms or criteria are then applied to these subjets to eliminate soft and/or wide-angle contributions to a jet, which are the most likely to be contaminated by backgrounds. Besides subtraction and grooming methods, techniques based on machine learning have also been implemented and tested recently [21,22]. The potential of improving background mitigation by incorporating charged-track information to subtract neutral pileup is discussed in ref. [23]. Recently, a novel approach to estimate the background is proposed in ref. [24], which aims at reducing the impact of fluctuations in the background on the jet observables and which may be potentially combined with CS algorithm or other subtraction method. A detailed overview of background mitigation techniques can be found in ref. [25]. This paper presents a new background subtraction method for jets, jet substructure observables, and global event observables such as missing E T , which is based on the CS method. In contrast to other background subtraction methods, such as the Area Subtraction or the shape-expansion method, CS corrects for background at the level of jet constituents, which may be particles, tracks from an inner detector, or calorimeter clusters [26][27][28]). Jet kinematics and substructure are simultaneously corrected with this approach. The CS method was successfully used in several measurements at the LHC. The ALICE and CMS experiments use CS in measurements of jet shapes and mass in heavy-ion collisions [29][30][31], furthermore CMS used CS in the measurement of splitting functions [32]. The CS method has been applied in several performance studies by CMS [33,34] and ATLAS [35][36][37] and it is being discussed and tested in the context of future experiments, in particular the Compact Linear Collider (CLIC) and Future Circular Collider (FCC) [38,39]. The CS method was also used in a recent measurement of jet substructure by STAR at RHIC [40]. Phenomenological studies also consider the CS method, in particular in the context of tagging boosted bosons and top quarks [41][42][43], searches for new physics [44][45][46][47], and structure of parton shower [48]. The new method for background subtraction presented in this paper may help to improve the precision of measurements of Standard Model processes and may make experimental studies less susceptible to increasing backgrounds at colliders. This new method is applicable for the subtraction of both pileup in pp collisions and UE in heavyion collisions. In this paper, we test the methods explicitly only in the environment of pp collisions with pileup.
This paper is organized as follows. First, in section 2, an event-wide background subtraction performed using CS is discussed, which was briefly mentioned in ref. [8] as a possible extension of the original CS method. In section 3, the new method for background subtraction is introduced. Then, in section 4, performance of various methods for background subtraction including the new method is presented in the context of jet reconstruction and substructure observables.

Event-wide pileup mitigation with CS
The CS algorithm described in ref. [8] corrects individual jets which were already clustered using a certain jet algorithm. We refer to this approach as Jet-by-jet CS. However, the jet clustering can be biased by the presence of background resulting in different particle content of jets (this is referred to as "back reaction" in ref. [6]). As briefly mentioned in ref. [8], the CS algorithm can be extended to correct the event constituents before jet clustering. The jets resulting from this Event-wide CS correction followed by jet clustering can be more precise than the individually corrected jets. In the following, the Event-wide CS is described assuming massless inputs. A discussion of the correction of massive inputs can be found in appendix B.
The basic ingredient of CS is the background p T density, ρ, which was introduced in the Area Subtraction. Several methods to estimate this quantity are described in [49]. In general, ρ can be estimated as a function of other variables, most commonly as a function of rapidity. The estimated ρ is then used to scale the p T of the ghosts in the Event-wide CS. The ghosts are infinitly soft particles (in practice p T ≈ 10 −100 GeV) incorporated into the event such that they uniformly cover the y − φ plane with high density. Each of these ghosts is massless and covers a fixed area, A g , in the y − φ plane. Historically, the ghosts can be used to define the jet area [5,6] for the Area Subtraction method or perform background subtraction in the Shape-expansion and Jet-by-jet CS methods. In all these methods, their property of infinite softness is essential to not modify the jet clustering sequence. However, for the Event-wide CS, this property is irrevelant and each ghost p T is directly set to p g T ≡ A g · ρ. Then such ghosts already represent the expected background contribution in the given event, and can be used to correct particles via the Event-wide CS as follows.
Only particles and ghosts with pseudo-rapidity, η, fulfilling |η| < η max are used in the correction procedure. The parameter η max defines the detector acceptance of particles. 2 For each pair of particle i and ghost k, a matching scheme is implemented using the distance measure, D i,k , defined as where α is a free parameter and ∆R i,k is defined as (2. 2) The list of all distance measures, {D i,k }, is sorted from the lowest to the highest values. The background removal proceeds iteratively, starting from the particle-ghost pair with the lowest D i,k . At each step, the momentum p T of each particle i and ghost k are modified as follows.
The iterative process is terminated when ∆R i,k > ∆R max where ∆R max is a free parameter. 3 The output of the correction procedure is a set of 4-momenta representing the backgroundcorrected event. Any operation can be done on these output particles -most commonly jet clustering, evaluation of global event shapes or missing transverse energy. Besides providing the ability to correct the full event, which is documented for the case of missing transverse energy in section 4.5, the performance of the background subtraction is also improved with respect to the original CS as documented in section 4.

Iterative Constituent Subtraction
Iterative Constituent Subtraction (ICS) is a new background mitigation method that extends the concept of the original CS method. ICS applies the event-wide implementation of CS iteratively with finite ∆R max value. After each iteration, any remaining unsubtracted background estimate is redistributed uniformly across the entire event and another CS procedure is performed. The exact algorithmic procedure using N iter iterations is the following: 2 In the software implementation of the Event-wide CS algorithm in FastJet Contrib [50], the user can define arbitrary phase-space using the fastjet::Selector class from FastJet [49,51]. In that way, phase space asymmetric in η can be used or the correction can be done only for particles below certain pT threshold. 3 The meaning of the ∆R max parameter was different in the original description of the CS procedure in ref. [8]. In the software implementation in FastJet Contrib, the meaning described in this paper is used since version 1.022.
1. Estimate ρ using a given estimation method and create ghosts with p g T = ρA g . The ρ can be a function of other variables.
2. Perform Event-wide CS correction using a finite ∆R max value. Define input ghosts as ghosts present in the event before this CS correction. Define output ghosts as ghosts remaining in the event after this CS correction.
3. Compute the scalar p T sum of the input ghosts, p input T , and of the output ghosts, p output T . Update the input ghosts by scaling their p T by a factor p output T /p input T . 4. Perform next iteration by going back to step 2 using the updated input ghosts. The parameters of CS, such as ∆R max , may be changed for each iteration.
The step 4 is performed (N iter − 1)-times. The algorithm with N iter = 2 is illustrated on an example event in figure 1. An option exists for the above algorithm that avoids the usage of ghosts in the second or higher iteration which were not fully subtracted in the previous iteration. With this option, the p input T is evaluated using only ghosts which were fully subtracted in the previous iteration, while the unsubtracted ghosts are discarded for the actual iteration. This approach avoids placing the expected background deposition to positions where there is small chance for combining ghosts with real particles in the second or higher iteration. The impact of this option on the performance depends heavily on the chosen ∆R max parameters for each iteration. We refer to this option as ghost removal.
In our performance studies, it was found that two iterations are often sufficient to obtain good performance for anti-k t jets. Furthermore, it was found that the optimal values of ∆R max depend on the radius used in the jet finding. A good compromise for a large range of jet radii are found to be ∆R max = 0.2 and ∆R max = 0.15 for the first and the second iteration, respectively. Using α > 0 and ghost removal can bring improvement for certain jet definitions and observables. Detailed discussion of the choice of CS parameters is provided in appendix A. Lastly, it was found that ICS outperforms both Jet-by-jet CS and Event-wide CS (described in section 2) as documented in section 4.
The ICS method is implemented in FastJet Contrib [50] since version 1.038.

Performance
In this section, we discuss the performance of new methods presented in sections 2 and 3.   Figure 1: Illustration of the ICS method with 2 iterations assuming only one dimension in azimuth. The event before the first iteration (1a) contains hard-scatter particles, background particles and ghosts with p T corresponding to ρ in that event. To emphasize the fact that the ghosts are subtracted, their p T is negative in these figures. After the first iteration with finite ∆R max (1b), ghosts with unsubtracted p T can remain in the event (emphasized by a red circle). The scalar p T sum of the unsubtracted ghosts is redistributed uniformly in the y − φ plane (1c) or according to the actual ρ dependence. After the second iteration (1d), the unsubtracted background p T is strongly reduced.

Jet shape definitions
While kinematic properties of jets can be uniquely characterized by four-momentum, P µ = (p T , η, φ, m), the internal structure of jet cannot be characterized by a single observable. Instead, various observables have been proposed in order to capture many internal properties of jets. These different observables are generically referred to as jet shapes. This term extends the meaning of the observable ρ(r), originally called jet shape, which was measured at various experiments [52][53][54][55][56][57][58] and which quantifies the radial energy flow, from the jet axis r = (∆η) 2 + (∆φ) 2 . This quantity, along with other observables such as fragmentation function, can be used to quantify the internal structure of QCD jets in the context of the parton shower evolution. More recently, jet shape observables have been extensively employed as tools in measurements of Lorentz-boosted massive objects [59][60][61][62]. In such cases, jets have fundamentally different internal structures as compared to jets formed from light quarks and gluons. One canonical example is the two-prong structure formed from a collimated pair of heavy quarks from the decay of a boosted Higgs boson [12]. Various jet shape and structure observables can be used to distinguish jets formed by massive boosted object decays from jets formed solely by parton showering and hadronization of light quarks and gluons. In this study, we use three representative jet shape observables -jet width, two-to-one and three-to-two ratios of N-subjettiness (τ 21 and τ 32 ) -and the jet mass. Jet mass, m = P µ P µ , is a basic observable used to identify boosted hadronically decaying objects such as the W ± boson (see e.g. ref. [63]). Jet width (also known as linear radial moment or girth) is defined as the first moment of the radial flow of transverse momentum or energy, where p T,i is magnitude of the transverse momentum of jet constituent i and ∆R i is the distance between jet constituent i and the jet axis. Jet width has been shown to provide e.g. a good discriminating power between quark-initiated and gluon-initiated jets [64]. The τ 21 and τ 32 are two-to-one and three-to-two ratio of N -subjettiness, respectively. The N -subjettiness is defined as where R is the distance parameter of the jet algorithm, p Tk is the transverse momentum of constituent k and ∆R ik is the distance between a subjet i and a constituent k. The N subjets are defined by re-clustering the constituents of the jet with exclusive version of the k t algorithm and requiring that exactly N subjets are found. Subjettiness was introduced to provide an enhanced discriminating power for identifying boosted massive objects [65].

Quantifying performance of background subtraction
The following figures of merit are used to quantify the performance of jet reconstruction [66,67]: jet energy scale (JES), jet energy resolution (JER), jet reconstruction efficiency, and rate of fake jets. JES is also sometimes called linearity. These quantities are calculated using Monte Carlo generators coupled with full detector simulations for a particular experiment by comparing particle-level jets (so-called true jets) with jets reconstructed using the outputs of the detector simulation. JES and JER characterize the mean and root-mean-square, respectively, of the difference between the transverse momentum of particle level jet, p true T , and the transverse momentum of jet reconstructed in the detector, p reco T . More explicitly, . The JES is determined by a combination of the response of the detector and the performance of algorithms used for the jet calibration and background subtraction. The JER of calorimeter jets can be factorized as follows where a is the stochastic term, b is the noise term and c is the constant term [68,69]. The stochastic and constant terms are governed by the response of the detector to the particle shower. The noise term is largely determined by fluctuations of backgrounds, which may include electronic noise, pileup, or the underlying event. At the LHC, the contributions due to pileup tend to dominate. In the case of an average background subtraction such as the Area Subtraction, the noise term is given by the root-mean-square of p T evaluated in the area of a jet excluding the jet signal, b = RMS(p area T ). The stochastic and constant terms are not significantly affected by the subtraction algorithm [68], but they can be reduced by using the information about the jet internal structure in the calibration procedure [70] or by combining the information from calorimeter with the information from tracking [71]. The noise term cannot be reduced by the calibration procedure, but it can be reduced by subtraction procedure or using some additional information about the background compared to the basic information about its average density. This can be, for example, the information about pointing of jet constituents to the primary vertex (e.g. charged hadron subtraction used by CMS [72] or jet-vertex association used by ATLAS [73]) or by noise suppression at the sub-constituent level of calorimeter cells [74,75].
Jet reconstruction efficiency quantifies the probability that a jet is found given the presence of a true jet, subject to specific kinematic constraints for a particular efficiency evaluation. As such, the jet reconstruction efficiency is indirectly affected by both the JES and the JER. The rate of fake jets is important in the case of large backgrounds, such as in heavy-ion collisions, where correlated background fluctuations can lead to a large rate of fake jets. Consequently, this rate can be reduced by reducing the fluctuations in the background for which the JER is the relevant performance metric. We therefore focus on the JES and JER in order to characterize the jet reconstruction performance.
The JES and JER can be generalized to any of the observable quantities that characterize the jet kinematics or shapes discussed in section 4.1. We define the bias and resolution of a given observable, x, as follows: The difference between x rec and x true is used instead of their ratio in order to avoid excessive values of bias for small values of x true . The denominator in the bias and resolution allows for an easier comparison among different quantities. Similarly to the case of the JES and JER, these quantities characterize the primary aspects of the performance of the background subtraction. These quantities are therefore evaluated simultaneously for all the jet shape observables discussed in section 4.1.

Test samples and configuration of subtraction
The performance of subtraction methods is evaluated in simulated pp collisions at √ 13 TeV using events with boosted top quarks from the decay Z → tt of a hypothetical boson Z with a mass of 1.5 TeV. These simulated hard-process events are referred to as true events. To obtain the reconstructed events with pileup included, the true events are overlaid with inclusive pp collision events that represent pileup. The number of overlaid inclusive events, N P U , has a uniform distribution from 0 to 140. All event generation is performed with PYTHIA 8.180 [76,77] using the tune 4C and CTEQ 5L parton density functions [78]. The hard process is generated without underlying event.
A pseudo-detector simulation is then used. All particles are grouped into towers of size 0.1 × 0.1 in the pseudorapidity-azimuth (η − φ) space. The tower energy is obtained as the sum of energies of particles pointing to that tower. All neutrinos and muons are discarded. Only towers with |η| < 4.0 are selected. The mass of each tower is set to 0. The η and φ of each tower is randomly smeared using a Gaussian kernel with width of 0.1 (but maximally up to 0.2).
The reconstructed events are corrected with both ICS and multiple other pileup mitigation techniques. To evaluate the performance of the various methods, the jets from the corrected events are compared to jets from the true events using the quantities defined in section 4.2. To factorize the effect of pileup, the same detector simulation is used for both, true and reconstructed, events. Two jet definitions are used: anti-k t algorithm with the distance parameter 0.4 and 1.0. Only true jets with |η| < 3 and p T > 20 GeV are used.
All jet finding and background estimation is performed using FastJet 3.3.1 [49,51]. The event energy density ρ is estimated as a function of y using the GridMedianBackgroundEstimator tool from FastJet with a grid spacing of 0.5 using the particles up to |y| = 4. Since the inputs are massless, there is no need to derive the background estimate for the mass term of pileup, ρ m . The same ρ estimation is used for the Area Subtraction and the CS-based methods. The Area Subtraction method was carried out as 4-vector subtraction using the fastjet::Subtractor class from FastJet 3.3.1. Unphysical situations with negative corrected mass are avoided by enabling the safe_mass option. The Jet-by-jet CS uses parameters ∆R max = ∞, α = 0, and A g = 0.0025 for both jet definitions. The ∆R max parameter used for the Event-wide CS are ∆R max = 0.25 and ∆R max = 0.7 for anti-k t R = 0.4 and R = 1.0 jet definitions, respectively. For anti-k t R = 0.4 jets, the ICS method is used with two iterations, parameters ∆R max 1 = 0.2, ∆R max 2 = 0.1, and without ghost removal. For anti-k t R = 1.0 jets, the parameters ∆R max 1 = 0.2, ∆R max 2 = 0.35 are chosen, and also implemented without ghost removal. The remaining two CS parameters are set to α = 1 and A g = 0.0025 for all configurations of the Event-wide CS and ICS. These configurations are found to be optimal for the majority of observables. More detailed discussion of the choice of parameters for Event-wide CS and ICS is provided in appendix A.

Performance for jet kinematics and substructure
The performance of the ICS method applied to jets is evaluated for both jet kinematic and jet shape observables, which are defined in section 4.1. The bias and resolution (defined in section 4.2) for the observables are studied as a function of number of pileup interactions (N P U ) and jet p T , as well as for two choices of anti-k t distance parameter, R = 0.4 and R = 1.0. The performance of ICS is compared to the performance of the Area Subtraction, Jet-by-jet CS (both introduced in section 1), and the Event-wide CS (discussed in section 2) using the configurations described in section 4.3.
Perhaps the most illustrative demonstration of the efficiency of any pileup correction is the extent to which a given algorithm is able to reduce the dependence of an observable on the amount of pileup in the event, as parameterized by N P U . Figures 2 and 3 show the impact of pileup on the p T and mass, respectively, for large-radius (R = 1.0) jets containing the decay products of boosted top-quarks from Z → tt (see section 4.3). Only a narrow true jet p true T range is chosen for these figures, 250 GeV ≤ p true T < 300 GeV. The four correction algorithms considered for comparison demonstrate the improvements achievable in terms of the bias (figures 2a and 3a) and resolution (figures 2b and 3b) in each case, as a function of N P U .
Each of the algorithms considered is able to remove the bias introduced by pileup in both the p T and the mass of these jets to approximately the same degree. However, the precision of these corrections, as quantified by the resolution of the corrected measure, can vary significantly. The Event-wide CS and the ICS corrections are both observed to improve the resolution of the jet p T and mass by up to 30% at large N P U , as compared to the Jetby-jet CS correction. Across the full range of N P U considered, ICS exhibits an improvement beyond that of the CS correction alone by approximately 5-10% in the resolution of both the jet p T and the jet mass. The ability of each algorithm to mitigate the impacts of pileup depends on more than just the amount of pileup considered. As mentioned earlier, the size of the jet and the kine- matic range (e.g. high or low true p true T ) can affect the performance significantly. Moreover, the impact of pileup on certain observables can be larger, thus reducing the effectiveness of certain approaches. Figures 4 and 5 summarize the results of a comprehensive study of the performance of each of the four algorithms under consideration. Results are reported in a narrow range of high pileup, N P U = 100-120, for jets from the Z → tt process, for the following: The outcome is a set of 52 comparisons each for the bias and resolution after pileup correction.
The results of these comparisons provide a rich set of information from which a few conclusions may be reliably drawn. The ability of algorithms to remove the bias is practically identical in the case of jet p T and jet η. The ability to correct the bias is significantly improved for the Event-wide CS and ICS algorithms compared to Jet-by-jet CS algorithm in the case of jet mass, jet width, and τ 21 . For τ 32 , the performance of Jet-by-jet CS, Eventwide CS and ICS algorithms is very similar. The bias generally decreases with increasing jet p true T and tend to converge to zero for all the algorithms, both jet radii, and almost all the observables. Such a convergence in the behavior of algorithms is however not seen in the case of resolution, where significant differences among algorithms persist in the full kinematic range and in some cases even tend to get larger at higher p true T . A clear improvement in the resolution is seen for Event-wide CS and ICS algorithms compared to both Jet-by-jet CS algorithm and Area Subtraction. While the resolution from ICS and Event-wide CS is practically the same for R = 0.4 jets, a significant difference is seen for R = 1.0 jets where ICS algorithm outperforms the Event-wide CS. These observations hold for all observables studied and for all the jet p true T bins. This conclusion together with the conclusion on the bias implies that ICS algorithm provides the largest and most consistent improvements in the performance out the four algorithms tested.

Performance for missing transverse energy
In addition to evaluating the performance of ICS and related background subtraction algorithms in terms of their impacts on jets, we also studied the extent to which improvements might be gained in the measurement of missing transverse energy, E miss T . The E miss T in the event is defined as a 2-vector calculated as the negative vector sum of the p T of all physics objects in the event. For this study, E miss T is calculated using the vector sum of all stable  Figure 6 shows as a function of N P U a measure of the resolution of the E miss T,x , calculated as the RMS of the difference between the reconstructed and true E miss T,x in the event. The uncorrected resolution is reported as well as the results of applying four different subtraction algorithms to the entire event: SoftKiller (grid-size parameter of 0.6), Event-wide CS and ICS with the same configurations as for anti-k t R = 0.4 jets described in section 4.3, and combination of Event-wide CS followed by SoftKiller (tested by the ATLAS Collaboration [37]). In all cases, the resolution of the E miss T,x worsens with increasing pileup, as expected. However, the three methods Event-wide CS, ICS, and Event-wide CS followed by SoftKiller reduce this degradation by more than 20% at large N P U , while SoftKiller alone does not perform so well. This along with results presented in the previous section demonstrate the stability and good performance of the ICS method.

Performance using the framework from the 2014 Pileup Workshop
We compare the performance of the new method with other methods using the common open-source software framework [79] defined at the "Pileup Workshop" held in May 2014 at CERN [80]. The code used to obtain the results in this section is located in the folder The mass of all particles is set to zero, preserving p T , rapidity, and azimuth. Only particles with |y| < 4 are used. The corrections are done on events without any detector simulation but assuming idealised tracker, where the information about the origin of charged particles (pileup or hard-scatter event) is known. In that way, the charged pileup particles can be directly removed and/or the information about charged particles can be further used to correct the neutral particles. We compare the CS-based methods with Area Subtraction, SoftKiller, and PUPPI, for which we use the same configurations as for the comparison in ref. [25] except slightly different ρ estimation for the Area Subtraction as described below. Summary of the used configurations and usage of the information about charged particles is the following: • Area Subtraction: All charged pileup particles are discarded. The ρ is estimated using only neutral particles with a grid-size parameter of 0.6 and rapidity rescaling. The protection against negative masses after subtraction is enabled. To determine the jet area, the ghosts are placed up to the edge of the particle rapidity acceptance, |y| < 4, with a ghost area of 0.01.
• SoftKiller: All charged pileup particles are discarded at the beginning. SoftKiller (grid-size parameter of 0.5) is applied on the neutral particles in the event.
• PUPPI: The implementation is taken from the comparisons/review in ref. [79] version 1.0.0, which is the original implementation provided by the PUPPI authors in the context of the 2014 Pileup Workshop.
• Jet-by-jet CS: All charged pileup particles are discarded. Same ρ estimation as for the Area Subtraction is used. Only neutral particles are corrected using CS with parameters: ∆R max = ∞, α = 0, and A g = 0.01.
• Event-wide CS: All charged pileup particles are discarded. Same ρ estimation as for the Area Subtraction is used. Only the neutral particles are corrected using the Event-wide CS with parameters: ∆R max = 0.25, α = 1, and A g = 0.005.
• CS+SoftKiller: All charged pileup particles are discarded. First Event-wide CS is applied as described in the previous point. Then the corrected neutral particles are further corrected using SoftKiller (grid-size parameter of 0.6).
• ICS: All charged pileup particles are discarded. Same ρ estimation as for the Area Subtraction. Only the neutral particles are corrected using the ICS method with two iterations and without ghost removal. Parameters: α = 1 and A g = 0.005 for both iterations, ∆R max = 0.2 and ∆R max = 0.15 for the first and the second iteration, respectively.
FastJet v3.3.2 is used for jet clustering, background estimation and Area Subtraction. FastJet Contrib v1.038 is used for CS-based methods and SoftKiller. To compare the performance of the methods, the average bias and resolution for the jet p T and jet mass are evaluated. From each event, the two true hardest jets are selected with p T > 20 GeV and |y| < 2.5. Each corrected jet is matched to the closest true jet requiring ∆y 2 + ∆φ 2 < 0.3 (the matching efficiency is above 99.5%). To follow recommendations from the workshop, the average jet p T bias, ∆p T , and p T resolution, σ ∆p T , is evaluated as the average and RMS from the p T difference between true and matched corrected jet p T , respectively. Similary, the mass bias, ∆m , and resolution, σ ∆m , are evaluated.
The performance of the Area Subtraction, Jet-by-jet CS, Event-wide CS and ICS methods is shown in figures 7 and 8. Qualitatively, the differences between the individual methods are the same as presented in section 4.4 where, however, a different signal sample and detector simulation are used. Both, Event-wide CS and ICS, improve the resolution significantly with respect to the Area Subtraction and Jet-by-jet CS. The ICS method has slightly better resolution with smaller biases compared to the Event-wide CS.
The performance of the PUPPI, SoftKiller, CS+SoftKiller, and ICS methods is shown in figures 9 and 10. All the compared methods lead to varying levels of bias for some observables or kinematic selections. For the majority of the N P U and kinematic selections, the best performance is achieved by the ICS method. For all the N P U and kinematic selections, PUPPI provides slightly worse resolution of p T compared to ICS which is worsening with increasing p T . On the contrary, for high N P U , the performance of PUPPI is systematically better compared to ICS in terms of the reconstruction of mass. In the low N P U environment the performance of ICS and CS+SoftKiller is almost identical, but for a higher N P U environement, CS+SoftKiller gives larger negative bias. On the contrary, SoftKiller alone gives systematically positive bias. This oversubtraction or undersubtraction of SoftKiller might be avoided by further optimizing the SoftKiller parameters to avoid cutting out a part of the signal or to allow for cutting out more of the background. The SoftKiller performance may also be improved by applying the protected zeroing which removes all neutral particles below certain p T threshold if a given particle is not located close to a charged particle from a hard-scattering process as discussed in ref. [25]. We should emphasize that in this study, the only method which uses the information about charged particles in the subtraction is PUPPI. In general, using the information from charged particles is expected to improve the performance of the subtraction. Systematic study of the methods which use the information from charged particles within the context of constituent-subtraction-based algorithms goes beyond the scope of this paper and is planned for a separate study.

Conclusions
We presented a new background mitigation method for jet kinematics, jet substructure observables, and event observables (such as missing transverse energy), called Iterative Constituent Subtraction. This method is applicable to both pileup effects in proton-proton collisions and underlying event contributions in heavy-ion collisions. Iterative Constituent Subtraction extends the Constituent Subtraction method from ref. [8] to an iterative eventwide subtraction algorithm with improved features and performance. The new algorithm has been tested using hadronic jets from Z → tt, as well as light quark and gluon dijet processes, and it was compared to various other algorithms including Jet-by-jet CS, Eventwide CS, Area Subtraction, SoftKiller, and PUPPI. Iterative Constituent Subtraction has been shown to significantly improve the performance of pileup subtraction in proton-proton collisions in terms of bias and resolution of the jet kinematics and substructure observables compared to the Area Subtraction, Jet-by-jet CS, and Event-wide CS. The improvement was also observed with respect to other methods for a large number of pileup and kinematic configurations. The new method has potential to improve the background mitigation at both proton-proton and heavy-ion colliders such as the LHC and RHIC.

A Parameters of the event-wide CS and ICS
As discussed in section 2, the CS procedure has three free parameters to control the subtraction: ∆R max , α, and A g . These parameters have weak impact in the jet-by-jet CS, and the recommended values in that configuration are ∆R max = ∞, α = 0, and A g ≤ 0.01. On the contrary, the event-wide CS is much more sensitive to the ∆R max parameter and also the α parameter can have non-negligible effect. This section serves as a general guidance how to set the three parameters for the event-wide CS. Conclusions presented here can be  applied on ICS as well. However, these conclusions may not be perfect for a specific detector environment. Therefore the experiments are encouraged to optimize the parameters in their own environment. The recommended starting point of tests for both event-wide CS and ICS is: ∆R max = 0.25, α = 1, A g = 0.0025 for anti-k t R = 0.4 jets and ∆R max = 0.7, α = 1, A g = 0.0025 for anti-k t R = 1.0 jets. We provide justification for these recommendations and a basic analysis of the sensitivity to the choice of parameters in the following sub-sections. The performance studies presented here use the setup described in section 4.3.
A.1 Maximal distance between ghost-particle pairs, ∆R max The parameter ∆R max controls the maximal allowed distance between ghost-particle pairs. Only ghost-particle pairs which have ∆R < ∆R max are combined in the algorithm. By setting a finite ∆R max , one can avoid combining ghost-particle pairs which are too far from  each other. However, in this case, certain amount of estimated pileup can remain in the event, while by setting ∆R max = ∞ it is ensured that all the estimated pileup represented by ghosts is subtracted from particles, although not necessarily at a naturally small distance between ghosts and particles.
We investigated the amount of remaining pileup for various ∆R max values by evaluating the scalar p T sum from all particles in the event. The average scalar p T sum for true events (events without pileup) for the used physics process is p true T ≈ 1 TeV. The average scalar p T sum for the same events with pileup after correction, p T , varies depending on the ∆R max parameter as shown in figure 11. For ∆R max = ∞, the p T ≈ p true T for the whole N P U range, which means that on average, the expected pileup deposition is entirely subtracted. This is expected, but this evaluation represents an important crosscheck to see that the used background estimation is not biased. For finite ∆R max , certain  The justification for using small values of ∆R max comes from the actual performance on jets, see figures 12 and 13. When using ∆R max = ∞, the jets are largely overcorrected. The reason for this are the fluctuations of pileup in the y − φ space. In case of no fluctuations, the ghosts subtract the p T added by pileup rather accurately. However, the presence of fluctuations can cause that ghosts are more often matched to a hard-scatter particle, which may be distant, which leads to the overcorrection. resolution. In any case, it is recommended to apply an additional correction to remove the remaining pileup which can be addressed by the Iterative Constituent Subtraction method described in section 3. Other possibility is to use a different method. For example, SoftKiller applied after the event-wide CS with ∆R max = 0.25 was found to be one of the best methods in studies from the ATLAS Collaboration [37].

A.2 α parameter
By using α > 0, one can prioritize ghost-particle pairs with lower particle p T during the subtraction procedure. This may have positive effect on the performance since it is expected that the pileup particles have lower p T than the hard-scatter particles. The optimal value of the α parameter depends on many factors: granularity of the detector, possible p T cuts applied to all particles, jet definition, ∆R max parameter and also on the observable in

question.
The dependence of the event-wide CS performance on α is shown in figures 14 and 15. In general, the smaller the ∆R max , the smaller is the effect of the α parameter. This is expected since with smaller ∆R max , each ghost has smaller freedom to move to match with a particle during the CS procedure. The choice of α parameter has almost no effect for the  Figure 13: Dependence of resolution on the ∆R max parameter for the event-wide CS. The other CS parameters are α = 0 and A g = 0.0025.
anti-k t R = 1.0 jets when using small values of ∆R max (e.g. ∆R max = 0.25). In this case, each ghost is active over an area which is much smaller than the area of the jet, that is locally, and the choice of α has no real impact.
We found that the choice α = 1 for anti-k t R = 0.4 and ∆R max = 0.25 keeps the biases low while the resolution is maximized. For anti-k t R = 1.0 jets, ∆R max = 0.7 and α = 1 is preferable.

A.3 Ghost area A g
The smaller the value of A g , the more densely are the ghosts distributed in the y − φ space which leads to better performance of the CS procedure. On the other hand, the smaller the value of A g , the higher is the number of ghosts which implies a longer computational time. Therefore in practice, a compromise needs to be done. The optimal value of A g may depend on the jet definition, granularity of the detector, and the ∆R max parameter in the CS procedure. We found that A g = 0.0025 gives the best performance and using smaller A g than 0.0025, does not lead to any significant improvement, it just brings a longer computational time. By using A g = 0.01, we observe on average 4-times faster correction with subtle worsening of performance (relatively up to ∼2% worse resolution for some observables).

B Treatment of massive particles
In the case of massless particles, there are three degrees of freedom for each particle 4momentum. The CS procedure corrects p T of each particle, while the rapidity and azimuth

Bias [%]
Jet-by-jet CS =0) α =0.25,   are kept unchanged. In that way, it is ensured that the anti-k t jet clustering algorithm clusters the same particles 4 . Since y = η for massless particles, also the momentum direction is kept unchanged which ensures good reconstruction of some jet observables such as the jet mass.

Resolution / Resolution
In the case of massive particles, there are four degrees of freedom for each particle 4-momentum. The CS procedure corrects the p T of each particle to compensate for the presence of background particles on average. Then the particle azimuth should be kept unchanged since there is no reason why the presence of background would change the azimuth. However, it is non-trivial to decide how to treat the remaining two degrees of freedom for the particle 4-momentum. One straightforward option is to keep the mass of each particle unchanged. For the remaining degree of freedom, two obvious options are available: keep rapidity or pseudo-rapidity unchanged. Due to the fact that the anti-k t jet clustering algorithm uses difference of rapidities in the definition of the distance measure, there are the following caveats for the two options: 1. Keeping mass and rapidity unchanged -in this case, it is ensured that the difference of rapidities in the anti-k t algorithm of each particle pair is not modified after the correction, so the particle content of the final jets can be similar to the particle content of true jets. However, in the limit when the corrected particle p T → 0, the zcomponent of momentum p z → m sinh y and pseudo-rapidity η → ±∞, which means that after such correction, the central anti-k t jets may contain particles with nonnegligible momentum pointing to totally different direction than the high p T particles in that jet. This causes a very large bias for some jet observables such as jet mass.
2. Keeping mass and pseudo-rapidity unchanged -in this case, the difference of rapidities in the anti-k t algorithm of each particle pair can be modified which may lead to jets with different particle content compared to the true jets. Especially, in the limit when the corrected particle p T → 0, the rapidity y → 0 and the energy E → m. Therefore after the correction, jets near η = 0 typically contain a lot of constituents with non-zero mass but very small energy.
To summarize these two options we can say that although the particle mass is usually negligible compared to the particle energy, we see that it cannot stay unchanged due to the fact that the anti-k t jet clustering algorithm uses rapidity difference in its distance measure. Hence, it is important to modify the mass of the particles. There are several other options how to treat the two remaining degrees of freedom: 3. Set the particle masses to zero. Keep the rapidity unchanged. 4. Set the particle masses to zero. Keep the pseudo-rapidity unchanged.
5. Do correction of variable m δ = p 2 T + m 2 − p T in the same way as it is described in the original CS procedure [8] which was based on the approach in ref. [7]. Keep the rapidity unchanged.
6. Do the same correction of m δ as in the previous option, just keep the pseudo-rapidity instead of rapidity unchanged.
7. Keep the rapidity and pseudo-rapidity unchanged. This option is equivalent with scaling the original 4-momentum by a factor corresponding to the ratio between the corrected and original p T of the particle.
We note that by setting the particle mass to zero (options 3 and 4), a bias is introduced for N P U = 0 by definition (since all the hard-scatter particles are modified to be massless).
There is no such bias for options 5 − 7. The options 5 and 6, which use the m δ correction, can still suffer from the same effects as described above the for options 1 and 2, respectively, thought with smaller impact on the performance. The options 3, 4, and 7 are not affected by the effects described above for options 1 and 2.
We investigated the performance of the individual options using the ICS method in the same conditions as described in section 4.3 but without the detector simulation. 5 We found that the jet performance for options 1 and 2 is much worse than for options 3 − 7, especially for the jet mass. The performance for jet η, jet p T , jet width and subjettiness ratios is very similar among the options 3 − 7. From the tested jet observables, only the jet mass has large dependence on the chosen strategy for correcting the particle 4-momenta, therefore we discuss the performance for the jet mass in the following.
First, the importance of modifying the particle mass is demonstrated in figure 16a showing the jet mass distribution for anti-k t R = 1.0 jets with true p T = 200 − 250 GeV (containing mainly decay products from the W boson). The true distribution is peaked around the W boson mass. With pileup, the original distribution is broadened and shifted to higher values. After ICS correction with keeping the original mass and rapidity of particles (option 1), the jet mass distribution is partially corrected, but still it is significantly shifted away from the true distribution. When using ICS correction with keeping the original mass and pseudo-rapidity (option 2), the corrected distribution agrees with the true distribution a bit better, but still it has a significant tail from jets biased towards high jet mass. In contrast, when using one of the options without keeping the original mass, e.g. option 7 (keeping original rapidity and pseudo-rapidity), the corrected mass distribution agrees much better with the true distribution.
Next, we investigated the bias and resolution of the jet mass systematically for options 3 − 7. We found that option 6 has bias above 20% and much worse resolution than the other four options. The option 5 has bias up to 10% and resolution comparable with the options 3, 4, and 7. These remaining options have bias below 2%. The jet mass resolution for all these options using ICS as a function of N P U is shown in figure 16b compared to jet-by-jet CS (which uses option 3). As demonstrated in this figure, we found that setting the particle mass to zero (options 3 and 4) performs the best for the anti-k t R = 1.0 jets with true p T > 200 GeV, while option 7 has only slightly worse resolution and option 5 has worse resolution by 5 − 10%. We found that the bias introduced by setting the particle mass to zero (options 3 and 4) gets negligible for N P U > 5. We also observed that this jet mass bias is larger for low p T anti-k t R = 0.4 jets where the option 7 performs slightly better in general compared to options 3 and 4.  Figure 16: Jet mass distribution (left) for the true jets, jets with pileup, and ICS corrected jets for three different approaches for the treatment of massive inputs: two approaches in which the particle mass is unchanged and one approach in which the rapidity and pseudorapidity is kept unchanged. Jet mass resolution (right) for six correction methods. The last bin is the overflow bin.
We conclude from our studies, that keeping the original rapidity and pseudo-rapidity (option 7) leads to the optimal performance for the two studied jet definitions over the given phase space. However, the experiments using massive inputs are encouraged to check our findings for the exact setup of the analysis they intend to perform.
There is one other aspect of the background subtraction with massive particles. As mentioned in section 2, a user-defined parameter η max is used, which ensures that only particles with |η| < η max are corrected by constructing the ghosts up to the same |η| limit. However, the available software for ρ estimation in FastJet uses particle rapidities and not pseudo-rapidities. Therefore, in the case of massive particles, it is important that the user uses only particles with |η| < η max to estimate the ρ and also to derive the rapidity dependence used for background rescaling.