Emulating the impact of additional proton-proton interactions in the ATLAS simulation by presampling sets of inelastic Monte Carlo events

The accurate simulation of additional interactions at the ATLAS experiment for the analysis of proton– proton collisions delivered by the Large Hadron Collider presents a signiﬁcant challenge to the computing resources. During the LHC Run 2 (2015–2018), there were up to 70 inelastic interactions per bunch crossing, which need to be accounted for in Monte Carlo (MC) production. In this document, a new method to account for these additional interactions in the simulation chain is described. Instead of sampling the inelastic interactions and adding their energy deposits to a hard-scatter interaction one-by-one, the inelastic interactions are presampled, independent of the hard scatter, and stored as combined events. Consequently, for each hard-scatter interaction, only one such presampled event needs to be added as part of the simulation chain. For the Run 2 simulation chain, with an average of 35 interactions per bunch crossing, this new method provides a substantial reduction in MC production CPU needs of around 20%, while reproducing the properties of the reconstructed quantities relevant for physics analyses with good accuracy.


Introduction
The excellent performance of the Large Hadron Collider (LHC) creates a challenging environment for the ATLAS and CMS experiments. In addition to the hard-scatter protonproton ( pp) interaction which is of interest for a given physics analysis, a large number of inelastic proton-proton collisions occur simultaneously. These are collectively known as pileup. The mean number of these inelastic pp interactions per bunch crossing, μ, also known as the pile-up parameter, characterises the instantaneous luminosity at any given time 1 .
For physics analyses, pile-up is conceptually similar to a noise contribution that needs to be accounted for as it is unrelated to the hard-scatter event that is of interest for the analysis. Since nearly all analyses rely on Monte Carlo (MC) Fig. 1 The μ distribution observed for the ATLAS Run 2 data, for each year (2015-2018) separately and for the sum of all years [4] simulation to predict the detector response to the physics process, it is crucial that the pile-up is modelled correctly as part of that simulation. The goal of the ATLAS MC simulation chain is to accurately reproduce the pile-up, such that it can be accounted for in physics analyses.
Within ATLAS, the pile-up is emulated by overlaying soft inelastic pp interactions, in the following called minimumbias interactions, generated with an MC generator, normally Pythia [1], according to the pile-up profile for a given datataking period. Figure 1 shows the μ distribution for each year during Run 2 (2015-2018) and the sum of all years. The mean value is 34.2, but the distribution is broad and generally covers values between 10 and 70. The small peak at μ ∼ 2 arises from special running periods with rather low luminosity. At the High Luminosity LHC (HL-LHC), μ is expected to increase to about 200 [2]. The inelastic interactions include non-diffractive and diffractive interactions based on the Donnachie-Landshoff [3] model for the cross sections of the individual processes.
The simulation chain for MC events contains several steps, starting from the generation of the interactions with an MC generator (e.g., Pythia, Sherpa [5]). The interactions of the generated particles with the ATLAS detector are simulated using a Geant4-based [6] simulation framework [7]. This is performed separately for the hard-scatter interactions of interest and a large number of minimum-bias interactions. Next, the readout of the detector is emulated via a process known as digitisation, which takes into account both the hardscatter and any overlapping minimum-bias interactions. In this article, two methods of performing the digitisation are compared. The goal of the new method, described below, is to reduce the computing resources required by creating a large set of pile-up events only once for an MC production campaign and then reusing these events for different hardscatter events. A similar method has been explored by the CMS collaboration [8].
In the first method, referred to as standard pile-up hereafter, the hard-scatter interaction and the desired number of minimum-bias interactions are read in simultaneously during the digitisation step and the energy deposits made by particles are added for each detector element. Then, the detector readout is emulated to convert these into digital signals, which are finally used in the event reconstruction. This method creates the pile-up on demand for each hard-scatter event, and has been used up to now for all ATLAS publications based on pp collisions. In the second (and new) method, referred to as presampled pile-up hereafter, this same procedure is followed but for the set of minimum-bias interactions alone, without the hard-scatter interaction. The resulting presampled events are written out and stored. Then, during the digitisation of a given hard-scatter interaction, a single presampled event is picked and its signal added to that of the hard-scatter interaction for each readout channel. This combined event is then input to the event reconstruction. In contrast to the first method, the same presampled pile-up event can be used for several hard-scatter interactions. For both methods, the μ value to be used is sampled randomly from the data μ distribution, such that the ensemble of many events follows the μ distribution of the data.
If the detector signals were read out without any information loss, the two methods would give identical results. However, in reality, some information loss occurs due to readout thresholds applied or custom compression algorithms designed to reduce the data volume. This can lead to differences in the reconstructed quantities used in physics analyses. While in most cases for ATLAS, these differences were found to be negligible, in some cases, corrections were derived to reduce the impact on physics analyses, as is discussed in Sects. 5-8. Within the ATLAS Collaboration, a significant validation effort took place to ensure that this presampled pile-up simulation chain reproduces the results from the standard pileup simulation chain accurately, so that there is no impact on physics analyses whether one or the other is used. To this end, thousands of distributions were compared between the presampled and standard pile-up simulation chains. In this article, a representative subset of relevant distributions is shown. Only comparisons between the two methods are shown in this article; detailed comparisons of data with simulation can be found in various performance papers; see, e.g., Refs. [9][10][11][12][13][14].
The motivation for using the presampled pile-up simulation chain in the future is that it uses significantly less CPU time than the standard pile-up simulation chain. As is discussed in Ref. [15], savings in CPU, memory, and disk space requirements are pivotal for the future running of the ATLAS experiment. Additionally, the presampled pile-up simulation chain can also be seen as a step towards using minimum-bias data, instead of presampled simulated events, for emulating the pile-up, which could potentially improve the accuracy of the modelling of the pile-up interactions. However, the pileup emulation with data is not yet validated and not the subject of this article.
The article is organised as follows. A description of the ATLAS detector is given in Sect. 2, highlighting the aspects that are most relevant for the pile-up emulation. Section 3 describes both the standard and presampled pile-up simulation chain, and Sect. 4 compares their CPU and memory performances. In Sects. 5-8, the challenges in the inner detector, calorimeters, muon system, and trigger are described and comparisons of the impact of the old and new methods are shown. For these comparisons, a variety of different MC samples are used, based on what is most appropriate for the validation of a given object within the particular detector subsystem.
For all studies presented in this article, unless otherwise stated, the distribution of the average number of events per bunch crossing follows the distribution observed in the ATLAS data in 2017, with an average μ value of 37.8 (see Fig. 1). The ATLAS detector configuration corresponds to that of Run 2. As the detector configuration evolves in the future, the new presampled pile-up method will need to be validated for those new detector elements.

ATLAS detector
The ATLAS detector [16] at the LHC covers nearly the entire solid angle around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroidal magnets. A two-level trigger system is used to select interesting events [17]. The first-level (L1) trigger is implemented in hardware and uses a subset of detector information to reduce the event rate from 40 MHz to 100 kHz. This is followed by a software-based high-level trigger (HLT) which reduces the event rate to an average of 1 kHz.
At the LHC, typically 2400 bunches from each of the two proton beams cross each other at the ATLAS interaction point per beam revolution, with one bunch crossing (BC) taking place every 25ns. In each BC, several pp interactions may occur. Whenever an L1 trigger signal is received for a given BC, the entire detector is read out and processed in the HLT to decide whether the event is stored for further analysis.
The inner detector (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the pseudorapidity 2 range |η| < 2. 5. The high-granularity silicon pixel detector (Pixel), including an insertable B-layer (IBL) [18,19] added in 2014 as a new innermost layer, covers the vertex region and typically provides four measurements per track; the first hit normally being in the innermost layer. It is followed by the silicon microstrip tracker (SCT) which usually provides four two-dimensional measurement points per track. These silicon detectors are complemented by a straw tracker (transition radiation tracker, TRT), which enables radially extended track reconstruction with an average of ∼ 30 hits per track up to |η| = 2.0. Additionally, the transition radiation capability provides separation power between electrons and charged pions.
The calorimeter system covers the pseudorapidity range |η| < 4. 9. Within the region |η| < 3.2, electromagnetic (EM) calorimetry is provided by barrel (EMB) and endcap (EMEC) high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters, with an additional thin LAr presampler covering |η| < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillator-tile (Tile) calorimeter, segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadronic endcap calorimeters (HEC). The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter (FCAL) modules optimised for electromagnetic and hadronic measurements, respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a toroidal magnetic field generated by the superconducting air-core magnets. The field integral of the toroids ranges between 2.0 and 6.0 T across most of the detector. A set of precision chambers covers the region |η| < 2.7 with three stations of monitored drift tubes (MDTs), complemented by cathode strip chambers (CSCs) in the forward region, where the background is highest. The muon trigger system covers the range |η| < 2.4 with resistive plate chambers (RPCs) in the barrel, and thin-gap chambers (TGCs) in the endcap regions.
The integration times of the different subdetectors vary significantly, mostly due to the charge drift times depending on the material and geometry of the respective detector system. In most cases, the integration time exceeds 25 ns, i.e., the time between two BCs. In such cases, the signal from events that occurred in previous BCs contaminates the signal in the triggered BC. This is often referred to as out-of-time pile-up and needs to be considered for the simulation, in addition to the in-time pile-up which accounts for signals generated the LHC ring, and the y-axis points upwards. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). Angular distance is measured in units of by interactions occurring inside the BC corresponding to the hard-scatter event. Figure 2 shows the readout windows considered for the simulation of each of the detector systems. The MDTs have the longest integration time, 750 ns, with 32 BCs prior to the trigger and 6 BCs after the trigger being considered. For the LAr calorimeter, it is only slightly shorter. For the inner detector (Pixel, SCT, and TRT), the integration time is much shorter, and only the 1-2 BCs before and after the trigger need to be considered.

Overview of simulation chain
As is described above, the ATLAS simulation chain [7], used to produce MC samples to be used in physics and performance studies, is divided into three steps: generation of the event and immediate decays, particle tracking and physics interactions in the detector, based on Geant4 (G4), and digitisation of the energy deposited in the sensitive regions of the detector into voltages and currents to emulate the readout of the ATLAS detector. This simulation chain is integrated into the ATLAS software framework, Athena [20]. Finally, a series of reconstruction algorithms is applied in the same way as for the data, where final physics objects such as jets, muons, and electrons are reconstructed [16]. Each step can be run as an individual task, but to save disk space, the digitisation step is usually performed in the same task as the reconstruction step, such that the intermediate output format from the digitisation step only needs to be stored locally on the computing node and can be discarded after the reconstruction step is finished.
The G4 simulation step is run by itself and, since it is independent of the detector readout configuration, the trigger, and the pile-up, it is often run significantly earlier than the digitisation and reconstruction, which depend on these aspects. The G4 simulation is the most CPU intensive, and thus, it is desirable to run this as rarely as possible.
The ATLAS digitisation software converts the energy deposits (HITS) produced by the G4 simulation in the sensitive elements into detector response objects, known as digits. A digit is produced when the voltage or current of a particular readout channel rises above a preconfigured threshold within a particular time window. Some of the subdetectors read out just the triggered BC, while others read out several bunch crossings, creating digits for each. For each digit, some subdetectors (e.g., SCT) record only the fact that a given threshold has been exceeded, while others (e.g., Pixel or LAr) also retain information related to the amplitude. The digits of each subdetector are written out as Raw Data Objects (RDOs), which contain information about the readout channel identifier and the raw data that are sent from the detector front-end electronics.
For any given hard-scatter interaction, the additional pileup interactions must be included in a realistic model of the detector response. For this purpose, minimum-bias events are generated using the Pythia event generator with the NNPDF2. 3LO [21] parton distribution function and the A3 [22] set of tuned parameters, and then simulated and stored in separate files. To avoid potential issues due to low statistics of relatively hard minimum-bias events, containing objects with high transverse momentum ( p T ), the sampled events are in fact split into two distinct equal-sized samples: a highp T sample composed of events containing jets or photons with p T > 35 GeV and a lowp T sample composed of the remaining events. Events are then selected from each sample based on their relative cross-section to avoid duplicating distinctive hard events, which may give rise to visible features during analysis.
In the current standard pile-up simulation chain, the simulation files of both the hard-scatter event and the desired number of minimum-bias events are read in concurrently at the digitisation step and the HITS are combined. For each hard-scatter event, a value of μ is assigned by randomly sampling the μ distribution corresponding to the relevant datataking period. Most subdetector responses are affected by interactions from neighbouring bunch crossings: as is shown in Fig. 2, up to 32 BCs before and 6 BCs after the triggering BC may contribute signal to the trigger BC. For the average μ value of 37.8 during 2017 data taking, this implies that simulating the impact of pile-up on any given hard-scatter event requires approximately (32 + 1 + 6) × 38 = 1482 minimum-bias events on average to be selected at random (from the simulated event files) and processed as part of the digitisation step. Each of these bunch crossings is taken to have the same value of μ as the trigger bunch crossing 3 . The number of minimum-bias events (N ) to include for each bunch crossing is drawn at random from a Poisson distribution with a mean of the μ value for that bunch crossing. After the energy deposits in the trigger BC due to all contributing BCs have been combined, the detector response is emulated. This workflow is illustrated in Fig. 3.
The new presampled pile-up simulation chain is illustrated in Fig. 4. Rather than digitising the minimum-bias interactions each time a hard-scatter event is produced, a large sample of pile-up events is produced by pre-combining the simulated pile-up interactions, according to the μ distribution of the data campaign, during a separate digitisation step, termed presampling 4 . Here, the sampling is done exactly as for the standard pile-up, the only difference being that there is no hard-scatter event. These presampled pile-up events are written out in RDO format as pile-up RDO datasets and typically contain several million events. Each simulated hardscatter interaction is then digitised and combined with an event sampled from these pile-up datasets (step 3 in Fig. 4, called overlay). Here, instead of HITS for each channel, the signals of the RDO or digit (depending on the subdetector) in the hard-scatter event and the presampled event are overlaid. To avoid double-counting, instrumental noise associated with the detector electronics is included solely in the presampled RDOs and not included in the subsequent hard-scatter digitisation. Since the digitisation, presampling, and reconstruction steps are typically combined into a single task in the production workflow, the output is written locally to an RDO file that is then input to the reconstruction software; this local RDO file is subsequently discarded. The pile-up RDO datasets necessary for a given digitisation task are about five times smaller than the many minimum-bias HITS required in the standard pile-up simulation chain.
The main benefit of the presampled pile-up simulation chain is that the CPU and I/O requirements of the digitisation are significantly lower and have a much smaller dependence on μ, as is discussed in Sect. 4. However, if a threshold or compression has been applied to the signal when writing the RDO/digit, this results in some loss of information and thereby could reduce the accuracy of the simulation when using the presampled pile-up method, as is discussed in Sects. 5-8. For all the comparisons shown in these sections, the hardscatter events are identical for the two methods, but the pileup events are different. This makes the estimation of the uncertainties difficult as the hard-scatter is fully correlated, while the pile-up is not. As most of the quantities are selected to be sensitive to pile-up, the uncertainties are calculated assuming the two samples are uncorrelated, but in some distributions, this leads to an overestimate of the uncertainties, e.g., in the reconstruction efficiencies of tracks and leptons and in the trigger efficiencies.

Computing performance comparison
In this section, the performances of the two simulation chains are compared in terms of CPU time, memory usage, and I/O. The validation in terms of physics performance is presented in subsequent sections.
The main computing performance benefit of the presampled pile-up simulation chain stems from the fact a pile-up dataset is only created once per MC production campaign, and then, the individual events within that dataset are used for 3 Page 6 of 35 Comput Softw Big Sci (2022) 6:3

Fig. 4
The presampled pile-up workflow schema. The oval steps represent an action, while the boxes represent data files of a given format. The final box is the reconstructed data in analysis format multiple hard-scatter MC samples, as opposed to being created on demand independently for each MC sample. An MC production campaign happens typically once per data-taking period and comprises billions (B) of hard-scatter events and thousands of individual samples. A sample is defined as a set of MC events generated using the same input parameters, e.g., a sample of tt events produced by a certain MC generator with a given set of input parameters. The same presampled pile-up event can thus be overlaid on many different hard-scatter events from different MC samples. In doing so, care needs to be taken to ensure that no undesirable effects on physics analyses occur due to reusing the same pile-up events, as is discussed below.
In ATLAS, typically 70% of the CPU resources are devoted to MC production via the simulation chain; the remainder is used for data processing and user analyses. At present, with the Run 2 pile-up profile, the simulation chain CPU usage is broken down into about 15% for event generation, 45% for G4 simulation, 20% for digitisation, and 20% for other tasks (reconstruction, trigger, and event writing). The presampled pile-up scheme decreases the digitisation time to a negligible level by reusing the pile-up events more efficiently and thus reduces the overall CPU resources required for MC production by about 20%, as is discussed below.
The average CPU time per event in the standard and presampled pile-up simulation chains as a function of μ is shown in Fig. 5. As can be seen, both depend linearly on μ, but the slope is about 50 times larger for the standard pile-up than for the presampled pile-up simulation chain. For the standard pile-up simulation chain, the CPU time required at μ = 70 is 7.5 times larger than for μ = 10, while for the presampled pile-up method, the corresponding increase in CPU time is only a factor of 1. 2. Extrapolating this to μ = 200, the CPU time is 20 times greater than for μ = 10 for the standard method and < 2 times higher for the presampled pile-up method. However, this comparison does not account for the CPU time required for the production of the presampled pileup dataset, which is needed to assess the overall CPU benefit in a realistic campaign, as is discussed below.  Figure 6 shows the memory used by the various steps as a function of time for the different production steps for the two simulation chains. The time estimate is based on running 2000 hard-scatter events for the 2017 μ distribution on the same CPU in all cases, so that the three scenarios can be directly compared. The absolute number, of course, depends on the CPU used and the μ distribution. The presampling takes about 7 s per event. The standard digitisation takes about 8 s per event, while the hard-scatter digitisation and overlay of the presampled pile-up take about 0.5 s. The remaining steps, which are the same for the two simula-tion chains, take about 8 s and include the trigger emulation, reconstruction, and the writing of the analysis format to disk.
When comparing the required CPU time between the two chains, the following equations provide a good approximation. For the standard pile-up simulation chain, the time T standard required is simply given by the number of events in the campaign times the total time t digi + t other , where t other is the sum of the times needed for reconstruction, trigger, and writing the event to disk. Thus where N MC-campaign is the number of hard-scatter events produced in a given MC-campaign.
For the presampled pile-up simulation chain, the time T presample required is given by the number of events in the campaign times the time needed for the overlay step and other aspects plus the time required for the presampling. This last contribution is given by the total number of presampled pileup events required (N pp ) multiplied by the event digitisation time, so that the required time is The time reduction factor of the presampled pile-up simulation chain compared to the standard is then given by where the approximation t overlay t other is made, based on the observations from Fig. 6.
It is immediately clear that the presampled pile-up simulation chain uses less CPU time than the standard pile-up simulation chain, since N pp < N MC-campaign . Choosing the exact value for N pp , however, is not trivial. In general, the reuse of a given presampled pile-up event within a particular MC sample, representing an individual hard-scatter physics process, should be avoided if possible; otherwise, each overlaid hard-scatter plus pile-up event would not be statistically independent. Such oversampling would be particularly worrisome if the presampled pile-up event in question contained a distinctive feature, such as a high-transverse-momentum jet, which could cause difficulties in using the MC sample for the statistical interpretation of the data distributions. In practice, such a repetition would not be statistically significant in the bulk of a distribution, but could be problematic in the tails, where there are few events. Given this, it is reasonable that the value for N pp be chosen to be about the size of the largest individual MC sample, so that no event is repeated within it.
For the ATLAS Run 2 MC-campaign, N MC-campaign ∼ 10 B and the single largest individual MC sample had a size of 0.2 B events. Allowing for some increase in these sizes to be commensurate with the size of the evolving data samples, N pp ∼ 0.5 B should thus be sufficient. Taking the resulting N MC-campaign /N pp ∼ 20, along with t other ≈ t digi (as seen in Fig. 6), the ratio of the times required for the two methods is T presample /T standard ∼ 0.53. Hence, the presampled pile-up simulation chain provides a CPU saving of 47% for the combined digitisation, reconstruction, trigger, and writing steps, compared to the standard pile-up simulation chain. If the time required for reconstruction and trigger is further improved (as is planned for Run 3), or the digitisation time were to further increase due to pile-up, the ratio would decrease; e.g., if t other ≈ t digi /2, a CPU saving of 63% would be realised. The ∼ 50% reduction translates to an overall MC production CPU saving of around 20%, since 60% of the CPU time required for the simulation chain is at present used for event generation and G4 simulation, which is not affected by this improvement. These are illustrative examples that confirm the intuitive expectation that performing the digitisation just once per campaign is much more effective than doing it for each simulated hard-scatter event, as the number of presampled events needed is by construction smaller than the number of hard-scatter events.
From the memory usage point of view, the presampled pile-up load is similar to the standard pile-up and well below the (soft) production limit of ∼ 2 GB per core (see Fig. 6) for the μ values observed during Run 2 and expected for Run 3. However, compared to the standard pile-up, the presampled pile-up simulation chain puts less stress on the I/O system both because, as is mentioned above, the presampled pile-up dataset files are about a factor of five smaller and because they can be read sequentially. The sequential reading is possible, because the random access necessary to combine the minimum-bias input files in the standard pile-up is now performed only once at the presampling stage. Hence, the presampled pile-up RDO production, with its heavier requirements, can be performed on a limited subset of ATLAS MC production sites designed to cope well with such workloads; the subsequent presampled pile-up simulation chain will then run on all resources available to ATLAS, utilising the ≈ 20% of sites that have previously been excluded for reconstruction due to insufficient I/O or disk resources. The reduced input size also enables the usage of opportunistic resources such as high-performance computing (HPC) sites, which typically have less available storage to ATLAS. The smaller I/O requirements from the presampled pile-up simulation chain jobs simplify the production workflow, and make it possible to transfer the pile-up datasets on demand to the computing node at a given production site, where they are needed. If network speed is further increased in the future, it might even be possible to access them directly via the network during the job from a remote storage site. writing The Analysis Object Data (AOD) event size written to disk is the same for both methods, i.e., there is neither advantage nor disadvantage in using the presampled pile-up simulation chain in this regard. However, the many simulated minimumbias events do not have to be distributed as widely any more throughout the year as they only need to be accessed once for creating the presampled events. These presampled events need to be made available widely though. It is expected that these two effects roughly cancel out, but operational experience is needed to understand how to distribute the presampled sample in the most effective way.

Inner detector
The ID consists of three subdetectors which all use different technologies as discussed in Sect. 2. Each of them has separate digitisation software and hence a different treatment for the presampled pile-up procedure is required for each. In this section, the readout of the three ID subdetectors is described, along with the presampled pile-up procedure for each. Validation results are also presented. 5.1 Detector readout

Silicon pixel detector
The charge produced by a particle traversing a silicon pixel is integrated if it passes a set threshold. In Run 2, this threshold is typically around 2500 electrons for the IBL and 3500 electrons for the remainder of the Pixel detector. The resulting charge deposited by a minimum-ionising particle (MIP) that traverses a single pixel is typically 16,000 and 20,000 electrons, respectively. The amount of charge deposited by a particle traversing the detector varies depending on the path length of the particle through the active silicon and can be spread across multiple pixels. The length of time during which the charge signal exceeds the threshold, termed time-over-threshold (ToT), is recorded. The ToT is roughly proportional to the charge. While most of the charge drifts to the pixel readout within the 25 ns bunch crossing time of the LHC, there is a small fraction which may take longer and only arrive in the subsequent bunch crossing (BC+1). Thus, in any given bunch crossing, the pile-up events both from the previous and the current bunch crossings contribute hits.

Silicon microstrip detector (SCT)
For the SCT, the readout is in principle similar to the Pixel detector in that a threshold is applied for each strip. However, in contrast to the pixel readout, it is purely digital, i.e., neither the charge nor the ToT is stored for a given strip, just a bit, X = 0 or 1, to signal a hit (1) or the absence of a hit (0). Hence, the hit from the current BC as well as that of the two adjacent bunch crossings (i.e. BC-1 and BC+1) are read out. Several data compression modes have been used since the first LHC collisions; they are defined by the hit pattern of the three time bins: • Any-hit mode (1XX, X1X or XX1); channels with a signal above threshold in either the current, previous, or next bunch crossing are read out. • Level mode (X1X); only channels with a signal above threshold in the current bunch crossing are read out. • Edge mode (01X); only channels with a signal above threshold in the current bunch crossing and explicitly no hit in the preceding bunch crossing are read out.
The data can be compressed further by storing, for adjacent strips with hits above threshold, only the address of the first strip and the number of these adjacent strips. When this compression is invoked, the information about which of the three bunch crossings observed a hit for a given strip is lost. When the LHC is running with 25 ns bunch spacing, SCT RDOs are required to satisfy the 01X hit pattern to be considered during event reconstruction to suppress pile-up from the previous crossings.

Transition radiation tracker (TRT)
When a particle crosses one of the tubes in the TRT, the electrons drift to the anode wire, producing an electrical signal. If the charge of that signal exceeds a low discriminator threshold, a corresponding hit is recorded, in eight time slices of 3.125 ns each. The drift time is calculated based on the time of the first hit, which is subsequently converted to distance to give a drift-circle radius. In addition, to provide information for electron identification, a record is kept of whether a high discriminator threshold is exceeded in any of the eight time slices. This information is stored for the previous, current, and subsequent bunch crossings (i.e., BC-1, BC, BC+1).

Overlay procedure
The quantities which are overlaid for the inner detector are the RDOs. Due to the high number of channels in the inner detector, zero suppression 5 is employed to reduce the amount of data read out and stored from the detector. Since for the ID, the RDOs do not contain the full information of the HITS created by simulation, the overlay of RDO information is less accurate than the overlay of the underlying HITS information. However, the impact on physics observables is generally found to be negligible as is described in the following; where a difference is observed, a parameterised correction is derived as is described below.

Pixel detector
The pixel detector has in excess of 90 M readout channels and a very high granularity. The single-pixel occupancy is below 2.5 × 10 −5 per unit μ in all layers [24], so even at μ ∼ 100, it is below 0.25%. Therefore, the chance that a single pixel which contains a signal due to a charged particle from the hard-scatter event also contains one from the overlapping in-time pile-up events is < 0.25%. A pixel RDO contains the channel identifier and a 32-bit packed word containing the ToT, a bunch-crossing identifier, and information related to the L1 trigger not relevant in simulation. In the presampled pile-up, if an RDO of a given channel contains a hit above threshold from either the hard-scatter event or the pile-up event, but not both, the corresponding RDO is kept and written out. In the 0.25% of cases where it contains a hit above threshold in both the hard-scatter event and the pile-up event, only the hard-scatter RDO is kept to retain the ToT (and thus, for example, the energy deposited per path length dE/dx) from the signal process. This causes a small loss of information as in principle the ToT would be modified by the presence of the additional charge deposited in that pixel from the pile-up events. However, as it only affects a small fraction of cases, it has a negligible impact on the overall physics performance. In addition, there could be a loss of information if, for a given pixel, both the hard-scatter event and the pile-up event produce charge deposits which are below the readout threshold but whose sum is above the threshold. In this case, the presampled pile-up method will register no hit, while the standard method will register a hit above threshold. This effect could reduce the cluster size and the ToT. But again, only a very small fraction of pixels are affected, so both the cluster size and the ToT agree well between the two methods.

SCT detector
The SCT is a strip detector with 6.3 M readout channels and an occupancy in high pile-up conditions of O(1%); consequently, the pile-up modelling is more critical than for the pixel detector. To facilitate accurate modelling, it is important that presampled RDOs be stored in any-hit mode, without further compression, to ensure that the impact of out-oftime pile-up is modelled correctly. To combine hard-scatter and pile-up RDOs, all of the strips that are hit on a module are unpacked from the respective RDOs and repacked into RDOs using the desired compression mode. Loss of information only occurs if hits in both the hard-scatter event and the pile-up event are below threshold, but the sum of the two charges is above threshold. In this case, in the standard digitisation a hit would be present, while with the presampled pile-up procedure, it is not, causing the presampled pile-up procedure potentially to result in fewer SCT hits per track. The impact is, however, negligible as is shown below.

TRT detector
The TRT is a straw tube detector with 320 k readout channels, and in high pile-up conditions, the occupancy of the TRT exceeds 10%. Therefore, pile-up has a major impact on the TRT signals. If the channel identifiers in the hard-scatter and pile-up events are the same, the data word stored is set to a bitwise logical OR of the corresponding raw words. This results in some loss of information as the sum of the charge signals will be larger, and thus more easily pass a given threshold, than would be just the sum of the digitised signals. This particularly impacts the fraction of hits that pass the high discriminator threshold.
A correction for this effect is applied to improve the level of agreement between the presampled pile-up and the standard digitisation. For this correction, a high-threshold (HT) bit is activated according to a randomised procedure, tuned to describe the standard digitisation. The rate of randomly activating a high-threshold bit is parameterised as a linear function of the occupancy of the TRT in the simulated pile-up events (a proxy for the average energy deposited in the pileup events) and whether the charged particle that is traversing the straw from the hard-scatter event is an electron or not. A different correction is applied for electrons as they produce significant amounts of transition radiation in the momentum range relevant for physics analysis (5-140 GeV), while all other particles do not. The correction corresponds to approximately a 10% (5%) increase in the number of HT hits for electrons (non-electrons) at the average Run 2 μ value.

Validation results
To validate the presampled pile-up digitisation for each of the subdetectors, the properties of tracks in simulated tt events, where at least one W boson from the top quarks decays leptonically, are compared between the presampled pile-up method and the standard digitisation. The tt events are cho-sen, because they represent a busy detector environment and contain tracks from a wide range of physics objects.
The primary track reconstruction is performed using an iterative track-finding procedure seeded from combinations of silicon detector measurements. The track candidates must have a transverse momentum p T > 500 MeV and |η| < 2.5 and meet the following criteria: a minimum of seven pixel and SCT clusters, a maximum of either one pixel or two SCT clusters shared among more than one track, and no more than two holes 6 in the SCT and pixel detectors combined. The tracks formed from the silicon detector measurements are then extended into the TRT detector. Full details, including a description of the TRT track extensions, can be found in Refs. [25,26]. Figure 7 shows the number of pixel clusters associated with a muon track as a function of μ, and the unbiased residual in the local x coordinate, which corresponds to the direction with the highest measurement precision. The unbiased residual is the distance of the cluster from the track trajectory (not including the cluster itself) at the point where that trajectory crosses the pixel sensor. Figure 8 shows the corresponding quantities for the SCT. In all cases, the presampled pile-up and standard digitisation are shown, and good agreement is observed between the two methods. Figure 9 shows a comparison of the number of highthreshold TRT drift circles as a function of μ for muons 7 and electrons. As is explained above, due to the high occupancy of the detector, the number of high-threshold drift circles is particularly sensitive to the presampled pile-up procedure. After the parameterised corrections discussed in Sect. 5.2 are applied, the average numbers of high-threshold drift circles for electrons and muons are each comparable for the two methods.
The resolution of all track parameters was examined for both methods, and they were found to agree well. Figure 10 shows the difference between the reconstructed and true values for the impact parameter of the track relative to the primary vertex (d 0 ), measured in the transverse plane, and the track curvature (q/ p track T ) for muons in tt events. Finally, the track reconstruction efficiency is shown in Fig. 11 as a function of the p T and η of all tracks identified in tt events. The level of agreement between the two methods is better than 0.5%. For the Tile calorimeter [29], each cell is read out by two photomultiplier channels. The maximum height of the analogue pulse in a channel is proportional to the amount of energy deposited by the incident particle in the corresponding cell. The shaped signals are sampled and digitised by 10-bit ADCs at a frequency of 40 MHz. The sampled data are temporarily stored in a pipeline memory until an L1 trigger signal is received. Seven time samples, centred around the pulse peak, are obtained. A gain selector is used to determine which gain information is sent to the back-end electronics for event processing. By default the high-gain signal is used, unless any of the seven time samples saturates the ADC, at which point the low-gain signal is transmitted.

Overlay procedure
The procedure for the LAr calorimeter is described in detail below; a very similar procedure is used for the Tile calorimeter.
In the presampled RDO sample, the pulse shape (ADC data vs time sample) is stored over the time period for which the calorimeter is read out for each calorimeter cell without any zero suppression. Its computation is based on the standard pile-up simulation, described in more detail in Ref.
[30]. It considers the energy deposited in each cell for each bunch crossing over the time window affecting the triggered BC, taking into account the time of each event relative to the trigger time. The resulting pulse shape, expressed in energy versus time, is then converted to ADC counts, applying the energy-to-ADC calibration factor per cell and adding the ADC pedestal. The gain used in the readout electronics for this conversion is selected by emulating the logic applied in the front-end readout electronics. The electronics noise is then added to the presampled RDO, with the proper correlation of the noise between the different samples, with a value that depends on the gain used to digitise the pulse.
In the presampled pile-up step, the pulse shape of the presampled event is converted back into energy and then the energy from the hard-scatter event is added. This is done for each time sample, resulting in a combined pulse shape of the hard-scatter and presampled pile-up events. From this summed pulse shape, the energies in each time sample are then converted back to ADC counts to produce a pulse shape mimicking the output of the front-end electronics. The read-out electronics gain used in this conversion is selected according to the energies of the summed pulse shape. If this gain differs from the ones used in the hard-scatter or presampled samples, the electronics noise is corrected accordingly.
This pulse shape is then processed following exactly the same algorithm as used in the standard pile-up digitisation, applying the optimal filtering coefficients [31] to estimate the energy per cell [30]. For cells with high enough energy, the time and pulse quality factors are also computed.
Since all cells are stored in the presampled RDO sample without any suppression, and the energy response is perfectly linear in the digitisation, the presampled pile-up does not rely on any approximations except for the integer rounding that is applied when storing ADC counts in the presampled sample. In practice, the impact of ADC integer rounding was found to be almost negligible. This rounding effect only applies to the LAr case; Tile ADC data are actually stored as floats in the presampled RDO sample. Figure 12a shows a comparison of the total energy deposited in the EMB calorimeter by dijet events for the presampled pile-up and standard digitisation methods. This distribution is sensitive to electronics and pile-up noise and shows that the simulation of the noise in the two methods is similar. Figure 12b shows the distribution of a calorimeter isolation quantity E cone20 T /E T for simulated single-electron events. This variable is calculated from topological clusters [32] of energy deposits by summing the transverse energies of such clusters within a cone of size R = 0.2 around (but not including) the candidate electron cluster. It is sensitive to pileup energy deposits close to the signal electrons and is again similar for the two methods. Figure 12c shows the invariant mass distribution of electron-positron pairs from simulated Z → e + e − events. This comparison shows that the energy scale and resolution of electrons from signal events agree for the two methods. Figure 13 shows the jet response in tt MC events. The jet p T is calibrated using a multi-stage procedure [33] that accounts for several effects, including pile-up. The pile-up correction is performed at an early stage of the calibration procedure and removes excess energy due to both in-time and out-of-time pile-up. It is therefore sensitive to the details of the pile-up emulation. The shape of the distribution (which is sensitive to noise modelling) and the average response versus η over the full calorimeter acceptance are in good agreement for the two methods. Also shown in Fig. 13 is the distribution of missing transverse momentum E miss T for events in the same tt sample. The soft term component, as reconstructed in the calorimeter, which is particularly sensitive to pile-up [34] is shown as well. Again, good agreement is observed for the two methods.

Muon spectrometer
The MS consists of four subdetectors: two providing highprecision tracking measurements and two primarily providing trigger information. The technologies used in these are different and, as with the ID, they require specific digitisation treatments for the presampled pile-up. The main difference in the case of the MS compared to the ID is that the occupancy is much lower. This means that, while there is the potential for loss of information in the presampled pile-up method if two sub-threshold hits occur in the same detector channel, the probability of this occurring is much lower and the resulting effect is found to be negligible. 7.1 Detector readout and overlay procedure

Monitored drift tubes (MDT)
The MDTs consist of layers of drift tubes which are designed to have a position resolution below 80 µm per tube. If a particle traverses a drift tube, ionisation is created and electrons drift to the anode wire. If the charge at that wire exceeds a set threshold, the charge and the time are recorded, and both are converted to digital information. For the presampled pile-up, the digital signals from the hard-scatter and pile-up events are combined as follows. If a signal in a given tube is only present in either the hard-scatter event or the pile-up event, that signal is copied to the output RDO. If a signal is present in both, then the two signal amplitudes are added, and the timing is taken to be the earlier of the two events.

Cathode strip chambers (CSC)
The CSCs are multiwire proportional chambers with cathode strip readout which, by charge interpolation, provide a spatial resolution of 60 µm in the radial, or bending, plane and 5 mm in the transverse, or φ, plane. By combining the hits of a track crossing all four chambers, a time resolution of 4 ns is achieved, sufficient to identify the bunch crossing. For each wire, the charge information per strip is recorded, and then digitised and stored in four time slices, each of 50 ns. For the presampled pile-up, the charge deposited in each strip in the four time slices is read out for the hard-scatter event and the pile-up event; the two signals are then added separately per time slice and strip, taking care to ensure that the pedestal is subtracted appropriately. The combined RDO resulting from these summed signals is then written out.

Resistive plate chambers (RPC)
The RPC system covers the region |η| < 1.05 and is composed of gaseous parallel-plate detectors. The position resolution is about 1 cm in both the transverse and longitudinal

Thin-gap chambers (TGC)
The TGCs cover the region 1.05 < |η| < 2. 4. They have a typical position resolution of 3-7 mm in the bending direction and 2-6 mm in the transverse direction, and a time resolution The bottom panels show the ratios of the two distributions of 4 ns. The radial coordinate is measured by reading which TGC wire-group is hit; the azimuthal coordinate is measured by reading which radial strip is hit. For each wire, the time at which a signal is above threshold is recorded and digitised and then written in the digit format. As in the RPCs, the hardscatter and pile-up events are combined by taking the earliest arrival time of any hard-scatter or pile-up signal for a given wire.

Validation results
The presampled pile-up procedure is validated using muons from simulated Z → μ + μ − events and comparing their characteristics with those after the standard pile-up digitisation procedure. Figure 14 shows the reconstruction efficiency of muons as a function of p T and η for the two methods. They agree to better than 0.1% for nearly the entire p T and η range. Figure 14c shows the invariant mass of the two muons for the same event sample. Also here, good agreement is observed between the two methods.

Trigger
The L1 trigger receives inputs from the L1 calorimeter (L1Calo) and L1 muon triggers. The L1Calo decision is formed using reduced granularity inputs from the LAr and Tile calorimeters. The L1 muon trigger receives signals from the RPCs in the barrel and from the TGCs in the endcaps as is described in Sect. 7. After the L1 trigger decision, the HLT has access to the data from the full detector to perform a refined analysis. The trigger decisions and all reconstructed objects are stored in a dedicated record of the accepted event.
The L1 hardware trigger is simulated using dedicated algorithms that strive to perform a bit-wise correct emulation of the trigger decision including any trigger objects that the hardware produces. The HLT runs on the output of the L1 trigger using the same simulation software as used for data. The following sections discuss the L1 calorimeter trigger and the overall HLT performance. No dedicated changes were required to the muon trigger simulation beyond what is discussed for the general simulation in Sect. 7. While the HLT software itself remains unchanged between the two methods, it depends on the inputs from the various subdetectors that do differ and hence serves as an additional validation. In the simulation, the analogue signals received from the calorimeters are represented by objects containing a vector of floating-point values, corresponding to the amplitudes of the pulses sampled at 25 ns intervals. These are then quantised, with the addition of noise from the digitisation system, and passed through a precise simulation of the signal processing performed by the trigger electronics. The calorimeter objects are formed from calorimeter hits, using a model of the pulse shaping and the noise from the readout and summation chain.
For presampled pile-up, the analogue calorimeter objects are merged before the trigger digitisation and processing are performed. This then allows the unmodified trigger simulation to be performed on the merged data, and it avoids any possible bias due to merging data that have been quantised on a relatively coarse scale. The merging is performed by an additional algorithm, which is run during the pile-up merging prior to the trigger simulation to create a set of merged calorimeter towers. The merging itself uses the calorimeter object identifiers to match corresponding towers in the hardscatter and pile-up event collections, and the amplitudes of the signals of the same towers in both events are summed. A new collection of objects containing the summed amplitudes is then created and written to the output stream. Figure 15 shows the L1Calo E T distributions in isolation regions around electrons in Z → e + e − events, which are sensitive to the pile-up E T deposits close to the electrons. Good agreement is seen between the standard and presampled pile-up simulation chains.

HLT simulation and performance
After being accepted by the L1 trigger, the events are processed by the HLT using finer granularity calorimeter information, precision measurements from the muon system, and tracking information from the inner detector. As needed, the HLT reconstruction can be executed either for the full event or within smaller, isolated regions of interest (RoIs) identified by the L1 trigger. To reduce the processing time, most HLT triggers use a two-stage approach with a fast (trigger-specific) first-pass reconstruction to reject the majority of events and a slower, higher precision (offline-like) reconstruction for the remaining events.
The reconstruction of electron (muon) candidates requires the matching of a calorimeter cluster (muon spectrometer track) to a track in the inner detector and is therefore sensitive to changes in the inner detector, calorimeter, and muon spectrometer reconstruction. Figure 16 shows the trigger efficiency of the primary 28 GeV electron trigger measured with simulated Z → e + e − events for the standard and presampled pile-up simulation chains. Similarly, Fig. 17 shows the trigger efficiency of the primary 26 GeV muon trigger measured with simulated Z → μ + μ − events. No significant differences are observed in the trigger efficiency between the presampled and standard pile-up simulation chains.
Jet and E miss T triggers are mainly based on the calorimeter reconstruction and are especially sensitive to changes in the simulation of lowp T jets. Figure 18 shows the p T distribution of the leading jet and the trigger efficiency as a function of the sixth leading jet p T for a multi-jet trigger requiring six jets with a p T larger than 45 GeV. Good agree-ment between the standard and presampled pile-up simulation chains is observed in both cases.
All other triggers relevant to the ATLAS physics programme were also studied and no notable differences between the two methods were observed.

Conclusions
A new method for reproducing the impact of pile-up interactions on the ATLAS detector performance is presented, based on overlaying presampled pile-up events on the hard-scatter event of interest during the digitisation. The method is validated separately for each ATLAS detector system and the trigger. In all cases, it is possible to achieve good agreement with the standard pile-up simulation chain which has been used up to now. For a large variety of quantities, detailed comparisons are made between the two methods, and all the differences are found to be small, so that the impact on physics analyses is considered negligible.
The presampled pile-up method is shown to use significantly less computing resources than the standard method used so far within ATLAS. For the Run 2 pile-up distribution and software, the CPU resources required for the entire MC simulation chain are reduced by around 20%.
Acknowledgements We thank CERN for the very successful operation of the LHC, as well as the support staff from our institutions without whom ATLAS could not be operated efficiently. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.