Introduction

The physics programme of the ATLAS experiment [1] at the Large Hadron Collider [2] (LHC) relies on the accurate simulation of billions of Monte Carlo (MC) events to complement the data delivered by the LHC. Simulated events are prepared by generating the appropriate physics process, simulating the passage of particles through the detectors, digitizing the detector response, and then reconstructing those events using the same reconstruction algorithms that are applied to the recorded data. The simulation of the passage of particles through the detectors for analyses of data taken during LHC Run 2 requires approximately 40% of the computing resources of the ATLAS experiment [3, 4]. Run 2 was the second data-taking run of the LHC and lasted from 2015 to 2018.

The complex accordion geometry of the ATLAS electromagnetic calorimeter (see Sect. 2.2) makes the simulation of shower development particularly CPU intensive when using the Geant4 [5] toolkit (G4). In fact, 80% of the total simulation time for a typical sample of the production of top and anti-top quark pair (\(t\bar{t}\)) is devoted to modelling this shower development [6]. Therefore, fast approaches for calorimeter simulation are crucial in reducing the CPU needs of the ATLAS experiment and to enable the production of the required numbers of simulated events for precision physics analyses.

The ATLAS Collaboration has already developed and deployed a fast simulation tool, called AtlFastII [6] (AF2), that performs the simulation of the entire ATLAS detector by combining different tools to simulate different sub-detectors and particles. In particular, AtlFastII relies on a fast simulation of the calorimeters called FastCaloSim [7]. AtlFastII has been used to produce approximately 32 billion of the 52 billion events simulated for physics analyses of the Run 2 data. AtlFastII is known to have certain limitations, particularly in the modelling of jets of particles reconstructed with large-radius clustering algorithms and the detailed description of their substructure. In this paper, the ATLAS Collaboration introduces a new fast simulation tool, AtlFast3 or AF3, which has the same CPU performance as AtlFastII, but better accuracy in reproducing Geant4. The ATLAS Collaboration is using AtlFast3 for a large resimulation campaign of Run 2 MC events and plans to use AtlFast3 extensively for Run 3 and beyond.

AtlFastII and AtlFast3 perform the simulation of the entire ATLAS detector by combining a number of components. Key components of AtlFast3 are two parametric calorimeter simulations: the new version of FastCaloSim, referred to as FastCaloSim V2, and FastCaloGAN. Parametric simulations of the calorimeter response simulate the energy of a particle shower as a single step based on an underlying parametrization instead of simulating how every particle propagates and interacts inside the calorimeter volume. This makes the CPU performance of these tools essentially independent of the particle energy and scales linearly with the number of particles entering the calorimeter volume.

AtlFast3 is designed to simulate particle showers to a level of precision, such that no sizeable differences from the Geant4 simulation can be resolved by the reconstruction algorithms, including those for electron, photon, and \(\tau \)-lepton reconstruction and identification, and jet reconstruction and clustering.

The fast simulation of showers in the calorimeter can be factorized into several components: the total shower energy, the energy sharing between calorimeter layers, the average lateral shower development within a layer, the uncorrelated energy fluctuations in individual showers compared to average showers, and, for hadronic showers, the correlated fluctuations between the longitudinal and lateral energy distributions. The energy that is deposited in the calorimeter depends on the kinetic energy (\(E_{\text {kin}}\)) of the particle and is the energy used for parameterization unless otherwise specified. The \(E_{\text {kin}}\) is defined as the particle energy minus its mass. For antiprotons and antineutrons, the rest mass is added instead of being subtracted as their annihilation will result in additional energy deposited in the calorimeter.

The simulation of the total shower energy and its longitudinal distribution between layers, including correlations, provides an approximate simulation of jets, electrons, photons, and \(\tau \)-leptons, albeit with overestimated reconstruction and identification efficiencies. The simulation of the average lateral energy spread plays an important role in the reconstruction and identification of objects for physics analysis. For speed, a simplified geometry of the calorimeter cells (see Sect. 2) is used in AtlFast3, where each cell belongs to a longitudinal sampling layer of the calorimeter and is either a cuboid in \(\eta \),Footnote 1\(\phi \), and r for the layers in the detector barrel, or a cuboid in \(\eta \), \(\phi \), and z for the layers in the detector endcaps up to \(|\eta |<3.2\), or a cuboid in x, y, and z for the forward calorimeter layers. This means that the accordion structure of the real ATLAS liquid-argon electromagnetic calorimeter must be emulated. The improved average shower energy distribution and a correction for the accordion structure in AtlFast3 reproduces the reconstruction and identification efficiencies of the Geant4 simulation, especially for electrons and photons.

Fluctuations also play an important role in the calorimeter simulation. The simulation of independent and uncorrelated energy fluctuations in calorimeter cells in individual showers relative to average showers is required for an accurate description of electrons and photons of all energies. Correct modelling of the fluctuations is also crucial for hadrons, where at low energy, the fluctuations in the lateral energy distribution are dominated by sampling fluctuations, noise, and additional proton–proton interactions. For medium-energy hadrons, correlated fluctuations between the longitudinal and lateral energy distributions in hadronic showers play an important role and are accurately simulated by FastCaloGAN (see Sect. 5).

AtlFast3 combines the strengths of the FastCaloSim V2 and FastCaloGAN calorimeter simulation approaches. The updated calorimeter parameterization in FastCaloSim V2 is used to simulate electromagnetic showers and hadronic showers with low and high energies, while Generative Adversarial Networks (GANs) [8] trained for FastCaloGAN [9] are introduced for medium-energy hadrons because of their ability to model correlated fluctuations. The two tools are combined to optimize the performance of the reconstruction and a smooth interpolation between them is performed. Both calorimeter simulation approaches are derived using single particles simulated with Geant4 to model the shower development in the ATLAS electromagnetic and hadronic calorimeters. As the calorimeters do not necessarily contain the full showers, the rate at which secondary particles punch through the muon spectrometer is parameterized and those particles are simulated with Geant4. The parameterization of muon punch-through is a new feature in AtlFast3. Geant4 is also used to simulate particles in the inner detector and hadrons with energies below a few hundred \(\text {MeV}\) in the calorimeters.

Section 2 introduces the ATLAS detector and provides details about the detectors used in AtlFast3. Datasets and samples are presented in Sect. 3. Sections 4 and  5 discuss the FastCaloSim V2 and FastCaloGAN parameterizations, respectively. The simulation of muon punch-through is discussed in Sect. 6. Section 7 discusses how AtlFast3 is constructed by combining FastCaloSim V2 and FastCaloGAN. The performance of AtlFast3 in the reconstruction and in physics analysis is discussed in Sect. 8. Section 9 concludes the paper.

The ATLAS Detector

The ATLAS detector [1, 10] at the LHC covers nearly the entire solid angle around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets. An extensive software suite [11] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Inner Detector

The inner detector is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range \(|\eta | < 2.5\). The high-granularity silicon pixel detector is closest to the collision point and typically provides four measurements per track, the first hit normally being in the innermost layer. It is followed by the silicon microstrip tracker, which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to \(|\eta | = 2.0\). The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.

Calorimeters

The calorimeter system covers the pseudorapidity range \(|\eta | < 4.9\) and exploits several technologies to measure the energy deposited by different types of particles. In Table 1, the different calorimeter modules, different layers in the radial direction, the acronyms used to refer to each layer, the coverage in \(\eta \) by each layer, and the transition regions with gaps between different layers are summarized. In the transition regions of the calorimeter, the detector response deteriorates relative to the rest of the acceptance.

Table 1 The different calorimeter modules, their number of layers in the radial direction, and the acronyms used to refer to each layer and their coverage in \(\eta \). The first layers of the electromagnetic calorimeter are presampler layers and have finer granularity than the subsequent layers

Electromagnetic showers are measured by high-granularity lead/liquid-argon (LAr) calorimeters. The electromagnetic barrel (EMB) and electromagnetic endcap calorimeters (EMEC) provide coverage within the region \(|\eta |< 3.2\). There are three EMEC sampling layers in the precision-measurement region (\(1.5< |\eta | < 2.5\)) and two layers in the higher-\(|\eta |\) region (\(2.5< |\eta | < 3.2\)). An additional thin LAr Presampler covering \(|\eta | < 1.8\) corrects for energy loss in material upstream of the calorimeters. The electromagnetic calorimeters have an accordion-shape geometry, which provides several active layers in a compact design without any gaps. An illustration of this structure and of the segmentation in each layer is shown in Fig. 1. Each layer consists of a number of cells, and groups of cells are referred to as towers, which are used by the trigger. The thickness of the calorimeter is given in units of radiation length, \(X_0\).

Hadronic showers are measured in a steel/scintillator-tile calorimeter segmented into three barrel structures within \(|\eta | < 1.7\) (one TileBar and two TileExt), and two copper/LAr hadron endcap calorimeters (HEC) for \(1.5< |\eta | < 3.2\). The granularity of HEC3 decreases for \(|\eta | > 2.5\). The transition region between the barrel and the endcap has additional detectors, the Tile Gap layers.

Coverage at higher \(|\eta |\) (\(3.1< |\eta | < 4.9\)) is provided by the forward calorimeter (FCal). Two different technologies, copper/LAr and tungsten/LAr, are used in the FCal, which provides both for electromagnetic and hadronic energy measurements.

The energy deposited in the calorimeters is read out from cells which define the granularity of the detector. The granularity varies significantly depending on the layer and can vary with \(\eta \) and \(\phi \).

Fig. 1
figure 1

Granularity of the electromagnetic barrel LAr calorimeters [12]. The accordion geometry is indicated with blue and orange lines on the side of the tower

Muon Spectrometer

The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers, and measures the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector. A set of precision chambers covers the region \(|\eta | < 2.7\) with three layers of monitored drift tubes (MDTs), complemented by cathode-strip chambers (CSCs) in the forward region, where the background is highest. In the barrel region (\(|\eta | \le 1.05\)), the MDT chambers are located in and around eight coils of superconducting toroid magnets. In the endcap (\(1.05 \le |\eta | \le 2.7\)) sections of the MS, the MDTs are located both in front of and behind the endcap toroid magnets. The innermost detector in the endcap region is instrumented with CSCs instead of MDTs to withstand higher rate and background conditions.

Secondary particles, created in showers in the calorimeters, leaking into the MS can have a significant impact on muon reconstruction and depend on the calorimeter simulation. This effect is called muon punch-through and the technique used to simulate it is described in Sect. 6.

Dataset Simulation and Preprocessing

This section discusses the simulation of the datasets used to derive the calorimeter parameterizations and the datasets used to validate the performance of AtlFast3.

Simulation of Reference Samples with Geant4

The reference samples used to generate the calorimeter parameterizations discussed in this paper were produced using Geant4 version 10.1.3 [5], which was released in 2016 and use the FTFP_BERT_ATL physics list [13]. FTFP_BERT_ATL uses the Bertini intra-nuclear cascade model below 9 \(\text {GeV}\) and transitions to the Fritiof model [14,15,16] with a pre-compound model for 12 \(\text {GeV}\) and higher. The default Geant4 electromagnetic physics list is used. The standard configuration for ATLAS simulation in physics analyses, referred to as the MC16 campaign, uses a fast simulation technique known as Frozen Showers [6] to simulate electromagnetic showers in the FCal. This latter configuration has been used for full simulation samples for all papers on Run 2 data published by ATLAS to date. The reference samples do not use the Frozen Shower technique, but it is used for the validation samples discussed in Sect. 3.3.

The reference datasets consist of single-particle events produced on a cylinder with \(r=1148\) mm and \(|z|<3550\) mm located just outside the TRT, because only particles reaching the calorimeters have to be parameterized. This means that the impact of the cryostat and solenoid material is included in the reference datasets. The particles are produced with directions of propagation consistent with production at the interaction point and simulated without the spread of the LHC beam bunches, for positive and negative values of \(\eta \) and with a uniform distribution in \(\phi \). The impact of the spread of the beam bunches is negligible, because particles are parameterized according to where they enter the calorimeter. Photons (\(\gamma \)) and electrons \((e^{\pm })\) are used to parameterize electromagnetic showers, and positively and negatively charged pions (\(\pi ^{\pm }\)) are used to parameterize hadronic showers. The positive and negative charge states of electrons and pions are combined, since the difference in shower development due to different charge is negligible. Charged pions are used to model the simulation of hadrons, because the dependence of the hadronic showers on particle type is very small. The Geant4 simulation is run in a special configuration with simulation steps which are smaller than those in the default configuration in the calorimeter, so that details of the spatial position of the deposited energy or hits are saved. These detailed hits are used to parameterize lateral shower shapes in FastCaloSim V2, discussed in Sect. 4.2, and for the training of FastCaloGAN. The sum of the energies of these detailed hits is lower than that of the standard Geant4 hits due to a small bug in Geant4. This introduces a small bias in the energy distributions for FastCaloGAN and also for FastCaloSim V2 standard hits in the derivation of the longitudinal energy parameterization of FastCaloSim V2. The bias is non-negligible for photons and electrons, and this is corrected for in FastCaloSim V2. After the correction, no significant impact on the physics performance of AtlFast3 is observed. Deposited energies in simulation are digitized without considering electronic noise and cross-talk between calorimeter cells in the readout electronics.

The calorimeter parameterization is obtained for 100 uniform \(\eta \) slices to provide coverage up to \(|\eta |=5\). This range slightly exceeds the calorimeter \(\eta \) range to include particles depositing only a fraction of their energy in the calorimeters. In each slice, up to 19 single-particle samples were produced, starting from a minimum momentum of 16 \(\text {MeV}\) and up to 4.2 \(\text {TeV}\) with the momentum doubling for each sample. The FastCaloGAN and FastCaloSim V2 parameterizations for pions are derived from the samples with energies of 256 \(\text {MeV}\) and higher as discussed in Sect. 7.2.2. For photons, the samples with energies of 64 \(\text {MeV}\) and higher are used for FastCaloSim V2 and those with energies of 128 \(\text {MeV}\) and higher are used for FastCaloGAN. All samples are used to derive the energy interpolation splines discussed in Sect. 7.3. Ten thousand events were produced for each of the samples with energies up to 256 \(\text {GeV}\). With increasing energies above 256 \(\text {GeV}\), the number of events was progressively reduced, reaching 1000 events for the highest-energy sample, due to the significant increase in the time required to simulate the events in Geant4.

Single-(anti)proton, single-(anti)neutron, and single-(anti)kaon samples were produced in the same \(\eta \) and momentum slices to derive the corrections described in Sect. 7.4.3. These corrections are sufficient to provide good performance for these stable hadrons, thus avoiding a dedicated parameterization and thereby reducing the memory footprint of AtlFast3.

Voxelization

The spatial energy deposits in each layer in the Geant4 datasets are grouped into volumes called ‘voxels’ for parameterization of FastCaloSim V2 and FastCaloGAN. Only layers with a significant amount of energy, referred to as ‘relevant layers’, are considered in the parameterization. Relevant layers are defined using criteria on the fraction of energy deposited in the layers with respect to the total energy deposited in the calorimeters. The criteria used by the two simulators to determine the relevant layers are slightly different:

  • In FastCaloSim V2, only layers with energy fractions larger than 0.1% are used; this procedure is performed for each sample independently.

  • In FastCaloGAN, all samples in the same \(\eta \) slice are processed with the same number of voxels. The relevant layers are determined using only the 1 \(\text {TeV}\) energy point and have an energy fraction larger than 0.1%. In addition, a layer with less energy is considered relevant if it is in front of a relevant layer. For example, PreSamplerB is always included in FastCaloGAN in the barrel region of the detector even if the energy deposited there is below the threshold.

The coordinates of hits in relevant layers, \(\eta ^{\text {hit}}\) and \(\phi ^{\text {hit}}\), are calculated relative to the extrapolated position of the particle in that layer, \(\eta ^{\text {extr}}\) and \(\phi ^{\text {extr}}\). The extrapolation is calculated from the momentum of the particle at the point where it enters the calorimeter and propagated through the calorimeter, taking into account the magnetic field for charged particles. Equation (1) shows the relative angular coordinates, \(\Delta \phi \) and \(\Delta \eta \), of the showers. The coordinates of the calorimeter cell associated with the hit, \(z_{\text {cell}}\) and \(r_{\text {cell}}\), are used to transform the relative hit coordinates into millimeters, \(\Delta \phi ^{\mathrm{mm}}\) and \(\Delta \eta ^{\mathrm{mm}}\), which provides a more convenient description of the shower

$$\begin{aligned} \Delta \eta= & {} \eta ^{\text {hit}} - \eta ^{\text {extr}}, \nonumber \\ \Delta \phi= & {} \phi ^{\text {hit}} - \phi ^{\text {extr}}, \nonumber \\ \Delta \eta ^{\mathrm{mm}}= & {} \Delta \eta \times \eta _{\text {Jacobi}} \times \sqrt{r^2_{\text {cell}} + z^2_{\text {cell}}}, \nonumber \\ \Delta \phi ^{\mathrm{mm}}= & {} \Delta \phi \times r_{\text {cell}}, \end{aligned}$$
(1)

where \(\eta _{\text {Jacobi}} = | 2 \times \exp ({-\eta _{\text {extr.}}}) / \left( 1+\exp (-2\eta _{\text {extr.}})\right) |\).

$$\begin{aligned} \Delta R^{{}\mathrm{mm}}= & {} \sqrt{\left( \Delta \eta ^{{}\mathrm{mm}}\right) ^2 + \left( \Delta \phi ^{{}\mathrm{mm}}\right) ^2}, \nonumber \\ \alpha= & {} \text {arctan2}\left( \Delta \phi ^{{}\mathrm{mm}} , \Delta \eta ^{{}\mathrm{mm}}\right) . \end{aligned}$$
(2)

The hit positions are then transformed to polar coordinates defined in Eq. (2) and grouped into voxels of different size:

  • In FastCaloSim V2, the shower symmetry along \(\phi \) with respect to the centre of the shower in each layer is exploited. A binning of 1 mm in the radial direction is used in the high-granularity EMB1 and EME1 layers, while 5 mm is used in the other layers. Along the angular direction, \(\alpha \), eight uniform bins are used in all relevant layers. These settings are used for all particles. The size of each voxel is much smaller than the calorimeter cell dimensions.

  • In FastCaloGAN, the size of the voxel is optimized for each particle type and detector \(\eta \) slice. In the radial direction, a variable bin width is used with increasingly wider bins. An example is shown in Table 2 for pions in the barrel. Only layers with a large fraction of the total energy are binned along the angular direction. The angular positions of the showers in the other layers are neglected and simulated uniformly. Ten bins of equal size are used for layers binned in the angular direction. Due to the variable-width bins in the radial direction, the size of the voxels can be significantly larger than the cell dimensions. This voxel definition is optimized for an accurate training of the GANs, since using as many bins as FastCaloSim V2 would significantly increase the training time and instability of the GANs, ultimately reducing the performance of FastCaloGAN.

Table 2 The binning used for the voxelization of pions in the different calorimeter layers for FastCaloGAN in the \(0< |\eta | < 0.8\) range. Each ellipsis indicates when the same binning continues until the subsequent listed number
Table 3 Summary of the Monte Carlo generator settings for the simulation of samples for validation of AtlFast3. See text for details

Validation Datasets

A range of different Monte Carlo samples commonly used for physics performance studies and physics analysis are used to validate the performance of AtlFast3. Table 3 summarizes the key samples, which are also discussed in this section. The matrix element (ME) order describes the precision at which the process is produced by the generator in perturbative quantum chromodynamics (QCD); this can be leading order (LO), next-to-leading order (NLO), or next-to-next-to-leading order (NNLO).

The production of \(t\bar{t}\) events was performed with Powheg Box r2330 [17] interfaced with Pythia 6.427 [18] for the parton shower and hadronization modelling with the CT10 [19] set of parton distribution functions (PDFs) and the Perugia2012 set of tuned parameters (P2012 tune) [20]. At least one of the top quarks is required to produce a lepton when decaying. This sample can be used to validate small-radius jets, leptons, and b-jets.

Events containing Z bosons decaying into a pair of electrons, muons or \(\tau \)-leptons were generated with Powheg Box r2856 [21] at \(\text {NLO}\) in QCD using the CTEQ6L1 [22] PDF set. The events were interfaced with Pythia 8.186 [23] for the parton shower and hadronization modelling using the AZNLO tune [24]. The samples were generated with \(p_{\text {T}} (Z) = 0\). These samples are used to validate electrons, muons, and \(\tau \)-leptons.

Events containing a new hypothetical spin-1 boson, \(W^\prime \), decaying into a WZ pair, which subsequently decay into hadrons, were generated using Pythia 8.235 with the NNPDF23LO [25] PDF set. The \(W^\prime \) bosons were generated with a mass of 13 \(\text {TeV}\), and the differential cross-section is reweighted to have a flat distribution of jet \(p_{\text {T}}\) from \(200~\text {GeV}\) to \(3~\text {TeV}{}\). Similarly, a sample of \(Z^\prime \) bosons with a mass of 4 \(\text {TeV}\) was generated using Pythia 8.235, and the \(Z^\prime \) boson was subsequently decayed into a top and anti-top quark pair. The top quarks were forced to decay into hadrons in the samples, which allows the substructure of jets to be validated. Similar to the \(W^\prime \) sample, the differential cross-section is reweighted to have a flat distribution of jet \(p_{\text {T}}\) from \(200~\text {GeV}\) to \(3~\text {TeV}{}\) to better populate kinematic regions with higher jet \(p_{\text {T}}\). These samples are used to validate the substructures of various jets with very high transverse momentum.

Samples of multijet events were simulated with the Pythia 8.186 general-purpose event generator interfaced to EvtGen 1.2.0 [26] for decay of heavy-flavour mesons. The NNPDF23 PDF set [27] and the A14 tune [28] were used. The rapidly falling spectrum of leading-jet momenta requires this simulation to be filtered in leading-jet \(p_{\text {T}}\) . These samples are used to validate jets in a range of \(p_{\text {T}}\) regimes.

Higgs boson production via gluon–gluon fusion (ggF) was simulated at \(\text {N}\) \(\text {NLO}\) accuracy in the strong coupling constant \(\alpha _{\text {s}}\) using Powheg NNLOPS [29,30,31,32,33], which achieves NNLO accuracy for arbitrary inclusive \(gg\rightarrow H\) observables by reweighting the Higgs boson rapidity spectrum of MJ-MiNLO [34,35,36] to that of HNNLO [37]. The PDF4LHC15 PDF set [38] and the AZNLO tune of Pythia 8 were used. This simulation was interfaced with Pythia 8.230 for parton shower and non-perturbative hadronization effects. The Higgs boson was decayed into a pair of photons.

The impact of pileup in the same and neighbouring proton-bunch crossings was modelled by combining detector signals from simulated inelastic \(pp\) events with the hard-scattering (HS) event [39]. These pileup events were generated with Pythia 8.186 [23] using the NNPDF23 set of PDFs and the A3 tune [40]. The pileup events were simulated using Geant4 even for samples produced with AtlFastII or AtlFast3, and the same pileup events were reused for all samples. An average number of pileup interactions per \(pp\) bunch crossing of 38 with a standard deviation of 12 were used, similar to the pileup distribution recorded by the ATLAS experiment during Run 2.

The validation samples are reconstructed using the standard algorithms for the ATLAS experiment [41,42,43,44,45,46]. The energy scale and resolution of reconstructed leptons and jets, as well as their reconstruction and identification efficiencies in the simulation are corrected to match those measured in data using the standard procedures of the ATLAS experiment [47]. Unless stated otherwise, the same reconstruction code and corrections are applied to samples simulated with Geant4, AtlFastII and AtlFast3.

Fig. 2
figure 2

Example of the steps in the first PCA transformation for 65 \(\text {GeV}\) photons with \(0.2< |\eta | < 0.25\): a shows the distribution of energy fractions in EMB1, b the Gaussian distribution, and c is the leading principal component of the first PCA with bin borders (dotted pink lines) showing five PCA bins. The steps of the second PCA are identical to those of the first PCA but performed in each PCA bin separately to generate uncorrelated Gaussian distributions using all principal components of the second PCA. The errors bars indicate the size of the statistical uncertainty

Calorimeter Simulation with FastCaloSim V2

FastCaloSim V2 parameterizes the longitudinal and lateral development of showers in the calorimeter. During AtlFast3 simulation, energy is directly deposited in calorimeter cells using the parameterized responses. The longitudinal parameterization along with a correction to the energy resolution is discussed in Sect. 4.1. Parameterization of the average lateral shower distribution is discussed in Sect. 4.2. Finally, the simulation of hits using longitudinal and average lateral shower parameterization is described in Sect. 4.3.

Longitudinal Shower Development

As particles shower in the calorimeter, they deposit energy in the various layers. The amount of energy deposited in each layer depends on how deep in the calorimeter the shower was initiated. The amount of energy deposited is highly correlated between layers, making it difficult to independently parameterize the response for each layer.

Principal Component Analysis (PCA) [48] is used to classify showers from the samples introduced in Sect. 3 for each slice of energy, \(\eta \) bin, and particle type. The PCA transformation is performed twice. The initial PCA, referred to as the ‘first PCA ’, is used to classify showers into bins referred to as ‘PCA bins’. A second PCA transformation, referred to as the ‘second PCA ’, is performed in each bin of the first PCA to generate uncorrelated and approximately Gaussian distributions. These Gaussian distributions from each PCA bin are used in the FastCaloSim V2 simulation. The steps of this PCA chain are discussed in detail below.

The distribution of the fraction of energy deposited (see Fig. 2a) in each calorimeter layer and the total energy deposited (summed over all layers) are used to classify the showers. Only the relevant layers as defined in Sect. 3.2 are considered. The energy fraction in each layer is integrated and transformed into a Gaussian distribution using a cumulative distribution function transformation (see Fig. 2b). These Gaussian distributions from each layer and each event are used to construct a PCA matrix to perform the first PCA.

The first PCA converts the set of correlated energies into a set of linearly uncorrelated quantities by an orthogonal transformation of the coordinate system. The transformation is calculated using the covariance and the eigenvectors of the PCA matrix. The principal components with highest and second-highest variance are referred to as the leading and sub-leading principal components of the first PCA. Figure 2c shows the leading principal component of the first PCA. To classify the shower, the leading and, in some cases, the sub-leading principal component of the first PCA is binned in equally populated PCA bins (covering equal ranges of cumulative PCA bin probability). A bin with zero deposited energy is included, as this improves the modelling of low-energy particles. Typically, five PCA bins in the leading principal component of the first PCA are used (see Fig. 2c.) However, within the transition regions of the calorimeter layers, given in Table 1, the sub-leading principal component is also used to determine PCA bins in two dimensions. The exact number of bins in each region is determined from a \(\chi ^2\) test giving the best modelling of all energy fractions. The first PCA removes non-linear correlations between layers and roughly classifies the showers according to their depth.

The effectiveness of the PCA transformation is demonstrated in Figs. 3 and 4, which show the correlations before and after the first PCA. The correlations between different layers are calculated from the Gaussian inputs of the PCA matrix. After the first PCA, the correlations are calculated using a subset of the principal components and are strongly reduced.

The total energy and the energy fractions in each first-PCA bin are transformed into Gaussian distributions following the same method, and then, the second PCA is performed. The steps of the second PCA are identical to those of the first PCA except that only the events in a given first-PCA bin are used. The second-PCA rotation removes any further correlations in each first-PCA bin to produce uncorrelated Gaussian outputs using all principal components. The mean and RMS of these Gaussian distributions, the PCA matrices, the PCA bin probabilities, and the inverse cumulative distributions are stored and used in the simulation.

Fig. 3
figure 3

Correlations between the transformed energies deposited in several layers, before PCA rotation, showing a Presampler barrel vs EM barrel 1, b Presampler vs EM barrel 2, and c EM barrel 1 vs EM barrel 2. The energies were transformed into Gaussian distributions. The correlation factors obtained from these 2D histograms are displayed

Fig. 4
figure 4

Correlations between the first-PCA components after the PCA rotation. The individual components are approximately Gaussian distributed

During simulation, the steps of the PCA chain are executed in reverse. For each simulated particle, a PCA bin is selected using random numbers distributed according to the PCA bin probabilities. The uncorrelated Gaussian distributions, stored in the parameterization, in the selected PCA bin are used to generate uncorrelated random numbers. These random numbers are rotated using the inverse PCA matrix of the second PCA to generate correlated random numbers. The correlated random numbers are then mapped back to the total energy and the energy fractions deposited in each layer using the error function and the inverse cumulative distributions of the first PCA.

The validation of the longitudinal energy parameterization is shown in Fig. 5 for 65 \(\text {GeV}\) photons with \(0.2<|\eta |<0.25\). Incoming photons with an energy of 65 \(\text {GeV}\) are simulated using the FastCaloSim V2 parameterization and compared with Geant4 simulation. In this case, no digitization or reconstruction algorithms are applied, but the energy deposited in the active regions of the calorimeter has been scaled by the sampling fraction. This simulation without digitization and reconstruction is referred to as ‘stand-alone simulation’.

In general, the energy fractions deposited in each calorimeter layer using the FastCaloSim V2 parameterization are observed to be in good agreement with Geant4. However, for the total energy distribution, a residual difference in the mean and a larger RMS are observed. These small differences can impact the modelling of complex quantities, e.g., the Higgs boson invariant mass distribution reconstructed from two photons. Additional corrections are therefore applied to improve the modelling of both the resolution and the mean, as discussed in Sects. 7.4.1 and 7.4.3.

Fig. 5
figure 5

Validation of the energy, E, parameterization is shown for 65 \(\text {GeV}\) photons with \(0.2< |\eta | < 0.25\), comparing the input Geant4 sample (black triangles) with FastCaloSim V2 (red dots). Good agreement is observed for all layers and the total energy. The errors bars indicate the size of the statistical uncertainty

Average Lateral Shower Shape

The lateral shower shape describes the lateral energy distribution in each calorimeter layer. The parameterization is derived in each relevant layer and for each PCA bin. The shower development is parameterized in voxels using the coordinates defined in Eq. (2). To exclude hits far away from the centre of the shower, only 99.5% of the total energy of each PCA bin cumulatively along \(\Delta R^{{}\mathrm{mm}}\) is considered. The shower shape distribution in each PCA bin (of each layer) is then normalized to the energy in that PCA bin to create the probability density function for the average shower shape. Figure 6 shows the average lateral shower profile corresponding to the electromagnetic and hadronic showers in the second layer of the EM barrel and the Tile barrel, respectively. The memory footprint of these histograms is reduced by storing only the \(|\Delta \phi ^{{}\mathrm{mm}}|\) coordinates for \(0 \le \alpha \le \pi \), because the shower is symmetric in \(\Delta \phi ^{{}\mathrm{mm}}\).

Fig. 6
figure 6

The lateral shower shape parameterization for a photons and b pions with energies of 265 \(\text {GeV}\) in the range \(0.55< |\eta | < 0.60\) and parameterized in the second layer of the EM barrel and Tile barrel, respectively. To visualize the core of the shower, these plots have a cut-off at \(\Delta R^\text {mm} \sim 100~{}\mathrm{mm}\)

Simulation of Hits

A key limitation of AtlFastII is that the lateral shower shape simulation is based on the average shower shape. This model works well for electrons and photons, but cannot reproduce the complex structure of hadronic showers. ATLAS extracts the shower structure of electrons, photons, hadrons, and jets by clustering calorimeter cells using the TopoCluster algorithm [49]. The clustering proceeds by starting from a seed cell with an energy 4\(\sigma \) above the calorimeter noise threshold and adding cells with an energy at least 2\(\sigma \) above the noise and finally adding adjacent cells of any energy. However, using the average shower shape means that the energy distribution and position of the hadronic clusters differ compared to Geant4. Instead of directly using the average shower shape, FastCaloSim V2 uses the average shower shape as a probability distribution function (pdf) to generate hits which are subsequently mapped onto the calorimeter cell structure. For particles entering the calorimeter with a non-zero angle with respect to the calorimeter cell boundaries, the position of each hit is modified to account for the longitudinal position within each layer to improve the simulation of the shower shapes. For each PCA bin, the average value of the longitudinal position distribution from the reference sample is used to correct the lateral position at which the hits are produced. The models used to assign energy to each hit are discussed in the following sections. The energies of hits are normalized, so that their sum exactly matches the simulated energy in a layer as discussed in Sect. 4.1.

Electrons and Photons

The number of generated hits for electrons and photons is calculated from the energy deposited in each calorimeter layer and the intrinsic resolution of the calorimeter technology in that layer. For a given energy E simulated within a calorimeter layer, the resolution is defined as

$$\begin{aligned} \sigma _E / E= & {} a / \sqrt{E/\text {GeV}} \oplus c, \end{aligned}$$
(3)

where a is the stochastic term and c is the constant term. The values used for a and c for the different detector technologies are listed in Table 4 and are used to calculate the resolution, \(\sigma _E\).

Table 4 Stochastic and constant terms for the intrinsic calorimeter energy resolution for the different detectors and used to simulate the hits for electrons and photons [12]

The expected number of hits, \(\lambda \), which would produce this resolution from a Poisson statistical process is calculated as

$$\begin{aligned} \lambda= & {} 1/ (\sigma _E /E)^2. \end{aligned}$$
(4)

A random number following a Poisson distribution \(N = \text {Poisson}(\lambda )\) is used to simulate N hits of equal energy \(E_\text {hit}=E/N\). The positions of these hits are randomly distributed according to the average shower shape introduced in Sect. 4.2.

For electrons and photons, which deposit most of their energy in the LAr EM calorimeters, the expected number of hits (see Eq. (4)) is dominated by the stochastic term, a, in the energy resolution. Assuming a stochastic term of 10.1% and equal energy for each hit, electron and photon showers have hits with \(E_\text {hit} \approx 10~\text {MeV}\).

Hadrons

The number of generated hits for hadrons is calculated following the same procedure as described in Sect. 4.3.1. However, for hadrons, the stochastic and constant terms in each layer are \(\eta \)-dependent and much larger due to intrinsic fluctuations in hadronic showers. To derive these terms, a special simulation of charged pions is used, where in addition to the measurable energy deposited in the active material of the calorimeter, the total energies lost in both the active and inactive parts of the calorimeter are recorded. The ratio of these two energies in each layer is the sampling fraction per shower and varies with the total energy deposited, denoted by \(f_{\text {sample}}(E)\). The relative resolution, \(\sigma _E/E\), of \(f_{\text {sample}}(E)\) is fitted with Eq. (3) to extract the stochastic and constant terms for each \(\eta \)-slice. Only showers that deposit more than 1 \(\text {GeV}\) of energy in a calorimeter layer are considered in the fit.

The stochastic terms obtained from simulation for pions are in the range of approximately 30–40% for the EM calorimeters, 50–60% for the Tile calorimeter, 60–80% for the HEC calorimeter, and 80–100% for the FCal. The constant terms c are in the range of 1–10%.

Using \(\eta \)-dependent stochastic and constant terms significantly improves the modelling of hadronic showers for most layers. A notable exception is observed in the highly granular calorimeter layers EMB1 and EME1. In these cases, the stochastic and constant terms shown in Table 4 are used.

In Fig. 7a, the energy fractions inside voxels along \(\Delta R^{{}\mathrm{mm}}\) in the EMB2 layer of the calorimeter are shown for a 65 \(\text {GeV}\) charged pion in the range \(0.20<|\eta |<0.25\) for the first bin of the leading PCA using Geant4. The number of voxels with a particular energy fraction is represented on the z-axis. In the Geant4 distribution, away from the centre of the shower, only a small number of voxels have an energy of \(O(E_\text {hit})\), while most voxels have a substantially lower energy. Due to the large stochastic terms for hadrons (\(> 30\%\)), \(E_\text {hit}\) is approximately 100–300 \(\text {MeV}\) for hadronic showers, which is similar to the energy needed to seed a cluster.

In Fig. 7b, the same distribution is shown simulated using FastCaloSim V2 where each hit is assigned equal energy. The mean and the RMS calculated by including the voxels with an energy of \(O(E_\text {hit})\), for each distribution, are compared in Fig. 7d. Although the mean of the energy fraction is correctly reproduced by FastCaloSim V2, the number of voxels with an energy of \(O(E_\text {hit})\) is substantially larger away from the centre of the shower. In many cases, just one of these hits together with some noise is sufficient to seed the formation of a calorimeter cluster, which then leads to substantial differences in the cluster energy and position distribution compared to Geant4.

Fig. 7
figure 7

Ratio \(E_\text {voxel}/E_\text {hit}\) as function of \(\Delta R^{{}\mathrm{mm}}\) for deposited energy from a 65 \(\text {GeV}\) charged pion in EMB2 in the range \(0.20<|\eta |<0.25\) in the first bin of the leading PCA (PCA=1). Entries with \(E_\text {voxel}=0\) are shown in the underflow bin below \(10^{-9}\). Lateral shower shape model (a) in Geant4, b in a model using equal deposited energies, c in a model using weighted hit deposited energies. d Comparison of the mean (central value) and the RMS (error bars) for the equal hit, weighted hit, and Geant4 models. The yellow band indicates the \(1\sigma \) uncertainty for Geant4

To correct for this mismodelling, a second model is developed where instead of assigning equal energy, a hit weight is introduced. The weight is calculated, such that the number of hits simulated in a certain \(\Delta R^{\text {mm}}\) bin is changed to better reproduce the RMS of the distribution from Geant4, denoted by \(\text {RMS}_\text {G4}\). The steps involved to calculate this weight are discussed below.

As a first step, the voxels with sufficiently low energy are assigned \(E_{\text {voxel}} = 0\). Only voxels that do not change the mean by more than 0.01% are considered in this step. Then, two Poisson distributions are calculated, one reproducing the fraction of voxels with \(E_{\text {voxel}} = 0\) and a second one reproducing the RMS of the total Geant4 distribution. The smaller of the two RMS values is used and denoted by \(\text {RMS}_\text {Poisson}\).

The \(\text {RMS}_\text {Poisson}\) value is used to determine the number of simulated hits by calculating

$$\begin{aligned} N_\text {Poisson}= & {} 1/(\text {RMS}_\text {Poisson}/\lambda )^2. \end{aligned}$$

The weight is then calculated using \(N_\text {Poisson}\) as follows:

$$\begin{aligned} w=\langle E_\text {voxel}/E_\text {hit} \rangle /N_\text {Poisson}, \end{aligned}$$

and the energy is recalculated as

$$\begin{aligned} E^\prime _\text {hit}=E_\text {hit} \cdot w. \end{aligned}$$

These weights are calculated for each average shower shape discussed in Sect. 4.2. To ensure that the average shower shape is unchanged, a correction of 1/w is applied to the probability of all voxels at a distance of \(\Delta R^{\text {mm}}\) from the shower center.

In addition, if the RMS of the Poisson distribution is smaller than that of Geant4, i.e., \(\text {RMS}_\text {Poisson} < \text {RMS}_\text {G4}\), additional fluctuations are added by applying a smearing function to the \(N_\text {Poisson}\) value. The smearing function has the form \(\text {e}^s\), where s is a random number generated from a Gaussian distribution, such that

$$\begin{aligned} \text {RMS}_{\text {e}^s} = \text {RMS}_\text {smearing}/\sqrt{N_\text {Poisson}}\,, \end{aligned}$$

and \(\text {RMS}_\text {smearing}\) is calculated as:

$$\begin{aligned} \text {RMS}^2_\text {smearing}=\text {RMS}^2_\text {G4} - \text {RMS}^2_\text {Poisson}\,. \end{aligned}$$

Combining these corrections, the hit energy is derived as

$$\begin{aligned} E^{\prime \prime }_\text {hit}=E_\text {hit} \cdot w \cdot \text {e}^s\,. \end{aligned}$$

Figure 7c shows the \(E_\text {voxel}/E_\text {hit}\) distribution simulated using FastCaloSim V2 with the weighted hit model. The number of voxels with an energy of \(O(E_\text {hit})\) is seen to be substantially better modelled when compared with Geant4. Additionally, the mean and the RMS of the equal hit model and the weighted hit model are overlaid with those of Geant4 in Fig. 7d, demonstrating the improved modelling of the RMS for weighted hits. The dependence of the weight w and the \(\text {RMS}_{\text {e}^s}\) parameter is stored as function of \(\Delta R^{{}\mathrm{mm}}\) together with the average shower shape scaled by 1/w for all charged pions with energy above 1 \(\text {GeV}\). For hadrons, the weighted hit model is used instead of the equal hit energy model.

Calorimeter Simulation with FastCaloGAN

FastCaloGAN is a fast calorimeter simulation tool that parameterizes the interactions of particles in the ATLAS calorimeter system using 300 GANs, one for each particle type and \(\eta \) slice in which the reference samples are produced. FastCaloGAN takes a different approach than FastCaloSim V2, which as seen in Sect. 4 factorizes the shower parameterization into several components, i.e., longitudinal and lateral energy distributions for different energy points that requires interpolation between them. A GAN, instead, provides a comprehensive solution to the simulation of any particle of any energy. This results in a simpler model that has a lower memory requirement at the price of a significantly higher time needed for producing the parametrisation used in the simulation. A detailed description of FastCaloGAN is provided in Ref. [50], and other studies of deep generative models for fast calorimeter simulation can be found in Refs. [51,52,53,54,55,56]. GANs were chosen, because they have proven successful in generating realistic showers in calorimeters. A GAN [8] is a combination of two deep networks, a generator producing artificial showers and a discriminator trying to distinguish the generated images from real ones. The two networks compete against each other in a game resulting in a type of training that is unlike those for other machine-learning problems. For example, the loss functions cannot easily be used to assess the quality of the training or to select the best training point. The architecture of the GANs is described in Sect. 5.1. Section 5.2 discusses the training strategy, the selection of the best epoch, and its performance. Finally, the strategy to map the energy from the voxels to the calorimeter cells is explained in Sect. 5.3.

Architecture

FastCaloGAN uses the Wasserstein GAN [57] with a gradient penalty (WGAN-GP) term [58] in the loss function of the discriminator. This configuration provides good performance and training stability.

The WGAN-GP is implemented in TensorFlow 2.0 [59], such that the training can be performed on either CPUs or GPUs. The architecture of the WGAN-GP is presented in Fig. 8. The generator uses a latent space of 50 values and has three hidden layers of increasing size. The output layer of the generator and the input layer of the discriminator have a number of nodes equal to the number of voxels (NVoxel) corresponding to the specific particle type and \(\eta \) slice. The discriminator maintains the same number of nodes until the last layer, which has a single output node. The GANs are conditioned on a single parameter, the true momentum of the particle.

Fig. 8
figure 8

Schematic representation of the architecture of the GANs used by FastCaloGAN. The input to the generator is at the top left and the output from the discriminator is at the bottom left. The Rectified Linear Unit (ReLU) activation function is used in all layers of the discriminator with the exception of the last

Each node uses the Rectified Linear Unit (ReLU) activation function. Both the generator and discriminator use the Adam [60] optimizer with a learning rate of \(10^{-4}\). The exponential decay rate for the first moment (\(\beta 1\)) is set to 0.5, while the second moment (\(\beta 2\)) is set to the default value (0.999), as are all other parameters that are explicitly given here. The training is performed using a batch size of 128 events, and the discriminator is trained five times for each training of the generator. Finally, the gradient penalty, \(\lambda \), is set to 10. These parameters are summarized in Table 5. This set of hyperparameters as well as the overall architecture were chosen as a compromise between the modelling performance and the time required to train the 300 GANs.

Table 5 Overview of the parameters of the WGAN-GP

Training

Each GAN is trained first on a single energy point, and then, the other energy points are added progressively to the training mixture starting from the energy points closest in energy to the initial sample. The training procedure can be summarized as follows:

  1. 1.

    Train the first 50,000 epochs using the 32 \(\text {GeV}\) sample.

  2. 2.

    Every 20,000 epochs add a new sample, alternating between higher and lower energies.

  3. 3.

    Once all energy points have been added, continue training with all samples for the remaining epochs.

The energy in each voxel is normalized to the true momentum of the primary particle entering the calorimeter, which means that the GAN only needs to learn the relative shape of the showers. The true momenta, which are used as labels for the conditioning, are also normalized to the highest value (4.2 \(\text {TeV}\)), which results in a range of values (0,1] which is optimal for the training of the GANs.

The training is performed for 1 million epochs with a TensorFlow checkpoint saved every 1000 epochs to monitor the improvements in the training. The training time for each GAN is approximately 8 h on the NVIDIA V100 [61] GPUs available on the CERN HTCondor system [62]. The limited number of GPUs available to train the 300 GANs sets the limit of 1 million epochs, while the frequency at which the checkpoints are stored is limited by both speed and disk space.

Best Epoch Selection

Due to the interplay between the generator and discriminator, the final epoch is not necessarily the best one. The figure of merit used to select the best epoch is a \(\chi ^2\) between the reference samples and the GANs. The variable chosen is the sum of the energy in all voxels that corresponds to the total energy deposited in the calorimeter by the particle.

For each energy point, the range used for the distribution is defined to be a ±3 RMS interval around the peak for electromagnetic showers in the Geant4 reference samples. As the energy distributions of the pions have longer tails, the range for hadrons is defined to be between \(-4\) RMS and 3.5 RMS. A total of 30 bins are used for all energy points. The \(\chi ^2\) is then evaluated between the binned distributions produced from all events in the reference samples and 10,000 events generated from the GAN and weighted by the statistical uncertainty. The overflow and underflow bins are not used in the \(\chi ^2\) evaluation. The total \(\chi ^2\) for a checkpoint is the sum of the \(\chi ^2\) for each of the 15 energy points. The checkpoint with the lowest \(\chi ^2\) sum is chosen for each GAN. This selection criterion, as opposed to selecting the last trained epoch, avoids the problem of selecting an epoch with an unfavourable fluctuation in the training.

The evolution of this \(\chi ^2\) as a function of the epoch is shown in Fig. 9 for pions with \(0.25< |\eta | < 0.3\). The average \(\chi ^2\) decreases with increasing epoch and the fluctuations around the average are typical of GAN training. The point with the lowest \(\chi ^2\) sum, which in the example presented in Fig. 9 occurs at epoch number 946,000, is the checkpoint used for the simulation of pions in that \(\eta \) range. This procedure is repeated for all 300 GANs.

Performance

Fig. 9
figure 9

The \(\chi ^2\) sum divided by the number of degree of freedom (NDF) calculated between the reference samples and the GAN as a function of the number of epochs. The lowest point (in red) represents the selected epoch

The performance of the best epoch for photons with \(0.2< |\eta | < 0.25\) is shown in Fig. 10. For each of the 15 energy points, the distribution of the total energy, defined as the sum of the energy in all voxels, is shown for the Geant4 input samples and the events generated with the GAN. In most cases, the means of the two distributions are comparable and so are their widths.

Fig. 10
figure 10

Sum of the energy in all voxels for photons with \(0.2< |\eta | < 0.25\). The calorimeter response for Geant4 (solid black line) compared with FastCaloGAN (dashed red line)

Similarly, the performance of the GAN for pions with \(0.2< |\eta | < 0.25\) is shown in Fig. 11. The first two energy points show a different shape than the other energy points and are not well described. The description of the highest energy point is poor due to the difficulties in reproducing the irregular shape and low number of events in the reference sample; given the extreme rarity of such high-energy hadrons in physics samples, the poor modelling is not of significant concern. Furthermore, in its final configuration described in Sect. 7.1, AtlFast3 does not use GANs in this energy range.

Fig. 11
figure 11

Sum of the energy in all voxels for pions with \(0.2< |\eta | < 0.25\). The calorimeter response for Geant4 (solid black line) is compared with FastCaloGAN (dashed red line)

The mean and RMS, indicated by the size of the uncertainty bars, of the total energy as a function of the true particle momentum (\(p_\mathrm {truth}\)) is shown in Fig. 12a–c for photons, electrons, and pions, respectively. For photons and electrons, the GANs reproduce the mean energies of the reference samples except at the low momentum points. The RMS from the GANs is larger than that of the reference sample for all energies. For pions, the GANs generate distributions with a lower mean and a larger RMS for a wider energy range.

Fig. 12
figure 12

Sum and RMS of the energy in all voxels normalized to the true momentum for a photons, b electrons, and c pions with 0.2 < \(|\eta |\) < 0.25 as a function of the true momentum. The calorimeter response for Geant4 (solid black line) is compared with FastCaloGAN (dashed red line), which is also abbreviated to FGAN. The uncertainty bars in the top panel indicate the RMS of the total energy distribution. The ratio of the means of the two energy distributions is shown in the middle panel, and the ratio of the RMS values is shown in the bottom panel. The error bars in the ratio indicate its statistical uncertainty. For most points, this uncertainty is smaller than the size of the markers

The total energy, defined as the sum of the energy in all voxels, for particles with momentum 65 \(\text {GeV}\) as a function of \(\eta \) for photons, electrons, and pions is shown in Fig. 13a–c. For photons and electrons, the GAN and reference sample means agree to better than 1% in almost all the regions, while the distributions generated by the GANs are wider than the reference samples in the barrel region. Small discrepancies are observed in the transition regions between detectors, where the energy response is non-Gaussian. For pions, the means agree to within 4%, with larger discrepancies observed in the barrel region, where the energy is slightly lower. The FastCaloGAN RMS is larger in both the barrel and endcap regions.

Fig. 13
figure 13

Sum and RMS of the energy in all voxels as a function of \(|\eta |\) for a photons, b electrons, and c pions of momentum 65 \(\text {GeV}\). The calorimeter response for Geant4 (solid black line) is compared with FastCaloGAN (dashed red line), which is also abbreviated to FGAN, while their ratio is shown in the ratio plots. The uncertainty bars in the top panel indicate the RMS of the total energy distribution. The ratio of the means of the two energy distributions is shown in the middle panel, and the ratio of the RMS is shown in the bottom panel. The error bars in the ratio indicate its statistical uncertainty. For most points, this uncertainty is smaller than the size of the markers

Simulation of Hits

The GAN models trained to generate showers in FastCaloGAN are implemented in the ATLAS Athena software framework using the Lightweight Trained Neural Network (LWTNN) [63].

The kinetic energy \(E_{\text {kin}}\) of the particle is used as the conditional parameter of the GAN. The output of the GAN is the energy assigned to each voxel. Each one of these energies must be assigned to a variable number of cells, because the voxels in FastCaloGAN can be larger than the ATLAS cells. To assign the correct amount of energy to each cell, the voxel surface defined in Eq. (2) is sampled uniformly, generating a grid of hits. Layers that are not binned along the angular direction have their energy uniformly distributed across the whole annulus surface. The granularity used to sample the voxel is 1 mm in the high-granularity EMB1 and EME1 layers, while 5 mm is used in the other layers. A maximum of 10 hits are created in either direction to limit the number of hits that are generated; this is required to have a small simulation time. The energy generated by the GAN in the voxel is divided uniformly between the hits. The hits are then assigned to the calorimeter cells using the simplified geometry. The longitudinal mid-position in each layer is used for the calculation of the hit position.

Simulation of Muon Punch-Through

Secondary particles created in hadronic showers inside the calorimeter can escape through the back of the calorimeter and generate hits in the muon spectrometer. This effect is referred to as muon punch through. These particles are reconstructed in the muon spectrometer and need to be well modelled to accurately describe the backgrounds of reconstructed muons. A dedicated treatment of these particles is required, because the information about the path of the particles is lost due to the parameterization of the calorimeter response in AtlFast3. Figure 14 shows the probability of a single pion entering the calorimeter to create at least one secondary particle which escapes the calorimeter volume with an energy of at least 50 \(\text {MeV}\) determined using the Geant4 simulation. The probability increases with increasing momentum p and varies as a function of \(\eta \). Particles with energies below 50 \(\text {MeV}\) are not simulated in the muon spectrometer, because they would have negligible impact.

Fig. 14
figure 14

The probability of a single pion event to produce at least one punch-through particle with an energy of at least 50 \(\text {MeV}\) as a function of the \(\eta \) and p of the incoming pion determined from Geant4

The AtlFast3 punch-through parameterization is derived separately for the five types of secondary particles that can emerge from the back of the calorimeter: photons, electrons, pions, muons, and protons. These account for 92% of the total punch through. The parameterizations of their multiplicity and kinematics are determined from single-pion samples simulated using Geant4. As the properties of the secondary particles depend significantly on the \(\eta \) direction and energy of the incoming pion, the reference samples within the acceptance of the muon spectrometer \(|\eta | \le 2.7\) and with momenta between 65 \(\text {GeV}\) and 4.2 \(\text {TeV}\) are used to determine the parameterization. The small number of secondary particles in lower energy samples did not allow a parametrization of primary particles with an energy lower than 65 \(\text {GeV}\).

The properties of the secondaries described by the parameterization include their energy, and their position and momentum relative to that of the incoming pion. The position and momentum of the secondaries are determined via deflection angles, \(\Delta \theta \) and \(\Delta \phi \), relative to the direction of propagation of the incoming pion. As an example, Figs. 15 and 16 show the histograms extracted from the Geant4 simulation and used to parameterize the secondaries produced by primary pions with an energy of 524 \(\text {GeV}\) and \(|\eta | \le 0.4\). The peak at 1 \(\text {GeV}\) is the most probable value of the energy of the secondary pions emerging from the calorimeter.

Fig. 15
figure 15

The punch-through probability as a function of the punch-through pion a multiplicity and b energy. The error bars indicate the statistical uncertainty and the overflow is not included in the final bins

Fig. 16
figure 16

The punch-through probability as a function of a deflection angle in \(\theta \) and energy, b deflection angle in \(\phi \) and energy, c relative momentum deflection in \(\theta \) and energy, and d relative momentum deflection in \(\phi \) and energy. Secondary pions with an energy of 524 \(\text {GeV}\) in the region \(|\eta | \le 0.4\) from the Geant4 reference samples were used

During the simulation of AtlFast3, the muon punch-through parameterization is invoked whenever particles that have some probability of punching through enter the calorimeter. For each incoming particle, the number of secondaries and their energy, position, and momentum are selected randomly from the punch-through parameterization histograms (see Figs. 15 and 16), using them as probability density functions. The parameterization is interpolated linearly for \(\eta \) and logarithmically for \(p_{\text {T}}\) to values between the discrete points used to determine the parameterization. Two sets of correlations are accounted for in the modeling of the secondaries: the correlations between the relative position and energy and correlations between the relative momentum and energy. After the multiplicity and properties of the punch-through secondaries have been determined, their propagation through the muon spectrometer is simulated using Geant4.

The Combination of FastCaloSim V2 and FastCaloGAN: AtlFast3

Configuration of AtlFast3

The new fast simulation tool, AtlFast3, is defined by combining the fast simulation tools described above in a way that balances modelling performance needs with CPU requirements. AtlFast3 uses the Integrated Simulation Framework (ISF), which allows different simulation tools to be combined in a flexible way [64]. AtlFast3 uses the following configuration as illustrated in Fig. 17:

  • Geant4 is used to simulate all particles in the inner detector and muons in all detectors. Hadrons with kinetic energies below 400 \(\text {MeV}\) (200 \(\text {MeV}\) for pions) in the calorimeter are also simulated in Geant4.

  • FastCaloSim V2 is used to simulate electrons and photons of all energies and hadrons with kinetic energies  \(\text {GeV}\) or  \(\text {GeV}\) in the calorimeter. A transition range of energies is given, because the response is interpolated linearly between the two models as discussed later.

  • FastCaloGAN is used to simulate hadrons with  \(\text {GeV}\) in the calorimeter.

  • The muon punch-through tool parameterizes the properties of particles exiting the calorimeter, which are then simulated with Geant4 in the muon spectrometer.

Fig. 17
figure 17

The configuration of the different tools used for AtlFast3, which depends on the particle type, the detector, and the particle energy

Configuration of the Fast Calorimeter Simulation

The configuration of AtlFast3 is determined by comparing the performance of FastCaloSim V2 and FastCaloGAN.

Electrons and Photons

The simulation of electrons and photons relies on the accurate simulation of electromagnetic showers in the electromagnetic calorimeter. The total reconstructed energy for 65 \(\text {GeV}\) photons is shown in Fig. 18 for Geant4, FastCaloSim V2, and FastCaloGAN. FastCaloGAN does not model the photon energy correctly and a similar poor performance is observed for electrons; therefore, FastCaloSim V2 is selected to simulate all electromagnetic showers. The poor modelling of electromagnetic showers in FastCaloGAN can be explained as follows: the GANs are trained without the energy resolution correction for the accordion structure of the calorimeter (see Sect. 7.4.4), and the energy scale of the detailed Geant4 hits, used in the training of FastCaloGAN, is slightly lower than that of the full Geant4 hits (see Sect. 3.1). Both these effects can be corrected for in future versions of FastCaloGAN, but could not be included here due to time constraints.

Fig. 18
figure 18

Reconstructed photon energy for photons generated at the calorimeter surface with an energy of 65 \(\text {GeV}\) and 0.2 \(< |\eta |<\) 0.25 by Geant4 (solid black line), FastCaloSim V2 (dashed blue line), and FastCaloGAN (dashed red line). The statistical uncertainties are shown, but are similar in size to the points or smaller

Low-Energy Hadrons

At low energies, the distribution of the average hadron energy response becomes complex and has a significant dependence on both \(E_{\text {kin}}\) and \(|\eta |\), as shown in Fig. 19b. This is because the measured energy depends strongly on the extent to which these shorter showers develop within the active liquid argon of the electromagnetic calorimeter or within the inactive lead absorbers. As an example, Fig. 19a shows the ratio of the average energy response to \(E_{\text {kin}}\) as a function of \(E_{\text {kin}}\) for charged pions in the range \(0.20< |\eta | < 0.25\). For pions with a kinetic energy of 100 \(\text {MeV}\)  the largest amount of deposited energy is typically within the liquid argon of the Presampler, which leads to a spike in the energy response. On the other hand, pions with a kinetic energy of 10 \(\text {MeV}\) deposit far less energy in the active liquid-argon regions and more in the inactive regions. In addition, the energy calibration of the Presampler is derived using high-energy particles, which deposit much less energy in the Presampler, which means that the measured fraction of shower energy in the Presampler increases further for \(E_{\text {kin}} \approx 100\) \(\text {MeV}\).

The dependence of the energy response to low-energy charged pions on \(\eta \) is due to the different amount of material that the charged pion passes through, which shifts the values of the kinetic energy at which the spike in the response occurs. Deriving a parameterization for such low-energy hadrons would require a significantly more complex method for deriving parameterizations to achieve high accuracy. Therefore, in AtlFast3 pions below 200 \(\text {MeV}\) and all other hadrons below 400 \(\text {MeV}\) (as shown in Table 6) are instead simulated by Geant4. Above these energy thresholds, their total energy response is modelled using AtlFast3. This choice does not significantly affect the speed of AtlFast3, because the simulation of low-energy hadrons requires only a comparatively small amount of CPU time.

Fig. 19
figure 19

Ratio of the average energy response to the generated energy for \(\pi ^{\pm }\) for a \(0.20< |\eta | < 0.25\) and b as a function of \(|\eta |\) and \(E_{\text {kin}}\). The error bars indicate the statistical uncertainty of the mean. For most points, this uncertainty is smaller than the size of the markers

Medium-Energy Hadrons

For hadronic showers, the number of clusters in a jet plays an important role in modeling the jet substructure and is therefore used as a metric to compare the performance of FastCaloSim V2 and FastCaloGAN. Differences in the modelling of the number of clusters between FastCaloSim V2 and FastCaloGAN are expected, because FastCaloGAN can model the correlations within a single event, while FastCaloSim V2 cannot. Figure 20 compares the modelling of the number of clusters in a jet for three different combinations of FastCaloSim V2 and FastCaloGAN. The hybrid models differ in the energy range over which the transition between FastCaloSim V2 and FastCaloGAN occurs; for example in the Hybrid 4–8 \(\text {GeV}\) model, FastCaloSim V2 is used up to 4 \(\text {GeV}\) and FastCaloGAN is used above 8 \(\text {GeV}\). Between 4 and 8 \(\text {GeV}\), the response is interpolated linearly between the two models as described in Sect. 7.3. The Hybrid 4–8 \(\text {GeV}\) model underestimates the number of constituents, while the Hybrid 16–32 \(\text {GeV}\) model overestimates the number of constituents. Therefore, the Hybrid 8–16 GeV model is chosen as the configuration for AtlFast3. Other key jet variables, including the number of jets, the \(p_{\text {T}}\) and \(\eta \) distributions, and variables used for substructure, are also checked for these different configurations, which provides additional support for choosing the Hybrid 8–16 \(\text {GeV}\) model. Section 8.1.2 discusses the performance of AtlFast3 in modelling jet variables.

Table 6 Hadron energies below which AtlFast3 relies on Geant4 for their simulation
Fig. 20
figure 20

Distribution of the number of constituents in the jets in a \(1.8< p_{\text {T}} < 2.5\) TeV dijet sample in Geant4 (black triangles) and the combination of FastCaloSim V2 and FastCaloGAN with transitions in the range 4–8 \(\text {GeV}\) (blue stars), 8–16 \(\text {GeV}\) (red diamonds), and 16–32 \(\text {GeV}\) (green crosses). Here, ‘hybrid’ refers to the combination of FastCaloSim V2 and FastCaloGAN. The statistical uncertainties are shown, but may be smaller than the markers

High-Energy Hadrons

At higher energies, the modelling of the properties of individual clusters becomes important. Figure 21 compares the number of cells in the calorimeter clusters in Geant4 with FastCaloSim V2 and FastCaloGAN for pion energies ranging from 65 to 524 \(\text {GeV}\). Although FastCaloSim V2 slightly overestimates the number of cells for all energies, FastCaloGAN significantly underestimates the number of cells and this becomes more pronounced at higher energy. Studies of additional jet variables, many of which are shown in Sect. 8.1.2, confirmed that FastCaloSim V2 has better modelling for higher energy hadrons. Therefore, FastCaloSim V2 is used to simulate hadrons with \(E_{\text {kin}} > \text {256--512}\) \(\text {GeV}\). As shown in Sect. 8.1.2, despite these discrepancies, the modelling of higher-level objects such as jets is sufficient for physics analysis.

Fig. 21
figure 21

Number of cells in the leading cluster for pions in the barrel at different energies in Geant4 (black triangles), FastCaloSim V2 (red diamonds), and FastCaloGAN (blue stars). The statistical uncertainties are shown, but may be smaller than the markers

Fig. 22
figure 22

Energy response, defined as the ratio of the reconstructed energy in the calorimeter cells to the kinetic energy of the particle, for a photons in \(1.05< |\eta | < 1.10 \) and b pions in \(0.20< |\eta | < 0.25\). The red dotted points represent the response derived at discrete energies, using Geant4 simulated single particles. The black line is a spline fit used to interpolate between discrete energy points. The statistical uncertainties are shown, but are similar in size to the points or smaller

Muon Punch-Through

The muon punch-through parameterization described in Sect. 6 is used to simulate particles punching through the calorimeter. After the multiplicity and properties of the secondaries are determined using the punch-through parameterization, their path through the muon spectrometer is simulated using Geant4.

Energy Interpolation

The FastCaloSim V2 and FastCaloGAN parameterizations are derived using samples with logarithmically spaced discrete energies, which need to be extrapolated to particles of all energies. In FastCaloSim V2, a piece-wise third-order polynomial spline function is fitted to the total energy response to interpolate to intermediate energies. Furthermore, linear extrapolation is used to reach energies beyond those of the simulated input samples. The spline interpolations are generated for each particle and each \(\eta \) slice and are used to rescale the total energy response from the parameterization points. An example of the energy response and fitted splines for photons and pions in the barrel region is shown in Fig. 22. The energy response for high-energy photons is slightly reduced due to leakage into the Tile calorimeter. In FastCaloGAN, the conditioning on the particle momentum creates a model that can produce particles of any energy.

In addition to the interpolation of the total energy response, the other longitudinal and lateral shower shape properties also need to be interpolated. In FastCaloGAN, the shape properties are interpolated automatically by the GANs, while in FastCaloSim V2, the shape interpolation is done by randomly selecting the parameterization from the nearest energy point with a probability linear in \(\log (E_{\text {kin}})\) and fitted, such that unit probability is reached for the grid energy points.

In the two transition regions between FastCaloSim V2 and FastCaloGAN (for hadrons in the ranges 8–16 \(\text {GeV}\) and 256–512 \(\text {GeV}\)), a spline is used to interpolate between the two models. A smooth energy–response transition between the two models is obtained, since the simulated energies are always scaled to the energy from Geant4. For electrons and photons, the spline for the energy response is fitted down to 16 \(\text {MeV}\), below which a linear extrapolation is used. For hadrons, the energy response is fitted down to a kinetic energy of 200 \(\text {MeV}\), below which Geant4 is used for the simulation.

Corrections

Four different corrections are applied to the calorimeter parameterization in AtlFast3. However, the energy resolution correction discussed in Sect. 7.4.1 and the energy \(\phi \)-modulation correction discussed in Sect. 7.4.2 are only applied to FastCaloSim V2.

Energy Resolution Correction

The simulation of the resolution of the total energy in FastCaloSim V2 is improved by reweighting the distribution of simulated energies produced by FastCaloSim V2 to the distribution from Geant4. The ratio of the Geant4 simulated energy to the FastCaloSim V2 simulated energy for each PCA bin is used to create a pdf. For each simulated total energy the pdf returns an associated probability. During simulation, for each simulated energy a uniform random number in [0,1] is drawn and if the number is smaller than the probability obtained from the pdf, the simulated energy is accepted. If the energy is rejected, then the energy simulation step discussed in Sect. 4.1 is repeated. The RMS is calculated using at least 99% of all events and in a \(\pm 3\sigma \) range around the mean. This probabilistic reweighting (rw) obtains good agreement with the Geant4 distribution. Figure 23 shows the resolution for photons, as an example, before and after the correction, and the RMS of the distribution is indicated.

Fig. 23
figure 23

The simulated total energy before (blue stars) and after (red diamonds) probabilistic reweighting for a photon of energy 262 \(\text {GeV}\) in the range \(0.4< |\eta | < 0.45\) compared with Geant4 (black triangles). The RMS of each distribution is indicated in the legend. The statistical uncertainties are shown, but may be smaller than the markers

Fig. 24
figure 24

a The total energy response exhibits a dependence on the impact position in \(\phi \) of the particle in the calorimeter cell (\(|\phi _\mathrm {mod}|\)), shown for 65 \(\text {GeV}\) photons with \(0.2< |\eta | < 0.25\) (Geant4). The ratio has been shifted, such that mean ratio of the energy from Geant4 to the true energy is unity. b The impact of the correction on Geant4 simulation (gray triangles are without correction; black are with corrections) and the result of the stand-alone simulation for 131 \(\text {GeV}\) photons with \(1.65< |\eta | < 1.7\) to which the correction has been applied as well as the reweighting described in Sect. 7.4.1. The statistical uncertainties are shown in the error bars

Energy \(\phi \)-Modulation Correction

Due to the accordion structure of the EM calorimeter, the total deposited energy is modulated in the \(\phi \)-direction, as shown in Fig. 24a, where \(|\phi _\mathrm {mod}| = |\mathrm {mod}(\phi _{\mathrm {calo}}, \pi /512)|\). The calibration applied during the ATLAS electron and photon reconstruction makes a correction for the \(\phi \)-modulation in the energy response observed in Geant4; this calibration impacts the resolution of the reconstructed energy. The modulation is not reproduced in FastCaloSim V2, because it does not have a functional dependence on \(\phi \). The resolution of showers in the electromagnetic calorimeter produced by FastCaloSim V2 is corrected by deriving the energy parameterization of Sect. 4.1 after removing the modulation of the energy as a function of \(\phi \) in the reference samples. This procedure is applied to particles with energies of at least 16 \(\text {GeV}\); below this threshold, the effect is negligible and can be ignored. Figure 24b shows the energy response from Geant4 for photons with and without the removal of the \(\phi \) modulation compared with the prediction from FastCaloSim V2. Good agreement in the modelling of the resolution between FastCaloSim V2 and Geant4 is obtained for the Geant4 samples with the \(\phi \)-modulation removed. As a consequence of this strategy, during the reconstruction of electrons and photons simulated with AtlFast3, a set of calibrations without a correction for the energy modulation in the \(\phi \)-direction must be applied, differing from the calibrations used for full simulation samples. This procedure particularly improves the modelling of the resolution of the calibrated energies for photons and electrons in AtlFast3.

Fig. 25
figure 25

Energy response correction factors as a function of the true kinetic energy for protons, neutrons, and kaons (left) in the barrel and their antiparticles (right). The kinetic energy for antiparticles includes their mass. The coloured bands indicate the size of the statistical uncertainty in the correction

Hadron Total Energy Correction

The hadron total energy correction accounts for the difference between the charged-pion response, which is used to derive the calorimeter parameterizations, and the response to other hadron species. It is particularly important at low energies, where the kinetic energy of a hadron is close to its mass.

The hadron total energy correction is derived using simulated samples of (anti)protons, (anti)neutrons, and (anti)kaons as described in Sect. 3. Using Geant4, the parameterized energy is corrected by the ratio of the mean simulated hadron energy response, \(\langle E_{\text {G4}}^{\text {h}} \rangle \), to the mean simulated pion energy response, \(\langle E_{\text {G4}}^{\pi } \rangle \). A further rescaling must be applied, because the reference samples were generated using the momentum of the particle, while the \(E_{\text {kin}}\) is used for the parameterization. This is achieved by calculating the pion-to-hadron ratio of kinetic energies for each true momentum in the reference samples. During AtlFast3 simulation, hadrons are then simulated using the charged-pion parameterization that provides the total energy \(E_{\text {Total}}\) given the kinetic energy of the pion, \(E_{\text {kin}}^{\pi ,\text {true}}\), but with an additional correction based on the kinetic energy of the hadron, \(E_{\text {kin}}^{\text {h,true}}\). The corrected energy response is then given by

$$\begin{aligned} E_{\text {Total}}^{\text {corr h}} = \frac{\langle E_{\text {G4}}^{\text {h}} \rangle }{ \langle E_{\text {G4}}^{\pi } \rangle } \times \frac{E_{\text {kin}}^{\pi ,\text {true}}}{E_{\text {kin}}^{\text {h,true}}} \times E_{\text {Total}}. \end{aligned}$$

The value \(E_{\text {Total}}^{\text {corr h}}\) is the corrected energy. The hadron total energy correction is linearly interpolated between the logarithmically spaced energy grid points. Figure 25 shows an example of the factor applied for the hadron total energy correction as a function of the true \(E_{\text {kin}}\) for protons, neutrons, and kaons. The hadron total energy correction is largest at small kinetic energies and decreases with increasing energy. It does not depend strongly on \(\eta \) and is similar for protons and neutrons.

Residual Energy Response Correction

The residual energy response correction is applied to correct the total energy response for electrons, photons, and pions from the parameterizations to match the average response of Geant4 after the full ATLAS simulation and reconstruction chain. This correction can, therefore, correct for differences introduced during digitization and reconstruction. The residual energy response correction is the ratio of the average reconstructed energy when using Geant4, \(\langle E_{\text {G4}} \rangle \) to the average reconstructed energy from AtlFast3, \(\langle E_{\text {AF3}} \rangle \). The residual energy response correction is calculated and applied as follows:

$$\begin{aligned} E_{\text {Total}}^{\text {corr res}} (\mathrm {p}) = \langle E_{\text {G4}} (\mathrm {p}) \rangle / \langle E_{\text {AF3}} (\mathrm {p}) \rangle \times E_{\text {Total}} (\mathrm {p}), \end{aligned}$$

where \(\mathrm {p} = [e, \gamma , \pi ]\).

It is derived for each parameterization grid point and linearly interpolated between the simulated energy points. Figure 26 shows an example of the derived residual energy response correction as a function of the true \(E_{\text {kin}}\) for photons, electrons, and pions. The residual energy response correction is at the per-mil level for electrons and photons and only slightly larger for pions and hence only applied when statistically significant.

Fig. 26
figure 26

Residual energy response correction factors as a function of the true kinetic energy for photons, electrons, and pions in the endcap. The coloured bands indicate the size of the statistical uncertainty in the correction

Fig. 27
figure 27

The ratio of the energies assigned to each cuboid of the second electromagnetic barrel layer in AtlFast3 and Geant4 for a photon of 65 \(\text {GeV}\) in the range \(0.20< |\eta | < 0.25\) using a simplified cuboid geometry and after applying the correction for the simplified geometry

Simplified Geometry Shower Shape Correction

The hits generated by FastCaloSim V2 or FastCaloGAN are assigned to calorimeter cells using a simplified cuboid geometry. This introduces a bias in the energy distribution, which can result in a significant number of hits being assigned to neighbouring calorimeter cells. To account for this effect while maintaining the reduced simulation CPU time afforded by the simplified geometry, a small displacement in \(\phi \) is assigned to each hit before geometrically matching it to a cell with the simplified geometry. This procedure is substantially easier and faster than geometrically matching a hit to the cells in the complex ATLAS liquid-argon accordion structure.

A pdf is derived from the difference between the cell assignment probabilities calculated in Geant4 and FastCaloSim V2. The correction is made using the pdf to randomly assign a displacement in \(\phi \) to a hit. Figure 27 shows the bias in deposited energy in each cuboid before and after this correction has been applied. Good agreement in the cell energy between AtlFast3 and Geant4 is observed once this correction has been applied.

Performance of AtlFast3

The performance of AtlFast3 is studied by comparing the modeling of reconstructed quantities (Sect. 8.1) and important kinematic variables from physics analyses (Sect. 8.2) in AtlFast3, Geant4, and AtlFastII. The reconstructed objects that depend on the performance of the calorimeter are electrons and photons, jets, and \(\tau \)-leptons. For b-tagging and for other particles, such as muons, the performance of AtlFast3 depends primarily on the performance of the tracking detectors. This is studied as part of the validation of AtlFast3 , and we focus the specific case of muon punch-through in this paper, because no significant differences from Geant4 are observed. Section 8.3 discusses the CPU performance of AtlFast3.

Performance of AtlFast3 on Objects for Physics Analysis

Reconstructed Photons and Electrons

Electron and photon candidates are reconstructed from topological clusters of deposited energy in the electromagnetic calorimeter, and in the case of electrons, tracks in the inner detector are matched to the clusters [42]. For physics analysis, identification criteria are defined by requirements on shower shape and track quality. These identification criteria are labelled as ‘loose’, ‘medium’, and ‘tight’. The identification of electrons uses information from the inner detector, so a single electron sample with uniform \(p_{\text {T}}\) , \(\eta \), and \(\phi \)-coverage is generated at the center of the ATLAS detector for validation in this section. These samples are then simulated with pileup overlaid. The efficiencies for both electrons and photons are validated on an inclusive \(\eta \) and \(p_{\text {T}}\) range that extends beyond what is typically considered for performance studies. Figure 28a compares the electron identification efficiencies for the ‘tight’ criterion as a function of the reconstructed \(p_{\text {T}}\) for AtlFastII, AtlFast3, and Geant4. While AtlFastII agrees with Geant4 to better than 5% in the electron \(p_{\text {T}}\) range from 30 \(\text {GeV}\) to 300 \(\text {GeV}\), AtlFast3 agrees with Geant4 to within 2% in most of the phase space. The ‘tight’ photon identification efficiency is shown in Fig. 28b as a function of the photon \(p_{\text {T}}\) for AtlFastII, AtlFast3 and Geant4. Except at very low \(p_{\text {T}}\), AtlFast3 agrees with Geant4 to within a few percent with better modelling than AtlFastII.

Fig. 28
figure 28

‘Tight’ identification efficiencies for single electrons with true energy greater than 20 \(\text {GeV}\)(a) and photons from \(H \rightarrow \gamma \gamma \) decays (b) inclusive in \(|\eta | < 2.5\) as a function of their reconstructed \(p_{\text {T}}\) for Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers

Reconstructed Jets

Jets are reconstructed using a variety of reconstruction algorithms and inputs from the calorimeters and the inner detector. In addition, different pileup mitigation and jet grooming algorithms are applied.

Particle-flow (EMPFlow) jets, which are constructed using EM-scale topological clusters [45], are reconstructed with the anti-\(k_{t}\) algorithm [44, 65] with a radial distance parameter \(R=0.4\), using charged constituents associated with the primary vertex [41] and neutral particle-flow constituents as inputs [45]. Large-radius jets (\(R=1.0\)) are reconstructed by applying the anti-\(k_t\) algorithm to locally calibrated topological clusters (LCTopo) [49] and the newer alternative of Unified Flow Objects (UFO) [66].

The performance of AtlFast3 with EMPFlow jets is assessed using the \(p_{\text {T}}\) of the leading jet and the pseudorapidity distribution of the sub-leading jet in a \(t\bar{t}\) sample, which are shown in Fig. 29. For both distributions, AtlFastII and AtlFast3 are consistent with Geant4 at the percent level. In the forward \(\eta \) regions of Fig. 29b, AtlFast3 shows better agreement than AtlFastII with Geant4 thanks to the updated parameterization in the forward region of the detector.

Fig. 29
figure 29

The transverse momentum distribution of the leading jets (a) and the pseudorapidity distribution of the sub-leading jets (b) in a \(t\bar{t}\) sample with Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The jets are EMPFlow jets with \(R=0.4\). The statistical uncertainties are shown, but may be smaller than the size of the markers

For higher \(p_{\text {T}}\) jets, the simulation of the detailed structure within the jet plays an important role in the efficiency and classification. To provide better coverage for higher jet \(p_{\text {T}}\) , the \(Z^\prime \) and \(W^\prime \) boson samples were reweighted to have a flat leading-jet \(p_{\text {T}}\) spectrum as described in Sect. 3.3. Figure 30 shows the number of charged constituents for leading jets with \(p_{\text {T}} > 200\) \(\text {GeV}\) from a sample containing \(Z^\prime \rightarrow t{\bar{t}}\) events. Figure 30a shows EMPFlow jets reconstructed with the anti-\(k_{t}\) algorithm with a radius parameter \(R=0.4\), while Fig. 30b shows UFO jets reconstructed with the anti-\(k_{t}\) algorithm with \(R=1.0\). The number of constituents in the EMPFlow jets is significantly underestimated by AtlFastII, while AtlFast3 reproduces the distribution from Geant4 within statistical uncertainties for jets with more than 14 constituents. For events with fewer constituents, AtlFast3 slightly underestimates the number of constituents. For the UFO jets, agreement with Geant4 improves significantly, going from a 20% difference in AtlFastII to less than 10% in AtlFast3.

Fig. 30
figure 30

Distribution of the number of constituents in the leading jets for EMPFlow jets with \(R=0.4\) (a) and UFO jets with \(R=1.0\) (b) in the \(Z^\prime \) sample in Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers

Variables commonly used in jet-tagging algorithms include the energy-correlation-function ratio, \(D_2\), for two-body decays and the n-subjettiness ratio, \(\tau _{32}\), for three-body decays [67, 68]. Figure 31 shows the \(D_2\) variable reconstructed using the UFO algorithm with Geant4, AtlFastII, and AtlFast3 on a \(W^\prime \) sample. AtlFast3 significantly improves the modelling of \(D_2\), particularly at lower values.

Fig. 31
figure 31

The \(D_2\) variable for the leading jets in a \(W^\prime \) sample reconstructed using the UFO algorithm with radius parameter \(R=1.0\) with Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers

Figure 32 shows \(\tau _{32}\) for different large-radius jet algorithms. For the UFO jets in Fig. 32a, AtlFastII reproduces the distribution of Geant4 to within 20% and AtlFast3 improves this further to within 10%. For the LCTopo jets shown in Fig. 32b, the modelling from AtlFastII is poor, but is significantly improved with AtlFast3, which obtains agreement to within 20%. The improvement for LCTopo is expected to be larger than for UFO, because UFO includes tracking information.

Fig. 32
figure 32

The \(\tau _{32}\) variable for the leading jets in a \(Z^\prime \) sample reconstructed using the UFO algorithm with radius parameter \(R=1.0\) (a) and the LCTopo algorithm (b) with Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers, and the dark blue arrows indicate that a point is beyond the y-axis range

Reconstructed Hadronic \(\tau \)-Lepton Decays

Hadronically decaying \(\tau \)-leptons are reconstructed in the ATLAS detector using their decays to one or three charged hadrons along with neutral particles [69,70,71]. The decays are labelled by the number (Y) of charged particles and the number (X) of neutral particles, YpXn. The \(\tau \) reconstruction algorithm is seeded by the presence of a reconstructed jet. Figure 33 compares the number of events in different \(\tau \) decay topologies identified in a \(Z^\star /\gamma ^\star \rightarrow \tau \tau \) Drell–Yan (DY) sample, filtered for an off-shell mass of 2.0–2.25 \(\text {TeV}\), for Geant4, AtlFastII and AtlFast3 using \(\tau \)-candidates with \(p_{\text {T}} {}>10~\text {GeV}{}\) and \(|\eta |<2.5\). For all cases, except 1pXn, both AtlFastII and AtlFast3 agree with Geant4 to better than 10% for reconstructed \(\tau \) matched to a true \(\tau \) and better than 5% for reconstructed \(\tau \) ummatched to a true \(\tau \) (i.e., for fake \(\tau \)-leptons). The 1pXn case has more neutral calorimeter clusters, and the improved lateral correlations of calorimeter clusters resulted in the better agreement of AF3 with G4. The performance of AtlFastII and AtlFast3 is similar, with slightly better performance in AtlFastII for true \(\tau \)-leptons and slightly better performance in AtlFast3 for fake \(\tau \)-leptons.

Accurate modelling of the structure of the constituents within \(\tau \)-jets can be challenging for fast simulation, but is crucial in obtaining an accurate simulation of \(\tau \) candidates. Figure 34 compares the numbers of simulated clusters within true (left) and fake (right) \(\tau \) candidates. In both cases, AtlFastII significantly underestimates the number of clusters, while AtlFast3 is consistent with Geant4 within statistical uncertainties.

Reconstructed Muons

Muons are reconstructed from tracks in the muon spectrometer matched to tracks in the inner detector. The \(p_{\text {T}}\) distributions of all reconstructed muons from Geant4, AtlFastII, and AtlFast3 \(Z \rightarrow \mu \mu \) samples are compared in Fig. 35a. Both AtlFastII and AtlFast3 reproduce the \(p_{\text {T}}\) spectrum from Geant4. Figure 35b compares the number of muon candidates passing the different muon reconstruction working points. Both AtlFastII and AtlFast3 agree with Geant4 within uncertainties as expected, because prompt muons are almost exclusively simulated with Geant4 for all three samples.

Fig. 33
figure 33

Hadronic \(\tau \)-lepton decay modes for reconstructed \(\tau \)-leptons matched to true \(\tau \)-leptons (a) and reconstructed \(\tau \)-leptons not matched to true \(\tau \)-leptons (b) in a \(Z^\star /\gamma ^\star \rightarrow \tau \tau \) Drell–Yan sample filtered for an off-shell mass of 2.0–2.25 \(\text {TeV}\). The decays with one or three charged-particle tracks are denoted by 1p and 3p, respectively. X\((=1,2,3)\) denotes the number of neutral particles. The statistical uncertainties are shown, but may be smaller than the size of the markers

Fig. 34
figure 34

Number of clusters in hadronic \(\tau \)-decay candidates reconstructed with one charged track (1p) and either matched (a) or not matched (b) to a true \(\tau \)-lepton in an \(Z^\star /\gamma ^\star \rightarrow \tau \tau \) Drell–Yan sample filtered for an off-shell mass of 2.0–2.25 \(\text {TeV}\). The statistical uncertainties are shown, but may be smaller than the size of the markers

Fig. 35
figure 35

The a reconstructed muon transverse momentum distribution and b identification efficiency for different muon working points for a \(Z \rightarrow \mu \mu \) sample generated with \(p_{\text {T}} (Z) = 0\) for Geant4, AtlFastII, and AtlFast3. The statistical uncertainties are shown, but may be smaller than the size of the markers

The performance of the muon punch-through simulation is validated by comparing misidentified muon candidates from hadronic activity produced in fully simulated Geant4 events with those produced by AtlFast3. Figure 36 compares the reconstructed \(p_{\text {T}}\) of fake muons created by 500 \(\text {GeV}\) single pions (inclusive in \(\eta \)) between Geant4 and AtlFast3. As muon punch through is not simulated in AtlFastII, only Geant4 and AtlFast3 are shown. Agreement to better than 20% is observed in most parts of the distributions.

The number of muon segments in jets reconstructed in the muon spectrometer is shown in Fig. 37. A \(Z^\prime \rightarrow t\bar{t} \) sample is used, because it includes prompt muons from the (anti-) top quark decays and particles produced by jets punching through the calorimeter. AtlFastII underestimates the number of muon segments, while AtlFast3 shows better agreement with Geant4. In particular, AtlFastII reproduces the number of muon segments only up to three, while AtlFast3 reproduces the number of muon segments up to seven.

Fig. 36
figure 36

Comparison of muon punch-through simulation in AtlFast3 and Geant4 as a function of the \(p_{\text {T}}\) of misidentified muons from 500 \(\text {GeV}\) single-pion events. The statistical uncertainties are shown, but may be smaller than the size of the markers

Fig. 37
figure 37

Comparison of muon segments in jets reconstructed with a radius parameter of 0.4 using the EMPFlow algorithm in a \(Z^\prime \rightarrow t\bar{t} \) sample with a \(Z^\prime \) mass of 4 \(\text {TeV}\) in Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers

Reconstructed \(E_{\text {T}}^{\text {miss}}\)

The missing transverse momentum (\(E_{\text {T}}^{\text {miss}}\)) [46] is the negative vector sum of the reconstructed momenta of EMPFlow jets, electrons, photons, \(\tau \)-leptons, and muons, plus any other tracks associated with the hard-scatter primary vertex, and is used to look for transverse momentum imbalance in pp collisions. The performance of the \(E_{\text {T}}^{\text {miss}}\) reconstruction is therefore sensitive to the modelling of all reconstructed objects. Figure 38 shows the difference between the true \(E_{\text {T}}^{\text {miss}}\) and the reconstructed \(E_{\text {T}}^{\text {miss}}\) in the x and y directions in \(t\bar{t}\) events. Both AtlFastII and AtlFast3 reproduce the \(E_{\text {T}}^{\text {miss}}\) distribution from Geant4 within the statistical uncertainties. Moreover, no significant differences between AtlFastII and AtlFast3 are observed, and this is attributed to their good agreement in the jet \(p_{\text {T}}\) shown in Fig. 29a.

Performance of AtlFast3 in Physics Analysis

The performance of AtlFast3 for physics analysis is studied using reconstructed particle masses from selected physics analyses.

The reconstructed mass of Higgs bosons decaying into two photons is used to further evaluate the performance for photons. Events are selected by requiring two photons with \(p_{\text {T}} > 0.35 m_{\gamma \gamma }\) and \(p_{\text {T}} > 0.25 m_{\gamma \gamma }\), and with \(|\eta | < 1.37\) or \(1.52< |\eta | < 2.47\). A comparison of Geant4, AtlFastII, and AtlFast3 is shown in Fig. 39. Both AtlFastII and AtlFast3 reproduce the mean of the distribution with high accuracy. AtlFast3 has better modelling of the width and agrees with Geant4 to within 5%, while AtlFastII overestimates the width of the distribution by 10%.

Fig. 38
figure 38

The difference between the true \(E_{\text {T}}^{\text {miss}}\) and the reconstructed \(E_{\text {T}}^{\text {miss}}\) in the x (a) and y (b) directions for a \(t\bar{t}\) sample for Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers

Fig. 39
figure 39

The reconstructed diphoton invariant mass distribution from a selection targeting events with Higgs boson decays into two photons. Events are selected by requiring two photons with \(p_{\text {T}} > 0.35 m_{\gamma \gamma }\) and \(p_{\text {T}} > 0.25 m_{\gamma \gamma }\), and with \(|\eta | < 1.37\) or \(1.52< |\eta | < 2.47\). The statistical uncertainties are shown, but may be smaller than the size of the markers

Events containing Drell–Yan processes are used extensively in physics performance studies as well as to probe the Standard Model. The invariant mass of the Z boson obtained when using Geant4, AtlFastII, and AtlFast3 is shown in Fig. 40a and b, with the Z boson reconstructed from either a pair of muons or a pair of electrons. The visible invariant mass [69] of off-shell 2.0–2.25 \(\text {TeV}\) \(Z^\star /\gamma ^\star \) bosons reconstructed using the hadronic decay modes of two \(\tau \)-leptons is shown in Fig. 40c, where the width is slightly overestimated by AtlFast3. Otherwise, no significant differences are observed between the three distributions, which validates the simulation of prompt muons in AtlFast3.

The performance of the simulation for jet substructure is evaluated with the \(Z^\prime \rightarrow t\bar{t} \) events as the benchmark for ‘beyond the Standard Model’ signatures with boosted high-\(p_{\text {T}}\) objects in the calorimeter. The leading jet’s mass with its constituents calibrated to the EM scale is shown in Fig. 41; the distribution has a large peak near the mass of the top quark, and a small peak at the W boson mass. AtlFastII significantly underestimates the mass and the width of both peaks compared to Geant4, while AtlFast3 is in better agreement with the Geant4 distribution than AtlFastII .

Fig. 40
figure 40

Invariant mass distribution from a selection targeting events with a Z boson decaying into a two muons or b two electrons with \(p_{\text {T}} > 25\) \(\text {GeV}\) and \(|\eta | < 1.37\) or \(1.52< |\eta | < 2.47\), and c the visible part of the invariant mass of two hadronically decaying \(\tau \)-leptons in Drell–Yan \(Z^\star /\gamma ^\star \rightarrow \tau \tau \) events filtered for an off-shell mass of 2.0–2.25 \(\text {TeV}\). The statistical uncertainties are shown, but may be smaller than the size of the markers

Fig. 41
figure 41

Distribution of the (left) number of constituents in the leading \(R=0.4\) EMPFlow jets in the \(W^\prime \) sample and (right) the mass of trimmed \(R=1.0\) UFO jets in the \(Z^\prime \) sample in Geant4 (black triangles), AtlFastII (blue stars), and AtlFast3 (red diamonds). The statistical uncertainties are shown, but may be smaller than the size of the markers

Computing Performance with AtlFast3

The time required to simulate a particle in Geant4 increases with energy due to increasing shower depth and complexity, whereas in AtlFastII and AtlFast3, the time is independent of the particle energy, because it requires a single lookup in the parameterization file. To illustrate this, the average CPU time, calculated with a 4-core Intel i7-3770 CPU at 3.40 GHz, required to simulate a single photon produced on the calorimeter surface at \(0.20< |\eta | <0.25\) is shown in Fig. 42 as a function of energy. For an 8 \(\text {GeV}\) photon produced on the calorimeter surface, AtlFast3 is approximately 20 times faster than Geant4, while for a 256 \(\text {GeV}\) photon, AtlFast3 is approximately 600 times faster.

Fig. 42
figure 42

Comparison of the CPU performance of AtlFast3 with Geant4 and AtlFastII. The average CPU time to simulate an event is estimated using 10,000 single photons at \(0.20< |\eta | < 0.25\) for three different energies: 8 \(\text {GeV}\), 65 \(\text {GeV}\), and 256 \(\text {GeV}\). These photons are generated on the calorimeter surface and provide a comparison for calorimeter-only simulation time

For the full detector simulation, the computing performance of AtlFast3 is compared with that of Geant4 by simulating the same 1000 \(t\bar{t}\) events; this is a complex process ideal for a variety of benchmarking needs and is used extensively by the ATLAS experiment for this purpose. Each simulation algorithm is executed on a 8-core Intel Xeon E5 CPU at 3.20 GHz. On average, Geant4 requires 167 seconds to simulate a single event, while AtlFast3 only requires 32 seconds, thereby obtaining a speed-up of the simulation by a factor of five. If the calorimeter simulation alone is considered, AtlFast3 is O(500) times faster than Geant4. This means that the simulation time is dominated by the simulation of the inner detector performed by Geant4. Therefore, further gains in the simulation speed of physics samples would require the use of fast simulation techniques in the tracking detector. Due to the size of the parameterization file, AtlFast3 requires 7 GB of proportional set size (PSS) memory, while the full simulation requires 2.7 GB in total when using eight separate cores. The parameterization requires 5 GB of PSS memory, and this is shared by the cores and is within the PSS memory budget available. The PSS memory required by AtlFast3 can be reduced in the future through the use of compression algorithms.

Conclusion

An updated version of the fast simulation for the ATLAS experiment, AtlFast3, is introduced in this paper. AtlFast3 significantly improves the modelling of reconstructed objects for physics analyses beyond that obtained by AtlFastII. In most cases, AtlFast3 and Geant4 agree to within a few percent. Key improvements include the modelling of the response in the forward calorimeters and of shower substructure within jets. Moreover, AtlFast3 requires only 20% as much CPU as Geant4 to simulate an event. The version of AtlFast3 described in this paper is currently being used by ATLAS to simulate 7 billion events for physics analyses of the Run 2 data. Further updates and improvements to the modelling are anticipated for Run 3 and beyond.