1 Introduction

Particles undergo complex stochastic interactions upon contact with materials. The modeling of these interactions is further complicated by the large number of secondary particles involved. The Monte Carlo simulation depends on repeated random sampling to produce a snapshot of the particle interactions with a detector. Simulation is crucial in most High Energy Physics (HEP) experiments and is extremely resource-intensive. More than \(50\%\) of the current computing resources of the HEP community are utilized in simulation alone [1]. In the future that need will increase further due to the higher luminosity and granularity of future experiments, and it will not be possible to create a corresponding increase in computing resources. The main motivation for the fast simulation is to incorporate other faster alternatives to decrease the cost of future experiments. Current fast simulation approaches are mostly based on parametrization [2,3,4] or lookup table [5] approaches, providing between 10 and 100 times speedup while achieving different levels of accuracy.

Generative Adversarial Network (GAN) [6] is a training paradigm for deep generational neural networks. Other approaches for generative networks include variational autoencoders [7] and autoregressive models [8], etc. All of these approaches have their own strengths and weaknesses. The autoencoder-based methods often produce blurry images while the pixel-based methods are not only slow to evaluate but also suffer from limited capacity. The GAN approach has been able to demonstrate highly realistic and sharp images as compared to other approaches [6]. There have been many recent variants of the GAN methodology, such as WGAN [9], StackGAN [10], and Progressive GAN [11] further enhancing the quality and resolution of generated images. The generative problem often involves intractable probability densities and thus methods involving likelihood estimates are not practical for most purposes. GAN can learn a distribution implicitly since it does not rely on the explicit computation of probability densities. Therefore the GAN approach is suitable for the generation of a wider range of data such as musical notes [12], natural language [13], medical data [14, 15], natural scenes [6], faces [11] and image denoising [16]. Since simulation is essentially a generative problem thus a successful image generation model can be exploited in this domain. A potential advantage of deep generative models is that the generated distribution does not have to be explicitly defined, and thus even the real data from a detector can be simulated directly.

We leverage the GAN methodology to generate HEP calorimeter output. Calorimeters are special HEP detectors that record particles through the measurement of the energies deposited by them. These detectors can be regarded as huge cameras taking pictures of particle interactions. The Monte Carlo simulation for the detector output is extremely precise but highly expensive both in regards to the simulation time and resources. For most HEP experiments the calorimeter is a simulation bottleneck, consuming, for example, more than \(80\%\) of the simulation time for the ATLAS experiment [2]. We generate the calorimeter cells as monochromatic pixelated images with the cell energy depositions as our pixel intensities.

The 3DGAN [17,18,19] was the first effort where the detector output was generated employing three dimensional convolutions, a more powerful approach for retaining correlations in all three spatial dimensions. Following our work, a number of similar approaches have been presented but all the previous implementations generate the detector output either as a two-dimensional image or a concatenated set of two-dimensional images. We demonstrate our approach for a high granularity detector with a higher spatial resolution and thus consequently much larger image dimensions than previous such efforts. We pre-process the cell depositions by taking a power less than one, thus decreasing the dynamic range of corresponding pixel intensities and improving the convergence. We employ a multi-step training process to generate images, from a complex multivariate distribution, for a large range of input conditions. We also perform extensive validations from diverse viewpoints including vision and deep learning, as well as, physics-based evaluation. The network scores highly on all the platforms both for the pertinence to the training data and for maintaining sufficient diversity. The details of the network development and the validation from different perspectives have been presented previously [18]. The current work is geared more towards the physics community and thus performs mostly physics-based validation. Previously we simulated electrons coming with energies from a wide spectrum by employing our multi-step training. Now we successfully extend the same approach to simulate additional particle types such as photon and neutral pion where most of the energy is lost in electromagnetic interactions. We perform an additional investigation to prove that the GAN could accurately reproduce the signature features of a particular particle type. The performance of the GAN model is further evaluated regarding GAN-specific failure modes. We also undertake some preliminary exploration of the charged pion simulation and generation of rare events. We finally present a successful practical example of using the GAN-generated data in a typical reconstruction tool, demonstrating that the GAN-generated images could provide similar performance as Monte Carlo images.

The current paper is organized as follows. Section 2 presents the overview of past efforts for fast simulation of HEP calorimeters exploiting deep neural networks. The next section (Sect. 3) describes the Monte Carlo training dataset. The basic structure of the calorimeter and the important features of our data are discussed. Section 4 describes how the GAN approach is adapted to the problem of HEP detector simulation. The approach has been exploited for the generation of particles with predominantly electromagnetic showers. The results for comparison to Monte Carlo simulation are presented in Sect. 5. Some preliminary work is also carried out for the charged pion simulation as presented in Sect. 6. Another study exploring the simulation of rare modes is presented in Sect. 7. Section 8 presents a practical application for the use of GAN-generated data. Finally, Sect. 9 summarises the main contributions and presents some future suggestions.

2 Previous work

Fast simulation is already incorporated in existing experiments through approaches like parametrization [2,3,4, 20] and Lookup tables [5], etc. Usually a part of the simulation is replaced by fastsim where some tradeoff between speed and accuracy can be feasible. Following the same concept deep learning has also been explored to generate simulation data. Fast simulation using neural networks can be regarded as a special type of parametrization, with the weights of the neural network as parameters, optimized through a training process.

Table 1 Calorimeter simulation with deep learning

The GAN technique is an unsupervised training methodology. The power of GAN lies in the fact that the target distribution does not have to be tractable and instead the training relies on a Minimax game between a discriminator (D) network and a generator (G) network. The discriminator is trained to differentiate between the target and the generated distributions, while the generator is trained to confuse the discriminator. Both the networks compete with each other till the generator manages to completely confuse the discriminator, given enough capacity for both models. At this point, the GAN is said to have converged.

Calorimeter data have been simulated through deep generative networks in a number of recent approaches as presented in Table 1. LAGAN [21] was one of the first fast simulation approaches based on deep learning. A simplified calorimeter was simulated as 2D jet images for high energy W bosons (signal) and generic quark/gluon jets (background). CALOGAN [22] employed the LAGAN architecture to generate sets of three two-dimensional images that were then concatenated to obtain the output for a three-layered simplified calorimeter conditioned on the primary particle energy (\(E_P\)) ranging from 1–100 GeV. Since then, there have been other demonstrations employing deep learning for HEP calorimeter simulation. Deep learning has been used for fast simulation of the ATLAS calorimeter [23]. Showers with energies 1–260 GeV and pseudorapidity \(|\eta |\) in the range of 0.2–0.25 were generated as flattened arrays of pixels, by a dense network employing both VAE and GAN methodologies. The images were also conditioned on the primary particle energy and constrained on the total energy deposition. The GAN-generated showers were reported to have better performance as compared to the VAE generated showers. WGAN has also been used to simulate the LHC detector output collapsed to a two-dimensional array of cells [24]. A simplified version of HGCAL was simulated as seven 2D images concatenated together, conditioned on the primary particle energy and impact position [25]. The DijetGAN [26] employed GAN for the simulation of QCD dijet events: a background process for important physics studies at LHC. Since the final version of 3DGAN [18] was presented some recent approaches have also attempted employing diverse methodologies. The SARM [27] model is based on the Autoregressive (ARM) architecture simulating simplified calorimeter outputs for single quarks against collimated pairs of quarks and muons produced in isolation against muons produced in a shower. The performance of the simulation was claimed to be of a much higher quality than previous such efforts but the auto-regressive models are slow to compute and thus the attained speedup is only limited. Another recent approach [28] experimented with several GAN-like architectures (including the BIB-AE, combining GAN and AE features). They simulated high granularity calorimeter for the ILD [30] detector as \(30\times 30\times 30\) three-dimensional images. The simulation was limited to the orthogonally incident photons coming with 10–100 GeV primary energy. The CaloFlow [29] applied the normalizing flow concept to the simplified calorimeter geometry (similar to the CALOGAN [22]), resulting in a much higher performance level. The work is quite promising and provides the additional benefit of tractable likelihoods.

The 3DGAN initial prototype [17] exploited 3D convolutional networks to simulate the response of a high granularity calorimeter as \(25\times 25\times 25\) image. The GAN setup was used to train the network for a simplified scenario involving only orthogonally incident electrons. The approach was then extended to condition \(51\times 51\times 25\) images on both the particle energy and incident angle [18, 19]. The more complex distribution could be generated through multi-step training, architecture, and loss function modifications (details in [31]). We now simulate the detector output for all the particle types available in the dataset and further validate the results. The 3DGAN greatly surpasses existing efforts in the granularity and dimensions of the generated images, conditioned on both the incident particle angle and energy from a wider range, and validated in great detail from diverse viewpoints. Finally, a culmination of the effort is to test the GAN-generated data for a practical use case.

3 Calorimeter dataset

We present a solution for the needs of future experiments with higher demands for computing resources due to increased luminosity and granularity. We, therefore, select the proposed Linear Collider Detector (LCD), designed in the context of the future Compact Linear Collider (CLIC) [32] accelerator for our study. The dataset employs the GEANT4 toolkit (G4) [33] for the generation of the simulation data for several particle types (i.e., electrons e, photons \(\gamma \), neutral pions \(\pi ^0\) and charged pions \(\pi \)) and is publicly available on Zenodo at https://zenodo.org/communities/mpp-hep.Footnote 1

3.1 Detector geometry

The basic design consideration for improving the jet energy resolution is to resolve the energy depositions of the individual particles in a jet, through a high cell granularity and precise time information. The granularity of a calorimeter is related to geometrical segmentation where smaller cell dimensions enable recording the particle shower in finer detail and thus also improving the energy resolution. Figure 1 shows the proposed detector design, highlighting the main detector concepts. The electron and positron will collide in the central region. The trackers are shown in blue. The surrounding grey region will comprise the calorimeters. The calorimeter will be highly segmented with an electromagnetic (ECAL) and a hadronic (HCAL) calorimeter.

The data used for the current work is that of the ECAL central barrel region with a dodecagonal shape [37]. This region is a cylindrical polygon with an inner radius of \(1.5~\hbox {m}\) with 25 concentric layers. The proposed granularity for the ECAL cells is \(5.1\times 5.1~\hbox {mm}^2\). The cells are arranged in the form of 25 cylindrical layers with silicon sensor planes (active), alternating with tungsten absorber planes (passive). The simulation is carried out considering the entire detector geometry, including the material in front of the calorimeter, and the effect of the solenoid magnetic field.

Fig. 1
figure 1

Schematic diagram for the CLIC detector (taken from an older version [38])

3.2 Data features

Fig. 2
figure 2

The calorimeter barycenter computation was done in global coordinates (left) and the cells were saved based on the local coordinates (right)

The energy deposits in the calorimeter cells result from the interaction of an incoming primary particle with the calorimeter material. These deposits form a characteristic shape that can be termed as an “event” or “shower”. A slice around the barycenter of each shower is saved as a 3D array of energy depositions. The slicing is carried out by taking a projection of all the deposited energy on the ECAL inner surface. The barycenter of this 2D image and the point of origin of the incoming particle are then used to compute the polar angles (\(\theta \) and \(\phi \)) corresponding to each shower. Due to the different \(\phi \) granularity for each depth layer in the ECAL, a multiplicative transformation is also applied to scale every layer to look like the innermost ECAL layer. Finally, the data is saved in the HDF5 format. Each entry in the dataset comprises the 3D array with cell energy deposits, the incoming particle energy \(E_P\), and the incidence angles \(\theta \) and \(\phi \). The energies for the incoming particles are uniformly distributed from 2 to 500 GeV and an incident angle (\(\theta \)) uniformly distributed from \(60^{\circ }\) (1.047 radians) to \(120^{\circ }\) (2.094 radians).

Figure 2 shows the particle gun position with respect to the calorimeter surface. In the global coordinate system, the Z axis lies along the axis of the calorimeter cylinder. While in the local coordinates of the shower it is perpendicular to the calorimeter surface. Similarly, other axes are also transformed to the local coordinates of each sample. The Z axis of our 3D images lies along with the detector depth, the X axis along the \(\phi \) direction, and the Y axis along the global Z axis.

The current work only focuses on replicating the selected dataset while more insight from the point of view of integration within a simulation framework can be easily incorporated for future implementations. The required inputs, the cut-off threshold for cell energies, and the validation criterion are some directions that can be explored for specific use-cases. The current work quantifies the shower direction as a single continuous variable since the aim of our work is to demonstrate the efficacy of our approach for correctly simulating the angle of incidence while reducing model complexity. We recompute the angle of incidence for a shower as a weighted mean of the 3D angles computed using the barycenter of the event and the barycenters of the XY planes for each position along Z (weighted by the position along Z). We denote this angle as \(\theta \) although there will be a small \(\phi \) contribution especially for very low energy charged particles. The 3D angle is computed from the images using the following Algorithm 1:

figure a

4 3DGAN

The HEP simulation depends on a set of variables that impact the underlying physics processes described by the simulation. Therefore, 3DGAN uses the \(E_{P}\) and \(\theta \) of a particle striking the calorimeter surface as inputs, to generate the appropriate detector response. In order to provide feedback on the correspondence between generated showers and input conditions, we exploit the concept of auxiliary tasks [39] together with domain-related constraints. The current work mainly describe the final optimized version of the 3DGAN model while additional details about the development process can be obtained from [31]. The 3DGAN is implemented using Keras 2.2.4 [40] deep learning python library with Tensorflow 1.14.0 [41] as a backend. The code is available at https://github.com/svalleco/3Dgan/tree/Anglegan.

4.1 Pre-processing

Fig. 3
figure 3

The cell energy distribution for GEANT4 MC events (red) and GAN generated events after a pre-processing by taking the power of pixel intensities: power = 0.85 (blue); power = 0.75 (green); power = 0.5 (cyan); power = 0.25 (magenta)

One of the main challenges for generating scientific data through techniques developed for computer vision lies in the inherent difference between the dynamic ranges of the pixel intensities. The pixel intensities in a typical RGB image have a range from 0 to 255 while the energy deposited in detector cells covers more than 10 orders of magnitude. Some of the previous efforts [27] map the pixel intensities to a similar range of intensities or perform normalization [23, 29], we perform only limited pre-processing. We investigated different procedures aimed at reducing the dynamic range. Initial tests conducted taking the logarithms of the pixel intensities, resulted in the generation of highly distorted images. Taking a less drastic approach we calculate the power function of pixels intensities using an exponent smaller than one. A smaller exponent results in faster convergence but greater distortion in generated images, while a larger exponent slows down convergence yet retaining image quality. Figure 3 shows how the value of the exponent (p) affected the distribution of the generated pixel intensities for the individual cells. The value of p is adjusted to an optimum value of 0.85, where a faster convergence is achieved while retaining an acceptable level of accuracy at both ends of the spectrum. The generated images are then post-processed by simply taking the inverse of the power function.

4.2 Architecture

Fig. 4
figure 4

The 3DGAN architecture, see the text for details

The 3DGAN architecture is presented in Fig. 4. The generator network implements stochasticity through a latent vector of 254 random numbers drawn from a Gaussian distribution. The generator input includes \(E_{P}\) and \(\theta \) concatenated to the latent vector. The generator network then maps the input to a layer of linear neurons followed by seven 3D convolutional layers. The discriminator input is an image while the network has only four 3D convolutional layers. Batch normalization [42] is performed after all except the first convolutional layer in the discriminator and the last two layers in the generator. The leakyRelu [43] activation function is used for the discriminator hidden layers while the Relu [44] activation function is used for the generator layers to induce sparsity. The discriminator uses dropout [45] of \(20\%\) for regularization and a single average pooling layer after the last convolutional layer since additional pooling layers result in substantial loss of performance.

The discriminator network has two trainable outputs: a sigmoid neuron predicts the \(O_{GAN}\) and a linear neuron \(O_{P}\) predicts \(E_{P}\). The other two additional outputs are simple analytical measurements: \(O_{sum}\) is the total deposited energy and \(O_{\theta }\) is the measured incident angle (geometrical angle of the shower energy depositions). These non-trainable outputs represent physics-based constraints.

4.3 Loss function

The 3DGAN loss function is the weighted sum of individual losses pertaining to the discriminator outputs and constraints. The domain-related constraints are essential to achieve a high level of agreement over the very large dynamic range of the image pixel intensity distribution. Equation 1 presents the discriminator loss related to the output \(O_{GAN}\) as \(L_G\), the loss related to the output \(O_{sum}\) as \(L_{E}\), the output \(O_{\theta }\) as \(L_{\theta }\) and the predicted \(E_{P}\) as \(L_P\) balanced by corresponding weights W.

$$\begin{aligned} L_{3DGAN} = W_{G}L_{G} + W_{P}L_{P} + W_{\theta }L_{\theta } + W_{E}L_{E}. \end{aligned}$$
(1)

The \(L_P\) and \(L_{\theta }\) both provide feedback on how well the generated images correspond to the input conditions. The loss \(L_{E}\) ensures energy conservation. \(L_{G}\) is evaluated as binary cross-entropy. \(L_{P}\) and \(L_{E}\) are implemented on mean percentage errors, while \(L_{\theta }\) as mean absolute error. The generator loss is implemented as the inverse of \(L_{G}\) together with the auxiliary losses and constraints. The weights (presented in Appendix A) are considered as hyperparameters and chosen to balance the loss ranges and their relative importance (in this case the loss \(L_G\) is given higher priority as compared to the auxiliary losses).

4.4 Training

The 3DGAN training is inspired by the concept of transfer learning. The GAN could not converge for the highly complex multivariate distribution directly thus a two-step training is applied. In order to successfully train the network, we reduce the complexity by training the GAN first for electron events having \(E_P\sim ~U(100, 200)\) GeV. After the GAN converges, the same trained model is further trained with the data from the whole \(E_P\) range of \(E_P\sim ~U(2, 500)\) GeV. The first training step exploits 137, 342 electron events. The GAN is then trained for the larger \(E_P\) range, utilizing a much larger size of training data (400 k events) from each particle type (electrons, photons, and neutral pions). The train and test losses are evaluated on the data divided in a ratio of nine to one. The first training step is run for 130 epochs (2 h per epoch on GTX 1080) while the second step is run for 30 epochs (4 h per epoch on GTX 1080). Finally, the best network is selected according to the minimum relative error for the sampling fraction (SF) related to the mapping of total deposited energy (\(E_{sum}\)) to \(E_P\), on additional validation data (20k events are filtered around specific \(E_P\) bins). This last step is aimed to further improve the accuracy for the SF.

figure b
Fig. 5
figure 5

The \(L_{G}\) losses for the generator (blue) and the discriminator (orange) converge for the first training step with restricted \(E_{P}\) range. (Left) training losses. (Middle) test losses. (Right) the discriminator output \(O_{GAN}\) for MC (red) and GAN (blue) images is similar and the discriminator is confused

The training process for each epoch is presented by the Algorithm 2. For each training iteration the discriminator is trained twice: once on a batch of real data, and next on a batch of generated data. For a balanced approach, the generator is also trained twice while freezing the weights of the discriminator. The RMSProp [46] optimizer is utilized to train the network through Stochastic Gradient Descent. Figure 5 shows the \(L_{G}\) losses associated to the discriminator (blue) and the generator (orange). It can be seen that the loss for the discriminator increases, while the loss for the generator increases till both losses are converged at around 0.6 (log(4)). At this point, the \(O_{GAN}\) output for both the data (red) and the GAN images (blue), shown in Fig. 5 right panel has similar distributions centered around 0.5. The discriminator is indeed confused and the GAN converges.

4.5 Generation time

Table 2 Inference timing for 3DGAN

The 3DGAN greatly reduces the simulation time. Table 2 compares the time taken to generate a shower using Monte Carlo and GAN. An average shower from the given \(E_P\) range can be simulated using GEANT4 in about 17 second per particle on an Intel Xeon 8180. The average inference time for GAN is around 16 msec for the same hardware. Recent advances in computing speeds are mostly directed towards parallel processing due to limitations on the transistor sizes. Deep learning models have an additional advantage of being able to exploit parallel processing on multi-core hardware, attaining further speedup. Traditional simulation approaches based on Monte Carlo involve sequential processes. Currently, it is not possible to run a full Geant4-based simulation on GPUs while GAN can exploit the distributed computing to decrease the inference time to around an average of 4 ms/particle on GeForce GTX 1080 for batch sizes 32, 64, 128, and 256. The speedup of many orders of magnitude is achieved.

5 Results and discussion

The performance assessment for GAN models is a subject of much debate and diverse viewpoints [47]. The GAN evaluation is nontrivial due to the intractable probability densities and thus is mainly sample-based and application-specific. We have validated the realism and diversity of our generated data from several independent viewpoints, such as the output of a third-party neural network and image quality assessment as presented previously [31] but the current work focuses mainly on the physics-based comparison to a Monte Carlo simulation.

The particle showers have specific characteristics due to the underlying physics processes, depending on the detector material and the type, energy, and direction of the particle initiating the shower. We validate these characteristics as a function of our inputs by dividing the data in 5 GeV \(E_P\) and 0.1 radian (\(5.73^\circ \)) \(\theta \) bins. To ensure an unbiased comparison, GAN events are generated with the same \(E_{P}\) and \(\theta \) values as the GEANT4 events. The bin-wise comparison of each physics-based feature, results in hundreds of histograms, for each particle type. We present here a selected subset of the detailed and exhaustive validation that we consider to be the most essential and representative of performance. The results presented in this section are also alternated among different bins and particle types, in order to convey the overall level of accuracy.

5.1 Visual inspection

Fig. 6
figure 6

Example of a random GEANT4 electron event (left) with \(E_P = 202.78\) GeV and \(\theta = 91.12^\circ \) vs. an event generated by GAN (right) for the same \(E_P\) and \(\theta \) values

Fig. 7
figure 7

Projections of Monte Carlo vs. GAN events on the YZ, XZ and XY planes. (Top) Electron with \(E_P= 6.87\) GeV and \(\theta = 95.86^\circ \). (Mid) neutral pions with \(E_P= 97.72\) GeV and \(\theta = 62.16^\circ \). (Bottom) photons with \(E_P= 403.62\) GeV and \(\theta = 116.33^\circ \)

An initial qualitative assessment can be performed by comparing the events visually. Figure 6 shows an example of a 3D electron event. The event on the left has been generated by GEANT4, while the event on the right has been generated by the GAN for the same input values. It can be observed that both events have very similar visual characteristics while retaining uniqueness for individual cell deposits. The graphical projections on different planes further illustrates the shower correlation to the incident angle (\(\theta \)) and energy (\(E_{P}\)). Figure 7 compares the projections of the GEANT4 showers to the corresponding GAN showers. The top panel presents electron events from the lowest limit of the \(E_P\) spectrum. The mid panel shows neutral pion events, while the bottom panel display photons events, with the \(E_P\) and \(\theta \) from different regions of the spectrum. The GAN images appear similar to the respective GEANT4 images with the deposited cell energies correlated to the input conditions while retaining stochasticity, for all particle types and input conditions.

5.2 Particle shower features

Shower shapes define the structure of the deposited energy distribution, as a shower develops through the detector material. The profiles of the energy deposition along the detector axes are important observables related to the shower geometry and crucial for most particle identification techniques. We would like to point out that these geometrical features are not included in the 3DGAN loss function as presented in Eq. 1 and are learned by the GAN implicitly. Figure 8 presents the shower shapes for the X, Y and Z axes as a function of \(\theta \) and \(E_{P}\). In order to summarize the performance for all particle types, we present a different particle in each column: the shapes for electrons in the first column from the left, the photons in the middle column, and the neutral pions in the rightmost column. The top row presents the transverse shape distribution for the Y axis corresponding to the different \(\theta \) bins (with random \(E_P\)) since the Y axis profile is most relevant for \(\theta \). The plots are displayed in the log scale to enhance the sparse distributions along the tails. The second row presents the shapes along the Z axis (longitudinal direction) in linear scale for the different \(E_{P}\) bins (with random \(\theta \)). The network is able to reproduce a similar shape distribution as the GEANT4 showers, furthermore, the network can correctly relate it to the inputs. In the transverse profiles, some discrepancies are observed in the log scale. These discrepancies occur at the volume edges, where smaller energy depositions occur. This region is also highly sparse and outside the main body of the shower with expected energies well below 0.1 MeV, which is comparable to the pedestal values.

Moments are another aspect of the shower geometry. The GEANT4 showers are all centered on the barycenter of the energy deposition by design thus the first moment (M1) defining the shower center is easily replicated by GAN. Therefore, we present here the performance related to the second moment (M2) having a more complex distribution depending on both \(E_{P}\) and \(\theta \). Figure 9 left panel presents the distribution of the second moment or the width of the shower for electrons. Here it can be appreciated that the GAN has learned the non-Gaussian width distribution. The mid plot shows the difference between internal correlation present between the shower inputs (\(E_{P}\) and \(\theta \)) and the shower features (shapes, moments, total deposition, hits, and ratios of energy deposition in different parts of the shower) for GEANT4 and GAN photons. The GAN showers are able to reproduce the internal correlations present between the different shower observables. The right panel displays the close agreement between the \(\theta \) measured from GEANT4 neutral pion events and that measured from the GAN events generated for the same \(\theta \) values.

Fig. 8
figure 8

Shower Shapes for the GEANT4 vs. GAN events as a function of inputs. (Top row) transverse shower shapes along the Y axis: (left) electrons with \(\theta \) in the \(62^{\circ }\) bin; (mid) photons with \(\theta \) in the \(90^{\circ }\) bin; (right) neutral pions with \(\theta \) in the \(118^{\circ }\) bin. (Bottom row) longitudinal shower shapes: (left) electrons with \(E_{P}\) in the 100 GeV bin; (mid) photons with \(E_{P}\) in the 200 GeV bin; (right) neutral pions with \(E_{P}\) with the 400 GeV bin

Fig. 9
figure 9

The GAN vs. GEANT4 shower features. (Left) the shower width (M2) along the Y axis for electrons (k denotes p-value for the Kolmogorov test). (Mid) difference between internal correlations present between physics features and the inputs for photons. (Right) the correlation between the measured angle from GEANT4 events and the GAN events generated for the same \(\theta \) values for the neutral pions

Fig. 10
figure 10

Shower features related to the pixel intensities (cell depositions) for the GEANT4 (red) vs. GAN (blue) events. (Left) sparsity level as a function of cutoff threshold for electrons. (Mid) distribution of cell energy deposits for photons. (Right) sampling fraction for neutral pion showers

Fig. 11
figure 11

The GAN could successfully learn the photon features after further training a network generating electrons with reduced \(E_P\) range. (Left) the SF for photons is lower than electrons both in GEANT4 events and GAN events. The bottom panel shows the relative error (\(\delta _{SF}\)) for the sampling fraction. The k denotes the p-value from the Kolmogorov test. (Right) the distribution of the energy fraction deposited in the first part of the shower (R1) for the electron and photon showers. The distribution for \(R1 < 0.1\) (zoomed in the inset) has very different distributions for electrons and photons. The GAN also demonstrates a similar behavior

The energies deposited in detector cells are the pixel intensities of our images. The images are mostly empty, centered around a shower. The energies are deposited only in around \(20\%\) of the cells. Figure 10 left panel shows the level of sparsity (S) as the fraction of cells with non-zero energy deposition (\(1-S\)) against the threshold used for cutoff. The GAN images have similar sparsity distributions as the GEANT4 events, without specifically constraining the image. The distributions for cell energy depositions (photon) shown in the mid panel have a similar shape for GEANT4 and GAN events. We had reported that a sharp, vertical drop around 0.2 MeV was present in the GEANT4 cell energy distribution that the GAN could not learn, yet tried to smooth out in the best manner [18]. Since then a recent work [28] also recognizes this feature and tries to replicate the effect by additional post-processing on the generated images. They note that their network improves the performance on the simulation of the correct pixel distribution at the cost of reduced performance for other features. We also report a reduction in performance when constraining the pixel intensity distribution through our loss function. We believe in the future the concept of ensembling [48] can be used to employ two networks to generate the pixels above and below this region. The sampling fraction (SF) is an important characteristic of the detector response. Figure 10 left panel presents SF for neutral pions. The SF is presented as a function of \(E_{P}\). There is a close agreement for most of the input range with some discrepancies at low energies, where events are highly sparse with low cell energy deposits.

The characteristic features for the electromagnetic showers are faithfully reproduced in the GAN-generated showers for different particle types. We test if the generated showers for different particle types are mutually distinguishable through their corresponding features. This is crucial as the final networks for all particle types, use the same initial weights trained to generate electron-induced showers for a reduced energy range. The photon-initiated showers have some minor differences from electron-initiated showers. The photons penetrate more distance into the detector material before starting to interact [49, 50]. This effect can be evaluated by studying the SF and the fraction of the total energy deposited in the first part (8 cells along the Z axis) of the shower. Figure 11 left panel shows the profiles of SF as a function of \(E_P\) for the central region of the spectrum. The SF for photons is lower than electrons for a similar value of \(E_P\). The right panel compares the distribution of the energy deposited in the first part (first 8 layers) of the shower for electrons and photons. The photons present more entries for the region where the fraction of the total energy deposited in the first part of the shower (\(R1= E_{1-8}/E_{sum}\)) is less than \(10\%\). The GAN-generated photons clearly demonstrate these identifiable features.

5.3 Some empirical tests for GAN evaluation

Fig. 12
figure 12

Projections of random Mote Carlo events vs. the closet GAN event among the 2000 events generated for the same \(E_P\) and \(\theta \). (Top) \(E_P= 101.63\) GeV and \(\theta = 115.58^\circ \). (Mid) \(E_P= 196.4\) GeV and \(\theta = 92.17^\circ \). (Bottom) \(E_P= 402.68\) GeV and \(\theta = 63.32^\circ \)

Fig. 13
figure 13

Most similar pairs of electron events among the 2000 events generated by GAN for the same \(E_P\) and \(\theta \). Projections on the YZ, XZ and XY planes. (Top) \(E_P= 104.73\) GeV and \(\theta = 116.64^\circ \). (Mid) \(E_P= 200.68\) GeV and \(\theta = 91.57^\circ \). Bottom \(E_P= 403.36\) GeV and \(\theta = 64.48^\circ \)

Fig. 14
figure 14

Average of 200k events for the Monte Carlo vs. GAN presented as projections on the YZ, XZ and XY planes

Some important criteria for the evaluation of the GAN model involve studying GAN failure phenomenon, like mode collapse, memorization, and over-training. We evaluate the 3DGAN model through visual assessment of the projections of the GAN-generated events, as described in Sect. 5.1. The validation is inspired by tests based on the Birthday Paradox, as suggested for GAN evaluation [47]. The distance between the GEANT4 image and the corresponding GAN-generated image is measured in terms of Euclidean distances in the pixel space since the data set is similarly aligned and centered around the shower barycenter. The 3DGAN conditions the generated images on inputs (\(E_P\) and \(\theta \)) and thus the comparison takes the conditioning into account by sorting the data in bins based on the inputs, similar to comparisons conducted in Sect. 5.2. The training dataset has random inputs sampled from a uniform distribution while in the case of GAN any number of images can be generated for a particular set of inputs (Fig. 12).

The first test that we conduct aims to identify any overfitting or memorization of the training data. The test utilizes the training data as the aim is to understand if the GAN is simply memorizing data samples. The test is performed with images from 50 GeV, 100 GeV, 200 GeV, 300 GeV, 400 GeV, and 500 GeV bins, further sorted for \(\theta \) equal to \(62^{\circ }\), \(90^{\circ }\), and \(118^{\circ }\) (a total of 18 bins). We randomly sample 10 images from each selected bin of the GEANT4 training data and then generate 2k events using GAN corresponding to each GEANT4 image. The distances between the GEANT4 image and the corresponding 2k GAN events are computed. The GEANT4 images are visually compared to the 10 closest GAN images and none of the images are found to be duplicates. Thus indicating that the GAN is not memorizing data images and is able to generate random samples from a similar distribution. Figure 13 compares examples of some GEANT4 images against the closest GAN images for a few bins. It can be seen that the GAN images are distinct from the respective GEANT4 images and cannot be regarded as copies or even near-copies. The same performance is observed for all bins. We also compare each set of the 2k GAN images against each other. Duplicates in the GAN-generated images would indicate a loss of diversity or over-fitting. Figure 12 shows examples of the most similar GAN images generated for the same inputs. The images appear unique, with a distinct pattern of energy depositions and we do not observe any duplicates. The above tests indicate that not only is GAN not memorizing the data but also generates diverse images conditioned on the same \(E_P\) and \(\theta \).

Fig. 15
figure 15

Shower Shapes for GEANT4 vs. GAN charged pion events along X (left), Y (mid) and Z (right) axes with random \(E_P\) and \(\theta \)

Fig. 16
figure 16

Shower moments along Y axis for GEANT4 vs. GAN charged pions. (Left) M1Y (shower center). (Mid) M2Y (shower width). (Right) M3Y (shower skewness)

Figure 14 compares the average of 200k GEANT4 electron events with 200k GAN events. Cells always denoting zero energy deposition would indicate mode collapse where pixels are stuck at zero. This behavior is not observed except for a few pixels along the transverse edges of the volume where the sparsity is very high and depositions are quite small. Some differences in the distributions can also be related to the differences in the cell pixel intensity distributions for low energies as observed in Fig. 10 middle panel. The effect is most pronounced for the low depositions occurring near the volume edges. The distributions for the GAN-generated images have a slight loss in the diversity for this region but no significant mode collapse is observed.

6 Simulating charged pions

The charged pions deposit a much smaller part of their energies in the ECAL while most of the energy is deposited in the HCAL. The current project is only limited to the ECAL data due to the limitation of the computing resources, thus the work for charged pion is only a preliminary study. A more accurate approach will also need to incorporate data from HCAL. We will present the results of our study to lay the foundation for any future work.

The transfer learning approach could not be extended to charged pions as the showers have very different distributions. The GAN is trained for the full \(E_P\) range of the charged pions, directly from random weights, for about 200 epochs using (300k) events. There is a great diversity in the charged pion events, and most of the events have low energy deposition in the ECAL. The data also contains some spurious events with little or no energy deposition. Therefore, data is subjected to a threshold of 0.2 GeV for the total energy deposited in the event aimed at removing these events. This rejection also results in removing low \(E_P\) events. Future efforts involving HCAL might alleviate the need for this cut.

Since visual inspection will not be helpful to understand the performance, due to high variance in the showers thus we present only the distributions of physics-based features. Figure 15 presents the overall shower shapes in log scale. The shape distributions along the X and Z axis show slightly better performance as compared to the Y axis, probably due to the higher variance present in this dimension. Figure 16 compares the distribution of the first three moments defining the shower center, width, and skewness along the Y axis, display similar distributions.

Fig. 17
figure 17

Shower features for GAN vs. GEANT4 charged pion events. (Left) SF as a function of \(E_P\) with shaded area representing standard deviation. (Mid) pixel intensities with log yscale. (Right) difference between internal correlations among shower features

Figure 17 left panel displays the sampling fraction for GEANT4 and GAN charged pion events. It can be seen that there is a difference between the means of the two distributions, while the shaded area representing the standard deviation shows some overlap. The mid panel presents the distribution of the pixel intensities that are energies deposited in the calorimeter cells. There is a deterioration in performance as compared to other particles, particularly the effect of the presence of a cut around 0.2 MeV (see Sect. 5) is more pronounced. The left panel presents the difference in internal correlations between physics-based features like shapes, moments, hits, \(E_P\) and \(\theta \) for the GEANT4 and GAN events. There is less than \(10\%\) error for the correlations, even given the highly diverse and incomplete showers. In order to convey an idea of the diversity present in these showers, we investigate the barycenters of the shower energy depositions along the longitudinal axis (Z) for electrons and charged pions, as a function of \(E_P\). Figure 18 compares the first moment for the GEANT4 and GAN events. The charged pion showers show greater diversity and depth as compared to electrons, also apparent in the GAN events.

7 Rare modes

The discriminator assigns a higher probability of being realistic to images exhibiting features that the GAN cannot reproduce correctly . We visually investigate such events having \(O_{GAN}\) value greater than 0.6, with the help of graphical projections. It is observed that most of these events manifest rare modes in data like pre-showering, late showering, and incorrectly centered events. As other modes are found to be even rarer, thus only the early showering events are further investigated. Figure 19 top row presents an example of a GEANT4 pre-showering event. The particles that start depositing their energies before entering the calorimeter volume have multiple particles striking the detector surface, thus resulting in multiple branches. Only a few percent of the Monte Carlo samples present such behavior and the percentage decreases with increasing \(E_P\).

Figure 19 bottom row presents an example of a pre-showering event generated by GAN. It must be mentioned here that in the first training step more such events are generated. The performance for these rare modes deteriorates with further training for the full \(E_P\) range due to a decrease in the percentage of such events for higher energies. In order to further improve the performance, methods like the ensembling [48] can be explored.

Fig. 18
figure 18

The distribution for the first moment along the Z axis (Mz1) for electrons and charged pions. The barycenter of deposition along the Z axis shows very different distributions for both particles with the charged pion showers starting much later. The p-value from Kolmogorov test is represented as k

Fig. 19
figure 19

Graphical projections for the XY and XZ planes for MC events with high discriminator probability of being real images; (top) GEANT4 event; (bottom) GAN event

Fig. 20
figure 20

Performance of the network trained on GEANT4 electrons and charged pions (left) vs. that of the network trained GAN electrons and GEANT4 charged pions (right). The profile plot for true and predicted \(E_P\) for the GAN and Monte Carlo samples are very similar. There were 9834 events for each type

8 Training with GAN data

We present a practical use case for the GAN-generated events. Triforce [19] is a deep learning model developed by a third-party study for the identification of particle type and primary energy for particle showers from the calorimeter dataset used for 3DGAN. We had previously employed the pre-trained Triforce DNN model for classification and regression of the GAN events [18] and proved that the type and the primary particle energy for the GAN-generated events was correctly predicted.

Triforce can also be considered as an example of a typical reconstruction tool used in HEP simulation. We test the performance of our 3DGAN generated images for training this tool. The Triforce requires two types of particles (electrons and charged pions) for training. We train the Triforce GoogleNet model from scratch on GEANT4 electron events and then on GAN electron events. The charged pion events in both trainings are those generated by the GEANT4. Figure 20 presents a comparison for the primary particle energy regression for the network trained on GEANT4 against the network trained on GAN electrons showing a similar performance. The particle type classification accuracy presented in Table 3 also manifest similar values. Thus we show that the GAN simulation can be used to replace the GEANT4 simulation without any loss in the classification accuracy, for the specific use-case.

In the context of GAN evaluation, the GAN-train and GAN-test are two very interesting concepts [51]. The accuracy of a classifier network trained on GAN generated events and tested on data events is termed as GAN-train. When GAN images are high quality and as diverse as the training set, the score on the validation set should be similar to training accuracy. A lower accuracy would indicate that GAN images are not covering the entire distribution of the training data. GAN-test is the accuracy of a network trained on true data and validated on GAN images. A lower accuracy would indicate that the GAN images are not sufficiently realistic while a higher accuracy could be related to mode dropping. The 3DGAN shows similar performance as the GEANT4 training data and thus the results of this test can also be regarded as proof of high accuracy and diversity of the GAN generated events.

Table 3 Triforce classification results

9 Conclusions and future suggestions

Table 4 Hyperparameters for 3DGAN fully optimized version

Simulation is crucial for most HEP experiments. Monte Carlo methodology can successfully simulate particle interactions at the cost of time and resources. Fast simulation is a set of faster alternatives that have been successfully used to replace detailed simulation where some loss in accuracy can be acceptable. Recent advances in deep learning have also had an impact on the HEP community and many interesting applications including simulation, were attempted. Deep networks can potentially simulate raw detector data without requiring the formulation of the internal processes. The 3DGAN is an effort aimed to simulate the detector output, as images generated by a neural network. These images are conditioned on a number of variables, having large dynamic range of values. We exploit a multi-step training process, resulting in accurate simulation for electrons, photons, and neutral pions. The accuracy for individual data features varies but is within \(10\%\) of the GEANT4 simulation for all quantities. Preliminary work on charged pion simulation is also promising where the ECAL contains only partial showers, manifesting higher variance. The GAN is able to reproduce the essential features of a charged pion shower. Another exploratory study demonstrate the possibility of generating rare modes present in the data. We further provide an example of a practical use of the GAN-generated events. The GAN simulated events are able to train a third-party particle classification and regression tool for the correct classification of the GEANT4 events. The response of the tool trained on GAN data is similar to that trained ones GEANT4 data. Finally, we would like to state that the GAN-generated showers are simulated with a speedup of three orders of magnitude.

We would like to point out some insights from our endeavour that maybe helpful in the context of any future work. For the current application certain domain-related features like the total deposited energy need to be hardcoded in the loss function while some other features can be learned implicitly. These features include geometrical properties such as shapes and moments, level of sparsity, pixel intensity distribution, and correlations among the different features as well as the inputs. The model can learn complex distributions for individual features and the only part where the GAN struggles is reproducing the sharp drop in pixel intensities as discussed in Sect. 5. Apart from that, the sparse peripheral regions of the images are more difficult to be correctly generated, and there is some loss in performance for very low \(E_{P}\) particles. The current training allows learning of an average response for the total deposited energy since the training relies on direct comparison for small data batches. We believe that a better formulation of the loss might result in better agreement for future efforts. The preliminary work on charged pion and rare mode simulation shows great promise. The charged pion simulation can be improved by including the HCAL data. The generation of rare modes can benefit by exploring methods like the ensembling approach, where multiple networks can be trained simultaneously for different modes present in the data. The current effort proves that GAN can generate events conditioned on multiple continuous variables that can be further adapted according to the requirements of integrating into a practical simulation. The inputs needed for conditioning the showers can be studied in detail. A cutoff threshold for cell energies can be identified, as well as, the most crucial tests for validation. The speedup can also be further increased by exploiting parallel hardware [52] that cannot yet be done for the sequential logic employed by the standard Monte Carlo tools. A distributed training [53] will be most essential for future generalization of the approach through hyperparameter scan.