1 Introduction

General-purpose high energy collider experiments are designed to measure both charged particle trajectories and calorimeter clustered energy deposits. The charged particle tracks in a magnetic field and the topology of energy deposits in calorimeters provide most of the information necessary to reconstruct, identify and measure the energy of the particles that constitute the event, which for the most part are charged and neutral hadrons, photons, electrons, muons, and neutrinos. The latter escaping detection and are measured by the imbalance in momentum in electron-positron collision events or transverse momentum in hadron collision events. Other particles created during a high energy collision, having too short lifetimes to be directly detected in the experiment, need to be reconstructed from their decay products.

The goal of Particle Flow (PFlow) algorithms is to make optimal use of these complementary measurements to reconstruct the particle content and its energy response for the entire event. A precise reconstruction of the entire event is essential for the measurement of those particles, such as neutrinos, escaping detection, as well as the reconstruction of jets of particles originating from the fragmentation and hadronization of hard scattering partons. One challenging aspect of PFlow algorithms is to disentangle particles of different nature when they are close to one another and possibly overlap. The reconstruction performance in general and the performance of PFlow algorithms, in particular, will critically depend on the detector design specifications, as for instance, the size and magnetic field intensity of the tracking volume, the granularity of the calorimeters, and their energy resolution. The first PFlow algorithm was designed by the CELLO collaboration at PETRA [1], where an optimal reconstruction of the event “Energy Flow ” was measured by subtracting the expected energy loss from charged particles in the calorimeter, to estimate the “neutral energy” and its spatial distribution. This algorithm, developed in \(e^+e^-\) collisions, was aimed at a precise measurement of the hadronic activity for the measurement of \(\alpha _S\). Since then, PFlow algorithms relying on the parametrization of the expected charged energy deposits in calorimeters have been further developed at \(e^+e^-\) [2] and pp [3, 4] collider experiments. The success of these algorithms has been such that future \(e^+e^-\) collider experiment projects are taking PFlow algorithms into account in the design of the projected detectors [5,6,7,8,9].

In this paper, we explore the capabilities of computer vision algorithms, along with graph and deep set Neural Networks (NN), to provide solutions to this complex question in a context of two overlapping particles, a charged and a neutral pion \(\pi ^{+}\) and \(\pi ^0\). This benchmark is highly representative in hadron collisions or jet environments in electron-positron collisions and served for instance as foundation to develop and tune the PFlow algorithm of the ATLAS experiment [3]. It is the most common scenario in high energy hadron collisions where most of the event particle content originates from parton fragmentation and comprises typically of charged and neutral pions. The focus of this paper is the precise reconstruction of a “neutral energy” calorimeter image as a starting point for more elaborate particle flow and particle identification algorithms. In addition, images with an augmented output granularity are produced using super-resolution [10] techniques. The ability to provide images with finer granularity than the native segmentation of the detector is a very interesting possible intermediate step in particle identification and is also relevant in projecting the design granularity of future detectors.

Image-based convolutional NN have been used to study calorimeter shower and also used for hadronic jet tagging, as can be found Refs. [11,12,13,14]. These approaches are based on uniform two or three-dimensional (2D or 3D) images. The more intricate case of varying granularity layered images has been recently addressed using graph NN in Ref. [15]. The hadronic showers with varying granularity layers are studied in the context of a future circular collider in Ref. [16]. Direct reconstruction of individual particles from detector hits has been investigated using segmentation of particles through edge classifiers in the context of tracking [17], as well as direct segmentation and property determination using object condensation techniques [18].

The approach adopted in this paper does not attempt the complete leap from the detector signals to fully identified high-level Particle Flow objects. It is a first intermediate step focusing on the reconstruction of a precise calorimeter image of the neutral particles in the event, using several NN architectures, including Convolution Neural Networks (ConvNet, UNet), Graph Neural Networks (Graph), and Deep Sets (DeepSet). These calorimeter images are based on a detailed full Monte Carlo simulation, thus providing a precise benchmark to probe the performance of the proposed methods. To probe the performance of super resolution in a typical detector configuration, a native lower granularity is chosen in most layers of the calorimeters.

2 Detector model and simulation

Two calorimeter layouts are considered in this paper, both based on the same underlying detector, but with different granularities. The two layouts will be referred to as High Granularity (HG) and Low Granularity (LG). The HG detector is used to study the separation of charged and neutral energy components while the LG detector is used as a benchmark for super-resolution applications.

The experimental setup is composed of three sub-detectors. The electromagnetic calorimeter (ECAL) and hadronic calorimeter (HCAL) which are fully simulated, and a parameterized simulation of a charged particle tracker. Each calorimetric sub-detector consists of three layers, with varying granularity, placed subsequently one after another.

Both, the ECAL and the HCAL are sampling calorimeters where for simplicity and to produce a realistic shower development each layer is modeled as a homogeneous calorimeter using an equivalent molecule corresponding to a mixture of the absorber and scintillator materials. .

The simulated energy deposits in each layer are then smeared to reproduce the corresponding sampling energy resolution. For the ECAL, the absorber material is lead and the active material is liquid argon, mixed with a mass proportion of 1.2:4.5, yielding an equivalent radiation length of \(X_0~=~3.9\) cm. For the HCAL, the absorber material used is iron with polyvinyl toluene plastic material used as a scintillating active material. These are combined with a mass proportion of 4.7:1.0 yielding an equivalent interaction length of \(\lambda _{int}~=~17.4\) cm. The choices of the detector geometry, material and smearing parameters are tuned to replicate single-particle energy responses similar to the one obtained with the ATLAS detector.

Overall the total length of the detector is 227.5 cm, including a 1 cm gap between the ECAL and HCAL blocks. The lateral profile of the calorimeters are squares of \(125 \times 125\) cm\(^2\). The main characteristics of the calorimeters are summarized in Table 1.

Table 1 Material budget of the ECAL and HCAL calorimeters as well as the corresponding equivalent radiation \(X_0\) and interaction \(\lambda _{int} \) lengths
Table 2 Transverse segmentations for both the HG and LG layouts (the total transverse dimension is \(125 \times 125\) cm\(^2\)) of the ECAL and HCAL individual layers and the corresponding simulated electronic noise per cell for the HG detector are shown. The noise for the LG detector is appropriately scaled up by a conversion factor (cf) while transiting from HG to LG detector

The transverse granularities of each layer for both the HG and LG layouts are indicated in Table 2. An event display showing the deposited charged and neutral energy components for the HG detector is shown in Fig. 1. A similar event display for the LG detector is shown in Fig. 2. A higher granularity detector composed of layers with \(32 \times 32\) resolution is also shown on the same figure. It appears evident how the second layer of the LG detector is not able to separate between the two photon clusters in contrast to the \(32 \times 32\) layers.

3 Simulated data

The electromagnetic showers of the photons from the subsequent decay of the neutral pion and the hadronic shower of the charged pion are simulated using GEANT 4 [19] using the detector layout described in Sect. 2. Electronic noise in the calorimeter is also taken into account.

Fig. 1
figure 1

A 3-D display the energy deposits of \(\pi ^{+}\) and \(\pi ^{0}\rightarrow \gamma \gamma \) in the LG calorimeters (left) and of \(\pi ^{0}\) only (right). The \(\pi ^{+}\) track and its extrapolation to the calorimeters is also displayed. Cells, where the fraction of deposited energy from the \(\pi ^{+}\) is dominant are illustrated in red. The neutral energy deposits originating from the \(\pi ^{0}\) are otherwise illustrated in green. The extrapolation of the \(\pi ^{+}\) trajectory is also indicated

Fig. 2
figure 2

A 3-D display the energy deposits of a \(\pi ^{+}\) and \(\pi ^{0}\rightarrow \gamma \gamma \), in the LG calorimeters (left), of the \(\pi ^{0}\) only (middle) and of the same \(\pi ^{0}\) only shower as captured by a HG calorimeter layer of \(32 \times 32\) granularity where the deposits from the two photons are resolved (right). The \(\pi ^{+}\) track and its extrapolation to the calorimeters are also displayed

To ensure significant overlap between the charged and neutral hadron, the polar angle of the \(\pi ^{+}\)/\(\pi ^{0}\) momenta varies randomly between \(\pi / 100 \) to \(3 \pi / 200\) radians, whereas the azimuthal angle varies uniformly between 0 and \(2 \pi \) radians with a relative separation of \(\pi / 60\) radians. The \(\pi ^{+}\) and \(\pi ^{0}\) are generated using the GEANT particle gun functionality. The source of the gun is located 150 cm away w.r.t. the first ECAL layer. To populate different parts of the detector, the initial location of the neutral and charged hadron in the event is randomly chosen from the corner of a square at the source location with a length size equal to 20 cm. Four sets of independent simulations are run with different energy ranges of: 2–5 GeV, 5–10 GeV, 10–15 GeV and 15–20 GeV. The energy of the generated charged and neutral pions are randomly sampled from a uniform distribution bounded within these ranges, without any correlation among the pions energies. The generation parameters of the particle gun ensure that a large proportion of detector cells have a significant amount of energy overlap, originating from the individual showers. The average fraction of neutral energy within groups of clustered cells, referred to as topoclusters (see Sect. 5), is around 60%. The effect of electronic noise is emulated using gaussian distributions centered at zero with variables widths for different layers. The per cell levels of noise in each layer are given in Table 2. For each cell in the event an energy is sampled from these distributions and added to the total energy.

To cross-check the effect of energy boundary transition, a sample with \(\pi ^{+} \,, \pi ^{0}\) energies randomly varied between 2–20 GeV was also produced.

Finally, a track is formed by smearing the \(\pi ^{+}\) momentum by a resolution \(\sigma (p)\), given by \(\frac{\sigma (p)}{p} = 5 \times 10^{-4} \times p~[GeV]\), and keeping the original \(\pi ^{+}\) momentum direction unchanged. The chosen momentum resolution of \(\pi ^{+}\) emulates the track resolution of the ATLAS tracking system and track reconstruction algorithms [20]. The smearing of the track direction is neglected as it is expected to have sub-dominant effects to the results presented in this document.

4 Deep neural network models

The target of the NN models is to regress the per-cell neutral energy fraction using deep learning methods to yield an accurate image of the neutral energy deposits. Two main approaches were investigated depending on the granularity of the target detector: a standard scenario where the granularity of the inputs and output images is unchanged, and a super-resolution scenario where the target detector features higher granularity layers compared to the input detector. For both, various state of the art NN architectures were implemented and compared.

For the standard scenario, the loss function is designed to regress the neutral energy fraction of each cell in the event, with a larger weight assigned to more energetic cells, to reduce the effect of noise and simultaneously enrich the performance of high energetic cells originating from the pions. The same loss function is used to train the different models and defined on an event-basis as follows:

$$\begin{aligned} L_{event} = \frac{1}{E_{tot}} \sum _c E_c (f^c_t-f^c_d)^2 \end{aligned}$$

where \(E_{tot}\) is the total energy collected by the six calorimeter layers, \(E_c\) is the total energy of a given cell indexed by c, \(f^c_t\) and \(f^c_d\) represent the target and predicted energy fractions.

Similarly to the previous case, the loss function used for super-resolution is built to regress the fraction of neutral energy of each super-cell (\(f^{sc}\)) with respect to the energy of the corresponding standard cell :

$$\begin{aligned} L^{\text {super-res}}_{event} = \frac{1}{E_{tot}} \sum _{c} E_{c} \sum _{s=0}^{us^2} (f^{sc}_t-f^{sc}_d)^2 \end{aligned}$$

where \(E_{c}\) is the total energy of the standard cell and s is an index running over the super-cells belonging to a standard cell c. Here \(f^{sc}_t\) and \(f^{sc}_d\) represents the target and predicted neutral energy fraction for the super-resoluted cell. The upper limit on the nested sum is the Up-Scale (us) factor used to define the high-resolution image. As an example, in this work we consider us = 4 and thus each cell of the LG detector is up-scaled to 16 cells in the high-resolution image.

For some of the NN models, a different loss function was also studied and provided similar results:

$$\begin{aligned} L^{2}_{event} = \frac{1}{N_{sc}} \sum _{sc} (e^{sc}_t-e^{sc}_d)^2 \end{aligned}$$

where \(e^{sc}_t\) is the absolute value of the target neutral energy in the super-cell sc and \(e^{sc}_d\) is the corresponding predicted neutral energy. \(N_{sc}\) is the total number of super-cells in an event that belongs to a topocluster.

All the networks are trained with the Adam optimizer [21] and a learning rate of \(10^{-4}\). The training dataset consists of 80,000 images while the validation dataset has 20,000 images. The performance of the trained models are evaluated on a test dataset consisting of 6000 images. The relative difference between the predicted neutral energy and truth neutral energy in the test sample serves as the baseline metric.

In the following subsections, we describe the individual network architectures for standard resolution networks, where the input and output images have the same granularity, and for super-resolution networks where the output images have higher granularity compared to the inputs.

4.1 Networks used for standard resolution

4.1.1 Convolutional network

Calorimeter images from the total energy deposit in the calorimeter cells in each layer are formed. The layers correspond to image channels (similarly to RGB in standard images), each having a different resolution due to variable granularity of the calorimeter layers. To perform an image recognition analysis, each of the image channels (with a granularity given in Table 2) is first mapped to a uniform resolution of \(64 \times 64\) pixels. This mapping is performed by an individual NN block referred to as “UpConv” block. The mathematical operations of the UpConv block is described in Eq. (A.1). The track is represented by the track layer, a single channel \(64 \times 64\) pixel image. The image has only one non-zero pixel which contains the value of the track momentum. The location of this pixel in the image is determined from the position of the \(\pi ^{+}\) impact on the first ECAL layer surface. The track information is crucial in the design and performance of particle flow algorithm as it provides invaluable information to estimate the expected charged energy deposits as well as the position of the shower.

The track layer is combined with the six Up-Convoluted calorimeter images to form a seven-layer image that is fed to a NN block consisting of convolutional layers, termed “ConvNet” block. The output of the ConvNet block is a six images with a uniform granularity of \(64 \times 64\) pixels. Each image is then mapped to its corresponding native granularity through a down-convolutional learnable NN, denoted “DownConv” block. More details about the UpConv and DownConv blocks can be found in Appendix A.

4.1.2 UNet with ResNet

An alternative to the image convolution based network described above in Sect. 4.1.1, is the encoder-decoder network (UNet) [22]. Similarly to the ConvNet architecture the UNet block is introduced after the UpConv blocks which produce 6 uniform \( 64 \times 64\) images forming, with the addition of the track image, the inputs to the UNet block. The encoder part consists of four ResNet blocks (a 2D convolution layer with a feed forward operation), followed by a bottleneck layer and four decoder blocks which perform the 2D transpose convolution to expand the image size. There are also skip-connections between the output and input of encoder and decoder blocks with similar tensor shapes. In our network, the output of the UNet block is a \(6 \times 64 \times 64\) image which require to be mapped to the native resolution images through DownConv blocks.

4.1.3 Graph network

The two networks described in Sects. 4.1.1 and 4.1.2 are well suited for calorimeters with geometries with cell boundaries that can easily be mapped to common “UpConv” blocks. For calorimeters with irregular geometries, or where cell sizes across layers are not related through multiplicative factors, these networks are not ideal. Graphs do not have this limitation, as discussed in Ref. [23], and thus provide the most natural representation of the event. The energy deposits in the calorimeter cells along with the tracks in the event have a natural representation of a point cloud. Each calorimeter cell has a 3D position coordinate (xyz) respectively. The z coordinate of a calorimeter cell is defined as the mid-point along z axis, of the corresponding calorimeter layer. The transverse location \((x_{tr}, y_{tr})\) where the track hits the ECAL1, denoting the point of entrance of the charged particle in the calorimeter volume where it will start developing its hadronic shower, is assigned to one node of the GNN.

The Graph Network model is based on a Dynamic Graph CNN, similar to that used in Ref. [24]. A graph is formed dynamically at each iteration of the message passing using a KNN [25], with a default choice of \(K = 10\). For the initial graph, each node has four embedding features: three spatial coordinates and one energy. In the chosen architecture, in addition to the spatial coordinates, the energy information is added. However spatial coordinates and energies are not directly combined. More details on graph network architecture can be found in Sect. 9.

4.1.4 Deep set network

The point cloud description, as explained in 4.1.3, can be represented as a permutation equivariant (PE) set. Each element of the set has four features (three coordinates and one energy parameter). In the Deep Set (DS) formalism, described in Ref. [26], the forward passing through a DS layer is done through a PE operation :

$$\begin{aligned} f({\mathbf {x}}) = \sigma \Big ( \gamma ~ \mathrm {I} ~{\mathbf {x}} - \lambda ~ mean({\mathbf {x}} ) ~\mathrm {I}~ \Big ) ~ , \end{aligned}$$

where \(\gamma \) and \(\lambda \) are MLP’s . \(\sigma \) is the activation function which we choose to be a \(\tanh \) function.

We design the DS layer to have two parameters P and Q which are the dimensions of input and output features, respectively.

In our DS model, there are a total of four DS layers, followed by an MLP. The (\(P ~\,, ~Q\)) values for these four layers are (4, 6), (6, 12), (12,8), (8, 4), respectively. The output of each DS layer is concatenated to one large vector which is then passed through the final MLP block to get the target fractions on each node of the point cloud.

4.2 Networks used for super-resolution

To solve the super-resolution task three approaches are used, based on graph networks, convolutional networks and a hybrid network composed of graph and convolutional networks.

4.2.1 Pure graph network

The graph only network approach to the super-resolution problem is slightly different from that used for the regular energy flow problem and requires a dual architecture with two main blocks connected through a broadcasting layer. The first block (GNN1), takes as an input the graph from the LG data and performs node and edge update operations [23, 27]. The broadcasting layer takes each node from the output of the GNN1 block and replicates it by the corresponding upsampling factor, from which a second block (GNN2) performs node and edge update operations. The output is then used to compare the loss function to the target graph, made of a higher number of nodes.

4.2.2 Graph UNet network

The dynamic graph network and UNet are described in Sects. 4.1.3 and 4.1.2 respectively. For the super-resolution task, we combine these two networks to form a hybrid network Graph-UNet.

The graph network takes as an input the whole event graph and tries to predict the neutral energy per-cell of the topocluster for LG data. The UNet takes this prediction as input and upscales it via transpose convolution operation to match the granularity of the super-resolution image. An L2 loss function is then used to optimize the whole network and predict neutral energy for the super-resolution cells.

4.2.3 Convolutional UNet network

The UNet network described in Sect. 4.1.2, by construction, maps the input LG image to a high granularity image which is then scaled down through a DownConv block. Here we omit the DownConv block and directly try to regress the output of the UNet network to the target.

Fig. 3
figure 3

An event display showing the individual components of charged and neutral energy per layer along with the noise distribution for the first four calorimeter layers. The the extrapolated track position is shown by the red cross for the first three columns. The energy of the initial \(\pi ^{+},\pi ^{0}\) ranges between 2–5 GeV. The total energy also includes noise contribution and topoclusters boundaries are outlined in white. The fourth and fifth columns shows the predicted neutral energy distribution from for the ConvNet and pPflow algorithms, respectively. In this example, it is seen that the electromagnetic shower from \(\pi ^{0} \rightarrow \gamma \gamma \) decay has been correctly predicted by the trained model. The trained model also learns to suppress the noise pattern

5 Parametric algorithm implementation

To quantify the performance of the designed NN algorithm, we compare the performance of the above mentioned NN algorithms to a traditional ATLAS like parametrized PFlow (pPflow) algorithm.

The pPflow algorithm is divided into two separate steps: (i) the topocluster formation and (ii) the expected charged energy subtraction. The implementation of both steps is inspired by the PFlow algorithm currently used by the ATLAS experiment [3].

Fig. 4
figure 4

(left) Distribution of the predicted cell neutral energy as a function of the truth deposited neutral energy for the entire energy range of interest. (right) Topological cluster level relative energy residuals \((E_{predicted}-E_{Neutral})/(E_{Neutral})\) in specific energy ranges. The blue lines denote the performance from trainings which are done on the specific energy ranges. The red curves show the performance of a unique training on a sample produced covering the entire energy range

The Topological clustering algorithm groups cells based on their energies and topological location. The algorithm is designed to cluster cells originated by a single neutral or charged particle as well as to suppress noise. The algorithm starts by ranking cells based on their significance over the corresponding nominal noise values. Cells with a significance larger than five \(\left( \frac{E}{\sigma } \ge 5\right) \) are considered as seeds and a topological search is performed on their adjacent cells in the longitudinal direction and on their adjacent and next-to adjacent cells in the transverse plane. If one of the clustered cells has a significance \(\frac{E}{\sigma }\) \(\ge \) 2 , an additional clustering step is performed. If a seed cell is found to be adjacent, within two cells to another topocluster, the two topoclusters are merged. The closest topocluster to the extrapolated track position is considered to be matched to the track.

The expected charged energy is estimated using a parametrisation of the energy deposited by a \(\pi ^{+}\) within the matched topocluster, referred to as \(\langle E_{\text {pred}} \rangle \). This parametrisation is computed from template distributions obtained using a pure sample of \(\pi ^{+}\) without contamination of \(\pi ^{0}\) and it is dependent on the track momentum and the estimated Layer of First Interaction (LFI, where the first nuclear interaction takes place). The track position is extrapolated to the calorimeter layers and rings of radius equal to a single cell pitch are built. The rings are then ordered according to their expected energy density. The ring energies are progressively subtracted from the topocluster in decreasing order of energy density. The algorithm proceeds until the total amount of removed energy exceeds \(\langle E_{\text {pred}} \rangle \). If the energy in the ring is larger than the required energy to reach \( \langle E_{\text {pred}} \rangle \), the energy in that ring is not fully subtracted but scaled to the fraction needed to reach the expected energy released by the \(\pi ^{+}\) . The remaining energy in the topolcluster is considered as originating from neutral particles.

6 Results for standard resolution

An example event is displayed in Fig. 3, showing the truth energies and the predictions of the pPflow and ConvNet algorithms. It illustrates how this specific algorithm produces a more accurate image of the neutral energy deposit. The other algorithms that are considered provide very similar results.

The training and the evaluation of the NN models are performed on cells belonging to the topoclusters. Figure 4 shows a comparison between the predicted and truth cell energies for the Graph model for an inclusive energy range of [2–20] GeV. It illustrates the ability of the ML models to predict the cell fractions over a wide range of energy. The distributions of residuals \(\left( \frac{E_{predicted}-E_{Neutral}}{E_{Neutral}}\right) \) computed from the prediction of a network trained on specific energy ranges and a network trained over the entire energy range. Both, the predicted and the truth neutral energies used in the the residuals are computed as the sum of cell energies belonging to the topoclusters. The marginal differences observed show that the training performed over the full energy range provides very similar results compared to the specific model trained exclusively in each of energy regions.

To quantify the overall performance of the NN methods, two figures of merit related to the reconstruction of the neutral energy deposits are considered. The first is the energy resolution of the residual neutral energy after the charged energy subtraction. The second is the spatial resolution in the reconstruction of the location of the barycenter of the neutral energy in the second layer of the EM calorimeter.

Figure 5 shows the neutral energy residual distributions for the pPflow algorithm and all NN models. The cells considered to compute the residual are those pertaining to the initial topocluster. The distributions are fitted with a sum of two Gaussian distributions, to quantify the main gaussian resolution and the non-gaussian tails. For all the NN models, non-gaussian tails are below 2% for all energy ranges and all models. The estimated energy resolutions of the ML algorithms show an improvement in excess of a factor of 4 compared to the pPflow in the lowest energy range corresponding to 2–5 GeV. The relative improvement progressively decreases to reach approximately a factor of 2 in the highest energy range of 15-20 GeV.

For the pPflow algorithm a bias in the mean, ranging from 20% in the low energy range, to 5 % in the highest energy range, is observed. It is found to originate from the parametrisation of the charged particle energy deposits predictions which is derived from a pure sample of \(\pi ^{+}\). The size and energy of the topocluster in isolated \(\pi ^{+}\) are systematically lower compared to the evaluation sample which features a large \(\pi ^{0}\) and \(\pi ^{+}\) overlap. As the energy of the \(\pi ^{+}\) increases, the size of the topoclusters in both, isolated and overlapped topologies becomes comparable thus reducing the bias in the high energy ranges. In contrast, for all NN methods and most energy ranges, the predictions yield a precise average value. In the low energy range, the UNet and CNN algorithms show small biases of approximately 4% and 2% respectively. These small biases are not observed for the GNN and Deepset algorithms. The GNN and DeepSet methods also perform up to approximately 20% better in terms of neutral energy response resolution. While these differences could be due to the tuning of the model or specific trainings that could benefit from different training strategies or larger training datasets. These differences are sufficiently significant to be emphasised.

Fig. 5
figure 5

The neutral energy relative residuals \((E_{predicted}-E_{Neutral})/(E_{Neutral})\) distributions of the pPflow and the different NN algorithms for different energy ranges described in the text. The distributions are fit with a sum of two gaussians to catch non-gaussian tails. The values of the central gaussians are shown in the plots. In order to compare with pPflow performance, all the residuals are computed from the topocluster closest to the track

Fig. 6
figure 6

The distance computed in number of cells between the barycenter of the predicted and truth neutral energy in the ECAL2 layer when using the pPflow or the NN algorithms for different energy ranges described in the text. The equivalent distributions in different calorimeter layers show a similar behaviour

Figure 6 illustrates the distance, in number of cells, between the barycenter of the truth neutral energy and the predicted neutral energy within the topocluster for the different algorithms in the ECAL2 layer. The NN models provide an accurate estimate of the barycenter, outperforming the pPflow results. While the RMS of the pPflow predictions is approximately constant as a function of the energy, the NNs predictions improve their accuracy as a function of the pions energy. The main reason for the improvement is related to the pPflow energy subtraction which is performed within rings around the extrapolated track position and therefore it gives a very approximate estimate of the precise topology of the neutral energy deposit, in contrast with the NN algorithms. Improvements of a factor of 4 or more in the spatial resolution are observed for the ML algorithms versus the pPflow algorithm.

In the next section, super-resolution models used to further improve the spatial resolution of the calorimeter system will be discussed.

7 Results for super-resolution

In HEP experiments very large samples, providing a precise description of the development of electromagnetic and hadronic showers, can be simulated.

Precise calorimeter images with arbitrarily high granularity can thus be produced, providing high-resolution images that can be used as targets in the training of NN algorithms. We use such a dataset with information of low granularity data and high granularity truth information to quantitatively establish a proof-of-principle NN based super-resolution method.

The models used in Sect. 6 to predict the fraction of cell energy pertaining to neutral hadrons, in some cases supplemented with a broadcasting layer to perform changes in granularity, can also be used to predict calorimeter images with augmented granularity. To efficiently showcase the ability of these methods, the LG detector configuration is used as a baseline. the HG detector is then used as the target of super-resolution algorithms. To avoid overlapping photons in the high-resolution image, only the sample featuring 2–5 GeV pions was used for these studies.

Fig. 7
figure 7

An event display of total energy shower (within topocluster), as captured by a calorimeter layer of \(8 \times 8\) granularity, along with the location of the track, denoted by a red cross (left) and the same shower is captured by a calorimeter layer of \(32 \times 32\) granularity (middle). The bottom right panel shows the corresponding event predicted by the NN. The figure shows that the shower originating from a \(\pi ^{0} \rightarrow \gamma \gamma \) is resolved by a \(32 \times 32\) granularity layer

An example event is displayed in Fig. 7, illustrating the truth neutral energy in the second ECAL layer for the LG and the HG detector configurations along with the high-resolution predictions for the Graph+UNet algorithm. This example illustrates the ability of the NN, not only to subtract the correct amount of energy from the charged particle, but also to provide an accurate and higher granularity prediction of the energy deposited by the two photons which can be seen to be separated and to nicely reproduce the underlying true pattern of energy deposition.

To quantify the performance of the super-resolution algorithms, the distance of the super-resolution energy barycenter to the center of the initial granularity cell is calculated for both the true energy deposits and the prediction, as illustrated in Fig. 8. The relative difference between these distances (\(\varDelta \text {R}\)) is also shown in Fig. 8 for all methods, thus showing that the super-resolution methods are able to reproduce the correct barycenter distance to within a good precision. The different NN models show similar results.

Fig. 8
figure 8

(left) Schematic representation of three cells of the LG detector and the respective high-resolution super cells. The truth and predicted neutral energy component in the high resolution image are outlined in green and blue, respectively. The radial distance (\(\varDelta \text {R}\)) between the barycenter of the neutral energy distribution of the super cells and the center of the corresponding standard cell is outlined in blue. (right) Relative residuals distributions of the radial distance for the different NN architectures

Fig. 9
figure 9

(left) Average cell truth neutral energy normalised to the total energy in the second layer of the HG calorimeter, where calorimeter images have been systematically centred on the highest energy cell. From this distribution the average energy profile as a function of radial distance \(\rho _{E}(r)\) is computed (as displayed in the left panel) from this 2-D distribution in rings (of surface S(r)) with a width of one cell (middle). (right) Calibrated reconstructed mass distribution from the neutral energy calorimeter in the LG case in blue, the truth HG case in red and the super resolution predicted HG image in green and orange. The green curve is the prediction when there is charged and neutral energy overlapping along with noise. The orange curve, which follows closely the truth distribution, shows the prediction when a training and evaluation is carried out on sample consisting of neutral energy and noise only. This demonstrates the degradation originates from the presence of overlapping charged shower. All predictions are computed from the GNN algorithm. The other approaches considered herein give similar results

To illustrate the performance of the super-resolution methods in producing an augmented granularity calorimeter image, an average image is produced from a large sample where all images are centred on the highest energy cell, as illustrated in the left panel of Fig. 9. The appearance of a circular secondary maximum energy ring around the central highest energy spot denotes the presence of a second photon from a neutral pion decay. The same figure (middle panel), shows that the integrated average relative energy profile \(\rho _{E}(r)\), obtained from the low-resolution image (blue line) cannot capture the secondary peak, while the super-resolution predictions are able to capture the secondary peak (green). The secondary peak location from the prediction coincides with the position of the peak in the true neutral particle energy distribution (red); however, the predicted energy distribution displays a slight underestimate under the primary peak and an overestimate between the two peaks. This would result in a degradation of the discrimination between a photon and a \(\pi ^0\) at reconstruction level. To further check the origin of this degradation, a super-resolution GNN is trained on a sample without an overlapping charged pion. In this case the super-resolution prediction (orange) reproduces accurately the expected truth energy density distribution (red). The degradation is relatively minor given the large overlap between the charged and neutral pions imposed in the nominal low energy (2–5 GeV) sample chosen here, where typically the angular distance between the charged and neutral pions is smaller than the opening angle of the two photons from the \(\pi ^0\) decay.

Another illustration of the performance of the ability of high-resolution layers in resolving the two photons clusters is obtained by reconstructing the invariant mass of the \(\pi ^{0}\) from the energies of the two photons. The individual photon energies and directions are estimated using a k-mean algorithm applied to the predicted neutral energy with a number of clusters equal to two (\(k=2\)). To better illustrate the impact on the mass resolution, the reconstructed photon energies are calibrated in order to yield in all cases the \(\pi ^0\) mass. The rightmost panel in Fig. 9 shows a comparison between the reconstructed mass distribution for low- and high-resolution layers and illustrates how the super-resolution GNN predictions are able to produce a mass peak with a resolution close to that of the native high-resolution images.

For the HG calorimeter, the spatial resolution of the shower allows to capture two distinct photon energy clusters using the k-mean algorithm. For the LG detector, the spatial resolution is missing and hence the k-mean algorithm fails to identify two peaks distinctively. Hence when we try to construct an invariant mass spectrum from two reconstructed clusters, the output of HG detector gives a well reconstructed peak whereas a relatively flat distribution is obtained from the LG shower.

8 Conclusion and outlook

Particle Flow reconstruction has an important role in high energy particle collider experiments and is being considered for the design of future experiments. A key component of particle flow reconstruction is the ability to distinguish neutral from charged energy deposits in the calorimeters. In this paper, a Computer Vision approach to this task, based on calorimeter images, is proposed. This approach explores the ability of Deep Learning techniques to produce calorimeter images of the energy deposits with an optimal separation of the energy deposits originating from neutral particles from those originating from charged particles. Several schemes based on Convolutional layers with the insertion of up-convolution and down-convolution blocks, dynamic graph convolution networks and DeepSet networks are proposed.

The detailed performance is quantified using a simplified layout of electromagnetic and hadron calorimeters, focusing on the challenging case of overlapping showers of charged and neutral pions. An improved energy and direction reconstruction of the initial particles is obtained, compared to parametric models. Enhanced calorimeter images of the event with a finer granularity with respect to the apparatus’ native granularity are also obtained using super-resolution techniques. All the techniques used yield excellent performances at producing calorimeter images. The GNN and DeepSet approaches; however, appear to provide a slightly better resolution and more stable results over a wide range of energies. These algorithms constitute an improved first step towards a computer vision Particle Flow algorithm or a powerful intermediate step in the development of precise identification algorithms.