1 Introduction

In modern hadron collider experiments with a large number of simultaneous collisions and a large number of detector cells, reconstructing the properties of individual particles from the information collected by the detectors poses a significant computational challenge. These experiments use, among other sub-detectors, electromagnetic calorimeters that record the energy deposits left by particles and measure in particular the energy of electrons and photons. Generally, energy deposits in different cells of these detectors must be clustered together to reconstruct the energy and position of the initial particle. Such reconstruction is a complex problem due to the high multiplicity of particles and the overlap of showers, which can be further investigated by taking the Electromagnetic CALorimeter (ECAL) of the Compact Muon Solenoid (CMS) detector as an example.

The CMS experiment is a general-purpose detector situated at the Large Hadron Collider (LHC) [1] at CERN. Its objectives are to probe the main theoretical framework used in particle physics (the Standard Model) and search for physics beyond it. To do so, it is necessary to detect, identify, and determine the kinematic properties of particles produced by proton-proton collisions at center-of-mass energies reaching up to \(13.6~\textrm{TeV} \).

Numerous physics analyses performed with the data collected by the CMS detector (e.g. relating to the study of the properties of the Higgs boson [2, 3]) require precise reconstruction of photon and electron properties. It is done in a multi-step process starting from the reconstruction of single energy deposits in the detector and ending with the identification of the particle type and its origin [4].

The kinematic properties of individual photons are evaluated using a traditional geometrical algorithm that clusters the energies they deposit in different cells of the detector. The algorithm used by CMS is described in Sect. 2 and is called PFClustering, PF standing for Particle-Flow [5]. While this method is accurate and efficient, it has limitations:

  1. 1.

    It has limited ability to accurately distinguish two close-by photons, such as photons produced in neutral pion (\(\pi _{0}\)) decays, which can mimic the energy pattern of an isolated \(\gamma \), or decays of potential new particles, for instance, the exotic decay of the Higgs boson \(H \rightarrow AA \rightarrow 4\gamma \) [6], where A is a light scalar or pseudoscalar particle.

  2. 2.

    The algorithm has a high background rate at very low energies (\(< 1\) GeV) related to experimental noise. Such backgrounds are expected to increase with the aging of the detector and the performance is expected to further deteriorate.

In order to overcome these limitations and improve the reconstruction of particle properties, machine learning (ML) algorithms can be considered. They have been already widely applied in physics analyses based on photons and electrons in CMS (e.g. [7, 8]). However, ML techniques have not been used in CMS for energy clustering, nor optimized for multiple overlapping energy deposits originating from several particles, where new challenges for the reconstruction performance emerge.

In this paper, we present a novel ML algorithm, named DeepCluster, to reconstruct energy deposits in electromagnetic calorimeters. This algorithm is based on convolutional and graph neural networks. The developed model significantly outperforms the PFClustering algorithm in terms of energy and position resolution, while also mitigating the limitations of the traditional approach regarding close-by particle identification. We show the different steps required to develop the final algorithm, explaining the reasoning behind the choice of specific methods. Finally, the performance is shown for various particle types (photons, electrons, and neutral pions), and compared to the performance of the PFClustering algorithm.

2 Particle reconstruction in the CMS electromagnetic calorimeter

The method described in this paper has broad applicability across a variety of clustering tasks. The CMS ECAL serves as a critical instrument for the precise detection and identification of electrons and photons, thus acting as the reference case for the proposed methodology. We first provide an overview of this detector’s operational principles and geometric configuration. Then, we delve into the PFClustering algorithm, which is currently employed by the CMS collaboration to reconstruct the properties of electromagnetic deposits. Finally, we discuss the utilized energy correction techniques. This comprehensive overview serves as a foundation for comparison with the novel methodology proposed in this study.

2.1 CMS ECAL

The CMS ECAL is a homogeneous calorimeter consisting of about 75,000 lead tungstate (\(\textrm{PbWO}_{4}\)) scintillating crystals. It is divided into two main parts: the barrel (crystal size: 2.2 \(\times \) 2.2 \(\times \) 23 \(\textrm{cm}^3\)), covering the pseudorapidity region \(|\eta |\) < 1.479, and the endcaps (crystal size: 2.9 \(\times \) 2.9 \(\times \) 23 \(\textrm{cm}^3\)), covering the pseudorapidity regions 1.479 < \(|\eta |\) < 3.0. In the barrel, the crystal length corresponds to 25.8 radiation lengths (\(X_0\)) and the crystal transverse size matches the Molière radius of PbWO4. A complete description of the CMS experiment and the ECAL is given in Ref. [1]. Crystals are arranged in a quasi-projective geometry with regard to the center of the interaction region: crystals are tilted by 3 degrees with respect to this point to avoid leakage in mechanical gaps.

The main purpose of the ECAL is the characterization of electromagnetic energy deposits. When a photon or an electron enters the calorimeter, it starts an electromagnetic shower [9], ultimately generating scintillation photons that are detected by photodetectors. This energy shower can be spread among multiple crystals around the entry point. A combination of adjacent-crystal energy deposits is called a cluster. The goal of reconstruction algorithms is to estimate as precisely as possible the energy and position of the primary particle entering the calorimeter from the corresponding cluster properties.

2.2 The PFClustering algorithm

The PFClustering algorithm [5] is designed to ensure high reconstruction efficiency even for low-energy particles. The energy clusters reconstructed with the PFClustering algorithm are called PFClusters and they are formed from the following steps:

  1. 1.

    The energy deposit in each crystal is reconstructed as a hit, if it is above a threshold \(E_\textrm{thr}^\textrm{hit}\).

  2. 2.

    Hits with energies exceeding a certain threshold \(E_\textrm{thr}^\textrm{seed}\) and larger than the energy of adjacent hits (either sharing a side or a corner) are selected as seeds.

  3. 3.

    Each seed is combined with the hits in the eight neighbouring crystals to create a topological cluster.

  4. 4.

    Topological clusters are grown by aggregating all hits with at least a corner or a side in common with a crystal that is already in the cluster.

  5. 5.

    In general, the energy deposited in each crystal of the PFCluster can be caused by more than one particle. If a topological cluster contains multiple seeds (each seed potentially corresponding to a single particle), a PFCluster is attributed to each seed. The fraction \(f_{ji}\) of energy of each hit j attributed to PFCluster i in the topological cluster is determined from an expectation-minimization iterative algorithm rooted in the Gaussian-mixture model. Its detailed description can be found in Ref. [5].

The position (\(\vec {\mu }_{\textrm{reco}}^i\)) and predicted energy (\(E_\textrm{reco}^{i} \)) of the \(i^{\textrm{th}}\) cluster are evaluated following the formulas:

$$\begin{aligned}{} & {} E_{\textrm{reco}}^i=\sum _{j=1}^M f_{j i} A_j \end{aligned}$$
(1)
$$\begin{aligned}{} & {} \vec {\mu }_{\textrm{reco}}^i=\frac{1}{E_{\textrm{reco}}^i}\sum _{j=1}^M f_{j i} A_j \vec {c}_j, \end{aligned}$$
(2)

where M is the total number of hits in a topological cluster, \(A_j\) is the deposited energy in \(j_{\textrm{th}}\) hit, \(\vec {c_{j}}\) is the position of the \(j_{\textrm{th}}\) hit.

The parameters of the PFClustering algorithm used in this paper correspond to the tuning performed for the Run 3 (2022–2025 operation) of the LHC [10]: \(E_\textrm{thr}^\textrm{hit}=E_\textrm{thr}^\textrm{seed}=3\,\sigma _n\), where \(\sigma _n\) is the average noise level in a crystal as discussed in Sect. 3.

2.3 Energy correction

Due to the possibility of energy leakage from the shower and the per-crystal energy thresholds implemented in the reconstruction step, the particle energy evaluated by the PFClustering algorithm tends to be underestimated. To compensate for this, an energy correction is applied to the PFClusters obtained in the preceding section. In the CMS experiment, this correction is carried out using a multivariate technique known as boosted decision trees (BDT) [11] and trained on simulation. A comprehensive list of all inputs to the BDT used in this study with their brief descriptions is presented in Table 1.

Table 1 Input variables to the PFClustering energy regression BDT used in this study

While the reconstruction algorithm implemented in the CMS experiment utilizes a limited subset of variables for energy correction, we have chosen to incorporate additional variables to enhance its performance. Detailed information on the training and optimization of the BDT is available in Appendix A. The correction is applied when comparing the energy resolution of the PFClustering algorithm to the one of the DeepCluster algorithm in Sect. 6.

3 Simulation and dataset

To train the DeepCluster model and compare its performance to the one of the PFClustering algorithm, a calorimeter simulation, referred to as a toy calorimeter, is created using the Geant4 software [12].

This simulation uses a simple geometry made of a rectangular shape consisting of \(51 \times 51\) crystals, preserving the physical characteristics of the barrel part of ECAL: PbWO\(_4\) crystals with dimensions of \(2.2 \times 2.2 \times 23\) cm\(^3\). In each crystal, the measured energy is derived by randomly smearing the deposited energy as obtained from Geant4 (see Sect. 3.2). In the simulation, we neglect the ECAL-crystal tilt. Furthermore, additional interactions (pile-up) are not considered.

We first create an initial dataset composed of single high-energy \(\gamma \) directed perpendicularly towards the detector surface. Their energies are uniformly distributed within the range of 1–\(100~\textrm{GeV} \). The position at which the particles enter the toy calorimeter is randomly chosen, avoiding the edges of the detector to ensure that the particle deposits most of its energy in the active material of the calorimeter.

We demonstrate the validity of our toy detector by comparing the energy deposit profiles obtained with our simulation to the ones obtained from the nominal CMS simulation, see Appendix B.

3.1 Datasets

In order to train and test the DeepCluster model, we create two separate datasets (using pandas [13] and numpy [14] libraries from Python programming language) from the aforementioned initial dataset:

  • Single-photon dataset.

    Each entry of the dataset consists of one photon. It is used both for training and as a primary check of DeepCluster performance regarding coordinate and energy resolution. It corresponds to the case of isolated particles in the calorimeter.

  • Two-photon dataset.

    Each entry is created by superimposing two different samples from the initial dataset. In this dataset, only particles with positions located within a maximum distance (\(\Delta R = \sqrt{\Delta x^2 + \Delta y^2}\)) of 3 crystals from each other are selected, excluding cases where the two particles enter the calorimeter in the same crystal. This dataset represents two close-by photons in the calorimeter, mimicking the signature of a \(\pi _0\) decay, for instance.

In this study, the training (resp. validation) dataset consists of a random mixture of 600 k (resp. 200 k) samples from the single-photon dataset and 300 k (resp. 100 k) samples from the two-photon dataset. The training dataset is used to train the DeepCluster model while the validation dataset is used to select the best model ensuring no overtraining. In addition, we estimate the performance with a test dataset composed of 100 k single photons and 50 k two-photon samples.

Moreover, in order to gain a comprehensive understanding of the model performance across various potential use cases, extra datasets used only for evaluation are created:

  • Electron dataset.

    This dataset is created in a similar way as the photon one. Each electron has an energy randomly chosen from a uniform distribution in the range \([1, 100]~\textrm{GeV} \). Since electrons are primarily reconstructed within the ECAL, it is important to check the algorithm’s performance for these particles.

  • Neutral pion dataset.

    To create this dataset, we simulate \(\pi ^{0}\)s emitted at a distance of \(130~\textrm{cm}\) from the toy calorimeter front face and with energies uniformly distributed in the range \([1, 100]~\textrm{GeV} \). It consists of 180k samples.

3.2 Per-crystal energy

Finally, as the simulation does not include all the steps of the ECAL readout chain, the per-crystal energy is obtained by smearing the true deposited energy. In case of two overlapping samples (e.g. two-photon dataset), the true deposited energies are first summed and then smeared. This provides a realistic simulation of the calorimeter response and energy resolution, including the simulation of the readout-electronic noise.

The ECAL energy resolution is parametrized as follows [15]:

$$\begin{aligned} \left( \frac{\sigma _E}{E}\right) ^2 = \left( \frac{a}{\sqrt{E}}\right) ^2 + \left( \frac{\sigma _n}{E}\right) ^2 + c^2, \end{aligned}$$
(3)

where a, \(\sigma _n\), and c are called respectively stochastic, noise, and constant terms. The energy \(E_\textrm{xtal}\) measured in a crystal xtal is given by:

$$\begin{aligned} E_\textrm{xtal} = E_\textrm{xtal}^\textrm{true} \times \mathcal {N}\left( \mu =1; \sigma =\frac{\sigma _{E_\textrm{xtal}^\textrm{true}}}{ E_\textrm{xtal}^\textrm{true}}\right) , \end{aligned}$$
(4)

where \(\mathcal {N}(\mu ; \sigma )\) is a random number following a Gaussian distribution with mean \(\mu \) and standard deviation \(\sigma \), and \(E_\textrm{xtal}^\textrm{true}\) is the true energy deposited in the crystal xtal obtained from the Geant4 program. The chosen parameters correspond to the ECAL conditions for Run 3 [10, 16]: \(a=0.03\) GeV\(^{\frac{1}{2}}\), \(\sigma _n = 0.167\) GeV, \(c = 0.0035\). A lower cut at 50 MeV is applied on the smeared energy to mitigate the noise.

4 DeepCluster model

There has been a surge in the application of machine learning techniques in the field of particle physics over recent years due to the capacity of these algorithms to extract complex patterns. In particular, techniques such as deep neural networks (DNN) and convolutional neural networks (CNN) have enabled unprecedented levels of accuracy and efficiency in tasks such as particle tracking and calorimetry clustering. Graph neural networks (GNN) have also recently gained significant popularity in high-energy physics applications (e.g. [17, 18]) due to the following factors:

  • they can handle sparse data coming from complex detector geometries,

  • they are applicable to non-Euclidean data structures with variable input sizes.

To achieve optimal performance, we combine multiple state-of-the-art machine learning methods in the DeepCluster architecture. First, we use the excellent pattern recognition abilities of CNNs. As energy deposited in the crystals of the calorimeter can be represented as pixel intensities of an image, CNN can be easily and naturally applied to treat calorimeter data. Secondly, we use a GNN to allow information transfer between neighbouring particles.

In the DeepCluster model, the particle-reconstruction task is divided into two consecutive steps:

  1. 1.

    Extract small windows (\(7\times 7\) crystals), named seed windows, whose energy deposits potentially originate from a real particle and not from noise. This is performed by a first NN called the seed-finder NN described in Sect. 4.2.

  2. 2.

    For each seed window predict the kinematic properties of the corresponding generated particle. This is done with a second NN called center-finder NN. In this work, we develop two different approaches for the center-finder NN:

    • The first one, based on a CNN, is described in Sect. 4.3.

    • To circumvent the limitations of this CNN-based center-finder, a second center-finder, using a GNN, is introduced in Sect. 4.4.

With this approach, the networks process only small crystal matrices as opposed to the full calorimeter matrix (\(51\times 51\) crystals). This significantly reduces the need for computational power and allows to easily scale this method to a real detector (the ECAL barrel contains \(170\times 360\) crystals).

Fig. 1
figure 1

Example of a seed window

4.1 Seed windows definition

The inputs to the DeepCluster model are called seed windows and are obtained for each sample as follows:

  1. 1.

    All the crystals from the toy calorimeter with energy deposits \(E_\textrm{xtal}>0.5~\textrm{GeV} \) are selected and defined as seed crystals. The selected \(0.5~\textrm{GeV} \) threshold value corresponds to \(E_\textrm{thr}^\textrm{seed}\) used in the PFClustering algorithm as tuned for Run 3 operations.

  2. 2.

    For each seed crystal, a seed window is created. This is a matrix of size \(7\times 7\) crystals centered on the seed crystal. An example of a seed window is shown in Fig. 1.

From one simulated sample corresponding to the full toy calorimeter, several seed windows can be created. They originate from a real particle or from noise.

In order to train the different networks, we need to associate the seed windows to their corresponding generated particles and the subsequent truth labeling has to be defined. We first check if the impact position of any generated particle in the considered sample lies within the boundaries of the seed crystal:

  • In such a case, the corresponding particle (at most one by construction) is associated to the seed window. The window is labeled true seed window and assigned three kinematic variables corresponding to the true particle: generated position \((x_\textrm{gen}^{}, y_\textrm{gen}^{})\) and energy \(E_\textrm{gen}^{} \).

  • Otherwise, it is labeled as background.

For true seed windows, the local position of the generated particle inside the seed window (\(x_\textrm{loc}^{} \), \(y_\textrm{loc}^{} \)) is defined as:

$$\begin{aligned} \begin{aligned} x_\textrm{loc}^{}&= x_\textrm{gen}^{} - x_\textrm{win}^{} \\ y_\textrm{loc}^{}&= y_\textrm{gen}^{} - y_\textrm{win}^{} \\ \end{aligned} \end{aligned}$$
(5)

where (\(x_\textrm{win}^{} \), \(y_\textrm{win}^{} \)) corresponds to the position of the seed window center.

4.2 Seed-finder NN

The seed-finder NN is the first network in the DeepCluster model. This is a CNN, whose goal is to select the true seed windows and discard background ones.

The network takes a seed window as input and assigns to it a seed-finder score (\(P_\textrm{seedSF}\)), indicating the likelihood to be a true seed window. The seed windows with \(P_\textrm{seedSF} < P^\textrm{thr}_\textrm{seedSF}\), where \(P^\textrm{thr}_\textrm{seedSF}\) is a tunable threshold, are discarded, the other ones are passed to the center-finder NN.

The seed-finder-NN architecture consists of two convolutional and two dense layers. LeakyReLU is chosen as the activation function [19] and a dropout of 10\(\%\) is applied after the first convolutions. For the output of the seed-finder NN, a sigmoid activation function is used. A detailed view of the model architecture is presented in Fig. 2.

Fig. 2
figure 2

Seed-finder NN architecture. \(7\times 7\) seed windows are first selected around all possible seeds (\(E_\textrm{xtal}>0.5~\textrm{GeV} \)). They are separately passed as input to the seed-finder NN. The input is processed by two convolutional layers until the vector of summary features is extracted. This vector is further passed to two dense layers, resulting in the network output: the seed-finder score \(P_\textrm{seedSF}\). It represents the likelihood of the input to originate from a generated particle. Detailed information on the number of nodes at each layer is presented in the figure

The model is trained using the Adam optimizer [20] with a learning rate of 0.0001 and a batch size of 64. We use the binary cross entropy as the loss function. The network is trained for \(\approx \)100 epochs and the epoch yielding the best result on the validation dataset is chosen.

Compared to the seeding step used in the PFClustering algorithm, the seed-finder NN provides several advantages:

  • As the condition for the seed to be a local maximum is removed, the seed-finder NN provides a better possibility to reconstruct close-by photons.

  • The seed-finder NN performs a refined seed window selection that helps to significantly eliminate the low-energy background coming from electronic noise.

4.3 Center-finder NN: convolutional neural network

The center-finder NN is the second step of the DeepCluster model. It predicts the position (\(x_\textrm{reco}^{i} \), \(y_\textrm{reco}^{i} \)) and energy \(E_\textrm{reco}^{i} \) of the generated particle associated to the seed window i; the corresponding generated quantities are named (\(x_\textrm{loc}^{i} \), \(y_\textrm{loc}^{i} \)), \(E_\textrm{gen}^{i} \). The global coordinates can be further inferred by inverting Eq. 5.

Similarly to the seed-finder NN, the CNN-based center-finder takes seed windows as input. All the inputs are processed independently from each other. In the training phase, the inputs are limited to the true seed windows. However, for the evaluation, the center-finder NN processes all the seed windows with the seed-finder scores passing \(P_\textrm{seedSF}^\textrm{thr}\).

The architecture of the center-finder NN is close to the one of the seed-finder NN: it consists of multiple convolutional layers, followed by dense layers that are divided into two parts: one resulting in coordinate prediction and another in energy prediction. LeakyReLU activation function is applied everywhere except for the output layer, where \(\tanh \) and sigmoid functions are used respectively for position and energy predictions [19]. The dropout level is set to 10\(\%\) everywhere except for the last energy prediction layer, where it is set to 30\(\%\). The full network architecture with precise details on the number of nodes is presented in Fig. 3.

Fig. 3
figure 3

Center-finder NN architecture. The seed windows are passed separately as input to the network. Each input is processed by two convolutional layers until the vector of summary features is extracted. This vector is passed through one dense layer and further sent separately to two different branches (coordinate and energy predictions). In each branch, it passes through two additional dense layers. Detailed information on the number of nodes at each layer is presented in the figure

Fig. 4
figure 4

Distribution of the variable \(x_\textrm{reco}^{}-x_\textrm{loc}^{} \). The results are obtained by applying the PFClustering algorithm and DeepCluster network on the single-photon (left) and two-photon (right) test datasets. The resolutions (see text) are reported in the figures

The network is trained for 1000 epochs with a batch size \(N_b\) of 64 seed windows. We use the mean absolute error loss defined by:

$$\begin{aligned} \mathcal {L}_{\textrm{kin}}= & {} \frac{1}{N_b} \sum _{i=1}^{N_b} \frac{1}{2} \left( |x_\textrm{reco}^{i} - x_\textrm{loc}^{i} |+ |y_\textrm{reco}^{i}-y_\textrm{loc}^{i} |\right) \nonumber \\{} & {} + |E_\textrm{reco}^{i} - E_\textrm{gen}^{i} |. \end{aligned}$$
(6)

The training is performed using the Adam optimizer with a learning rate of 0.0001. The chosen epoch is the one providing the best performance according to the validation dataset.

4.3.1 Results

The results of the DeepCluster model and the PFClustering algorithm are compared with the single- and two-photon test datasets. For the seed-finder NN, we set \(P^\textrm{thr}_\textrm{seedSF} = 0.3\). This latter threshold is tuned for the final DeepCluster model as discussed in Sect. 5.

The performance for the position prediction is presented in Fig. 4 for the single-photon dataset on the left and for the two-photon dataset on the right. Each plot shows the distribution of the difference \(x_\textrm{reco}^{}-x_\textrm{loc}^{} \). Similar results are obtained for the y-coordinate. We associate reconstructed objects to generated objects using a matching procedure described in Appendix C.

The DeepCluster network significantly outperforms the PFClustering algorithm. The coordinate resolution (evaluated as half the interval containing 68% of the distribution and centered on the median) for the DeepCluster is 0.02 crystal compared to 0.04 crystal for the PFClustering algorithm for the single-photon dataset and 0.03 crystal versus 0.08 crystal for the two-photon dataset.

The position reconstruction for the two-photon dataset is a more difficult task than for the single-photon one because of overlapping energy clusters for close-by generated particles. This explains the performance degradation in the two-photon dataset.

Concerning the energy prediction, the performance of the PFClustering and DeepCluster algorithms is closer. They are presented for the optimized DeepCluster algorithm in Sect. 6.

4.3.2 Per-event energy overestimation

Because the network processes only seed windows, this approach can be easily extended to a real calorimeter. However, it also raises a major issue described in this section. Contrary to the PFClustering, the local maximum condition is omitted for the seed crystal in the DeepCluster. As a consequence, two (or more) neighbouring crystals can be selected as seeds. While this allows for efficient reconstruction of close-by particles, this can also create two or more separate seed windows corresponding to a single generated particle, which would share energy among several crystals. For the majority of these latter cases, the seed-finder NN is able to identify the local maximum itself and predict a high seed-finder score \(P_\textrm{seedSF}\) for the corresponding seed window (in which the local maximum is the central crystal) and a low score for its neighbour window.

However, when the position of the generated particle is close to the edge of a crystal, its energy deposit can be very similar in the two neighbouring crystals. Both of them get a high \(P_\textrm{seedSF}\), giving rise to two distinct seed windows. These windows are separately passed to the center-finder NN that predicts similar coordinates and energies for both of them. The same generated particle can therefore be reconstructed twice by the DeepCluster algorithm, thus overestimating the total energy reconstructed in the event. In the following, this is referred to as energy double counting.

Fig. 5
figure 5

Distributions of the ratio \(R_\textrm{en}\) between the total reconstructed energy \(E_\textrm{reco}^\textrm{tot}\) and the total generated energy \(E_\textrm{gen}^\textrm{tot}\). The results are obtained with the PFClustering algorithm, the DeepCluster model with a CNN-based center-finder CNN (CF CNN), and with a GNN-based center-finder (CF GNN). The distributions are obtained from the single-photon test dataset. A second peak \(R_\textrm{en}\approx 2\) arises for the CF CNN, while it is eliminated with the CF GNN

Fig. 6
figure 6

Flow chart of the DeepCluster model. \(7\times 7\) seed windows are first selected around all possible seeds (>0.5 GeV) in the event. They are separately passed as input to the seed-finder NN predicting \(P_\textrm{seedSF}\) for each seed window. Selected seed windows with \(P_\textrm{seedSF}\) > \(P_\textrm{seedSF}^\textrm{thr}\) are combined into groups of 4 with their neighbours and passed to center-finder NN which predicts the coordinates \(x_\textrm{reco}^{} \), \(y_\textrm{reco}^{} \), the energy \(E_\textrm{reco}^{} \) and a new seed score \(P_\textrm{seedCF}\) for each seed window

This effect is illustrated in Fig. 5, which presents the distribution of the ratio \(R_\textrm{en} = E_\textrm{reco}^{\textrm{tot}} / E_\textrm{gen}^{\textrm{tot}} \) where \(E_\textrm{reco}^{\textrm{tot}} \) and \(E_\textrm{gen}^{\textrm{tot}} \) are respectively the total reconstructed energy \(E_\textrm{reco}^{\textrm{tot}} \) (sum of the energies of all the reconstructed particles in the event) and the total generated energy \(E_\textrm{gen}^{\textrm{tot}} \) (sum of the energies of all the generated particles in the event). The distribution of \(R_\textrm{en}\) is obtained from the single-photon dataset. The additional peak around two observed for the CNN-based center-finder (CF CNN on the figure) originates from the energy double counting.

The double counting issue is cured by changing the center-finder NN architecture as presented in the next section.

4.4 Center-finder NN: graph neural network

The energy double counting is due to the fact that the CNN-based DeepCluster model does not receive information from the rest of the event, as a consequence, it is not aware of the existence of several seed windows corresponding to the same generated particle.

To solve the issue, we pass several neighbouring seed windows as input to the network. This is achieved using a GNN architecture for the center-finder NN while the seed-finder NN remains unchanged. The GNN implementation provides in addition message-passing capabilities [21] enabling information sharing between the different seed windows.

The inputs to the GNN-based center-finder are constructed as follows. First, the list of seed windows in the event is ordered based on the energy of its center crystal \(E_\textrm{seed}\). This list is processed as follows:

  1. 1.

    A center-finder input is initialized from the window \(w_\textrm{ref}\) with the highest energy \(E_\textrm{seed}\) in the list. At this step, the input shape is \(1\times 7\times 7\).

  2. 2.

    For each remaining seed window \(w_\textrm{alt}\) in the list, the \(\Delta R\) distance between \(w_\textrm{ref}\) and \(w_\textrm{alt}\) is computed. If \(\Delta R < 3\), \(w_\textrm{alt}\) is added to the input. After this step, the input shape is \(N_\textrm{w}\times 7\times 7\) where \(N_\textrm{w}\) is the total number of selected seed windows (including \(w_\textrm{ref}\)).

  3. 3.

    All of the seed windows included in the input are removed from the list of seed windows. The process is iterated until this list is empty.

In this work, \(N_\textrm{w}\) is chosen to be at most 4. This maximum can be easily adjusted to higher values as well. For each input, if \(N_\textrm{w} < 4\), it is completed with \(7\times 7\) empty windows, i.e. with null crystal energies. In such a way, the input shape is fixed to \(4\times 7\times 7\).

The 4 seed windows of the input represent the nodes of the graph. In the center-finder NN, each seed window j is first separately processed by a chain of convolutional layers (identical to the CNN center-finder implementation) in order to extract the vector of summary features \(v_j\). The message-passing is implemented as the concatenation of these vectors. It results in 4 updated vectors \(\bar{v}_j\), each of them containing information about their neighbours:

$$\begin{aligned} \begin{aligned} \bar{v}_1&= \begin{pmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \end{pmatrix},&\bar{v}_2&= \begin{pmatrix} v_2 \\ v_1 \\ v_3 \\ v_4 \end{pmatrix},\\ \bar{v}_3&= \begin{pmatrix} v_3 \\ v_1 \\ v_2 \\ v_4 \end{pmatrix},&\bar{v}_4&= \begin{pmatrix} v_4 \\ v_1 \\ v_2 \\ v_3 \end{pmatrix} \end{aligned} \end{aligned}$$
(7)

The combined vectors \(\bar{v}_j\) are then passed independently to a set of dense layers until the final output is extracted. The GNN-based center-finder NN predicts 4 values for each seed window j in the input i: the kinematic variables (\(x_\textrm{reco}^{ij} \), \(y_\textrm{reco}^{ij} \)), \(E_\textrm{reco}^{ij} \) and a new seed score \(P_\textrm{seedCF}^{ij}\), indicating the likelihood to be associated to a true generated particle. In this version of the DeepCluster model, the seed-finder NN serves as an initial filter, separating signal from background, while the GNN-based center-finder corrects for the wrong predictions related to double counting: eventually only objects with \(P_\textrm{seedCF} >P_\textrm{seedCF}^\textrm{thr}\) are selected. The optimization of \(P_\textrm{seedCF}^\textrm{thr}\) is presented in Sect. 5. The GNN-based center-finder architecture is presented in Fig. 6. The loss used in the training is the sum of three terms related to the coordinates, the energy, and the seed probability predictions. The two first terms are based on a mean absolute error loss while the last one is based on a focal cross-entropy loss [22]. The combined loss is computed as:

$$\begin{aligned} \mathcal {L}_\textrm{pos}&= \frac{1}{4\cdot N_b} \sum _{i=1}^{N_b}\sum _{j=1}^4 \frac{1}{2}\nonumber \\&\quad \times \left( |x_\textrm{reco}^{ij} - x_\textrm{loc}^{ij} |+ |y_\textrm{reco}^{ij}-y_\textrm{loc}^{ij} |\right) \nonumber \\ \mathcal {L}_\textrm{en }&= \frac{1}{4\cdot N_b} \sum _{i=1}^{N_b}\sum _{j=1}^4 |E_\textrm{reco}^{ij} - E_\textrm{gen}^{ij} |\nonumber \\ \mathcal {L}_\textrm{seed}&= -\frac{1}{4\cdot N_b} \sum _{i=1}^{N_b}\sum _{j=1}^{4}\alpha (1-P^{ij}_\textrm{seedCF})^\gamma \log (P^{ij}_\textrm{seedCF}), \nonumber \\ \mathcal {L}_\textrm{tot}&= \mathcal {L}_\textrm{pos} + \mathcal {L}_\textrm{en} + k_\textrm{s} \cdot \mathcal {L}_\textrm{seed}, \end{aligned}$$
(8)

where \(N_b\) is the number of batches, the focal loss parameters \(\gamma \) and \(\alpha \) are chosen to be \(\gamma =2\), \(\alpha =0.25\), and \(k_\textrm{s}\) is an adjustable parameter associated with the seed loss, it is referred to as seed-loss weight.

4.4.1 Results

In the GNN implementation of the center-finder NN, the model receives information about all the neighbouring seed windows simultaneously. With this adjustment, the network is able to make a more informed decision for each seed window and additionally better attribute energy fractions for different windows.

The related improvement is shown in Fig. 5 where the distribution of \(R_\textrm{en}\) obtained with the single-photon dataset is presented. For the center-finder GNN (CF GNN) only the seed windows with \(P_\textrm{seedCF} > \) 0.4 are selected. One can notice the disappearance of the peak at 2 signalling the resolution of the double-counting issue.

5 Network optimization

The DeepCluster model has a number of adjustable parameters. This section describes the optimization of these parameters to achieve the best possible performance. This starts with the optimization of the seed loss weight, followed by the optimization of the different seed-probability thresholds: \(P_\textrm{seedSF}^\textrm{thr}\) and \(P_\textrm{seedCF}^\textrm{thr}\). Then another step is added to the algorithm in order to suppress multiple predicted particles arising from the same generated one. Eventually, the adjustment of the hyperparameters which underlie the different NNs is presented.

5.1 Seed-loss weight

The GNN-based center finder contains in its loss a term related to the predicted seed probability which is weighted by the parameter \(k_\textrm{s}\). The definition of the total loss is given by Eq. 8, where the parameter \(k_\textrm{s}\) is introduced. The optimal value of \(k_\textrm{s}\) is obtained by comparing the performance achieved with different seed-loss weights \(k_\textrm{s}\). The evolution of the losses for the training and validation datasets are shown in Appendix D for different values of \(k_s\).

5.2 Seed-probability thresholds

In the final DeepCluster model implementation, the seed-finder NN and the center-finder NN predict two different seed scores: \(P_\textrm{seedSF}\), \(P_\textrm{seedCF}\). Solely the seed windows fulfilling the criteria \(P_\textrm{seedSF}>P_\textrm{seedSF}^\textrm{thr}\) and \(P_\textrm{seedCF}>P_\textrm{seedCF}^\textrm{thr}\) are kept for further analysis, the other ones are discarded as background windows. The energy and position resolutions as well as the signal efficiencies and background rates are studied on the single-photon test sample for different sets of thresholds. We retain the values \(P_\textrm{seedSF}^\textrm{thr}=0.3\) and \(P_\textrm{seedCF}^\textrm{thr}=0.4\) as they ensure excellent performance in terms of position and energy resolutions (\(\sigma _x\sim 0.02\) crystal and \(\sigma _E\sim 0.56~\textrm{GeV} \)) while maintaining a high signal efficiency (\(\epsilon \sim 99.5~\%\)).

5.3 Particle splitting

One of the advantages of the DeepCluster model resides in its ability to reconstruct close-by particles. Still, for both the DeepCluster and the PFClustering algorithms, the closer the particles are, the harder it is to disentangle them in the reconstruction process. There is a limit to the distance between two particles below which the reconstruction is not reliable.

Fig. 7
figure 7

Distributions of \(\Delta R\) between two properly reconstructed objects associated to two generated particles (labeled close-by objects), obtained from the two-photon dataset, and \(\Delta R\) between two close-by reconstructed objects associated to the same particle in the single-photon dataset (labeled splitted object)

This limit is explored in Fig. 7 by comparing the distance between two properly reconstructed objects (i.e. associated to two generated particles) to the distance between two reconstructed objects associated to the same particle. The latter case corresponds to a generated particle giving rise to two very close-by clusters, thus splitting in parts the energy of the original particle, this is referred to as particle splitting. From this figure, one can see that for \(\Delta R <0.3\), the reconstruction of two close-by clusters mostly corresponds to the splitting of a single particle (given that the number of photons in single- and two-particle datasets is the same). On the opposite, for \(\Delta R>0.3\), close-by particles are properly reconstructed and single-particle splitting is negligible.

In order to mitigate single-particle splitting while maintaining high signal efficiency for two close-by photons, a dedicated procedure is implemented. We first group reconstructed objects with \(\Delta R <0.3\). As aforementioned, these group contain essentially particle-splitting clusters. In these groups, the highest energy object is kept while the others are discarded. The seed window associated to the corresponding selected object is passed a second time in the center-finder NN. This second pass allows to keep a single reconstructed object for a single particle while properly predicting its energy thus avoiding the particle-splitting phenomena.

5.4 Hyperparameters

The tuning of the hyperparameters of the center-finder network is performed using a Bayesian optimization [23]. The optimal parameters are presented in Table 2. The final DeepCluster model is trained using the LAMB optimizer [24]. The epoch achieving the lowest value in the validation dataset is selected.

Table 2 Final values for the DeepCluster model after the Bayesian hyperparameter optimization was performed for center-finder NN

6 DeepCluster performance

This section presents the performance of the optimized DeepCluster model and compares it to the one obtained with the PFClustering algorithm. The energy and position resolutions are evaluated as half the interval containing 68% of the distribution and centered on the median. In addition to the position and energy resolutions, other important metrics are investigated: signal efficiency, background yield, and particle-splitting yield, as defined in Table 3. The matching procedure linking the reconstructed objects with the true generated particles to determine if reconstructed objects are to be considered as signal or background is described in Appendix C. Each of the reconstructed objects is linked to at most one generated particle, while the generated particle can be linked to multiple reconstructed objects. In the latter case, the objects are tagged as particle-splitting. The same matching procedure is applied for the objects reconstructed by the DeepCluster model and PFClustering algorithm.

Table 3 Description of the variables used for evaluating the performance of the algorithms

First, the DeepCluster model is tested with photon datasets that are comparable to the ones used for training. In a second step, the DeepCluster reconstruction is tested with electrons and finally, the algorithm is applied to the reconstruction of \(\pi ^0\)-meson decays producing collimated photons in the detector.

6.1 Performance for photons

The performance for single-photon and two-photon test datasets is presented in this section.

Figure 8 shows the distributions of the difference between the reconstructed and generated values for position (left) and energy (right) in the single-photon dataset. The position resolution is improved by 50% with regard to the PFClustering algorithm while the energy resolution is improved by about 10%. The improvement is more drastic for the two-photon dataset, where both the position and energy resolutions are improved by about 60%.

Fig. 8
figure 8

Left: distribution of the difference between the reconstructed position \(x_\textrm{reco}\) and the generated position of the particle \(x_\textrm{loc}\). Right: distribution of the difference between the reconstructed energy \(E_\textrm{reco}\) and the generated energy of the particle \(E_\textrm{gen}\). The results are obtained by applying the PFClustering algorithm and DeepCluster model on the single-photon test dataset

This is illustrated in Figs. 9 and  10 where the standard deviation (left) and median (right) of the distributions of the difference between the reconstructed and generated values for energy and position are presented as a function of the generated energy of the particle \(E_\textrm{gen}\), for the single- and two-photon datasets. For the energy, the relative resolution is shown. In the two-photon case, the resolutions, for energies above 20 \(\textrm{GeV} \), are improved by more than 70 % for the energy and 60% for the position. The bias observed for the median of the distributions for the DeepCluster model is small compared to the corresponding resolution. It can be either fixed by further optimizing the parameters of the network or corrected for.

The reason for the poor performance of the PFClustering algorithm for the two-photon dataset is two-fold. Firstly, the BDT used for energy correction is trained only on a single-photon dataset (as it is done in the current CMS implementation). Secondly, with higher energy, there is a bigger chance for two separate clusters to be misreconstructed as a single particle by the PFClustering algorithm, thus, largely overestimating energy.

Fig. 9
figure 9

Relative energy resolution (left) and energy median (right) obtained with the DeepCluster model and PFClustering algorithm applied on the single- and two-photon test datasets. The results are shown in the bins of generated energy \(E_\textrm{gen}\)

Fig. 10
figure 10

Position resolution (left) and position median (right) obtained with the DeepCluster model and PFClustering algorithm applied on the single- and two-photon test datasets. The results are shown in the bins of generated energy \(E_\textrm{gen}\)

Fig. 11
figure 11

Signal efficiency (top), splitting yield for 100 k photons (middle), and background yield for 100 k toy simulations (bottom) values. The results are obtained by applying the PFClustering and DeepCluster model on the single-photon (left) and two-photon (right) test datasets

Table 4 Performance comparison for position and energy resolutions, signal efficiency, splitting yield for 100k photons, and background yield for 100k toy simulation between PFClustering and DeepCluster algorithms for single- and two-photon datasets
Fig. 12
figure 12

\(m_{\gamma \gamma }\) mass distributions reconstructed with DeepCluster model and PFClustering algorithm on the \(\pi _0\) dataset. The results are shown in bins of the generated momentum \(p_\textrm{gen}\) of the \(\pi _{0}\)

Figure 11 presents the signal efficiency, particle-splitting yield, and background yield for the DeepCluster model and the PFClustering algorithm for the single-photon dataset on the left and for the two-photon dataset on the right. The results are presented as a function of the energy of the generated particle \(E_\textrm{gen}\) for the signal efficiency and splitting yield and as a function of the energy of the seed crystal \(E_\textrm{seed}\) for the background yield. In the single-photon case, the signal efficiency is the same as for PFClustering starting from about 5 \(\textrm{GeV} \), while the background rate, coming from noise reconstructed as low energy clusters, is improved by a factor of about 2000. In the two-photon case, the signal efficiency is largely improved, up to a factor of two at low energies. The splitting yield is slightly increased at high energies but the rate stays rather low (in total \(\approx \) 0.3%). The splitting yield is a relevant metric that is important to consider, for example, in the context of di-photon resonance searches.

A summary of the performance is presented in Table 4. The DeepCluster model outperforms the PFClustering algorithm in terms of position and energy resolution both for the single- and two-photon cases. Most notably, the signal efficiency for the two-photon dataset obtained with the DeepCluster model is 97.0\(\%\), while with PFClustering, it is only 82.0\(\%\).

All presented results evaluate the performance of each photon within the two-particle dataset without requiring both photons to be properly reconstructed.

6.2 Electrons

The DeepCluster algorithm is tested on the electron dataset. Although the network is not specifically trained on it, it still shows excellent performance. The resolutions in energy and position, the signal efficiency, and the splitting and background yields are extremely similar to the ones obtained for photons, as expected in absence of magnetic field and material in front of the calorimeter.

6.3 Neutral pions

Finally, the results achieved on the \(\pi _0\) sample are presented. In this case, the reconstruction algorithms have to detect both of the photons originating from the \(\pi _0\) decay and correctly estimate their energy. With this information, the mass of the \(\pi _0\) can be reconstructed as:

$$\begin{aligned} m_{\gamma \gamma } = \sqrt{2E^1_\textrm{reco} E^2_\textrm{reco} \left( 1-\cos {\theta _\textrm{reco}}\right) }, \end{aligned}$$
(9)

where \(E^1_\textrm{reco}\), \(E^2_\textrm{reco}\) are the reconstructed energies of two photons and \(\theta _\textrm{reco}\) is the reconstructed angle between them.

The diphoton mass distributions reconstructed with the DeepCluster model and PFClustering are presented in Fig. 12. The results are shown in bins of the generated momentum \(p_\textrm{gen}\) of the \(\pi _{0}\). When the \(\pi _{0}\) momentum increases, the two photons get closer and are harder to reconstruct separately. The DeepCluster model achieves excellent results, outperforming the PFClustering \(\pi _0\) detection efficiency by a factor of more than two. Moreover, the diphoton mass resolution is significantly better with the model.

As in the case of electrons, the DeepCluster model is not specifically trained on the \(\pi _0\) dataset. Moreover, the photons enter the toy calorimeter under different angles and not perpendicularly as in the training sample. This further underscores the robustness of the network.

7 Conclusion

This paper introduces an innovative machine-learning algorithm, called the DeepCluster model, to measure the energy and position of photons and electrons based on convolutional and graph neural networks, taking the geometry of the CMS electromagnetic calorimeter as an example.

To develop the DeepCluster model and evaluate its performance, a dedicated simplified simulation of the ECAL is created, and different implementations of the model are tested. A two-step network strategy, incorporating both CNN and GNN architectures, delivers the best performance and effectively addresses all identified issues. The final model is tested on datasets with single photons and two overlapping photons, as well as on datasets with electrons or neutral pions. In all cases, the DeepCluster shows superior performance compared to the method currently in use in CMS in terms of coordinate and energy resolutions, as well as background rejection and signal efficiency. In particular, the DeepCluster model demonstrates excellent results in distinguishing between closely spaced particles, reconstructing approximately twice as many \(\pi ^0\) as the traditional approach.

Table 5 Results for the BDT hyperparameter optimization. The algorithm is applied on the single-photon validation dataset, and the regression score measures its performance. The parameters of the 10th trial are chosen as optimal as they correspond to the highest value of the regression score

These results demonstrate that this approach is very promising to enhance the performance of calorimeters in high-energy physics experiments. Moreover, as the network processes only small 7\(\times \)7 windows, the scalability for the full ECAL will not pose a problem. The particles being either far apart in the calorimeter (single-photon dataset) or in close proximity (two-photon dataset), all physics cases are covered. Finally, the performance gain of this approach could be even larger in presence of pile-up interactions, as the local information could allow the network to better mitigate their impact.