Introduction

The Belle II experiment is located at the high-intensity, asymmetric electron-positron-collider SuperKEKB in Tsukuba, Japan. SuperKEKB is colliding 4 \(\mathrm {\,Ge\hspace{-1.00006pt}V}\) positron and 7 \(\mathrm {\,Ge\hspace{-1.00006pt}V}\) electron beams at a center-of-mass energy of around 10.58 \(\mathrm {\,Ge\hspace{-1.00006pt}V}\) to search for rare meson decays and new physics phenomena. Many of these decays include photons in the final state that are reconstructed exclusively in the electromagnetic calorimeter. The experimental program of Belle II targets a significantly increased instantaneous luminosity that ultimately exceeds the predecessor experiment by a factor of 30. This increase in luminosity also leads to a significant increase in beam-induced backgrounds [1]. These background processes produce both high-energy particle interactions that could be misidentified as physics signals, but also energy depositions of low-energy particles that degrade the energy resolution of the electromagnetic crystal calorimeter. The electronics signals from the calorimeter are interpreted during a process called reconstruction to determine the properties of particles that created the signals.

In this paper, we describe a fuzzy clustering algorithm based on Graph Neural Networks (GNNs) to reconstruct photons. The term fuzzy clustering [2] refers to the partial assignment of individual calorimeter crystals to several clustering classes. In our case, these are potentially overlapping, different signal photons, but also a beam background class.

The paper is organized as follows: Sect. 2 gives an overview of related work on Machine Learning for calorimeter reconstruction. Section 3 describes the Belle II electromagnetic calorimeter. The event simulation and details of the beam background simulation are discussed in Sect. 4. The conventional Belle II reconstruction algorithm and the new GNN algorithm are described in Sect. 5. We introduce the metrics used to measure the performance of the GNN algorithm in Sect. 6. The main performance studies and results are discussed in Sect. 7. We summarize our results in Sect. 8.

Related Work

Machine Learning is widely used in high energy physics for the reconstruction of calorimeter signals both for clustering [3, 4], energy regression [5, 6], but also particle identification [7, 8] and fast simulation [9,10,11]. Most of the recent work has been performed in the context of the high-granularity calorimeter (HGCAL) at CMS [12, 13]. For Belle II, the use of machine learning utilizing the electromagnetic calorimeter is so far limited to image-based particle identification in the barrel [8, 14].

GNNs are now widely recognized as one possible solution for irregular geometries in high energy physics [15,16,17]. GNN architectures that are able to learn a latent space representation of the detector geometry itself [18, 19] are the basis of the work presented in this paper.

Previous work has focused on simplified and idealized detector geometries, often approximated as a regular grid of readout cells expressed as 2D or 3D images. Additionally, the presence of geometry changes and overlaps between barrel and endcap regions, large variations of cell sizes, and the presence of very high spatially non-uniform noise levels induced by beam background energy depositions are neglected.

For a complete list of works in particle physics that utilize machine learning, we refer to the review [20].

The Belle II Electromagnetic Calorimeter

The Belle II detector consists of several subdetectors arranged around the beam pipe in a cylindrical structure that is described in detail in Ref. [21, 22]. We define the z-axis of the laboratory frame as the central axis of the solenoid. The positive direction is pointing in the direction of the electron beam. The x axis is horizontal and points away from the accelerator center, while the y axis is vertical and points upwards. The longitudinal direction, the transverse plane with azimuthal angle \(\phi\), and the polar angle \(\theta\) are defined with respect to the detector’s solenoidal axis.

The Belle II electromagnetic calorimeter (ECL) consists of 8736 Thallium-doped CsI (CsI(Tl)) crystals that are grouped in a forward endcap, covering a polar angle \(12.4^{\circ }< \theta < 31.4^{\circ }\), a barrel, covering a polar angle \(32.2^{\circ }< \theta < 128.7^{\circ }\), and a backward endcap, covering a polar angle \(130.7^{\circ }< \theta < 155.1^{\circ }\). The crystals have a trapezoidal geometry with a nominal cross-sectional area of approximately \(6 \times 6\) cm\(^2\) and a length of 30 cm, providing 16.1 radiation lengths of material. While crystals in the barrel are similar in cross-section and shape, the crystals in the endcaps vary with masses between 4.03 kg and 5.94 kg [23]; crystals in the endcaps also have significantly more passive material in front of the crystals. Each crystal is aligned in the direction of the collision point with a small tilt in polar angle \(\theta\) to reduce detection inefficiencies from particles passing between two crystals. Crystals in the barrel additionally have a small tilt in azimuthal angle \(\phi\). The scintillation light produced in the CsI(Tl) crystals is read out by two photodiodes glued to the back of each crystal. After shaping electronics, the waveform is digitized and the crystal energy \(E^\textrm{crystal}_\textrm{rec}\) over baseline and time \(t^\textrm{crystal}_\textrm{rec}\) since trigger time of the energy deposition are reconstructed online using FPGAs [24]. Waveforms of crystals with energy depositions above 50 MeV are stored for offline processing to allow for electromagnetic vs. hadronic shower identification through pulse shape discrimination (PSD) [25]. Available information from PSD is

  • the fit type ID of a multi-template fit indicating which of the possible templates provides the best goodness-of-fit,

  • the respective \(\chi ^2\) value as an indicator of the goodness-of-fit,

  • and the ratio of reconstructed hadronic and photon template energies, referred to as PSD hadronic energy ratio in the following.

Data Set

In this work, we use simulated events to train and evaluate the reconstruction algorithms. The detector geometry and interactions of final-state particles with detector materials are simulated using Geant4 [26] combined with a dedicated detector response simulation. Simulated events are reconstructed and analyzed using the Belle II Analysis Software Framework (basf2) [27, 28]. We simulate isolated photons, with energy \(0.1< E_{\textrm{gen}} < 1.5\,\text {GeV}\), and direction \(17^{\circ }< \theta _{\textrm{gen}} < 150^{\circ }\) and \(0^{\circ }< \phi _{\textrm{gen}} < 360^{\circ }\) drawn randomly from independent uniform distributions in E, \(\theta\), and \(\phi\). The generation vertex of the photons is \(x=0\), \(y=0\), and \(z=0\). For events with two overlapping photons, we first draw randomly one photon with independent uniform distributions as outline above. We then simulate a second photon with an angular separation \(2.9< \Delta \alpha < 9.7\,^{\circ }\) drawn randomly from uniform distributions in \(\Delta \alpha\) and in E. This angular separation covers approximately the distance needed to create two overlapping clusters. These two cases are typical calorimeter signatures in Belle II that describe the majority of photons. We note that the reconstructions of hadrons is a more difficult task not yet covered by our algorithm.

As part of the simulation, we overlay simulated beam background events corresponding to different collision conditions to our signal particles [1, 29]. The simulated beam backgrounds correspond to an instantaneous luminosity of \(L_{\text {beam}}=1.06\times 10^{34}\) cm\(^{-2}\)s\(^{-1}\) (called low beam background), and \(L_{\text {beam}}=8\times 10^{35}\) cm\(^{-2}\)s\(^{-1}\) (called high beam background). Those two values approximately correspond to the conditions in 2021, and the expected conditions slightly above the design luminosity, respectively. The spatial distribution of beam backgrounds is asymmetric: They are much higher in the backward endcap than in the forward endcap, and they are slightly higher in the barrel than in the forward endcap. Additional electronics noise per crystal of about 0.35 MeV is included in our simulation as well.

The supervised training and the performance evaluation both use labeled information that relies on matching reconstructed information with the simulated truth information. For each of the four configurations, isolated and overlapping photons with low and high beam backgrounds, we use 1.8 million events for training and 200,000 events for validation. The performance evaluation is carried out on a large number of statistically independent samples simulated with various energies and in different detector regions.

We then study the performance of the GNN clustering algorithm in all four scenarios and compare it to the baseline basf2 reconstruction. Both reconstruction algorithms are described in detail in Sect. 5.

Isolated Photon


To study isolated photons, we use the simulated events with a generated isolated photon only. For each event, we select a region of interest (ROI): We first determine the azimuthal angles of the fourth neighbour on either side of the local maximum (LM), and the polar angles of the fourth neighbours on either direction of the LM. We then include all crystals in that angular range. In the barrel this defines a regular \(9\times 9\) array of crystals centered around a LM, while in the endcaps this array is not necessarily regular, but can contain a few crystals more or less. The LM is a crystal with at least 10 MeV of reconstructed crystal energy, and energy higher than all its direct eight neighbors. The LM must be the only LM in the ROI, and the matched truth particle must be a simulated photon responsible for at least 20% of the reconstructed crystal energy. Precisely, for the LM we require the ratio

$$\begin{aligned} r^{\gamma _1}_\textrm{LM}=\frac{E^{\gamma _1\textrm{,crystal}_\textrm{LM}}_\textrm{dep}}{E^{\textrm{crystal}_\textrm{LM}}_\textrm{rec}}\ge 0.2. \end{aligned}$$
(1)

here, \(E^{\gamma _1\textrm{,crystal}_\textrm{LM}}_\textrm{dep}\) denotes the truth energy deposition of photon 1 in the LM, and \(E^{\textrm{crystal}_\textrm{LM}}_\textrm{rec}\) the reconstructed crystal energy in the LM. The crystals contained in the ROI are considered for the clustering by the GNN algorithm and significantly extend the \(5\times 5\) area considered by the baseline algorithm (Sect. 5). Furthermore, the ROI represents the area of the local coordinate system later used as an input feature, with the LM as the origin. Figure 1 (top) shows a typical isolated photon event with high beam background.

Fig. 1
figure 1

Typical event displays showing (left) simulated truth assignments, (center) input variables time, and (right) PSD hadronic energy ratio for (top) isolated and (bottom) overlapping photons for two example events with high beam background. The marker centers indicate the crystal centers, the marker area is proportional to the truth energy deposition for the left plots; it is proportional to the reconstructed crystal energy for the other plots

Overlapping Photons

Two different photons that deposit some of their energy in identical crystals are referred to as overlapping photons. To study overlapping photons, we use the simulated events with two overlapping photons only. We select events that have exactly two LMs that must fulfill the following selection criteria:

  1. a)

    each LM must have reconstructed crystal energies greater than 10 MeV,

  2. b)

    \(r^{\gamma _1}_\mathrm {LM_1}\ge 0.2\) and \(r^{\gamma _1}_\mathrm {LM_1}>r^{\gamma _2}_\mathrm {LM_1}\),

  3. c)

    \(r^{\gamma _2}_\mathrm {LM_2}\ge 0.2\) and \(r^{\gamma _2}_\mathrm {LM_2}>r^{\gamma _1}_\mathrm {LM_2}\).

We refer to criteria a)-c) as LM separation criteria since they ensure that the particles form two separate LMs. Additionally, events must meet the overlap criterion:

  1. d)

    each of the two photons must deposit at least 10 MeV energy in shared crystals within a \(5\times 5\) area around its respective LM.

Fig. 2
figure 2

Fraction of selected overlapping photon events in the barrel as a function of generated opening angle. The orange markers correspond to events fulfilling LM separation criteria (ac); the blue markers correspond to events that additionally pass the overlap criterion (d) (see text for details)


Figure 2 shows the fraction of events accepted by these selections as a function of the simulated opening angle. In the scope of this paper, we additionally require LMs to exclusively originate from simulated particles without additional LMs, e.g. from beam background, in the ROI, that is:

  1. e)

    the two LMs must be the only ones in the ROI and they must be truth-matched to the simulated photons.

Finally, we remove rare cases of small truth energy depositions and large backgrounds, by requiring:

  1. f)

    the crystal with the largest truth energy deposition of a photon must be within a \(5\times 5\) area around its corresponding LM.

We then create a ROI centered at the midpoint between the two LMs, calculated using the shortest distance between two LMs projected onto the surface of a sphere. The crystal closest to the midpoint is defined as the ROI center. The LM positions for this are determined by interpreting the global LM coordinates of their associated crystals as latitude and longitude. Figure 1 (bottom) shows an overlapping photon event with high beam background.

The truth energy deposition per photon and the reconstructed crystal energy \(E^\textrm{crystal}_\textrm{rec}\), crystal time \(t^\textrm{crystal}_\textrm{rec}\), crystal PSD information (see Sect. 3), and the LM positions within the ROI are recorded for each event.

Reconstruction Algorithms

Interactions of energetic photons in the Belle II ECL typically deposit energy in up to \(5\times 5\) crystals. The task of the clustering reconstruction algorithms is to select a set of crystals that contains all the energy of the incoming photon, but no energy from other particles or from beam background. Low beam background results in approximately \(17\%\) of all crystals in the ECL having significant reconstructed energy \(E^\textrm{crystal}_\textrm{rec} \ge 1\,\)MeV; for high beam backgrounds this number is expected to increase to about \(40\%\). This increase in the number of crystals to consider in the clustering, adds to the complexity of the reconstruction.

Baseline

The baseline algorithm is designed to provide maximum efficiency for cluster finding, contain all crystals from the incoming particle for particle identification, and select an optimal subset of the cluster crystals that provides the best energy resolution [21]. The clustering is performed in three steps. In the first step, all crystals are grouped into a connected set of crystals, so-called connected regions starting with LMs, as defined previously. In an iterative procedure all direct neighbors with energies above 0.5 MeV are added to this LM, and the process is continued if any neighbor itself has energy above 10 MeV. Overlapping connected regions are merged into one.

In the second step, each connected region is split into clusters, one per LM. If there is only one LM in the connected region, up to 21 crystals in a \(5\times 5\) area excluding corners centered at the local maximum are grouped into a cluster. If there is more than one LM in a connected region, the energy in each crystal of the connected region is assigned a distance-dependent weight and can be shared between different clusters. The distance is calculated from the cluster centroid to each crystal center, where the cluster centroid is updated iteratively using logarithmic energy weights. This process is repeated until all cluster centroids in a connected region are stable within 1 mm.

In a third step, an optimal subset, including the n highest energetic crystals of all non-zero weighted crystals that minimize the energy resolution, is used to predict the cluster energy \(E_{\textrm{rec}}^{\textrm{basf2}}\). n depends on the measured noise in the event, and on the energy of the LM itself. The noise level is estimated by counting the number of crystals in the event containing more than 5 MeV that have times t more than 125 ns from the trigger time. \(E_{\textrm{rec}}^{\textrm{basf2}}\) is also corrected already within basf2 for possible bias using simulated events. This bias includes leakage (energy not deposited in the crystals included in the energy sum) and beam backgrounds (energy included in the sum that is not from the signal photon). \(E_{\textrm{rec}}^{\textrm{basf2}}\) is the estimator for the generated energy of a particle.

The basf2 clustering algorithm also returns a cluster energy \(E_{\mathrm {rec,\, raw}}^{\textrm{basf2}}\) that is not corrected for energy bias. \(E_{\mathrm {rec,\, raw}}^{\textrm{basf2}}\) is the estimator for the deposited energy of a particle.

Graph Neural Network Architecture

GNN architectures have shown that they are powerful network types to deal with both irregular geometries and varying input sizes. In this work, all crystals of an ROI with an energy deposition above 1 MeV are interpreted as nodes in a graph, which leads to variable input sizes and is thus a good use case for GNNs. The implementation of this GNN is done in PyTorch Geometric  [30].

The input features consist of crystal properties and crystal measurements: The global coordinates \(\theta\) and \(\phi\) of each crystal, the local coordinates \(\theta ^\prime\) and \(\phi ^\prime\) with respect to the ROI center, the crystal mass, and the LM(s) (in one-hot encoding) represent crystal properties. The crystal energy \(E^\textrm{crystal}_\textrm{rec}\) in GeV, the time \(t^\textrm{crystal}_\textrm{rec}\) in \(\mu\)s, and the PSD fit type, PSD \(\chi ^2\), and PSD hadronic energy ratio are crystal measurements used as input features. Pre-processing scales the input uniformly before further processing with the GNN: All features are min-max normalized to an interval of [0, 1] with the exception of \(t^\textrm{crystal}_\textrm{rec}\) and the PSD hadronic energy ratio which are both normalized to the interval \([-1,1]\). The global coordinates and the crystal masses are normalized based on the range of coordinates and masses of all crystals in the detector instead of only the ones in the ROI. Additionally, we average each input feature over all nodes in the ROI and concatenate the averaged input features as additional inputs, thus enabling a global exchange of information.

As displayed in Fig. 3, our model is built out of four so-called GravNet  [19] blocks of which the concatenated outputs are passed through three dense output layers with a final softmax activation function. Each GravNet block features three dense layers at the beginning of the block, the initial two of which with ELU [31] activation functions and the last one with a \(\tanh\) activation function. The dense layers feed into a GravNet layer and the overall GravNet block is concluded by a batch normalization layer [32]. The GravNet layer is responsible for the graph building and subsequent message passing between the nodes of the graph. It first translates the input features into two learned representation spaces: one representing spatial information S while the other, denoted \(F_\textrm{LR}\), contains the transformed features used for message passing. In the second step, each node is connected to its k nearest neighbors defined by the Euclidean distances in S, thus creating an undirected, connected graph. For each node, the input features of connected nodes are then weighted by a Gaussian potential depending on the distance in S and aggregated by summation. The resulting features are concatenated with the GravNet input features and, after batch normalization, passed to the next GravNet block and to the dense output layers.

The implementation in the present work follows the concept of fuzzy clustering which refers to the partial assignment of individual crystals to several clustering classes. Consequently, the GNN predicts weights \(w_i^\textrm{X}\) that indicate the proportion of the reconstructed energy \(E^\mathrm {crystal_i}_\textrm{rec}\) in a crystal i that belongs to a clustering class X. For models used with isolated photons, \(\textrm{X}\in \{\gamma _{1},\textrm{background}\}\), for models with overlapping photons \(\textrm{X}\in \{\gamma _{1},\gamma _{2},\textrm{background}\}\). As a loss function, we then use the Mean Squared Error (MSE) between the true and predicted weights summed over all classes and crystals. The training is stopped when there has been no improvement for 15 epochs in the optimization objective. For low beam background models that objective is the MSE loss on the validation data set, whereas the high beam background models employ the more high-level FWHM\(_\textrm{dep}\) (Sect. 6) on the validation data set.

Fig. 3
figure 3

An illustration of the GNN architecture. Each pair of gray, square brackets represents one GravNet block consisting of dense layers, a GravNet layer and a batch norm layer. The input features describe the feature vector of one node. The global exchange denotes appending the average each input features over all nodes in the ROI

Table 1 Optimized hyperparameters of the isolated photon, and overlapping photon GravNet models
Fig. 4
figure 4

Comparison of (a) truth energy fractions, (b) reconstructed energy fraction by the GNN, and (c) reconstructed energy fraction by basf2 for an example event with high beam background. Colors indicate the fractions belonging to each photon or background. The marker centers indicate the crystal centers, the marker area is proportional to the truth or reconstructed (GNN, basf2) energy deposition respectively

Hyperparameters have been chosen through a hyperparameter optimization using Optuna [33]. The optimization is done with respect to the FWHM\(_\textrm{dep}\) (Sec. 6) instead of the loss function. We optimize the two models trained for high beam backgrounds and use the respective hyperparameters also for the corresponding low beam background models. The final hyperparameters for both the isolated photon models and the overlapping photon models are shown in Table 1.

The learning rate, the number of dense layers in each GravNet block, and all dimensions of the output layers have been manually optimized by testing a reasonable range of values. The learning rate is set to 5 \(\times \, 10^{-3}\) and is subject to a decay factor of 0.25 after every five epochs of stagnating validation loss. We did not observe significant over-training and as a consequence, we do not use dropout layers or other regularization methods but rely on the large data set.

The GNN algorithm yields the weights \(w_i^\textrm{X}\) per crystal for all crystals in the ROI with an energy deposition above 1 MeV. In order to reconstruct the total cluster energy \(E_{\textrm{rec}}^{\textrm{GNN}}\) associated with a certain particle, we then sum over all specific weights multiplied by the reconstructed energies per crystal, \(E_{\textrm{rec}}^{\textrm{GNN}} =\sum w_i^\textrm{X}E^\mathrm {crystal_i}_\textrm{rec}\).

Figure 4 shows how the GNN and the basf2 algorithms behave in clustering a typical case of overlapping photons.

Metrics

For performance evaluation, the reconstructed energy of a particle is compared with two different truth targets: the total deposited truth energy \(E_{\textrm{dep}}\) per photon in the ROI, and the generated truth energy \(E_{\textrm{gen}}\) per photon. This results in two variants of relative reconstruction errors. The reconstruction error on the deposited energy

$$\begin{aligned} \eta _\text {dep} ^{\text {basf2}}&= \frac{E_{\mathrm {rec,\, raw}}^{\textrm{basf2}}-E_{\textrm{dep}}}{E_{\textrm{dep}}}\quad \text {and}\nonumber \\ \eta _\text {dep} ^{\text {GNN}}&= \frac{E_{\textrm{rec}}^{\textrm{GNN}}-E_{\textrm{dep}}}{E_{\textrm{dep}}} \end{aligned}$$
(2)

gives access to the energy resolution ignoring leakage and other detector effects. It is a direct evaluation of the clustering performance of an algorithm.

On the other hand, the reconstruction error on the generated energy

$$\begin{aligned} \eta _\text {gen} ^{\text {basf2}}&= \frac{E_{\textrm{rec}}^{\textrm{basf2}}-E_{\textrm{gen}}}{E_{\textrm{gen}}}\quad \text {and}\nonumber \\ \eta _\text {gen} ^{\text {GNN}}&= \frac{E_{\textrm{rec}}^{\textrm{GNN}}-E_{\textrm{gen}}}{E_{\textrm{gen}}} \end{aligned}$$
(3)

factors in all detector and physics effects and quantifies how much of the improvements to the underlying clustering carry over to downstream physics object reconstruction.

Fig. 5
figure 5

Example distribution of the relative reconstruction error \(\eta _\text {gen}\) of the generated energy and illustration of the bias correction, the \(\text {FWHM}\), and the tail ranges

Evaluating both algorithms on a large number of simulated photons yields peaking distributions in both reconstruction errors \(\eta _\text {dep}\) and \(\eta _\text {gen}\). Both distributions are potentially biased because of energy leakage and the presence of beam backgrounds (see Sect. 5.1). We perform a binned fit using a double-sided crystal ball [34, 35] function as probability density function (pdf) with the kafe2 [?] framework. We shift all reconstruction error distributions independently by a multiplicative factor to correct the difference between the fitted peak position and zero (Fig. 5). Since \(\eta _\text {dep}\) and \(\eta _\text {gen}\) are asymmetric distributions, we repeat this procedure until the difference between the fitted peak position and zero is less than 0.002. This procedure usually converges within two or three iterations.

We then determine the full width half maximum (\(\text {FWHM}\)) of the final shifted distributions in \(\eta _\text {dep}\) and \(\eta _\text {gen}\), yielding \(\text {FWHM}_{\text {dep}}\) and \(\text {FWHM}_{\text {gen}}\) respectively. The uncertainty on the \(\text {FWHM}\) is calculated from the uncertainties of the fit parameters. In addition to the \(\text {FWHM}\), we determine the tails of the reconstruction error distribution. The left and right tails \(T_\text {L,R}\) are calculated as the 95th percentile when ranking the unbinned events on the respective side of the peak position, as given by the fit parameters, in ascending order (\(T_\text {R}\)) and descending order (\(T_\text {L}\)) respectively. Propagating the uncertainty on the peak position as given by the fit yields the uncertainty on \(T_\text {L,R}\).

Results

The first sections of the results focus on detailed studies of isolated clusters. Section 7.4 then introduces overlapping clusters and their effects on the performance. Figure 6 shows examples for the distributions of both reconstruction errors \(\eta _\text {dep}\) and \(\eta _\text {gen}\), as well as the fit results for events with low beam background. Figure 7 shows the equivalent distributions for events with high beam background.

The \(\eta _\text {gen}\) distributions are wider because the reconstruction error includes the effects of leakage which result in missing energy with respect to the generated photon energy. This only affects the left-side tails.

Fig. 6
figure 6

Distribution of relative reconstruction errors (a\(\eta _\text {dep}\) and (b\(\eta _\text {gen}\) for isolated clusters for low beam backgrounds. The first bin contains all underflow entries; the last bin contains all overflow entries

Fig. 7
figure 7

Distributions of relative reconstruction errors (a\(\eta _\text {dep}\) and (b) \(\eta _\text {gen}\)   for isolated clusters for high beam backgrounds. The first bin contains all underflow entries; the last bin contains all overflow entries

In the following subsections, we are comparing the performance of the GNN and the basf2 reconstruction algorithms for different detector regions for low and high beam backgrounds by evaluating the energy resolution \(\text {FWHM}_{\text {gen}}/2.355\) and the tail parameters. We then analyze the GNN in more detail by testing the input variable dependencies and the robustness against differences in beam background levels between training and evaluation.

Energy Resolution and Energy Tails

The three detector regions barrel, forward endcap, and backward endcap described in Sect. 3 differ in crystal geometry, levels of background, and amount of passive material before and in between crystals. The following section studies the variations in the energy reconstruction performance that arise as a direct result of these differences.

Fig. 8
figure 8

Resolution \(\text {FWHM}_{\text {gen}}/2.355\) of the GNN and basf2 as function of the simulated photon energy \(E_{\textrm{gen}}\) for both endcaps and the barrel for (a) low and (b) high beam background. Each color is associated with one detector region; the light color indicates basf2, the dark color the GNN. The bands indicate the uncertainty of the fits, see text for details. The fit parameters are summarized in Table 2

In order to access the energy dependence of the resolution and tail parameters we simulate test data sets of photons at various fixed energies. The \(\text {FWHM}\) for each simulated data set is then determined according to Sect. 6. Plotting the resolutions \(\text {FWHM}_{\text {gen}}/2.355\) over the generated photon energies \(E_\textrm{gen}\) reveals a characteristic relationship that is parameterized by the function \(a / E_\textrm{gen} \oplus b / \sqrt{E_\textrm{gen}} \oplus c\), where \(\oplus\) indicates addition in quadrature.

Both the GNN as well as the baseline algorithm perform differently in regards to the energy resolution in all three detector parts, as can be seen in Fig. 8a for low beam background and as Fig. 8b for high beam background. Table 2 reports the parameters of the fitted parameterization of the resolution. We attribute these difference to the large spread of both shape and size of crystals in the endcaps, the asymmetric distribution of beam backgrounds, and the different amount of passive material in front of the different detector regions.

Overall, the energy resolution of the GNN algorithm is significantly better than the baseline algorithm for all photon energies. The GNN energy resolution is better by more than 30% for photon energies below \(500\,\mathrm {\,Me\hspace{-1.00006pt}V}\) which is the energy range of more than 90% of all photons in B-meson decay chains. The higher the beam background, the larger the difference between the GNN and the baseline algorithm. The difference between the two algorithms decreases with energy because the relative contribution of beam backgrounds to the photon energy resolution decreases.

The shape of the left-side tails is dominated by passive material and is hence expected to be different in the different detector regions. The left-side tails are almost independent of beam backgrounds as can be seen by comparing Fig. 9a for low beam background and Fig. 9c for high beam background. The GNN and the baseline algorithm both show the smallest tail length for the barrel region with decreasing tail lengths for increasing energy. The left-side tails are largest in the backward endcap due to the highest ratio of passive to active material as expected. The right-side tails are mostly originating from beam background being wrongly added to photon clusters. The GNN produces shorter tails than the baseline algorithm for all energies and for both low and high beam backgrounds, with the performance difference increasing for lower energies and higher beam backgrounds.

Table 2 Fit results (\(a/E_\textrm{gen}\oplus b / \sqrt{E_\textrm{gen}} \oplus c\)) of the fits shown in Fig. 8
Fig. 9
figure 9

95% left- and right tail lengths \(T_L\) and \(T_R\) of \(\eta _\text {gen}\) for the GNN and basf2 as function of the simulated photon energy \(E_{\textrm{gen}}\) for both endcaps and the barrel for (a and b) low and (c and d) high beam background. Each color is associated with one detector region

Beam Background Robustness

The beam background levels are changing continuously during detector operations. Ideally, reconstruction algorithms at Belle II are insensitive to such changes. The basf2 baseline algorithm achieves robustness against increasing beam backgrounds by adaptively including fewer crystals in the energy sum calculation. Since our GNN is trained with a large number of events with event-by-event fluctuations of beam backgrounds, we expect robustness against varying beam backgrounds if the GNN generalizes well enough. We test the robustness of our GNN by comparing GNNs trained and tested on the same backgrounds, against GNNs trained and tested on the two different beam backgrounds (Fig. 10, parameterization in Table 3). While the GNNs trained on the same beam backgrounds achieve a better resolution than the ones trained on different beam backgrounds, the GNN still outperforms the baseline algorithm even for networks trained on the different beam backgrounds. This demonstrates an promising generalization with respect to different levels of beam backgrounds.

Fig. 10
figure 10

Resolution \(\text {FWHM}_{\text {gen}}/2.355\) as a function of the simulated photon energy \(E_{\textrm{gen}}\) for the GNNs trained with low beam background (LBB GNN) and high beam background (HBB GNN) in the barrel. The color is associated with the evaluation on either beam background; the dark color indicates the model trained with the beam background identical to the evaluation, and the light color indicates the model trained with the respective other beam background. The bands indicate the uncertainty of the fits, see text for details. The fit parameters are summarized in Table 3. The resolution of the basf2 algorithm is shown for comparison

Table 3 Fit results (\(a /E_\textrm{gen} \oplus b / \sqrt{E_\textrm{gen}} \oplus c\)) of the fits shown in Fig. 10 for the GNN trained with low beam background (LBB GNN) and high beam background (HBB GNN)

Input Parameter Dependency

As discussed in Sect. 3, multiple input features are available for the GNN, while the basf2 algorithm uses crystal position and energy only. This section presents a study of the influence of the input features on the \(\text {FWHM}\). For that, the architecture described in Sect. 5.2 is trained on isolated photon events with low or high beam backgrounds using different combinations of input features. The 200,000 events from the respective validation data set, as described in Sect. 4, are used for inference. The data set covers an energy range of \(0.1< E_{\textrm{gen}} < 1.5\,\text {GeV}\) and the full detector range \(17^{\circ }< \theta _{\textrm{gen}} < 150^{\circ }\) and \(0^{\circ }< \phi _{\textrm{gen}} < 360^{\circ }\), each of which in uniform distribution. The \(\text {FWHM}\) of \(E_{\textrm{gen}}\) and \(E_{\textrm{dep}}\) is calculated as described in Sect. 6. All GNNs use the global crystal coordinates, the LM position, and the crystal mass as input features. A comparison of the \(\text {FWHM}\) for the different additional input features is shown in Table 4. The results show, that even for the minimal set of input variables, the GNN’s \(\text {FWHM}\) is smaller than basf2 ’s for both the deposited and the generated energy in both beam background scenarios. Adding local coordinates leads to small improvements and using time information brings significant improvement in the GNN performance. PSD information has almost no effect on the \(\text {FWHM}\). Since the main purpose of the PSD information is to differentiate electromagnetic and hadronic interactions per crystal, this is expected. In anticipation of future extensions of the GNN to hadronic interactions as well, the PSD information is kept throughout this work.

Table 4 Comparison of the performances of GNN models with different additional input features, and the performance of the basf2 baseline

Overlapping Photons

When discussing overlapping photon events, it is important to note that the FWHM of the photon energy distribution not only depends on its own properties but also on the properties of the second photon present. To account for that, the evaluation is split in energy bins of [0.1, 0.2], [0.2, 0.5], [0.5, 1.0], and [1.0, 1.5] \(\mathrm {\,Ge\hspace{-1.00006pt}V}\) for both photons respectively. We report the FWHM of the first photon for different simulated energies of the second photon for low beam backgrounds (see Table 5) and high beam backgrounds (Table 6).

The GNN provides a better \(\text {FWHM}\) for all combinations, but the improvement is most significant if the photon is low energetic. For low beam backgrounds, the GNN improves the \(\text {FWHM}\) by up to 20% for photons with simulated energies between \(0.1< E_{\textrm{gen}} < 0.2\) \(\mathrm {\,Ge\hspace{-1.00006pt}V}\). For high beam backgrounds, the GNN improves the \(\text {FWHM}\) by more than 35% for photons with simulated energies between \(0.1< E_{\textrm{gen}} < 0.2\) \(\mathrm {\,Ge\hspace{-1.00006pt}V}\).

The result shows that the significant performance improvement observed for isolated photons can also be achieved for the more complicated overlapping photon signatures.

Conclusion and Outlook

In this work, we have presented a complete study of a GNN-based fuzzy clustering algorithm for the Belle II electromagnetic calorimeter. We have been using a realistic full detector simulation and simulated beam background for low and high luminosity conditions of Belle II. The GNN algorithm has been compared to the currently used basf2 baseline algorithm. We find a significantly improved resolution of more than 30% for high beam backgrounds, but also improved performance in reducing the right-side tails of the reconstruction errors that are caused by beam background. Such significant improvements in photon reconstruction performance directly improve the physics reach of Belle II for almost all final states with photons, but also analyses that use missing energy information [21]. We also trained different GNNs to separate energy depositions of overlapping photon clusters. The improvement of the energy resolution is up to 30% for the low energy photon in asymmetric photon pairs. Any improvement in overlapping photon reconstruction has direct implications for the reconstruction of boosted \(\pi ^0\) mesons or axion-like particles with couplings to photons [36].

While the basf2 algorithm strictly reconstructs one cluster for each LM, the GNN algorithm only uses the LMs to center the ROI. The GNN algorithm can therefore in principle also be used to reconstruct overlapping photons that only produced one LM (Fig. 11). The extension of the GNN algorithm to such overlapping signatures as well as to charged particles and neutral hadrons will be the focus of follow-up work. Future work is also going to address robustness against varying beam backgrounds explicitly, for example by introducing features that are directly sensitive to beam-background levels.

Fig. 11
figure 11

Comparison of truth energy fractions (a), the reconstructed energy fraction by the GNN (b), and the reconstructed energy fraction by basf2 (c) for one example event with only one local maximum. Colors indicate the fractions belonging to each photon or background. The marker centers indicate the crystal centers, the marker area is proportional to the reconstructed energy in each crystal

This is the first application of a GNN-based clustering algorithm at Belle II for a realistic detector geometry and realistic and high beam backgrounds. This is also the first time that an algorithm has shown to improve the performance of the photon reconstruction by explicitly including timing information on clustering level at Belle II.

Table 5 \(\mathrm {FWHM_{gen}}\times 10^{2}\) of one photon with photon energy \(E_\gamma ^{(1)}\) in dependence of the second photon energy \(E_\gamma ^{(2)}\) for low beam background for the full detector (barrel and endcaps combined)
Table 6 \(\mathrm {FWHM_{gen}}\times 10^{2}\) of one photon with photon energy \(E_\gamma ^{(1)}\) in dependence of the second photon energy \(E_\gamma ^{(2)}\) for high beam background for the full detector (barrel and endcaps combined)