1 Introduction

Both existing high-energy physics experiments, such as those at the CERN LHC, and future experiments at future colliders, like the Future Circular Collider (FCC), rely heavily on the performance of hadron calorimeters and their particle flow capabilities for measuring jet and missing transverse momentum (\(p_{T}\)) [1,2,3,4,5,6,7,8,9]. Hadron calorimeters are currently characterized not only in terms of their intrinsic energy resolution, but by their imaging capabilities, which allow for offline corrections using smart algorithms. Due to the diverse composition of hadronic showers and the differences in the calorimeter response, a correct energy measurement becomes challenging. In general, the components of hadronic showers can be divided into electromagnetic (EM) and hadronic parts. The hadronic part of the shower consists of particles such as neutrinos and neutrons which are partially invisible to the detector. This can be affected by the chosen active detector material, where, e.g., plastic scintillators allow for neutron detection via strong interaction with the atomic nucleus. The undetectable particles in the hadronic shower result in an unequal detector response; that is, \(e/h\ne 1\), where e and h are the calorimeter response to electromagnetic and hadronic shower fractions, respectively.

Many hadronic calorimeters currently in use and planned for future experiments are sampling calorimeters, which consist of alternating active and passive absorber layers [10,11,12,13]. The sampling of the hadronic shower allows for tuning of the hadronic and electromagnetic shower responses. In the past, the e/h ratio has been adjusted closer to 1 by either suppressing the electromagnetic response, e.g., by using high-Z absorbers, or by enhancing the hadronic response, using neutron-sensitive active materials. Calorimeters that have a ratio \(e/h\sim 1\) are called “compensating” calorimeters. These optimizations in the active and passive materials often require a decreased sampling fraction (ratio of active/passive material), which itself degrades the calorimeter energy resolution by increasing the stochastic term \(\alpha \) of

$$\begin{aligned} \frac{\sigma _{E}}{\langle E\rangle }=\frac{\alpha }{\sqrt{E}} \oplus c. \end{aligned}$$
(1)

The stochastic term is dominated by the sampling fraction (per layer) and the frequency (the number of layers) for sampling calorimeters, and expresses the dependence of the calorimeter resolution on the fluctuations of the number of particles within the hadronic shower (following a Poisson distribution). The constant term c expresses linearly energy-dependent uncertainties, such as energy losses due to particles escaping the detector, caused by limited calorimeter sizes. The fluctuations on the EM-to-hadronic shower fraction increase logarithmically with energy and can thus contribute to both terms. This contribution can be removed either by intrinsic compensation, or by an event-by-event measurement of the EM fraction, which is called software compensation.

Due to the cost and mechanical stability benefits, absorbers made of steel or lead are widely in use. These materials have been found to require very small sampling fractions in, e.g., scintillator-steel calorimeters in order to achieve compensating behavior. Since such low sampling fractions would degrade the performance, especially for particles at low energies (\(<50\) GeV), the solution to correct for fluctuations in the electromagnetic shower fraction is to use software compensation techniques.

In order to allow algorithms to distinguish between the dense electromagnetic shower core and other shower parts, e.g., disappearing tracks, the granularity of the calorimeter plays a key role. The first attempt in so-called imaging calorimetry has been made by the CALICE collaboration, which started a R&D program of calorimeters for a future e\(^{-}\)e\(^{+}\) linear collider [14, 15], where the calorimeter designs have been optimised for particle flow algorithms [5]. These algorithms allow for jet energy measurements using the best suited sub-detector to reconstruct each jet particle. The prototypes of these calorimeters have been realised with active layers made of silicon for the EM shower part and scintillator or resistive plate chambers for the measurement of hadronic showers. The active layers were tested and interleaved within both steel and tungsten absorber stacks [16, 17] and achieved such good results in test beams [18] that the CMS Collaboration decided to adopt this concept in a full silicon-tungsten/scintillator-steel endcap calorimeter [12, 19]. The developments in, e.g., silicon photomultiplier (SiPM) technologies have been key to measuring the scintillation light produced in calorimeter cell sizes of \(3\times 3\times 0.5\) cm\(^{3}\) [20]. The impact of software compensation techniques on the performance of particle flow algorithms has been studied in a specific detector design [9], and proven to provide a significant improvement to the jet energy measurement by using a corrected calorimeter cluster which is matched to tracks in the tracking system.

The next step towards a calorimeter design optimized for the use of software compensation techniques is to study the necessary granularity that allows an algorithm to determine most accurately the hadronic shower energy.

In this paper, we will discuss the performance of a software compensation technique using a deep neural network (DNN), with a specific focus on the dependence on the transverse and longitudinal granularity. For the purpose of this study, we consider a homogeneous idealistic calorimeter simulated with Geant4  [21]. The performance is evaluated in terms of energy resolution and linearity for single charged pions. The resolution of particle-flow algorithms is also limited by the accuracy of the association between charged particle tracks in the tracker and energy depositions in the calorimeter. In this context it has been shown that DNNs can provide a new avenue for particle flow in general [22, 23]; this, however, is beyond the scope of this paper. Here, we show how a DNN can be utilised to approximate a generic close-to-optimal reconstruction algorithm that can be optimised to the granularity in an automated fashion. This can help pave the way towards a more ambitious global optimisation of detector design parameters as suggested, e.g., in Ref. [24].

Table 1 Granularity configurations considered in this analysis

2 Calorimeter and dataset

The studied calorimeter is a homogeneous lead tungstate calorimeter, which follows the EM calorimeter concept of the CMS experiment [25]. In order to concentrate this study on the capabilities of the DNN to correct e/h fluctuations, we do not consider any passive absorber material in the following. But the impact of sampling fluctuations has been tested and the results are summarized in Appendix A for the example of a full PbWO\(_4\) calorimeter with a passive layer fraction of 95%. Qualitatively, the sampling calorimeter shows similar behavior to the homogenous calorimeter studied, but at present the effect of the sampling term in the resolution is not sufficiently well measured to form any precise conclusions.

The dimensions are \(1\times 1\times 2.5\) m\(^3\), which ensures complete shower containment within the calorimeter volume and corresponds to \(10.4\,\lambda \) and \(280\,X_{0}\) of total depth. The transverse segmentation is increased from no segmentation up to \(30\times 30\) segments in x and y (designated stages A–F), and from 1 to 60 segments (designated stages 0–7) in the longitudinal direction. A list of the configurations can be found in Table 1.

The data set consists of approximately \(5\times 10^6\) charged pion events, generated using the FTFP_BERT physics list of Geant4  [21] 10.04 patch 0. The training data set comprises pions with energies sampled from a flat distribution between 1 and 110 GeV. The test data set covers 11 discrete energies of 5 to 105 GeV in 10 GeV steps. In both cases, pions are shot at the calorimeter center with a normal incident angle. The training data set covers a slightly larger energy range to suppress bias effects caused by a difference between the mean and the expectation value of the reconstructed energy at the edges during training. The Geant4 simulation has been performed in the highest granularity, while for the tests and training of different segmentation configurations, the same dataset has been used. For this purpose, the energy deposits (sum of total energy losses over time of the event) in the cells have been merged corresponding to the tested cell sizes. This method avoids inconsistencies that are otherwise to be expected due to the different number of surfaces and material borders through which Geant4 propagates the particles.

3 Neural network architecture and training

At the core of the neural network architecture used here is a software compensation block that uses convolutional neural network (CNN) layers [26] to achieve local identification of the subshowers. Due to the regular grid-like structure of the calorimeter, graph neural networks such as discussed e.g. in Ref. [27] are not necessary. The architecture chosen here is similar to the one introduced in Ref. [13], which is used as a subblock in the overall model. This subblock consists of 3 parallel paths: in the first path, the energy of all cells within the kernel range K is summed up and forwarded to the next block, while this kernel is moved with a stride of size K; the second path consists of a CNN layer with the same kernel size and \(F=16\) filters; and the third path contains in total three subsequent CNN layers, out of which the first two have kernel sizes (in x, y, and depth) of \(K_a = (1,\ k,\ 3)\) and \(K_b=(k,\ 1,\ 3)\), with no stride and 32 filters, each. Here, k is an adjustable parameter depending on granularity, as described later. The final layer of this path is a CNN layer with a kernel size of K with a stride of K and F filters, such that the output of all paths can be combined. This combination is done by adding the output of the CNN layers of all paths feature by feature. All layers use a tanh activation function. The weights of the layers in the third path are initialised with a Gaussian distribution centred at 0 with a width of \(10^{-3}\), and receive a small L2 regularisation of \(10^{-5}\). This structure is optimised to derive small corrections to the simple energy sum by detecting the different shower shape of electromagnetic subshowers.

In the final model, the input is passed through a batch normalisation layer [28], normalising all inputs except for the per-cell energy. If fewer than 6 calorimeter layers are present or the transverse granularity in either direction is less than 6, the input is directly flattened and passed to 3 dense layers, the first two of which contain 128 and 64 nodes using ELU activation [29], before being finally passed to the energy prediction layer with 1 node. In all other cases, the input is first passed through a set of the subblocks described above before being fed through the same structure with dense layers. These subblocks adapt to the input: if the corresponding granularity is less than \(6\times 6\) cells in the transverse directions, a stride of \(1\times 1\) is used, and the input k for the kernel size determination is set to \(k=1\). Otherwise, a stride of \(2\times 2\) and \(k=3\) are used in these directions. The subblock is repeated until the dimensionality in x, y, or depth is less than or equal to 6. At this point, the output is fed to the three final dense layers.

The model is trained using the Adam optimiser [30] using TensorFlow [31] and Keras [32] within the DeepJetCore framework [33]. The training consists of five steps: the first four steps use a loss function \(L_\mathrm {calo}\) that follows the expected calorimeter resolution:

$$\begin{aligned} L_\mathrm {calo} = \frac{(E_\mathrm {true} - E_\mathrm {pred})^2}{E_\mathrm {true}} \mathrm {,} \end{aligned}$$
(2)

where \(E_{\mathrm {pred}}\) is the energy of the particle predicted by the DNN. These steps are trained for 1, 19, 60, and 20 epochs with learning rates of \(10^{-4}\), \(10^{-4}\), \(10^{-5}\), and \(10^{-5}\), and batch sizes of 256, 512, 1280, and 1280, respectively. Between the third and fourth step, the batch normalisation is frozen.

The mean and expectation value for \(E_\mathrm {true}\) differ at the edges of the training sample. This typically leads to edge effects, which introduce a bias towards higher predicted values at the low edge, and towards lower predicted values at the high edge. To mitigate this effect, we freeze all layers except for the last dense layers, and introduce a loss that follows a \(\chi ^2\) distribution taking the difference of the average predicted and truth energy in bins of \(E_\mathrm {true}\), and accounting for the number of samples in that bin. The bin boundaries are randomly chosen for each batch to avoid a global bias. Using this loss, the model is trained for another 50 epochs with a learning rate of \(10^{-5}\) and a batch size of 1280.

4 Results

The energy resolution is evaluated as the ratio of the width to the most probable value of the distribution of the reconstructed energy. These distributions, as shown for example in Fig. 1a, follow a Gaussian function. The standard deviation can thus be extracted from a fit. This fit is limited within 2\(\sigma \) around the most probable value \(\mu \), following the procedure widely used in calorimeter performance studies. As a comparison and validation, the energy resolution has also been evaluated from the root mean square (RMS) and mean, which is sensitive to the tails of the distribution. The energy resolution over the full available energy range is shown for stage 4, which corresponds to a granularity of 15 longitudinal layers, in Fig. 1b. The points are fitted following Eq. (1), and the values of the stochastic and constant term are shown in the legend. The constant term is set to 0 if the fitted value deviates from 0 by less than its uncertainty. An overall 10–20% degradation in energy resolution from the Gaussian fit to the RMS method is observed. In the following, the energy resolutions obtained for different granularities will refer to the results obtained from the Gaussian fit.

Fig. 1
figure 1

Results for a scenario with 15 longitudinal layers (stage 4) and no transverse segmentation (stage A). a Energy distribution for 45 GeV pions. The width as computed from the Gaussian fit (black line) and from the RMS are shown. b Energy resolution as a function of the particle energy. The resolution is computed two ways, using the Gaussian fit (open circles) and using the RMS (filled squares)

Table 2 Summary of energy resolution fit results. The top set shows the different longitudinal segmentation scenarios with no transverse segmentation, while the other two sets show two specific longitudinal stages with different transverse segmentation scenarios, as described in Table 1. In the last row the result of a simple energy sum is given as reference

The results, in terms of the stochastic term \(\alpha \) and constant term c for all studied longitudinal and transverse granularities, are summarized in Table 2. The theory of the different contributions to the energy resolution of hadronic showers [34] considers that the stochastic term is in fact a quadratic sum of two major effects, \(\alpha = \alpha _{\mathrm {int}}\oplus \alpha _{\mathrm {sampl}}\), where the first intrinsic term is irreducible and determined by the fluctuations of the initial energy that is transformed into ionising shower particles, and the second is the term due to the sampling fraction. These fluctuations are material dependent, due to material-dependent nuclear binding energy losses, and have been found to be on the order of 19%/11% in the ZEUS uranium/lead-scintillator calorimeter prototypes [35].

We assume that the DNN is able to identify and re-weight the electromagnetic and hadronic shower fractions, due to the topological differences of EM and hadronic subshowers (\(\lambda _{\pi }/X_{0}\sim 27\)). Thus, we expect the stochastic and constant terms to improve with respect to an energy measurement based on a simple sum over calorimeter cells, and to decrease with increased granularity. Table 2 shows the resulting measured stochastic and constant terms (using both the Gaussian fit and the RMS to obtain the resolution) for three different sets of scenarios: first, the different longitudinal granularities with no transverse segmentation, the results for which are plotted in Fig. 2; second, longitudinal stage 0 with different transverse granularities (Fig. 3); and third, longitudinal stage 5 with different transverse granularities (Fig. 4). For reference, we also compare the results obtained with the DNN with a simple energy sum over all energy deposits in the calorimeter cells. This does not include any further energy calibration, which is visible in a significant deviation from unity in the linearity (compare Fig. 2a). This could however easily be recovered with standard methods of energy calibration. The observed increase in the response with energy corresponds to the increasing EM fraction within the hadronic shower. Overall, at the finest granularities, we observe that the constant term goes to zero, while the stochastic term decreases by approximately 50% with respect to the scenario with no segmentation, reaching a minimum of 8%, which can be considered as an upper limit on the intrinsic stochastic term \(\alpha _{\mathrm {int}}\). The difference between the parameters from the RMS and the Gaussian fit is indicative of a contribution from moderately pronounced tails, which are also shown in Fig. 1.

The constant term is consistently removed as soon as the first segmentation in transverse granularity into \(3\times 3\) cells is implemented. Figure 5 shows an event display of a 35 GeV pion shower; the bottom shows the impact of a \(3\times 3\) transverse segmentation. We can see that already at this stage, a significant enough energy fraction of about 12% (shown as \(\langle E_{\mathrm {out}}/E_{\mathrm {tot}}\rangle \) in the legend) is found in the outer quadrants. In comparison, a single shower is represented in 3D on the top, and visualises the imaging power of the finest chosen granularity of the homogeneous PbWO\(_4\) calorimeter.

Fig. 2
figure 2

Energy resolution (a) and linearity (b) for different longitudinal granularities and no transverse segmentation, compared to a simple energy sum. The curves correspond to the fit with Eq. (1)

Fig. 3
figure 3

Energy resolution (a) and linearity (b) for different transverse granularities with 1 longitudinal layer (stage 0). The curves show the fit to the form given in Eq. (1)

Fig. 4
figure 4

Energy resolution (a) and linearity (b) for different transverse granularities with 20 longitudinal layers (stage 5). The curves show the fit to the form given in Eq. (1)

Fig. 5
figure 5

a A 3D view of a 35 GeV pion shower in the homogeneous PbWO\(_4\) calorimeter at the finest granularity. The color code as well as the box sizes correspond to the amount of energy recorded in the calorimeter cells. b A front view of the average pion shower at 35 GeV over 2646 events, with a grid overlaid corresponding to the coarsest applied transverse segmentation and an original binning corresponding to the finest segmentation

Figure 6 summarizes the energy resolution as a function of longitudinal and transverse granularity. We observe that the behavior of the resolution as a function of granularity exhibits the same pattern regardless of the incident particle energy. For the transverse granularity, the resolution reaches an optimal value at a cell size of \(\approx 1\lambda _{\pi }\), and finer segmentation does not yield any appreciable further benefit. In the longitudinal direction, the energy resolution continues to improve as the layer size is decreased, reaching the minimum at the finest granularity considered (\(\approx 0.2\lambda _{\pi }\) or \(\approx 5 X_0\)).

Figure 7 summarizes the fitted parameters \(\alpha \) and c in the energy resolution function in Eq. (1), as a function of longitudinal and transverse granularity. In the transverse direction, we observe that the constant term goes to zero at a cell size of 1.4 \(\lambda _{\pi }\) (\(37 X_0\)), and further decrease in the cell size does not further improve the stochastic term \(\alpha \). In the longitudinal case, a layer width smaller than 10 \(X_0\) results only a minor improvement of about 10%, which suggests that layer widths of about 10 \(X_0\) could offer a good balance between the obtained resolution and the detector complexity.

Fig. 6
figure 6

Energy resolution as a function of the longitudinal (a) and transverse (b) granularity. Two different particle energies are considered: 15 GeV (black circles), and 85 GeV (red squares). In the upper plot, no transverse segmentation is used, while on the bottom, two different longitudinal segmentations are shown: 1 layer (dashed lines) and 20 layers (solid lines)

Fig. 7
figure 7

Values of the parameters \(\alpha \) (black line) and c (blue line) in the energy resolution function \(\frac{\sigma _{E}}{\langle E\rangle }=\frac{\alpha }{E} \oplus c\) as a function of the a longitudinal and b transverse granularity. In the upper plot, no transverse segmentation is used, while on the bottom, two different longitudinal segmentations are shown: 1 layer (dashed lines) and 20 layers (solid lines)

5 Conclusions

When calorimeters are designed for new high-energy physics experiments, often the approach has been to pick a technology before optimising the reconstruction of jet particles. From the perspective of testing various options, this not only requires significant computing power due to the introduced details of signal processing (digitisation) in the simulations, but also means that the simulations are unable to answer basic questions due to the high complexity. For example, a smaller cell size improves the spatial and pointing resolution, which should help the particle-flow algorithm to reconstruct the jet. However, the signal height per cell decreases, which can introduce an energy loss due to a lower signal-to-noise ratio. Thus, a high-level optimisation becomes blind to the individual impact for each effect. Instead, a different approach could be to first identify the necessary input for reconstruction algorithms which allows for optimal performance, before selecting the detector technology.

Moving towards that approach, we have used a model calorimeter to show how DNNs can be used to study the effect of the cell granularity on the hadronic energy reconstruction, without the need for manual optimisation of the algorithm for each granularity choice. In this model, the impact of the sampling fraction has been intentionally excluded. Even though we are aware that the type of chosen active and passive material will impact the shower development, we believe that this study can pave the way towards a more global optimisation of calorimeter designs exploiting the versatility of DNN based reconstruction algorithms.

For this particular detector setup (with \(\lambda _{\pi }/X_{0}\sim 27\)), we conclude that cell sizes of at most 1 nuclear interaction length, and longitudinal layers of 5–10 \(X_{0}\) thickness are needed, in order to optimize the software compensation to obtain an e/h response close to 1, and improve towards the intrinsic stochastic term of 8%. Following this approach, one could imagine further studies to determine the optimal cell and layer sizes as a function of the \(\lambda _{\pi }/X_{0}\) ratio. However, this exceeds the scope of this paper.