1 Introduction

Neutrinoless double beta decay (\(0\nu \beta \beta\)) is a posited lepton number violating process that would probe the Majorana nature of neutrinos. Sensitive searches for such decay have started exploring the inverted hierarchy region of neutrino masses [1,2,3,4,5]. To do so, they reached high mass of candidate emitter nuclei, low background radioactivity, good energy resolution. Cryogenic calorimeters yielded some of the best energy resolutions in \(0\nu \beta \beta\) searches to date [1, 6,7,8].

The CUORE experiment has reached the ton scale of detector mass, an energy resolution of \(\sim 7.8\,\)keV and a background index in the region of interest for \(0\nu \beta \beta\) of \(1.4\times 10^{-2}\) counts/(keV kg year), dominated by partly contained \(\alpha\) decays [1]. CUORE Upgrade with Particle Identification (CUPID) aims at fully exploring the inverted hierarchy region of neutrino masses, corresponding to half-life sensitivities of \(> 10^{27}\,\)year [9]. It will use the successful cryogenic infrastructure that hosts CUORE to deploy a ton-scale array of Li\(_2\)MoO\(_4\) (LMO) scintillating crystal bolometers equipped with Ge bolometer light detectors (LD) to achieve simultaneous readout of the phonon and scintillation signals [10]. The effectiveness of the double readout technique was proved by the CUPID-0 [6, 11, 12] and CUPID-Mo [7] experiments and will bring the expected background down to \(10^{-4}\) counts/(keV kg year). The nearly simultaneous occurrence of separate \(2\nu \beta \beta\) decay events in the same crystal (pile-up) within the time resolution of the detector is an irreducible source of background that needs to be mitigated in the design phase. Given the short half life of \(^{100}\)Mo of \(^{100}\mathrm {T}_{1/2} = 7.1 \times 10^{18}\) year and assuming \(\sim 300\) g mass, \(100\%\) enrichment in \(^{100}\)Mo and a 1 ms resolving time the expected background contribution from pile-up events is \(3.5\times 10^{-4}\) counts/(keV kg year) [9].

CUPID will use neutron-transmutation-doped (NTD) thermistors to read out the phonon signal on both LMOs and LDs. The rise time of thermal pulses is a function of the NTD working resistance, which is a function of temperature, and the capacitance of the readout line. A campaign of measurements was carried out in the Laboratori Nazionali del Gran Sasso (LNGS) to study the time resolution for pile up discrimination of LMO crystals. An analysis based on optimal filtering (OF) [13] of the reconstructed pulses showed a \(>90\%\) pile-up discrimination efficiency down to \(\Delta t \sim 1\) ms [14]. This work is aimed at exploring the effectiveness of alternative analysis approaches. We analyze the same data with a deep learning classification algorithm.

2 Measurement

An array of 8 cubic LMO detectors \(45 \times 45 \times 45\) mm\(^3\) arranged in a tower of 2 floors of 4 crystals each was operated at \(\sim 18\) mK temperature in the Hall C facility at LNGS between summer 2019 and spring 2020. Each crystal faced two light detector Ge disks; the crystals on the bottom floor were wrapped in a reflecting foil. The shape and size of LMO crystals resemble the envisioned CUPID design. Further details on the experimental setup can be found in [14]. Three LMO crystals were equipped with a functioning Si heater, which was driven by an arbitrary function generator (Tektronix AFG1062). All detectors were equipped with NTD thermistors, biased with a constant current. The working resistance of the NTDs was in the \((10{-}50)\) M\({\Omega }\) range. The typical rise time of pulses was \(\sim 15\) ms. Each NTD was low-pass filtered (Bessel-Thomson) with a cutoff frequency of 63 Hz and the voltage at its ends continuously digitized and stored on disk at 2 kHz sampling frequency. A derivative trigger algorithm was used to identify pulses. Triggered pulses were enclosed in 5 s windows, with 1 s time before the trigger (pre-trigger) used as a proxy for the temperature before the pulse. Random triggers were fired throughout the data taking to collect noise samples.

We performed reference heater pulses generation paying special attention to the accurate reproduction of the pulse rise time with respect to events originating from natural radioactivity (physics pulses). We excited the Si heaters with waveforms \(w(t;A,\tau )\) of square, triangular (sharp rise, linear fall), exponential shape with given voltage amplitude A and typical time \(\tau\). We found that despite their different origin, triangular-shaped heater excitations best reproduced the rising edge of physics pulses [14]. We tuned the A and \(\tau\) parameters of each waveform to reconstruct in the \((1{-}2.5)\) MeV energy range and as close as possible to the mean of the rise time distribution of physics pulses. The sample of reference (single) heater pulses mimics the detector response to single decay signal events. The time between one pulse generation and the following one was set to 15 s, in order to let the detector return to the initial condition. To generate heater pile-up pulses at a time distance \(\Delta t\), we superimposed excitations of the Si heater as

$$\begin{aligned} w^{\prime }(t;A_1,A_2,\tau ) = w(t;A_1,\tau ) + w(t+\Delta t;A_2,\tau ) \end{aligned}$$
(1)

Let \(\alpha = A_1/A_2\) be the amplitude ratio between the excitations. In order to explore amplitude ratios in the range \(\alpha = 0.24{-}1.4\), we fixed \(A_1 = 170\) mV and varied \(A_2 = (40{-}240)\) mV in steps of 50 mV. We explored \(\Delta t = (1{-}40)\) ms. For each doublet \((\alpha ,\Delta t)\), we collected a sample of \(\sim 100\) pulses.

3 Data Analysis

The data analysis can be divided into two steps, a low-level processing and high-level analysis. The former is aimed at selecting a clean sample of heater pulses; the latter is the actual pile-up discrimination analysis. This study closely follows [8] as far as the low-level processing is concerned and implements machine learning algorithms in the high level analysis step.

The low-level processing acts on 5 s time windows (waveforms) around triggered events. Each waveform is filtered with an OF [13]. The energy of each event is obtained calibrating the amplitude of filtered pulses with a \(^{232}\)Th source. We reject waveforms that contain more than one triggered pulse, or where the pre-trigger slope is not compatible with 0. To discriminate heater from physics pulses, we first select events according to their timestamp. Then, for each pulser configuration, we further restrict our sample of heater pulses to those that fulfill

$$\begin{aligned} |E - E_{\mathrm{med}}| < 5 \times E_{\mathrm{MAD}} \end{aligned}$$
(2)

where E is the reconstructed energy, \(E_{\mathrm{med}}\) is the median reconstructed energy and \(E_{\mathrm{MAD}}\) is the median absolute deviation. Equivalent results can be obtained replacing the energy variable in Eq. (2) with the raw amplitude of the pulse before filtering, effectively decoupling the deep learning algorithm from other steps of the analysis. Since the rate of physics pulses is much lower than the one of heater pulses, this procedure is effective in removing the outliers of the heater pulse distribution due to either natural radioactivity, unidentified pile-up events or pulses produced in unstable detector conditions.

The high-level analysis is based on the Keras [15] implementation of a convolutional neural network (CNN) classifier algorithm. The CNN was designed to read full windows of raw pulses, before any OF is applied. We pre-process the data by subtracting the average sampled voltage in the pre-trigger part of the pulse (baseline) and linearly scaling individual pulses to obtain unit amplitude.

The CNN includes 10 deep convolutional blocks, each made of 8 filters 20 samples long with a ReLU activation function, followed by a MaxPooling layer with a factor 2 subsampling. A fully connected feed-forward classifier layer connects the output layer with a Softmax activation function. Training was performed minimizing a categorical cross-entropy loss function for 200 epochs with the Adam algorithm [16] and a constant learning rate of \(10^{-4}\). For a comprehensive review on machine learning, see [17] and references therein.

Due to the small size of the available dataset, we rely on cross-validation techniques [18] to assess the performance of the CNN model to classify pile-up events. We use a fivefold cross-validation. We randomize the data and split them into 5 non-overlapping subsets. From each of the subsets, we retain a \(20\%\) fraction of events to assess the model performance after training is completed (test set). We further split the remaining part to build a training set \((80\%)\) and a validation set \((20\%)\). We train independently 5 identical CNN models with the same choice of hyperparameters (learning rate, number of epochs, etc.) and architecture (layer structure). We optimize hyperparameters keeping the test sets blinded. The output layer of the CNN returns a classifier score in the (0, 1) range. For each data subset, we compute the predicted score on the test set using the corresponding CNN model. We compare the prediction with the label assigned to each test pulse according to its timestamp. We evaluate the classification efficiency as a function of \(\Delta t\) as the ratio between the number of correctly labeled events over the total number of events. We fit the classification efficiency curve (Fig. 1) with an exponential saturation function

$$\begin{aligned} \varepsilon (\Delta t) = p_0 \big (1-e^{-\Delta t/p_1}\big ) \end{aligned}$$
(3)

and define the pile-up time resolution of each channel as the \(\Delta t > 0\) at which the \(90\%\) classification efficiency is reached.

Fig. 1
figure 1

Pile-up classification performance on single channel. The data points show the classification efficiency on pulser data (fivefold cross-validation method) as a function of \(\Delta t\) and a fit with an exponential saturation function. The green band corresponds to the signal (single-pulse) efficiency. The \(\Delta t\) error bars correspond to the bin size. (Color figure online)

4 Results

We present the performance of a new technique based on a deep learning classifier to discriminate pile-up events in cryogenic LMO bolometers. The classifier is trained on single-pulse and pile-up samples generated exciting a Si heater with a programmable function generator and tested on an independent sample of events generated with the same technique. We achieve a \(>90\)% reconstruction efficiency of signal events and a \(>90\)% rejection efficiency for \(\Delta t > 1{-}2\) ms depending on the channel. The deep-learning and OF based pile-up rejection approaches yield so far equivalent results [14].

This work is part of a technical optimization campaign of the CUPID detector design and data processing techniques. A detailed Monte Carlo simulation of the detector response incorporating pile-up rejection is underway to disentangle the interplay of parameters such as the noise level, sampling and Bessel cutoff frequencies, pulse rise time and signal-to-noise ratio. Preliminary results indicate that similar performances can be achieved for physics pulses. In future measurements, we plan to improve the cryogenic stability, noise conditions and excitation methods to better reproduce both the rising and falling edge of physics pulses and eventually explore smaller pile-up time differences.