Calibration of the Gerda experiment

The GERmanium Detector Array (Gerda) collaboration searched for neutrinoless double-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β decay in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{76}$$\end{document}76Ge with an array of about 40 high-purity isotopically-enriched germanium detectors. The experimental signature of the decay is a monoenergetic signal at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Q_{\beta \beta }$$\end{document}Qββ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=2039.061(7)$$\end{document}=2039.061(7) keV in the measured summed energy spectrum of the two emitted electrons. Both the energy reconstruction and resolution of the germanium detectors are crucial to separate a potential signal from various backgrounds, such as neutrino-accompanied double-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta $$\end{document}β decays allowed by the Standard Model. The energy resolution and stability were determined and monitored as a function of time using data from regular \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{228}$$\end{document}228Th calibrations. In this work, we describe the calibration process and associated data analysis of the full Gerda dataset, tailored to preserve the excellent resolution of the individual germanium detectors when combining data over several years.

decay is a monoenergetic signal at Q ββ = 2039.061 (7) keV in the measured summed energy spectrum of the two emitted electrons. Both the energy reconstruction and resolution of the germanium detectors are crucial to separate a potential signal from various backgrounds, such as neutrinoaccompanied double-β decays allowed by the Standard Model. The energy resolution and stability were determined and monitored as a function of time using data from regular 228 Th calibrations. In this work, we describe the calibration process and associated data analysis of the full Gerda dataset, tailored to preserve the excellent resolution of the individual germanium detectors when combining data over several years.

Introduction
Neutrinoless double-β (0νββ) decay is a hypothetical, second-order weak interaction process in which a nucleus changes its charge number by two units with the emission of two electrons but without accompanying anti-neutrinos. This lepton-number violating process is only permitted if neutrinos are massive Majorana fermions, i.e. if there is a Majorana mass term in the Lagrangian of the underlying theory. Such a term appears in many extensions of the Standard Model of particle physics and could explain why neutrino masses are much smaller than those of all other fermions [1]. The GERmanium Detector Array (Gerda) collaboration searched for the 0νββ decay of the isotope 76 Ge with a Q-value of Q ββ = 2039.061(7) keV [2] by operating high-purity germanium (HPGe) detectors isotopically enriched to >86% in 76 Ge, making them also the potential source of 0νββ decay.
We used three types of enriched germanium detectors: 30 broad energy germanium (BEGe) detectors, 7 coaxial detectors, and 5 newer inverted coaxial (IC) detectors. The BEGe detectors are smaller (average 0.7 kg) but offer superior energy resolution and pulse shape discrimination (PSD) properties compared to the coaxial detectors [3], while the IC detectors provide energy resolution and PSD properties similar to the BEGe detectors [4] but with a larger mass (average 1.9 kg) comparable to that of the coaxial detectors (average 2.3 kg), allowing for the easier design of larger germanium arrays.
The array of germanium detectors was immersed in a cryostat filled with 64 m 3 of liquid argon (LAr). The top of the cryostat and the surrounding water tank houses a clean room containing a glove box and lock system for deploying the HPGe detectors and calibration sources. The entire setup was located underground at the Laboratori Nazionali del Gran Sasso (LNGS) of INFN, Italy, and is described in detail in [5].
The first phase of the experiment was operated with 18 kg of coaxial detectors (inherited from the Heidelberg-Moscow [6] and Igex [7] collaborations) between November 2011 and September 2013. Phase II started in December 2015, after 20 kg of BEGe detectors produced for the Gerda experiment were added and the liquid argon volume around the detector array was instrumented with photosensors as a veto against radioactivity [5]. During an upgrade in mid-2018, IC detectors with a total mass of 9.6 kg were added, and the LAr instrumentation was upgraded. Phase II data taking ended in November 2019. While the calibration procedure of Phase I data has been discussed in [8,9], the focus of this paper is the calibration of the Phase II data.
In all recent 0νββ decay experiments, the signature of the rare nuclear transition is a monoenergetic peak in the measured energy spectrum of the two electrons at Q ββ . Consequently, a crucial parameter to distinguish a signal from the background is the energy estimator. The better the energy resolution of the detectors, the narrower the signal energy region effectively becomes, and an excess over the continuous background can be more clearly identified. One strength of HPGe detectors is their unparalleled energy resolution (typically σ /E∼0.1% at Q ββ ). It permits the almost complete rejection of background events from regular two-neutrinoaccompanied double-β decays [10], an otherwise irreducible background in 0νββ decay searches [11,12].
Given the central role of the energy observable, adequate measures must be taken to accurately determine the energy scale and resolution, monitor their stability over the full data acquisition period, and determine the relevant uncertainties entering the statistical analysis for the 0νββ decay search. In Sect. 2 we detail the calibration procedure, while in Sect. 3 we discuss the analysis of the calibration data and the energy scale determination, including the procedures employed to monitor and maintain the stability of the HPGe detectors over time. In Sect. 4 we describe the determination of the energy resolution for the 0νββ decay analysis, and in Sect. 5 we provide an evaluation of the associated uncertainties. In Sect. 6, we discuss the determination of the residual energy bias and its uncertainty. In Sect. 7, we compare the results from calibration data with those in the physics data (data used for the 0νββ decay search) for the resolutions of the lines from decays of 40 K and 42 K. We close in Sect. 8 with a summary and a discussion of our main results.

Energy calibration process
To perform the calibrations, we regularly exposed the HPGe detectors to three custom-made low-neutron emission 228 Th calibration sources [13], each with an activity of about 228 Th has a half-life of 1.9 yr, the sources were replaced during Phase II to ensure a sufficient level of activity.
During calibration runs of the HPGe detectors, the 228 Th sources were lowered into the LAr cryostat to reach the level of the detector array by three source insertion systems [14,15]. Each of these deploy a single source, placed on tantalum absorbers (h = 60 mm, = 32 mm). During calibration, each source was placed at three different heights to expose the detector array more homogeneously, and data were acquired at each location for up to 30 min. With this exposure, typically around (1 − 3) × 10 3 events are observed in the prominent 208 Tl γ line at 2614.5 keV in a BEGe detector, and (0.6 − 1) × 10 4 events in a coaxial or IC detector. Calibration data were acquired every 7-10 days with a total of 142 calibration runs used for the analysis of the Phase II data.
The triggering energy threshold for this acquisition during calibration corresponds to ∼400 keV. This threshold was set to include the strong γ line of the 228 Th spectrum at 583.2 keV while keeping the event rate at a manageable level for the data acquisition system. The detector signals were read out with charge sensitive amplifiers, and digitized by a 100 MHz 14-bit flash analog-to-digital converter (FADC). As for physics data acquisition, for each trigger a 160 µs long waveform is recorded at a sampling rate of 25 MHz, centered around the trigger time and covering an energy range up to ∼6 MeV. During calibration, every 2 s a test pulse was injected into the amplifier electronics of all germanium detectors to monitor the stability of their gains. Between successive calibration runs, i.e. during physics data acquisition, test pulsers were injected every 20 s for the same purpose.
Data from the FADCs are transformed into the MGDO (ROOT-based) format [16,17] and processed to analyze properties of the recorded waveforms using the Gelatio software [18], as is the case for physics data. The energy is estimated from the amplitude of the waveform after applying a digital filter which reduces the impact of noise and thus improves the resolution. As a fast first estimate for monitoring and cross-check purposes, a pseudo-Gaussian filter is applied to obtain an energy estimator as part of the online analysis [19]. An improved energy resolution for the final data analysis is achieved with a zero area cusp (ZAC) filter [20], which removes the effect of low-frequency noise. This filter is optimized offline for each calibration run and HPGe detector to minimise the resolution of the highest energy γ line in the 228 Th spectrum [21].
A set of heuristic event selection criteria is applied to ensure that events recorded during calibration are of a physical origin, and to reduce pile-up events. The underestimation of energy for these events cause low-energy tails in the spectra of γ lines and can bias the estimated energy resolution. These selection criteria are based on the properties of the waveform, such as the baseline stability and slope, trigger time, number of triggered events, and rise time of the pulse. The probability of rejecting physical interactions, estimated with events from the regularly injected test pulses, is below 0.1% [22].

Analysis of energy spectra
Nuclei of the 228 Th isotope decay in a chain via α and β decays to the stable 208 Pb with the emission of multiple monoenergetic γ rays. These result in sharp peaks in the recorded energy spectra, as shown in Fig. 1 in the combined spectra of each detector type. The pattern of observed peaks is used to identify the γ lines and thereby determine their energy and resolution.
The TSpectrum class of ROOT is used to find peak positions in the uncalibrated spectrum, such that all peaks with amplitudes exceeding 1/20 of the amplitude of the most prominent peak are found. This threshold was chosen to avoid the detection of spurious peaks. The peak with the highest energy is identified as the full energy peak (FEP) of the γ ray from the decay of 208 Tl, a daughter of 228 Th, at E FEP = 2614.5 keV. A preliminary calibration for the energy estimator T is applied assuming a linear energy scale without offset: A candidate peak is confirmed if its preliminary estimated energy is consistent within 6 keV with the energy of a known line in the 228 Th spectrum. The 6 keV value permits the accurate identification of peaks while allowing for some nonlinearity of the energy scale. The known peaks correspond to γ rays from isotopes in the 228 Th decay chain with energies above 500 keV and branching ratios above 0.3%, including the detector specific single escape peak (SEP) at 2103.5 keV and double escape peak (DEP) at 1592.5 keV resulting from the 2.6 MeV γ ray of 208 Tl decays. In the context of this paper, without ambiguity, FEP, SEP, and DEP always refer to those of 208 Tl. The double peak due to the 511.0 keV annihilation line and 510.7 keV γ line from 208 Tl is excluded from the analysis, in particular since the resolution of the annihilation peak is broadened due to the Doppler effect [23].

Peak fitting and calibration curves
To determine the position μ and energy resolution in terms of the full width at half maximum (FWHM) = 2.35 · σ of the identified peaks, fits are performed locally in an energy window of 10-20 keV around the peak position obtained from the preliminary calibration. These are configured manually and separately for each peak to avoid interference from neighbouring peaks. Minimally, a Gaussian g(E) is used to model the peak, and a linear function f lin (E) is used to model the background: where n, μ and σ are the intensity, position, and width of the peak, and a and b give the intercept and slope of the linear function, respectively. For high statistics peaks (583.2 keV, 727.3 keV, 763.5 keV, 860.5 keV, and 2614.5 keV), the SEP, and the DEP, a step function is used to model the flat backgrounds occurring only above or below the peak from multiple Compton scatters: where d is the height of the step function, and erfc denotes the complementary error function.
For the high statistics peaks as defined above, a low-energy tail is additionally used to model the effects of incomplete charge collection and the residual presence of pile-up events: where β and c are the height and slope of the tail, respectively. An example of the FEP peak fit is shown in the inset of Fig. 1.
Peaks are excluded after the fit if any of the following heuristic rules are fulfilled: (i) the estimated FWHM is above 11 keV or below 1.5 keV; (ii) the peak maximum is lower than 2.5 times the linear component of the background or lower than 10 counts; (iii) the fitting error on the FWHM is larger than the FWHM itself. These rules are purely heuristic and designed to remove peaks that cannot be fitted well, mainly due to low statistics.
Typically around 5-8 peaks per detector survive all selection criteria. The FEP is always identified, since the peak identification algorithm requires it. In > 80% of cases the lines at 583.2 keV, 860.5 keV, 1592.5 keV (DEP) and 2103.5 keV (SEP), and in (15-60)% the lines at 727.3 keV, 785.4 keV, 1078.6 keV and 1620.7 keV are found. All other γ lines are seen in <3% of the spectra from individual detectors of a single calibration run, due to insufficient statistics.
From the obtained peak positions, we determine the calibration curve which is a function to convert the uncalibrated energy estimator T into a physical energy in keV. We plot the peak positions in terms of the uncalibrated energy estimator T of identified peaks against their physical energies E according to literature values [24], and then fit with a linear function In the rare case where only the FEP is found and successfully fitted, the resulting calibration curve has an intercept of zero. Such cases only occured during periods of instability for a detector which were excluded from data analysis (see Sect. 3.3). For most detectors, the linear calibration curve describes the peak positions within a few tenths of a keV, as shown in Sect. 6. Discussion of a quadratic correction to the energy scale introduced for five detectors after the 2018 upgrade can be found in Sect. 3.2.
Typically, a calibration curve is used to calibrate the physics data following a calibration run. However, if a calibration is taken after changes in the experimental setup, or instabilities in the detector array, the calibration curves may be applied retrospectively to additionally calibrate the period between the changes or instabilities and the calibration run. The unstable period itself would not be used for physics analysis.

Quadratic correction
After the 2018 upgrade, several detectors (the new IC detectors and one coaxial detector) exhibited larger residuals in their calibration curves compared to other detectors, up to 2.5 keV at 1.5 MeV. Whether these effects are directly related to a change in cable routing is unknown. These effects could be largely accounted for by the incorporation of a quadratic correction to the calibration curves, where E 0 is the energy estimator after the application of the linear calibration curve as described in Sect. 3.1.
The parameters m 0 , m 1 and m 2 were determined by fitting the residuals of each detector's calibration curves per calibration run. One such example is shown in Fig. 2. These parameters were observed to be stable over time, with the exception of a single jump for three IC detectors following hardware activities in the clean room. Therefore a single correction for each detector's stable period was applied, using the average parameters of all calibration runs in that period. After the quadratic correction, the remaining residuals were within a few tenths of a keV.

Detector performance and stability
To consistently combine data over an extended period of time while preserving the excellent energy resolution of the HPGe detectors, it is vital to monitor the stability of the energy scale between calibrations and exclude periods with significant shifts and fluctuations which would contribute to the width of the peaks. As previously mentioned, test pulses are regularly injected into the readout electronics to monitor the stability of the data acquisition system. Their signal magni- tude corresponds to an energy of about 3 MeV. Periods with significant jumps or drifts (>1 keV) in the amplitude of the test pulses are excluded from data analysis and a calibration is performed once the detector stabilizes. The corresponding loss of exposure is at the few-percent level. The origin of these shifts is largely unknown, but may be caused by temperature changes in the electronics.
Additionally, we monitor the stability of the FEP position in the calibration spectrum. If the position of this line changes by more than 1 keV between successive calibrations without an identifiable reason (maintenance, longer breaks, specific incident), the data of the respective detector are discarded from the analysis for that period of time. The corresponding exposure loss is at the few-percent level. Smaller or temporary drifts may still affect the obtainable energy resolution and are discussed as a systematic uncertainty in Sect. 5.2.
Due to hardware changes, the detectors may experience changes in their energy resolution and energy scale over longer periods of time. To more accurately reflect the properties of a detector at a certain time, for the final Gerda analysis [25] we divide the full data acquisition period for each detector into stable sub-periods called partitions. The stability is judged based on two parameters: the FWHM at the FEP and the residual at SEP. The former reflects the changes in the detector resolution, while the latter catches the changes in the energy bias at the energy peak closest to Q ββ (see Sect. 6 for more discussions on the bias). After the 2018 upgrade and cable rerouting, the resolutions improved for most of the detectors. Therefore, for simplicity, we start a new partition for all detectors after the upgrade. There are one to four partitions for each detector. The majority of the detectors have only two partitions, split at the time of the 2018 upgrade. An example of the partitions is shown in Fig. 3.

Energy resolutions from the combined calibration spectra
Depending on the specific physics analysis, we calculated the energy resolution either by partition, described in the previous section, or by detector type. For the 0νββ decay search reported in [26], where the data before the 2018 upgrade were analyzed, an effective resolution for each detector type was employed. For the final 0νββ decay search of Gerda reported in [25], where all Gerda data were analyzed, we calculated a resolution for each partition, a much more finegrained approach. At the expense of increased complexity, the partition approach improves the physics result by capturing the variations among the detectors as well as the variation over time.
Since both methods are applicable for Gerda and any other experiment with a modular detector setup, here, we discuss both approaches. While the detector type approach was replaced in favor of partitioning for the final 0νββ decay search, the former gives an overview of the overall detector performance. For this reason all the illustrations and calibration parameters are provided by detector type. Here, we refer to a dataset containing all the partitions of detectors of the same type as a DTD (detector type dataset).

By partition
To obtain the γ line resolutions for each detector partition, we first produce combined calibration spectra. The energy spectra obtained from each calibration run within one partition are first normalised to account for differing statistics, and The detector partitions with resolutions > 6 keV are due to two coaxial detectors whose resolutions degraded after the 2018 upgrade then weighted according to the time span for which the corresponding calibration curves were used to calibrate physics data. The resulting γ peaks in a combined spectrum will be representative of the average performance of that detector in that partition.
The peak identification and fit procedure described in Sect. 3 is then applied to each combined calibration spectrum.
The SEP is broadened due to the known Doppler effect and is thus excluded [23]. We also observe broadening in the DEP. This is hypothesised to originate due to events occurring more frequently in the outer regions of the detectors and thus being more susceptible to incomplete charge collection [27]. This line is therefore excluded as well.
The dependence of the γ line resolutions on the calibrated energy E is then fitted with the function [20] σ (E) = √ a + bE, where a and b are fit parameters. The former accounts for the contributions from electronic noise, while the latter accounts for statistical fluctuations in the number of charge carriers. The resolution at Q ββ is then given by using E = Q ββ in Eq. 8. The resultant FWHM resolutions at Q ββ of the partitions vary between 2.3 keV and 8.8 keV, as shown in Fig. 4. Values for each partition can be found in [28]. Systematic errors are calculated via a dedicated study as explained in Sect. 5.

By detector type
The appropriate method for calculating effective resolutions by detector type depends on the specific application.

Background modeling
For background modeling, energy dependent resolutions are required, i.e. resolution curves. To calculate these for a DTD, the procedure is similar to that for the partitions, though weighting is now required to combine the resolutions from different detector partitions. When data from multiple partitions are combined by adding their energy spectra, Gaussian peaks in the individual spectra combine to become a Gaussian mixture, namely the sum of multiple Gaussian distributions with different centroids and resolutions. The resolution of individual partitions in a DTD is stable within a factor of 1.7 for BEGe and IC detectors. For coaxial detectors there is a slightly higher fluctuation, but still within a factor of three. The variation in position of the centroid is much smaller than the energy resolution, typically around 0.2 keV. Therefore the shape of a peak in the combined energy spectra remains approximately Gaussian and can be characterized by an effective resolution, computed from the resolution of individual partitions.
The variance of a Gaussian mixture is given by: where the sum goes over Gaussians with means and standard deviations μ i and σ i , with weights w i , representing the relative contribution to expected peak counts of individual Gaussians [29]. For a DTD comprised of individual partitions, these parameters stand for the individual partitions' resolution σ i , and peak position μ i , which can be different due to independent systematic effects on the energy scale. The weights are the expected relative event count contribution of individual partitions. Since peak counts are proportional to exposure E i = m i · t i , with individual detector's active mass m i and live time t i , the relative exposure contribution is: where E = j E j is the total exposure of the DTD. Since the biases in the energy scale are small, we can neglect the differences in the peak positions. Eq. 9 therefore simplifies to: with total error δ σ from the statistical fitting errors of individual partition resolutions δ σ i : with negligible uncertainty in the weights. For instance, the simplified model of the FEP is a Gaussian with a mean of 2614.5 keV and a width fixed to the effective resolution (see Eq. 11) of the DTD. On the other hand, a Gaussian mixture model would consist of the sum of a Gaussian for each partition, each with its own resolution and centroid. Fig. 5 shows the Gaussian mixture and simplified signal models for the IC and coaxial DTDs. For the IC   (2) and BEGe DTDs, the Gaussian mixture model is very close to a Gaussian shape, as the centroid differences are small and the partitions in each DTD have similar resolutions. The resolutions among the coaxial detectors are more varied and thus using a Gaussian signal model may be less appropriate.
To calculate the effective resolution curves for each DTD, first the γ line resolutions are obtained for each of the partitions as in Sect. 4.1. For all γ lines whose resolution was reliably determined for all partitions in that DTD, an effective resolution of the DTD at that energy is calculated using Eq. 11. All other lines which were missing in at least one detector partition are excluded.
Once the effective resolutions for each energy and DTD have been determined by weighting partition resolutions with Eq. 11, their energy dependence is fitted with Eq. 8.
The obtained effective resolutions and functions of the three detector types are shown in Fig. 6 and Table 1. The statistical errors are obtained from the fit.

0νββ decay search
As mentioned before, in earlier Gerda 0νββ decay analyses such as [26], partitioning was not performed, and data from multiple detectors were combined to form a DTD for each detector type. In the case of the Gerda Phase II data, very few where the sum goes over the partitions with resolutions σ i and weights w i . For the three detector types we obtain the resolutions at Q ββ as given in Table 2.

Energy resolution uncertainty at Q ββ
The statistical uncertainty on the energy resolution decreases with rising statistics over time, and is on the order of only a few eV. As such, the uncertainty on the energy resolution is dominated by systematic effects. We consider various sources of systematic uncertainty, given here in decreasing order of their contribution: (i) resolution shifts over time; (ii) energy scale shifts over time; (iii) choice of the resolution fitting function. Due to the nature of these uncertainties, their magnitude will not decrease over time, but could change if the detector setup or analysis methods change.
In the following sections, we explain how individual contributions to the systematic uncertainty were determined (Sects. 5.1-5.3), and how they are combined together to give a total uncertainty per partition (Sect. 5.4).

Resolution stability
We consider a systematic uncertainty estimated from the fluctuations in the resolution obtained for each calibration over time. For each partition, we calculate the standard deviation of the resolution at FEP, σ FEP , among individual calibration runs in that partition. Assuming that in Eq. 8, any systematic fluctuation of the energy resolution is caused by the two correlated parameters changing proportionally, the energy resolution uncertainty δ divided by the energy resolution σ is independent of energy. This is supported by the high degree of correlation between the fit parameters a and b of Eq. 8 of −0.81 for the fitted partition resolution curves. With this specific model, we can translate the uncertainty at the FEP energy to Q ββ : The mean value for this component across all partitions is 0.11 keV, with a standard deviation of 0.06 keV.

Pulser stability
Once the energy scale has been determined via a calibration as described in Sect. 3.1, the calibration curves are used until the next calibration. While several parameters are monitored to ensure detector stability, fluctuations of the energy scale can still deteriorate the resolution for physics data compared to calibration data. Fluctuations on time scales smaller than the typical calibration duration (1.5 h) are also present in the calibration data. The effect from these short-term fluctuations will thus be included in the calculated effective resolution. Fluctuations on larger time scales, up to around one week, can, within the restraints of our data quality requirements, contribute additionally to the resolution in physics data compared to the resolution obtained from calibration data. This additional contribution was estimated using the position of test pulser events (see Sect. 3.3). Shifts in the test pulser positions averaged over 1.5 h, normalised by their statistical uncertainty, were analyzed. Were the variation in energies due only to statistical fluctuations, these normalized residuals would be distributed normally with a mean of 0 and a standard deviation of 1. The observed deviation from this standard normal distribution can be quantified as an additional contribution to the resolution, which is typically on the order of 0.2 keV (1σ ) or 0.6 keV (FWHM). As an example, for a detector partition with a resolution of FWHM = 3.0 keV, the additional systematic uncertainty is given by: The mean value for this component is 0.08 keV, with a standard deviation of 0.07 keV among partitions.

Choice of the resolution function
We used the square root of a linear function to model the resolution as a function of energy (Eq. 8). While this choice is physically well-motivated, including both statistical variations in the number of charge carriers, and effects due to the electronics, there are some common alternatives. For example, one could add a quadratic term under the square root to model the effects of incomplete charge collection or integration, To estimate the variation of the resolution at Q ββ for the different choices of functions, the values obtained for the two discussed choices are compared. Using the square root of linear (Eq. 8) and quadratic (Eq. 16) functions, an average difference of 0.05 keV is obtained, with a standard deviation of 0.05 keV among partitions.

Total resolution uncertainty by partition
The total resolution uncertainty is obtained by summing individual contributions in quadrature, thereby assuming no correlations. The resultant FWHM resolution uncertainties at Q ββ of the partitions vary between 0.04 keV and 0.37 keV, with a mean (standard deviation) of 0.13 (0.07) keV.

Energy bias and uncertainty
Due to the different assumptions and approximations in the calibration procedure, slight biases in the energy scale may remain. Such biases may, for example, be caused by the integral non-linearity of the FADCs [30]. Small non-linearities in the energy scale are for example neglected due to the use of a linear calibration function. Therefore a peak from a γ ray with well defined energy might be displaced towards higher or lower energies. Correspondingly, for each individual event, while its reconstructed energy will fluctuate according to the resolution, it may also be systematically displaced.
To evaluate the energy bias per partition near Q ββ , we look at the residual at the SEP defined in Sect. 3.3 in the combined calibration spectrum, since the SEP is very close to Q ββ with a difference of 64.5 keV. The statistics is sufficient to reach a precision of O(0.01 keV) for the SEP position. The average bias is found to be −0.07 keV, with a standard deviation of 0.29 keV among the partitions. Since the 0νββ decay search is extremely sensitive to the energy of the events close to Q ββ , in the final Gerda analysis [25], we correct for the energy bias of the events that fall into the energy range considered for the 0νββ decay search (1930 keV to 2190 keV), by adding the amount of bias to the calibrated event energy. This approach is justified by studying the residuals at the 42 K peak (1525 keV) and the DEP (1592.5 keV), which are two closely located peaks with the former appearing in the physics data [10] and the latter in the calibration data. The relation between them is consistent with that between Q ββ and the SEP.
For the uncertainty of the bias, we use the residual fluctuations of the SEP over time near Q ββ . We additionally include a systematic uncertainty of 0.02 keV accounting for the potential difference between the bias at SEP and that at Q ββ . It was estimated by performing a linear interpolation between the residuals at the DEP and the SEP which are on the two sides of Q ββ . In total, the average bias uncertainty is 0.17 keV.

Comparison to physics data
The two strongest γ lines in our physics data spectrum are those due to 40 K (1460.8 keV) and 42 K (1524.7 keV) decays [25]. The measured resolution of these peaks allows for a cross-check to the conclusions drawn solely from calibration data. For every partition, the background energy spectrum around each of these lines is fitted using a Gaussian for the signal and a linear function for the background. The background rate was constrained to be non-negative across the fitting window. Partitions with potassium peaks with low counting statistics, i.e. those whose best-fit is compatible with zero counts, are excluded from further analysis.
Given their proximity in energy, the extracted resolution for each of the two lines is expected to coincide within 0.05 keV. Indeed, no significant difference between the resolutions of the two peaks was found.
We compared the resolution obtained in the potassium lines with the one predicted from the resolution curves extracted from the combined energy spectra (see Sect. 4), as shown in Fig. 7. The systematic uncertainty for the calibration resolution is calculated in the same way as described in Sect. 5. The measured resolutions and predicted values from calibration data show a high degree of correlation, with a Pearson correlation coefficient of 0.92, and with 66% compatibility within one σ . Similar results are obtained for the 40 K line.

Conclusions
A reliable and stable energy scale is crucial to the search for 0νββ decay of 76 Ge performed with the Gerda experiment. The event energies are reconstructed using the ZAC filter to minimize the effects of low-frequency noise. To preserve the excellent energy resolution of the germanium detectors when combining data over a long period of time, they are calibrated weekly using 228 Th sources. By identifying γ peaks in the recorded spectrum the energy scale and energy resolution can be determined.
For each calibration, the stability of the energy scale and resolution is monitored via the 2.6 MeV FEP from 208 Tl decays. Between successive calibration runs the energy scale Fig. 7 Resolution of the 1524.7 keV 42 K line as measured from physics data and extracted from calibration data, for each detector partition. The red line shows the case of perfect agreement is monitored via test pulser events injected into the readout electronics of the HPGe detectors. Data with short-term instabilities are discarded from further analysis.
To more accurately reflect the properties of a detector at a certain time, we have introduced the partitioning of the detectors' data into stable sub-periods. The stability is based on the long-term changes of the energy resolution at the FEP and the residual at the SEP.
For each partition, a combined calibration analysis is performed to calculate the energy resolution used for the 0νββ decay analysis. For this purpose, calibration data in a partition are combined into a single spectrum. The resolution curve is obtained by fitting a resolution model function to the obtained resolutions of individual peaks in the combined spectrum. Among the partitions, the calculated resolutions at Q ββ range from 2.3 to 8.8 keV, with an exposure-weighted mean (standard deviation) of 3.0 (0.8) keV.
Alternatively, effective resolution curves per detector type are calculated by modeling the signal by a single Gaussian with a width according to the standard deviation of a Gaussian mixture of the individual detector partition contributions. Over Phase II we obtained exposure-weighted average resolutions at Q ββ for the BEGe/coaxial/IC detectors of (2.8 ± 0.3) keV, (4.0 ± 1.3) keV, and (2.9 ± 0.1) keV respectively.
Dedicated studies were performed to determine the resolution systematic uncertainties for the 0νββ decay analysis. Various sources of systematic uncertainty on the resolution were considered: the fluctuations of the resolution and energy scale over time, and the choice of resolution function. The average total systematic uncertainty across all partitions is 0.13 keV.
The energy bias for the events near Q ββ is estimated and corrected based on the residual of the SEP. Among the par-titions, the average bias is -0.07 keV with a standard deviation of 0.29 keV. The average uncertainty of these biases is 0.17 keV.
The energy scale, partitioning, resolutions, and energy biases discussed in this paper are essential to the final search for 0νββ decay with Gerda described in [25]. The success of the Gerda program in reaching the world's most stringent 0νββ decay half-life constraint given by T 0ν 1/2 > 1.8 · 10 26 yr at 90% C.L, was achieved in part due to the excellent energy resolution offered by germanium detectors and the analysis described in this work. This is an important step towards Legend in developing the next generation of 0νββ decay 76 Ge experiments [31].