Final results on the $0\nu\beta\beta$ decay half-life limit of $^{100}$Mo from the CUPID-Mo experiment

The CUPID-Mo experiment to search for 0$\nu\beta\beta$ decay in $^{100}$Mo has been recently completed after about 1.5 years of operation at Laboratoire Souterrain de Modane (France). It served as a demonstrator for CUPID, a next generation 0$\nu\beta\beta$ decay experiment. CUPID-Mo was comprised of 20 enriched Li$_2$$^{100}$MoO$_4$ scintillating calorimeters, each with a mass of $\sim$ 0.2 kg, operated at $\sim$20 mK. We present here the final analysis with the full exposure of CUPID-Mo ($^{100}$Mo exposure of 1.47 kg$\times$yr) used to search for lepton number violation via 0$\nu\beta\beta$ decay. We report on various analysis improvements since the previous result on a subset of data, reprocessing all data with these new techniques. We observe zero events in the region of interest and set a new limit on the $^{100}$Mo 0$\nu\beta\beta$ decay half-life of $T^{0\nu}_{1/2}>1.8 \times 10^{24}$ year (stat.+syst.) at 90% CI. Under the light Majorana neutrino exchange mechanism this corresponds to an effective Majorana neutrino mass of $\left<(0.28$--$0.49)$ eV, dependent upon the nuclear matrix element utilized.


Introduction
Ever since the neutrino was found to have mass via the observation of flavor state oscillations [1,2], the nature of the neutrino mass itself has remained a mystery. Unlike charged leptons, neutrinos may be Majorana [3,4] instead of Dirac particles. In addition to implying that neutrinos and antineutrinos would be the same particle [5], this would also imply that the total lepton number is not conserved [6]. This may provide for a possible explanation of baryon asymmetry (i.e., the imbalance between matter and anti-matter) in the early universe [7,8].
Two-neutrino double-beta (2 ) decay is a rare Standard Model process which can occur in some even-even nuclei for which single beta decays are energetically forbidden (or heavily disfavored due to large changes in angular momentum). In this process two neutrons are converted into two protons with the emission of two electrons and two electron anti-neutrinos. The possibility of the neutrino being a Majorana fermion raises the prospect that neutrinoless double-beta (0 ) decay may occur [9,10]. In the case of 0 decay, we would again observe the conversion of two neutrons into two protons, but such a decay would only produce two electrons. Whereas 2 decay conserves lepton number, 0 decay would result in an overall violation of the total lepton number by two units [11][12][13], and indicate physics beyond the Standard Model. At present this process is unobserved with limits on the decay half-life at the level of 10 24 − 10 26 year via the isotopes 76 Ge, 82 Se, 100 Mo, 130 Te, and 136 Xe [14][15][16][17][18][19][20][21][22]. The process of 0 decay has several possible mechanisms [11,13,[23][24][25][26][27][28][29], however the minimal extension to the Standard Model provides the simplest via the exchange of a light Majorana neutrino. The rate of this process is dependent upon the square of the effective Majorana neutrino mass, : where is the weak axial vector coupling constant, 0 is the nuclear matrix element, 0 is the decay phase space, and is the electron mass. The effective mass is a linear combination of the three neutrino mass eigenstates, with the present limits on ranging from 60-600 meV [13].
A vibrant experimental field has emerged to search for 0 decay with experiments using a variety of nuclei and a wide range of methods (see [13] for a review of some of these). The main experimental signature for this decay is a peak of the summed electron energy at the -value of the decay ( ; the difference in energy between the parent and daughter nuclei) broadened only by detector energy resolution. There are 35 natural decay isotopes [30], however from an experimental perspective only a subset of these are relevant. Searching for 0 decay requires that the number of target atoms should be very large and that the background rate should be small. In an ideal case a candidate isotope should have > 2.6 MeV so it is above significant natural backgrounds and has the phase space for a relatively fast decay rate. It should also occur with a high natural abundance or be easily enriched.
The isotope 100 Mo meets these requirements with a of 3034 keV and a relatively favorable phase space compared to other isotopes. Additionally it is relatively easy to enrich detectors with 100 Mo. Several experiments have utilized 100 Mo for 0 decay searches: NEMO-3, LUMINEU, and AMoRE. NEMO-3 utilized foils containing 100 Mo and used external sensors to measure time of flight and provide calorimetry. It ran from 2003-2010 at Modane and accumulated an exposure of 34.3 kg×year in 100 Mo. It found no evidence for 0 and set a limit of 0 1/2 > 1.1 × 10 24 year at 90% C.I. with a limit on the effective Majorana mass of < (0.33-0.62) eV [20]. LUMINEU was a pilot experiment for 100 Mo based calorimeters and served as a precursor to CUPID-Mo. LUMINEU utilized both Zn 100 MoO 4 and Li 2 100 MoO 4 crystals, and found Li 2 100 MoO 4 is more favorable for 0 decay searches [31]. AMoRE operates at Yangyang underground laboratory in Korea. The AMoREpilot operated with 48depl Ca 100 MoO 4 crystals. As with CUPID-Mo, AMoRE utilizes light and heat to provide particle identification. The AMoRE-pilot has set a limit using 111 kg×day of exposure of 0 1/2 > 9.5 × 10 22 year at 90% C.I. with a corresponding effective Majorana mass limit of < (1.2-2.1) eV [32].
Scintillating calorimeters are one of the most promising current technologies for 0 decay searches, with many possible configurations [16,[32][33][34][35][36][37][38]. These consist of a crystalline material, containing the source isotope, capable of scintillating at low temperatures which is operated as a cryogenic calorimeter coupled to light detectors to detect scintillation light. Particle identification is based on the difference in scintillation light produced for a given amount of energy deposited in the main calorimeter. This technology has demonstrated excellent energy resolution, high detection efficiency, and low background rates (due to the rejection of events). The rejection of events is of primary concern as the energy region above ∼2. 6 MeV is populated by surface radioactive contaminants with degraded energy collection of particles [17,18]. CUPID (CUORE Upgrade with Particle IDentification) is a next-generation experiment [39] which will use this scintillation calorimeter technology. It will build on the success of CUORE (Cryogenic Underground Observatory for Rare Events) which demonstrated the feasibility of a tonne-scale experiment using cryogenic calorimeters [17,40]. In this paper we describe the final 0 decay search results of the CUPID-Mo experiment which has successfully demonstrated the use of 100 Mo-enriched Li 2 MoO 4 detectors for CUPID. In sections 2 and 3 we introduce the CUPID-Mo experiment and an overview of the collected data. In sections 4 and 5 we describe the data production and basic data quality selection. Then in sections 6-11 we describe in detail the improved data selection cuts we use to reduce experimental background rates. We then describe our Bayesian 0 decay analysis in sections 12-14. Finally, the results and their implications are discussed in section 15.

CUPID-Mo Experiment
The CUPID-Mo experiment was operated underground at the Laboratoire Souterrain de Modane in France [41] following a successful pilot experiment, LUMINEU [31,42]. The CUPID-Mo detector array was comprised of 20 scintillating Li 2 MoO 4 (LMO) cylindrical crystals, ∼210 g each (see Fig. 1). These are enriched in 100 Mo to ∼97%, and operated as cryogenic calorimeters at ∼ 20 mK. Each LMO detector is paired with a Ge wafer light detector (LD) and assembled into a detector module with a copper holder and Vikuiti TM reflective foil to increase scintillation light collection. Both the LMO detectors and LDs are instrumented with a neutron-transmutation doped Ge-thermistor (NTD) [43] for data readout. Additionally, a Si heater is attached to each LMO crystal which is used to monitor detector performance.
The modules are organized into five towers with four floors and mounted in the EDELWEISS cryostat [44] (see Fig. 1). In this configuration each LMO detector (apart from those on the top floor) nominally has two LDs increasing the discrimination power. We note that one LD did not function resulting in two LMO detectors which are not on the top floor having only a single working LD.
CUPID-Mo has demonstrated excellent performance, crystal radiopurity, energy resolution, and high detection efficiency [41], close to the requirements of the CUPID experiment [39]. An analysis of the initial CUPID-Mo data (1.17 kg×year of 100 Mo exposure) led to a limit on the half-life of 0 decay in 100 Mo of 0 1/2 > 1.5 × 10 24 yr at 90% C.I. [19]. For the final results of CUPID-Mo we increase the exposure and also develop novel analysis procedures which will be critical to allow CUPID to reach its goals. Images showing the CUPID-Mo detector array (5 nearest towers) mounted in the EDELWEISS cryostat (Top) and a single module assembled in the Cu holder (Bottom) [41]. (Bottom left) view from the top on the LMO detector, NTD-Ge, Si heater, copper holder and PTFE clamps. (Bottom right) view from the bottom on the Ge LD and its NTD-Ge thermistor and PTFE clamps.

CUPID-Mo Data Taking
The data utilized in this analysis was acquired from early 2019 through mid-2020 (481 days in total) with a duty cycle of ∼89% of the EDELWEISS cryogenic facility. The data collected between periods of cryostat maintenance or special calibrations, which require the external shield to open, are grouped into "datasets" typically ∼1-2 months long. Within each dataset we attempt to have periods of calibration data taking (typically, ∼2-d-long measurements every ∼10 days) bracketing physics data taking, corresponding to 21% and 70% of the total CUPID-Mo data respectively. CUPID-Mo utilizes a U/Th source placed outside the copper screens of the cryostat (see [41]) for standard LMO detector calibration, providing a prominent peak at 2615 keV, as well as several other peaks at lower energies to perform calibration. The primary calibration source is a thorite mineral with ∼50 Bq of 232 Th, and ∼100 Bq of 238 U with significantly smaller activity from 235 U. Overall, nine datasets are utilized in this final analysis with a total LMO exposure of 2.71 kg×year, corresponding to a 100 Mo exposure of 1.47 kg×year. As was the case in the previous analysis [19], we exclude three short periods of data taking which have an insufficient amount of calibration data to adequately perform thermal gain correction, and determine the energy calibration. We also exclude one LMO detector which has abnormally poor performance in all datasets.
Additional periods of data taking with a very high activity 60 Co source (∼100 kBq, ∼2% of CUPID-Mo data) were performed near regular liquid He refills (every ∼ 10 days). While the 60 Co source was primarily used for EDELWEISS [44], it was also utilized in CUPID-Mo for LD calibration via X-ray fluorescence [41] and is further described in section 4.3. The remainder of the data in CUPID-Mo is split between calibration with a 241 Am+ 9 Be neutron source (2%) and a 56 Co calibration source (∼5%).

Data Production
We outline here the basic data production steps required to create a calibrated energy spectrum. Starting with AC biased NTDs, we perform demodulation in hardware and sample the resulting voltage signals from all heat and light channels at 500 Hz to produce the raw data. We then utilize the Diana and Apollo framework [45,46], developed by the CUORE-0, CUORE, and CUPID-0 collaborations, with modifications for CUPID-Mo. Events in data are triggered "offline" in Apollo using the optimum trigger method [47] to search for pulses. This method requires an initial triggering of the data to construct an average pulse template and average noise power spectrum. This in turn is used to build an optimum filter (OF) which maximizes the signal-to-noise ratio. This OF is then used as the basis for the primary triggering. An event is triggered when the filtered data crosses a set threshold relative to the typical OF resolution obtained from the average noise power spectrum for a given channel (set at a value of 10 ). We periodically inject flags to indicate noise triggers into the data stream in order to obtain a sample of noise events which allows us to characterize the noise on each channel. For this data production we utilize a 3 s time window for both the heat and light channels. This is long enough to allow sufficient time for the LMO waveform to return towards baseline whilst being short enough to keep the rate of pileup events relatively low. This choice also keeps the event windows of equal size between the LMO detectors and LDs (see Fig. 2). The first 1 s of data prior to the trigger is the pretrigger window which is used in pulse baseline measurements. For reference the typical 10%-90% rise and 90%-30% fall times for the LMO detectors are ∼20 ms and ∼300 ms respectively, and for the LDs they are much shorter at ∼4 ms and ∼9 ms respectively [41].
Once triggered data is available, basic event reconstruction quantities are computed, such as the waveform average baseline (the mean of the waveform in the first 80% of the pretrigger window), baseline slope, pulse rise and decay times, and other parameters that are computed directly on the raw waveform. A mapping of so-called "side" channels is generated, grouping the LDs that a given LMO crystal directly faces in the data processing framework. In each dataset, a new OF is constructed for each channel, and used to estimate the amplitude of both the LMO detector and LD events, the latter being restricted to search in a narrow range around the LMO event trigger time. After the OF amplitudes are available, thermal gain correction is performed on the LMO detectors (see section 4.1) and finally the LMO detector energy scale is calibrated from the external U/Th calibration runs (see section 4.2). Each step of the data production is done on runs within a single dataset, with the exception of the first two datasets which share a common thermal gain correction and energy calibration period to boost statistics.

Thermal Gain Correction
After we have reconstructed pulse amplitudes via the OF we must perform a thermal gain correction (sometimes referred to as "stabilization") [48]. This process corrects for thermalgain changes in detector response which cause slight differences in pulse amplitude for a given incident energy, resulting in artificially broadened peaks. The pulse baseline is used as a proxy for the temperature, allowing us to use it to correct for thermal-gain changes due to fluctuations in temperature. This correction uses calibration data, from which we select a sample of events determined to be the 2615 keV -ray full absorption peak from 208 Tl. We perform a fit of the OF amplitudes ( ) as a function of the mean baselines ( ) given by the linear function ( ) = 0 + 1 · and compute the scaled corrected amplitude (˜) as˜= ( / ( )) · 2615. This correction is applied to both calibration and physics data within a dataset. We observe that the LDs do not demonstrate any significant thermal gain drift and as such do not perform this step on them.

LMO Detector Calibration
To perform energy calibration, four of the most prominent peaks from the U/Th source are utilized: 609, 1120, 1764, and 2615 keV. These peaks are fit to a model comprised of a smeared-step function and linear component for the background, along with a crystal ball function [49] for the peak shape. The smeared step is modeled via a complimentary error function with mean and sigma equal to those used in the peak shape. Then, the best-fit peak location values are fit against the literature values for the specified energies using a quadratic function with zero intercept which provides the calibration from the thermal gain corrected amplitude to energy for each channel: In general this fit performs well for the selected peaks used in calibration with only minimal residuals. Using these calibration functions we can compute the deposited energy for each event, it is at this point that summed spectra from all channels can be meaningful for 0 decay analysis. We note that between successive datasets, there is some small variation in the calibration fit function coefficients for any given channel, however this is acceptable as the calibration removes residual detector response non-linearities that may change slightly over the course of the data taking. We check the stability of each calibration run over all datasets for each channel relative to the expected energy and find the central location of the 2615 keV peak for each channel-run is consistent to within the channel energy resolution.

LD Calibration
The LD energy scale is calibrated using a high activity 60 Co source. This source produces 1173, 1333 keV 's which interact with the LMO crystals to produce fluorescence Xrays. In particular, Mo X-rays with energy ∼17 keV can be fully absorbed in the LDs and used for energy calibration. We use Monte Carlo simulations to determine the energy of the X-ray peak, accounting for the expected contribution of scintillation light. We extract the amplitude of the X-ray peak for each channel using a Gaussian fit with linear background and perform a linear calibration. Three datasets do not have any 60 Co calibration available, so we assume a constant light yield with respect to the closest dataset in time that does have a 60 Co calibration and extrapolate the LD calibration instead. The combined 60 Co calibration spectrum is shown in Fig. 3.

Time Delay Correction
For studies that involve the use of timing information of events in multiple crystals, a correction of the characteristic time offsets between pairs of channels is performed. This correction is done by constructing a matrix of channel-channel time delays using events that are coincident in two LMO detectors (referred to as multiplicity two, M 2 ) within a conservative (±100 ms) time window, and whose energy sum to a prominent peak in the calibration spectra. This is done to ensure the events under consideration are likely to originate from causally related interactions and not from accidental coincidences.
The timing information for an event comes from two sources: the raw trigger time and an offset from the OF. The OF time, OF, , is the interpolated time offset which minimizes the 2 between a pulse and the average pulse template. Together these two values are used to estimate the time differences between any two events, and : The distribution of this time offset for a given channel pair is computed. From this the time offset between channels , (ˆ, ) is estimated as the median of the distribution. Several checks of the reliability of this estimate are performed: consistency of median and mode to within the ∼1 ms binning size, and that there are sufficient counts (≥ 5). Any channel pair that fails either of these checks is deemed unsuitable for direct computation ofˆand an iterative approach is used exploiting the fact that time differences add linearly:   (7) the single escape peak for 2615 keV 's. The four most prominent peaks (denoted by larger labels) are utilized for LMO detector energy calibration. Right: Calibration spectra for LDs with X-ray fluorescence during irradiation with a high activity 60 Co source. The ∼17 keV X-ray line is used for a linear LD absolute energy calibration.
solely from the M 2 summed peaks and found to agree within ∼ 1 ms. We purposefully zero out valid channel-pair cells in the matrix to check the reliability of the iterative approach, finding it reliably reproduces the Δ values that are directly computable. As described in section 7, this time delay correction greatly improves our anti-coincidence cut as the distributions of corrected time differences is much narrower (see Fig. 4).  Time differences for M 2 events whose energy sums to a prominent peak in calibration data for both raw time (black) and corrected times (red). Note the time scales are different in the two cases to account for the much sharper peak with corrected times. Due to the high event rate in calibration data, an elevated rate of accidental coincidences is present leading to the presence of an elevated flat background in the Δ distributions.

Data Selection Cuts and Blinding
After calibration is performed, the data are able to be meaningfully combined for analysis. We apply a set of simple "base" cuts to remove bad events. These cuts require that an event be flagged as a signal event (i.e, not a heater nor noise event), reject periods of bad detector operating conditions manually flagged due to excessive noise or environmental disturbances, reject any events with extremely atypical rise times, and reject any events with atypical baseline slope values. Additionally we reject all events from a single LMO that was observed to have an abnormally low signal-to-noise ratio which compromises its performance, as was done previously [19]. Beyond these base cuts, other improvements are possible with the use of more sophisticated selection cuts to remove background in order to increase the sensitivity to 0 decay. We expect to observe background from: spurious / pileup events, suppressed with pulse shape discrimination cuts (see section 6); external events, suppressed by removing multiple scatter events (see section 7); background, removed using LD cuts (see section 8); events from close sources, suppressed by delayed coincidence cuts (see section 9); external muon induced events, removed with muon veto (see section 10).
Finally, we note that all cuts are tuned without utilizing data in the vicinity of (3034 keV) for 100 Mo. As was done previously [19], we blind data by excluding all events in a 100 keV window centered at . In the following sections we describe these selection cuts.

Pulse-Shape Discrimination
An expected significant contribution to the background near are pileup events in which two or more events overlap in time in the same LMO detector. This causes incorrect amplitude estimation and shifts events into our region of interest (ROI). In order to mitigate this effect we employ a pulse shape discrimination (PSD) cut that is comprised of two different techniques.
The main method we utilize for pulse-shape discrimination is based on principal component analysis (PCA), as was originally utilized in the previous analysis [19,50], and successfully applied recently to CUORE [17] with more details in [51]. This method utilizes 2 decay events between 1-2 MeV to derive a set of principal components that are used to describe typical pulse shapes for each channel-dataset. The leading principal component typically resembles an average pulse template with subsequent components adding small adjustments. These are used to compute a quantity referred to as the reconstruction error ( ) which characterizes how well a given pulse with samples, , is described by a set of principal components: where is the -th eigenvector of the PCA with the projection of onto each component given by = · . is energy dependent and this is corrected for by subtracting the linear component, ( ), and normalizing by the median absolute deviation (MAD): The resulting normalized reconstruction error, , is then used with an energy independent threshold to reject abnormal events.

PCA Improvements
We improve several aspects of the PCA cut compared to the previous implementation [50]: we utilize a cleaner training sample, perform normalization on a run-by-run basis, and correct for the energy dependence of the MAD. Abnormal pulses in the training sample result in distortions to all principal components leading to degraded performance in both efficiency and rejection power. To mitigate this we use a stricter selection cut requiring that the pretrigger baseline RMS not be identically zero (indicative of digitizer saturation and subsequent baseline jumps), and that a simple pulse counting algorithm must identify no more than one pulse on the LMO waveform and primary LD in the event window.
This cleaner training sample allows us to utilize higher numbers of principal components without sacrificing efficiency. By performing the normalization of on a run-by-run basis, as opposed to whole-dataset, the fit for the linear component better reflects changes in that may arise due to variations in noise. To correct for the energy dependence of the MAD, we require the aggregate statistics of a whole dataset. We perform a linear regression in energy and compute the average MAD of the ensemble. We then use the ratio of the linear regression function and ensemble average MAD as a correction to the individual channel MAD values, providing a proxy for a channel-dependent energy scaling of the MAD.
We examine the overall efficiency, impact on the 2615 keV peak resolution, and optimization of the median discovery significance as suggested in Cowan et al. [52], as a function of number of PCA components and cut threshold. From this we choose to utilize the first 6 leading components of the PCA for this portion of the PSD cut. As seen in Fig. 5 the quantity has no energy dependence and is able to reject obvious abnormal pulses.

PSD Enhancements
To finalize the PSD cut we utilize a two additional parameters developed in previous CUORE analyses [53]. These parameters are computed on the optimally filtered pulse itself and are measures of goodness of fit on the left/right side of the filtered pulse, and are referred to as test-valueleft and test-value-right (TVL and TVR) respectively. These 2 -like quantities are normalized via empirical fits of their median and MAD energy dependencies using events between 500-2600 keV. As these quantities are computed on the filtered pulses they provide an additional proxy to detect subtle pulse-shape deviations and provide a complimentary way to reject pileup events, especially for noisy events [54].
We observe that some pileup events still leak through the six component PCA cut alone, primarily pileup with a short separation with the earlier pulse having a small amplitude relative to the "primary" pulse. Energy independent cuts on TVL and TVR are able to remove a large portion of these with negligible loss to efficiency. The discrimination power from these two cuts arises from the fact they are derived on the optimally filtered waveforms. They are sensitive to pileup in a fashion that the PCA is not, and owing to the better signalto-noise ratio, tend reject small-scale pileup events that the PCA cut is insensitive to. We combine the various pulseshape cuts to form the final PSD cut by requiring that the absolute value of the normalized reconstruction error be less than 9, and that the absolute value of the normalized TVR and TVL quantities each be less than 10. The resulting cut maintains an efficiency comparable to the previous analysis (see section 12) while being able to reject more types of abnormal events.

Anti-Coincidence
Due to the short range of 100 Mo electrons in LMO (up to a few mm [55]), 0 decay events would primarily be contained within in a single crystal. A powerful tool to reduce backgrounds is to remove events where simultaneous energy deposits in multiple LMO crystals occur. It is useful to classify multi-crystal events for a background model and other analyses (e.g. transitions to excited states). We define the multiplicity, M , of an event by the total number of coincident crystals with an energy above 40 keV in a pre-determined time window. This requires measuring the relative times of events across different crystals. Previously we utilized a very conservative window of ±100 ms, which due to the relatively fast 2 decay rate in 100 Mo of ∼ 7 × 10 18 yr or ∼2 mHz in a 0.2 kg 100 Mo-enriched LMO crystal [56], leads to ∼2% of single crystal (M 1 ) events being accidentally tagged as two-crystal (M 2 ) events. This results in a slight pollution of the M 2 energy spectrum with these random coincidences as events that should be M 1 have been incorrectly tagged as M 2 events. The channel-channel time offset correction described in section 4.4 substantially narrows the Δ distribution amongst channel-channel pairs allowing for a much shorter time window to be used (see Fig. 4). For this analysis we choose a coincidence window of 10 ms which reduces the dead time due to accidental tagging of M 1 events as M 2 by a factor of ∼10, while also producing a more pure M 2 spectrum. The anti-coincidence (AC) cut then ensures we only examine single-crystal events.

Light Yield
LDs are the primary tool we use in CUPID-Mo to distinguish from / particles to reduce degraded backgrounds. Using the detected LD signal relative to energy deposited in the LMO detector, we are able to separate 's from / events as the former have ∼20% the light yield of the latter for the same heat energy release. Previously, we exploited the information provided from the LDs by using a resolutionweighted summed quantity and direct difference to select events with light signals consistent with / 's [19]. In this analysis we modify the light cuts to utilize the correlation between both LDs associated with an LMO detector more directly. To account for the energy dependence of the light cut, we model the light band mean and width. We divide the  60 Co contamination resulting in a higher rate of events that deposit significant energy into this light detector. These are easily rejected with the light cut shown here as well as an anti-coincidence cut with the specific LD. Events populating the lower left quadrant are 's from the various detectors and show the hallmark deficiency of scintillation light on both light sensors. The energy dependence of the normalized light distance is shown in the bottom figure, again with the cut boundary denoted by a solid line. events are clearly well separated from the / 's. We observe that the norm light distance for these events is roughly flat with energy. The contamination due to excess 60 Co on 1 LD is evident. A cluster of events at low normalized light distance is present due to a short period of time with sub-optimal performance on a single LD.
light band into slices in energy for each channel and dataset. For each slice we perform a Gaussian fit of the LD energies to determine the mean and resolution, then fit the means to a second order polynomial in energy, and the resolutions to: This is used to determine the best estimate of the expected LD energy for a given energy. We define the normalized LD energy for a given LMO detector in dataset as: where is the LD neighbor index, , , is the measured LD energy, , , ( ) is the expected LD energy, and , , ( ) is the expected width of the light band. This procedure explicitly removes the energy dependence, and we note that , , has a normal distribution. We expect signal-like / events to have similar energies on the both LDs [41]. We observe background events where the total light energy is consistent with / signal events but the resulting individual LD energies are very different. This can happen due to surface events where a nuclear recoil deposits some energy onto only one LD (see [31]), or contamination on the LDs themselves. To remove these background-like events we exploit the full information of two LDs by making a two-dimensional light cut. In particular, we expect the joint distribution of ,0, and ,1, to be a bivariate Gaussian. This is also observed in data, with minimal correlations between the two normalized LD energies, thus a simple radial cut can be defined by computing the normalized light distance, , : For channels which do not have two LDs we instead make a simple cut on the single normalised light energy which is available. We chose a cut of < 4 (corresponding to ∼ 3.5 equivalent coverage). As shown in Fig. 6 this is sufficient to remove the background which is characterized by a large negative value of , , .

Delayed Coincidences
A significant background for calorimeters can be surface and bulk activity in the crystals themselves due to natural U/Th radioactivity (see [57] for more details). In particular, because of 100 Mo (3034 keV) is above most natural radioactivity, the only potentially relevant isotopes are 208 Tl, 210 Tl and 214 Bi [34]. However, given both the low contamination in the CUPID-Mo detectors and the very small branching ratio (∼0.02%), the decay chain of 214 A common approach is to reject candidate 208 Tl events that are preceded by a 212 Bi decay [16,34]. We note that for bulk activity, the candidate is detected with > 99% probability, so it is the efficiency at which these events pass the analysis cuts that sets this background. For surface events, ∼50% reconstruct at their -value, so a delayed coincidence cut would remove only about ∼50% of surface events (see [16]). In this analysis we use the same energy and time difference as was used previously [19]: we reject any candidate 208 Tl event that is within 10 half-lives from a 212 Bi candidate event. We note that the CUPID-Mo detector structure with a reflective foil and Cu holder surrounding each crystal reduces the effectiveness of this cut for surface events. In a future experiment with an open structure (for example CUPID [39], CROSS [58], or BINGO [59]) the detection of multi-site events may significantly improve this detection probability (and therefore cut rejection). In addition to this commonly used cut, the extremely low count rate for 's in CUPID-Mo, due to low contamination [60,61], enables a novel extended delayed-coincidence cut designed to remove potential 214 Bi induced events. We focus on the lower part of the decay chain: We tag the 214 Bi nuclei based on either the 222 Rn or 218 Po decay. Compared to 212 Bi → 208 Tl coincidences, a much larger veto time window is required. We set these time cuts based on a simulation of the time differences between decays in order to have a 99% probability of the decay being in the selected time range, as shown in Table 1. We veto events where there is an candidate within [ −100, +50] keV and within the time differences in Table 1 in the same LMO detector. This energy range is chosen to fully cover thevalue peaks. Despite the dead time per event being large, the total dead time is acceptable (< 1%, see section 12) thanks to the low contamination of 226 Ra in the CUPID-Mo detectors. We observe several events with > 2600 keV that are rejected, while the events removed at lower energy are dominated by accidental coincidences of 2 decays.

Muon veto coincidences
We apply an anti-coincidence cut between the LMO detectors and an active muon veto to reject prompt backgrounds from cosmic-ray muons which may deposit energy in the ROI, with LY similar to a / . The muon veto system is described in detail in [62]. We utilize muon veto timestamps to compute an initial set of coincidences between LMO detectors and the veto system. We observe a clear Δ peak of muon induced events which we correct for (see Fig. 7). The muon veto coincidences are then defined using the corrected times with a window of ± 5 ms. The relatively small window removes the need to also place a requirement on the number of muon veto panels triggered, maximizing the rejection of background events with minimal impact on livetime.

Energy spectra
After all cuts are tuned on the blinded data we proceed to compute cut efficiencies, extract the resolution energy scaling, energy bias, and define the ROI. The application of successive cuts can be seen in Fig. 8. Starting with the base cuts, the application of the PSD cuts produces a spectrum of events originating from real physical interactions with the detector (i.e., devoid of abnormal events). We see that the spectrum is dominated by 2 decay from ∼1 MeV up towards with few events populating the region. The application of the AC cuts removes only a small amount of events as the majority of events are single-crystal interactions. The most significant selection cut is the application of the LY cut which removes almost all remaining events at high energies where degraded events may be present.

Efficiencies
In order to compute the cut efficiencies we use three methods that span the distinct types of cuts present in this analysis: noise events for pileup efficiency; efficiency from peaks; efficiency from 210 Po peak.

.2).
The pileup efficiency is the probability that an event will not have another pulse in the same time window during which event reconstruction takes place. In addition, we check if the energy of the noise event is biased by > 20 keV. If either of these two possibilities occur, we consider the event a pileup. We compute the pileup rejection efficiency as the ratio of the noise events passing the single trigger criterion and with energy inside ±20 keV to the total number of noise events. We present the exposure weighted average over all datasets in Table 2 and assign a 1% uncertainty to this calculation due to the extrapolation from noise to physics events. We note that this is equivalent to a statistical calculation based on the known trigger rate, but this method averages over varying trigger rates (in time or across channels).
The anti-coincidence, delayed coincidence, and muon veto cuts are not expected to have energy dependent efficiencies and represent detector deadtimes. For each of these we evaluate the efficiency utilizing events in the 210 Po -value peak at 5407 keV, as this peak has a very high energy and provides a clean sample of physical events. We extract the efficiency as = pass / total integrating in a ±50 keV window around the peak; the results are listed in Table 2.
We compute the efficiency of the normalized light distance cut (i.e., LY cut) and the PSD cut using a new method in this analysis. We fit the peaks in the M 1 data as they provide a clean sample of signal-like events, and are a more robust population with which to evaluate the efficiency, compared to using all physics events as was done previously [19]. In order to account for background with non-signal like events around each peak we fit the distributions of both events passing and failing each cut to a Gaussian plus linear model. The efficiency is then given as: We do not expect large variation in the cut efficiency across datasets and in order to maintain sufficient statistics when using the peaks we compute only the global cut efficiencies. We estimate the uncertainty numerically by sampling from the uncertainty on the number of events in the photopeaks from the Gaussian fit. We apply the LY cut in order to gain a clean sample of events when measuring the PSD efficiency and vice-versa, which is possible due to the independence of the heat and normalized light signals. We perform this for each significant peak in the M 1 physics data (excluding the 60 Co peaks for the LY cut as they are known to be biased due to a contaminated LD). We fit the efficiency as a function of peak energy to a linear polynomial and observe that the efficiency is consistent with being constant (between 238-2615 keV). We extrapolate to in order to obtain the efficiencies for each cut in order to account for any systematic energy dependence. These fits are shown in Fig. 9. We combine the efficiencies measured in Table 2 to determine the overall total analysis efficiency. We sample from the errors for each efficiency (assumed to be Gaussian), and obtain an estimate of the probability distribution of the total efficiency from which we extract the analysis cut efficiency with a Gaussian fit as = (88.4 ± 1.8)%.  Fig. 9 Plot showing the efficiency for the PCA cut (Upper) and normalized light distance cut (Lower) obtained from M 1 peaks as a function of the peak energy (back points). We fit these graphs to a linear polynomial (red line) and the confidence interval of this linear fit is shown in gray.
As there is no significant naturally occurring peak near we must perform an extrapolation of the resolution as a function of energy and likewise for the energy scale bias. In order to account for variations in the performance and noise of each LMO detector over time, we obtain the energy scale extrapolations on a channel-dataset basis. Due to the excellent radiopurity and the relatively fast 2 decay rate which covers most peaks in the spectrum, we cannot determine this scaling directly from physics data alone. In order to have sufficient statistics, we utilize calibration data to obtain a lineshape from the 2615 keV events which is then extrapolated to physics data.

Resolution in Calibration Data
As in [19] we perform a simultaneous fit of the 2615 keV peak in calibration data for each dataset. This fit is an unbinned extended maximum likelihood fit implemented using RooFit [64]. We model the data in each channel as: where is the channel number, is the dataset and the functions ( ; ), ( ; , , , ), ( ; , , , ) are normalized linear background, smeared step and Gaussian functions. The parameter is the slope of the linear background, , is the mean of the peak for channel in dataset and , is the corresponding standard deviation. , are the background and smeared step ratio (these parameters are shared for all channels).
is the number of events in the Gaussian peak, while , are the resolution and mean for this channel. An example of one of these fits is seen in Fig. 10. We observe in each dataset that the core of the peak is well described by the model with some distortion in the low-energy tail due to the presence of pileup events due to the high event rate in calibration data. We use the individual channel-dataset widths and means in the physics data extrapolation.

Resolution in Physics Data
In order to reconstruct the resolution in physics data we use a slightly different procedure compared to [19] and [18]. We fit selected peaks with the lineshape model and extract an energy dependent resolution function from this. In the previous analysis we utilized a simple Gaussian plus linear background for each peak fit on the total summed spectrum and took the ratio, , of each peak resolution to the calibration summed spectrum 2615 keV peak. Here we introduce a new exposure weighted lineshape function: where the summation occurs over channels , and datasets , is the exposure, ( ) is a Gaussian, is the mean of the peak and is a ratio scaling from calibration to physics data. We fit each peak in the physics data summed spectrum to this lineshape plus a linear background as a binned likelihood fit with the number of events in the peak, and the linear background, and as free parameters.
After all peaks in physics data have been fit we can model the resolution ratio as a function of energy. A typical functional form for the resolution of a calorimeter can be given by: where the term 0 is related to the baseline noise in the detector, while 1 characterizes any stochastic effects that degrade the resolution with increasing energy, as in [31]. We use noise events to constrain the baseline component of the energy resolution. By fitting the distribution of noise events to the same model as the physics peaks we measure (0 keV). We fit ( ) for each physics peak and also the noise peak to Eq. 15 as shown in Fig. 11. As in the previous analysis, we also considered a simple linear model, = 0 + 1 for the resolution scaling. Previously, in physics data there were insufficient statistics to favor one model over another, however with the additional two datasets this linear model is disfavored, as has been seen in calibration data. Using the model in Eq. 15 we extrapolate the ratio at to be (3034 keV) = 1.126 ± 0.052. This number then is used to scale each of the channel-dataset dependent 2615 keV resolutions from the simultaneous lineshape fit in calibration data to resolutions at in physics data: These extrapolated resolutions are used to compute the containment efficiency (see section 12). The exposure weighted harmonic mean of the 2615 keV line in calibration data is (6.6 ± 0.1) keV FWHM. We use this to compute the effective resolution in physics data at by scaling by (3034 keV), obtaining (7.4 ± 0.4) keV FWHM.

Energy Bias
The total effective energy bias is also extracted from the fit done in physics data described in section 13.2. Using the best fit peak locations, from the lineshape fit (Eq. 14), we fit the residuals of − lit. as a function of lit. to a second order polynomial as shown in Fig. 12. As in the previous analysis, we find the distribution is well described by this model and we extract the energy bias at as − = (−0.42 ± 0.30) keV. Here the best fit central value vs true value for each peak in the physics data is fit against a quadratic polynomial. The residual evaluated at is then obtained from this fit giving an estimate for the energy scale bias.

Model Definition
We use a Bayesian counting analysis to extract a limit on 0 1/2 , similar to that in [19]. However, due to significant improvements in the background modelling of the CUPID-Mo data we modify this analysis. We model our background in the ROI as the sum of an exponential and linear background: where is the total background index (averaged over the 100 keV blinded region) in counts/(keV · kg · year), Δ is the width of the blinded region (100 keV), is the slope of the exponential and is the probability of flat background. Finally, is a normalization factor for the exponential. We use a counting analysis with three bins, with the expected number of counts in a bin with index given by: The sum is over all channels and over all datasets. Γ 0 is the 0 decay rate, is Avogadro's number, is the total LMO exposure, while ( ) , is the exposure for one channel/dataset, is the isotopic enrichment, and is the enriched LMO molecular mass. ( , ) is the total 0 decay detection efficiency for channel , dataset , and bin . This is the product of the analysis efficiency (see section 12) and the containment efficiency. This is the probability for a 0 decay event to have energy in bin and to be M 1 . The expected number of counts is a sum of a signal contribution ( , ) · 0 , and a background contribution from integrating ( ) between the bounds [ , ( , ), , ( , )], the upper and lower bounds for the bin . The decay rate is normalized by a constant to give the number of 0 decay events. The three bins used in this analysis represent lower/upper side-bands to constrain the background, and a signal region. The energy ranges of the signal region are chosen on a channel-dataset basis (see section 14.2), and the remaining energies out of the 100 keV fit region form the sidebands. The efficiencies ( , ) are defined for each detectordataset from Monte Carlo (MC) simulations accounting for the energy resolution and its uncertainty. Our likelihood is then given by a binned Poisson likelihood over three bins: We simultaneously minimise and sample from the joint posterior distribution using the Bayesian Analysis Toolkit (BAT [65]). Our model parameters are: -: the background index; -: the probability of flat background; -: the exponential background decay constant; -Γ 0 : the 0 decay rate.
We also include systematic uncertainties as nuisance parameters as described in section 14.5.

Optimization of the ROI
Due to the different performance of each channel across datasets we use different ROIs for each. These are optimized using blinded data to maximize the mean expected sensitivity using the same procedure defined in [19]. We optimize the ROI window based on the likelihood ratio defined as: where L ( ) is the probability that an event at energy in channel and dataset is background, and L ( ) is the same for signal. We divide the energy in 0.1 keV bins between 2984-3084 keV for each channel-dataset from which we extract the containment efficiency and estimated background. We rank these bins via the likelihood ratio: where the background index is assumed to be constant at 5 × 10 −3 counts/(keV · kg · year) (in the previous analysis we found this assumption does not significantly impact the results [19]). We then optimize the choice of the maximum allowed likelihood ratio to include by maximizing the mean limit setting sensitivity, as a Poisson counting analysis: with the limit, ( ), of 2.3 counts in the case of zero events, 3.9 for one event, etc., and ( ) is the probability of observing counts based on the expected background rate. The chosen channel-dataset based ROIs are shown in Fig. 13, with an exposure weighted effective ROI width of (17.1 ± 4.5) keV, corresponding to (2.3 ± 0.6) FWHM at .

Containment Efficiency
Once the channel-dataset based ROIs have been chosen we can compute the containment efficiency for each channel and dataset pair. This efficiency is evaluated using Geant4 MC simulations, accounting for the energy resolutions extracted in section 13. The average containment efficiency is (75.9 ± 1.1)%. To estimate the systematic uncertainty from the MC simulations we vary the simulated crystal dimensions and Geant4 production cuts resulting in a 1.5% relative uncertainty.

Extraction of the Background Prior
The most significant prior probabilities in our analysis are for the signal rate Γ 0 and the background index . Due to the very low CUPID-Mo backgrounds and a relatively small exposure, data around the ROI does not constrain well. However, detailed Geant4 modelling does provide a measurement of the background averaged over our 100 keV blinded region (a forthcoming publication on the background modelling is in preparation). This fit models our experimental data in bin as (in units of counts/keV): where the sum is over all simulated MC contributions, MC , is the number of events in the simulated MC spectra and bin , and is a factor we obtain from the fit. This fit is performed using a Bayesian fit based on JAGS [66,67], similar to [68,69]. It estimates the joint posterior distribution of the parameters , and we sample from this distribution at each step in the Markov chain computing: From the marginalized posterior distribution of the observable background index we obtain: This value is used as a prior in our Bayesian fit with a split-Gaussian distribution; two Gaussian distributions with the same mode are combined such that values on either side of the mode have different variances. We have found that in the case of observing zero events, this prior does not change the observed limit. However, if some events are observed, this is a more conservative choice than a non-informative flat prior since it prevents the background index from floating to high values that are strongly disfavored by the background model. To extract a prior on the slope of the exponential background, , we perform a fit to the blinded data (between 2650 to 2980 keV) to a constant plus exponential model, as seen in Fig. 14. This results in a best fit of = (65.7 ± 4.6) keV, which is used as a prior in our analysis. The probability of the background being uniform (instead of exponential) is given a uniform prior between [0, 1].

Systematic Uncertainties
We include systematic uncertainties in our Bayesian fit as nuisance parameters, in particular we account for uncertainties in: cut efficiencies; isotopic enrichment; containment efficiency.
These are each given Gaussian prior distribution with the values from sections 12 and 13 as indicated in Table 3. As in [19]  these uncertainties are marginalized over and are automatically included in our limit. We note that the systematic uncertainties from the energy bias and resolution scaling are incorporated in the computation of the containment efficiency. We chose a uniform prior on the rate, Γ 0 ∈ [0, 40 × 10 −24 ] yr −1 . This is consistent with the standard practice for 0 decay analysis [14,17,22]. The range is large enough that it has minimal impact on the possible result, and provides as little information as possible on the rate to avoid possible bias.  Fig. 15 The unblinded background spectrum near the ROI for 2.71 kg×year of data (1.47 kg×year for 100 Mo). After application of all cuts we observe no events in both the ROI and in the full 100 keV blinded region. In this work, the event near 3200 keV present in the previous analysis [19], was tagged as coincident with the improved muon veto. The exposure weighted mean ROI (17.12 keV) is shown with dashed lines, and the full blinded region is within the solid lines.

Results
After unblinding our data, we observe zero events in the channel-dataset ROIs and zero events in the side-bands, as shown in Fig. 15. This leads to an upper limit on the decay rate Γ 0 including all systematics of: or: 0 1/2 > 1.8 × 10 24 year (stat.+syst.) at 90% CI.
This limit surpasses our first result of 0 1/2 > 1.5×10 24 yr [19], becoming a new leading limit on 0 decay in 100 Mo. The posterior distribution of the decay rate is shown in Fig. 16. We find that this can be fit well by a single exponential as expected for a background-free measurement. We extract: where = (6.061 ± 0.001) × 10 24 year, (29) and CUPID-Mo is the CUPID-Mo data. We can extract the This is mostly consistent with the informative background model prior. Further studies are ongoing to include extra information into the background model fit (i.e. constraints on pileup from simulation or calibration data) to reduce this uncertainty. The posterior distributions for the exponential background parameters are consistent with the priors derived from the fit of the 2 decay spectrum in an energy interval between 2650−2980 keV (as done previously [19]). In order to study the effect of systematics we perform a series of fits allowing only one nuisance parameter to float at a time, with all others fixed to their prior's central value. The nuisance parameters we allow to float are the isotopic abundance, MC containment efficiency factor, and analysis efficiency. These are compared against fits with all parameters fixed (e.g., a statistics-only run), and again allowing all parameters to float. For each category of test we run ∼1000 toys, each generating 10 4 Markov chains. We find that relative to statistics-only runs (i.e., fixing all nuisance parameters), the effect of each nuisance parameter on the marginalized rate is less than 1%. The largest impact originates from the global analysis efficiency at ∼ 0.7%. This is not surprising as the relative uncertainty on the analysis efficiency is high compared to the other parameters.
We interpret the obtained half-life limit on the 0 decay in 100 Mo in the framework of light Majorana neutrino exchange. We utilize = 1.27, and phase space factors from [71,72]. We consider various nuclear matrix elements from [73][74][75][76][77][78][79][80]. This results in a limit on the effective Majorana neutrino mass of: This result improves upon the previous constraint by virtue of an increased 100 Mo exposure in the new processing and is set with a very modest exposure of 1.47 kg×year of 100 Mo. This is seen in Fig. 17 which shows this result in the context of other experiments, indicating the promise of utilizing 100 Mo as a 0 decay search isotope.

Conclusions
In this work, we implemented refined data production and analysis techniques with respect to the previous result [19]. We report a final 0 decay half-life limit of 0 1/2 > 1.8 × 10 24 year (stat.+syst.) at 90% CI. with a relatively modest exposure of 2.71 kg×year (1.47 kg×year in 100 Mo), with a resulting limit on the effective Majorana mass of < (0.28-0.49) eV. We show that an iterative channel-channel time offset correction is feasible and significantly improves the ability to tag multiple crystal events while reducing accidental coincidences. This results in a highly efficient singlescatter cut, and a more pure higher multiplicity spectra, which is useful for analyses such as decay to excited states and the development of a background model. We have also shown an improved method used for particle identification by utilizing normalized light energy quantities derived from the absolute LD calibration. This allows for an improvement in the rejection of events with a high efficiency and relatively conservative cut. The pulse shape discrimination is improved via a cleaner training sample, run-by-run normalization and full energy dependence correction. It is further enhanced by combination of pulse shape parameters derived from the optimally filtered waveform. Further improvements may be possible with better tuned pulse templates and a multivariate discrimination using portions of the waveform to allow for even more pileup rejection. Finally, the very low contamination of the LMO detectors also allows for the implementation of extended delayed coincidence cuts to reject not just 212 Bi-208 Tl decay chain events, but also 222 Rn-214 Bi and 218 Po-214 Bi decay chain events, allowing for the reduction of the background in the high energy region. This type of cut in particular may be especially useful for a larger scale experiment such as CUPID [39] due to the ability to remove potentially dangerous events.
The result of these enhanced analysis steps produces a total analysis efficiency of (88.4 ± 1.8)% or combining with the containment efficiency, a total 0 decay efficiency of (67.1 ± 1.7)%. This high total efficiency, along with low background index, and excellent energy resolution at of (7.4 ± 0.4) keV FWHM show that the potential for scintillating Li 2 100 MoO 4 crystals coupled to complimentary LDs in a larger experiment such as CUPID is entirely feasible. Analysis techniques developed here can be easily applied to larger datasets.
The CUPID-Mo data can be used to extract other physics results. The analysis techniques described here have been used for an analysis of decays to excited states (publication forthcoming). Other foreseen analyses include spindependent low-mass dark matter searches via interaction with 7 Li [63,81] in the Li 2 MoO 4 and axion searches [82]. CUPID-Mo has succeeded in demonstrating the feasibility of scintillating calorimeters for use in 0 decay searches, having demonstrated that backgrounds from 's can be easily rejected via scintillation light, and that pulse-shape rejection techniques can be utilized with high efficiency. Russian and Ukrainian scientists have given and give crucial contributions to CUPID-Mo. For this reason, the CUPID-Mo collaboration is particularly sensitive to the current situation in Ukraine. The position of the collaboration leadership on this matter, approved by majority, is expressed at https:// cupid-mo.mit.edu/collaboration#statement. Majority of the work described here was completed before February 24, 2022.