1 Introduction

Antineutrinos were detected for the first time in 1956 by Clyde Cowan and Fred Reines by recording the transmutation of a free proton by particles born in nuclear reactors  [1]. This detection confirmed the existence of the neutrino and marked the advent of experimental neutrino physics. For decades, neutrino research has been an active and fruitful pursuit in the fields of particle physics, astrophysics, and cosmology. Neutrino experimental physics provided a glimpse into some of the most obscured astrophysical phenomena in the universe. The confirmed neutrino and/or antineutrino sources include the Sun  [2], nuclear reactors, particle accelerators  [3], the Earth  [4, 5], atmosphere  [6], and core-collapse supernovae  [7, 8]. Moreover, the quantum mechanical phenomenon, known as neutrino oscillation, is our first observation beyond the Standard Model  [9], and it possibly holds the key to the explanation of the matter-dominated universe  [10]. Since the neutrino discovery, reactor antineutrinos have continued to make huge contributions to studies of neutrino properties, including the measurement of neutrino oscillation parameters, neutrino mass ordering, and even the possibility of “sterile” neutrino flavors  [11, 12]. Antineutrinos have also enabled a new interdisciplinary field of neutrino geoscience. Geoneutrinos are the only direct probe of radiogenic heat in the depths of the Earth, particularly the mantle, and can help discriminate between different geological models of Earth’s formation and evolution. The number of electron flavour antineutrinos emitted in the radioactive decays of heat-producing elements (HPEs) with lifetimes compatible with the age of the Earth, such as \(^{232}{\hbox {Th}}\), \(^{235}{\hbox {U}}\), \(^{238}{\hbox {U}}\), and \(^{40}{\hbox {K}}\), is directly proportional to the HPE abundances and Earth’s radiogenic heat. Borexino (Italy) and KamLAND (Japan) are the only two experiments to have observed geoneutrinos. The most recent measurements of geoneutrinos are \(52.6^{+9.4}_{-8.6}\) (stat)\(^{+2.7}_{-2.1}\) (sys) in Borexino  [13] and \(174^{+31}_{-29}\) in KamLAND  [14], corresponding to 18% and 17% precision. Expanding these measurements with higher statistics and to the other parts of the world will allow us to have a better understanding of radioactive elements abundances in the crust and mantle. SNO+  [15] (Canada) and JUNO  [16] (China) will attempt geoneutrino measurements as part of their physics program in the near future. Of special interest is the concept of placing a neutrino detector on the seafloor as the oceanic crust is much thinner than the continental one, hence mantle contributions would dominate the measured geoneutrino flux. With the original idea developed for Hanohano  [17] to be placed at the oceanic floor near Hawaii, more recently, a collaboration in Japan reinvigorated this idea with the Ocean Bottom Detector (OBD)  [18]. Other proposed experiments that hope to measure geoneutrinos are Jinping  [19] in China, and the Theia multipurpose detector  [20]. The latter is the focus of this paper. Theia is a proposed large-scale scintillation-based neutrino detector that will deploy new target media, photon detectors, readout techniques and reconstruction algorithms to help discriminate between Cherenkov and scintillation signals. The considered detector design consists of a cylindrical tank viewed by inward-looking photomultipliers (PMTs) and filled with water-based liquid scintillator (WbLS)  [21], a mixture of water and an organic oil-based scintillator, combined using surfactants. This novel target allows for the combination of directional sensitivity from the Cherenkov signal, with the low energy threshold and good resolution from the scintillation. Hence, Theia has a broad physics program ranging from low energy solar to high energy accelerator neutrinos  [20].

Fig. 1
figure 1

A sketch of Theia-25 site in the planned fourth DUNE cavern at SURF and a box-shape detector outline. The maximum space available in terms of fiducial volume is shown

In this paper, we consider a Theia design that would fit in a cavern the size and shape of those intended for the Deep Underground Neutrino Experiment (DUNE) at Sanford Underground Research Facility (SURF), which we call Theia-25. Figure 1 shows a sketch of the available space on-site and a proposed design for Theia-25, a letterbox detector of dimensions \({70}\,{\hbox {m}}\times {20}\,{\hbox {m}}\times {18}\,{\hbox {m}}\).

We explore the sensitivity of Theia-25 towards geoneutrinos based on simulations of a one-year data-taking equivalent. Theia’s first high statistics geoneutrino measurement in North America will be complementary to measurements in Asia and in Europe. A combined analysis, with contributions from experiments across the globe, is critical for understanding the contributions of the crust and mantle. Theia’s good energy resolution also offers the potential to extract the Th/U mass ratio from a spectral fit. We also estimate the sensitivity of Theia at SURF towards the antineutrinos originating at nuclear reactors at baseline of more than 700 km. In Sect. 2, we give a short overview of the antineutrino sources, the advantages of their detection, as well as the assumptions on the main background sources. We proceed by describing the full Monte Carlo simulations we have performed in Sect. 3, including all the inputs. Section 4 presents the analysis methods, while Sect. 5 focuses on the results.

2 Signals and backgrounds

2.1 Antineutrino signals

The main natural antineutrino sources expected at Theia-25 at SURF are radioactive elements in the crust and mantle of the Earth. Geoneutrinos are electron flavour antineutrinos emitted inside the Earth, in the radioactive decays of HPEs with lifetimes comparable with the age of the Earth \(({4.54\cdot 10^9}\,{\hbox {year}})\): \(^{232}{\hbox {Th}}\) (\(T_{1/2}= {1.40\cdot 10^{10}}\,{\hbox {year}}\)), \(^{238}{\hbox {U}}\) (\(T_{1/2}= {4.47\cdot 10^9}\,{\hbox {year}}\)), \({^{235}}{\hbox {U}}\) (\(T_{1/2}= {7.04\cdot 10^{8}}\,{\hbox {year}}\)), and \(^{40}{\hbox {K}}\) (\(T_{1/2}= {1.25\cdot 10^9}\,{\hbox {year}}\))  [22]:

$$\begin{aligned} ^{238}{\hbox {U}} \rightarrow&\, ^{206}{\hbox {Pb}} + 8\alpha + 6 e^{-} + 6 \bar{\nu }_e + {51.7}\,\,\hbox {MeV}\, \\ ^{235}{\hbox {U}} \rightarrow&\, ^{207}{\hbox {Pb}} + 7\alpha + 4 e^{-} + 4 \bar{\nu }_e + {46.4}\,\,\hbox {MeV}\, \\ ^{232}{\hbox {Th}} \rightarrow&\, ^{208}\hbox {Pb} + 6\alpha + 4 e^{-} + 4 \bar{\nu }_e + {42.7}\,\,\hbox {MeV}\, \\ ^{40}{\hbox {K}} \rightarrow&\, ^{40}{\hbox {Ca}} + e^{-} + \bar{\nu }_e + {1.31}\,\,\hbox {MeV}\,~\mathrm {(89.3\%)} \\ ^{40}{\hbox {K}} + e^{-} \rightarrow&\, ^{40}{\hbox {Ar}} + \nu _{e} + 1.505\, {\hbox {MeV}}\,~\mathrm {(10.7\%)}. \end{aligned}$$

As \(^{40}{\hbox {K}}\) electronic capture produces neutrinos their detection is not discussed in this analysis. Geoneutrino measurements can shed light on abundances and distributions of radioactive elements inside the Earth beyond the reach of direct measurements by sampling. In each decay, the emitted radiogenic heat is in a well-known ratio to the number of emitted geoneutrinos, providing a way to directly assess the Earth’s heat budget  [23].

In addition to antineutrinos produced from within the Earth, nuclear power plants produce abundant antineutrinos and are the strongest man-made source. Many nuclei, produced in the fission process of nuclear fuel, decay through \(\beta \)-processes with the consequent emission of electron antineutrinos with the energy up to \({10}\,{\hbox {MeV}}\). The closest reactors from SURF are Monticello (\({790}\,{\hbox {km}}\)), Cooper (\({802}\,{\hbox {km}}\)), Prairie Island (\({884}\,{\hbox {km}}\)), and Wolf Creek (\({790}\,{\hbox {km}}\)), and they contribute around 10% of reactor signal. Additionally, in a range of 1130 km to 1450 km from SURF there are 24 active reactor cores that, using the 2021 load factors data, contribute greater than one third of the total estimated reactor signal. These reactors are predominantly located to the east and south-east of the location’s site.

Antineutrinos can be detected via the inverse beta decay (IBD) reaction:

$$\begin{aligned} \bar{\nu }_{e} + p \rightarrow e^{+} + n, \end{aligned}$$
(1)

in which the free protons of hydrogen nuclei act as the target. IBD is a charge-current interaction that proceeds only for electron flavoured antineutrinos. Since the combined mass of the neutron and positron is greater than the mass of the proton, the IBD interaction has a kinematic threshold of \({1.806}\,{\hbox {MeV}}\). A positron and a neutron are emitted as reaction products in this process. The positron promptly comes to rest and annihilates emitting two 511 keV \(\gamma \)-rays from para-positronium and three \(\gamma \)-rays from ortho-positronium decays, yielding a “prompt” signal, with a visible energy \(E_\text {vis}\), which is directly correlated with the incident antineutrino energy \(E_{\bar{\nu }_e}\):

$$\begin{aligned} E_\text {vis} \sim E_{\bar{\nu }_e} - {0.784}\,{\hbox {MeV}}. \end{aligned}$$
(2)

The offset results mostly from the difference between the \({1.806}\,{\hbox {MeV}}\), absorbed from \(E_{\bar{\nu }_e}\) in order to make the IBD kinematically possible, and the \({1.022}\,{\hbox {MeV}}\) energy released during the positron annihilation. The emitted neutron initially retains information about the \({\bar{\nu }_e}\) direction. However, the neutron is detected only indirectly, after it is thermalized and captured, mostly on a proton. Such a capture leads to an emission of a \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-ray, which interacts typically through several Compton scatterings and is detected in a delayed signal.

2.2 Overview of background sources

The time and spatial coincidence between prompt and delayed signals offer a clean topology for \(\bar{\nu }_e\) IBD interactions, which strongly suppresses backgrounds. Nevertheless, there are some non-antineutrino backgrounds that can imitate the IBD signature. The rates of these backgrounds depend on the selection cuts applied to the search of prompt and delayed event coincidences. Assuming the same neutron capture time in WbLS as in water of \(\tau _\text {n}={202}\,\upmu {\hbox {s}}\)  [24], an upper-bound for the time search of correlated events for IBD pairs candidates of \(\varDelta \text {t}_\text {cut}\equiv \,{1}\,{\hbox {ms}}\) is defined, corresponding to \(5\tau _\text {n}\). However, the final selection cuts on the space and time correlation have been optimized and will be discussed in Sect. 4.4. Backgrounds mimicking the IBD interaction can be divided into two categories: (1) two independent sources produce prompt- and delayed-like events, (2) a sole physical process produces both the prompt and delayed events correlated in space and time. We will refer to these categories as accidental and correlated backgrounds, respectively.

In the following, we give a brief overview of the backgrounds that can potentially contribute to the antineutrino search and classify them into those that require Monte Carlo simulations and those that can be safely neglected for this study.

Accidentals Radioactive impurities inside the PMTs glass bulb or dissolved in the WbLS are the primary sources of accidental background. The cleaner the PMT glass and target material, the less is the probability of accidental coincidence to satisfy the search for prompt and delayed event pairs. All these contributions were simulated and are discussed in Sect. 3.3.

Table 1 Summary of (\(\beta ^-\), n) emitters spallation products of interest inside Theia-25 and expected rates inside the whole volume before and after a \({1}\,{\hbox {s}}\) muon veto cut
Table 2 Most abundant single emitting \(\beta \) products of spallation after a \({1}\,{\hbox {s}}\) muon veto cut

Correlated from cosmogenic backgrounds A high-energy cosmic muon can produce a copious amount of activated radioisotopes while knocking off nuclei along its path. Whilst in most cases these muons are easily recognizable as they leave a very clear signature inside the detector losing about \({2}\,{\hbox {MeV}}/{\hbox {cm}}\), the spallation products can be missed, creating an IBD-like signal from their subsequent decay. Short-lived radioisotopes can be significantly suppressed by using a time veto cut. However, long-lived radioisotopes can create a coincidental background long after the muon has been detected.

A previous study using data from Super-Kamiokande and FLUKA simulations evaluated the amount of radioisotopes spallation products from the muon track in water [25]. Assuming the same yield in WbLS, the expected spallation production rates inside Theia-25 are evaluated using Eq. 3:

$$\begin{aligned} \phi _i= & {} Y_i B_i P_i(\mu _\text {cut}) \cdot \varPhi _\mu \nonumber \\= & {} Y_i B_i \exp {\left( -\ln {2}\,\frac{\mu _\text {cut}}{T_{{1/2}_i}}\right) } \cdot \int \limits _\theta \phi _\mu \mathcal {A} l_\mu \textrm{d}\theta , \end{aligned}$$
(3)

where \(Y_i\) is the yield of radioisotope i taken from Ref. [25], \(B_i\) is the branching ratio for the considered decay, \(P_i\) is the probability of the isotope with half-life \(T_{{1/2}_i}\) to survive the muon veto cut \(\mu _\text {cut}\), and \(\varPhi _\mu \) is the product of the muon flux and its path length inside the detector, integrated over all directions. The term \(\varPhi _\mu \) does not depend on the radioisotope i and can be calculated with the average path length of all muons inside Theia-25 \(\bar{l}_\mu \), assuming an overall angular distribution of cosmic muons to be \(\propto \cos ^n\theta _\text {Z}\)  [26], the coverage area above Theia-25 \(\mathcal {A}\), and the integrated muon flux, measured previously at SURF to be \(\phi _{\mu } = {5.31\times 10^{-9}}\,{\upmu /{\hbox {s}}/{\hbox {cm}}^{2}}\)  [27].

Spallation products of interest can be divided into two categories: (\(\beta ^-\), n) emitters creating a correlated background, and \(\beta ^\pm \) emitters which can decay in coincidence with another spallation product from the same muon, creating a background that is fundamentally accidental but somewhat correlated in time and space. Tables 1 and 2 show the summary of the (\(\beta ^-\), n) and \(\beta ^\pm \) longest-lived radioisotopes of interest.

A dedicated study to evaluate the impact of a muon veto cut on the spallation rates has been performed. For correlated backgrounds, the contribution of \(^{9}{\hbox {Li}}\) and \(^{8}{\hbox {He}}\) can be strongly suppressed by setting a \({1}\,{\hbox {s}}\) muon veto cut. Its impact on the rates is shown by comparing the values in the last two columns of Table 1. This time cut will result in a dead-time of the experiment of 6.7%, which is applied to the signal and other background rates.

The probability that a radioisotope decays within a specific time interval \(\varDelta \text {t}\) is \(P = \exp {\left( -\varDelta \text {t}\ln {2}/T_{1/2}\right) }\). Taking into account all combinations of decays between isotopes from Table 2, the resulting expected number of coincidence is estimated to be less than \(9\times 10^{-4}\) events per year for \(\varDelta \text {t}=\varDelta \text {t}_\text {cut}\). Therefore, from all the spallation products of cosmogenic muons only the leading contribution of \(^{17}{\hbox {N}}\) is considered. The details of the simulation are discussed in Sect. 3.3.

Correlated from fast neutrons A fast neutron (with an energy \(\ge \,{1}\,{\hbox {MeV}}\)) interaction on a WbLS nucleus can imitate the IBD signature by producing a proton recoil, falsely identified as a prompt signal, and subsequently being thermalized and captured on a hydrogen hence producing a correlated delayed signal. This recoil proton from neutron elastic scattering can produce measurable ionization, described by the empirical Birk’s law  [28], which is an empirical formula for the light yield per path length as a function of energy loss per path length for an ionizing particle. Birk’s constant, \(k_B\), which characterizes this process, has not yet been measured in WbLS at the MeV scale, but can be interpolated from measurement at BNL with a high-energy proton beam  [29].

These fast neutrons are also a spallation product of high-energy cosmogenic muons passing through matter. Therefore their rates are estimated using the same parent cosmogenic muon flux as in the previous section. The characteristic fast neutron thermalization time constant during which a proton recoil may create a prompt signal is \({5.3}\,\upmu {\hbox {s}}\) [30], and afterwards the neutron capture time is taken as \(\tau _\text {n}={202}\,\upmu {\hbox {s}}\). Using the previously set muon veto cut of \({1}\,{\hbox {s}}\), the identification of the parent muon will strongly suppress all fast neutrons produced inside Theia-25. Therefore all fast neutrons produced by visible muons are not considered throughout this analysis.

However, muons missing the detector may produce fast neutrons in the surrounding rock, and these can reach Theia-25 inner volume. A previous study  [31] provides a parametrization of the expected fast neutron flux produced along the cosmic muons track at various underground laboratories (\(\phi _n={5.39\times 10^{-10}}\,{\hbox {cm}}^{-2}\,{\hbox {s}}^{-1}\) at SURF), the neutron multiplicity (\(m_n=7.02\)) and mean energy (\(\langle E_n \rangle = {98}\,{\hbox {MeV}}\)), together with the fraction of neutrons detected with respect to the distance of the muon track. Typically, the neutron flux is attenuated by about two orders of magnitude at distances larger than \({3.5}\,{\hbox {m}}\) from the muon track; however, as much as 10% remain at distances from 2 to \({2.5}\,{\hbox {m}}\)  [31].

Integrating from Theia-25 edges, the muon-induced neutron flux in the rocks surrounding the detector yields a rate:

$$\begin{aligned} m_n \phi _n \int \limits _0^\infty (2x(L+H)+4x^2)e^{-x}\textrm{d}x = {6.70\cdot 10^{-3}}{{\hbox {n}}/{\hbox {s}}}, \end{aligned}$$
(4)

with \(L={70}\,{\hbox {m}}\) Theia-25 length, \(H={18}\,{\hbox {m}}\) Theia-25 width, and x the distance from Theia-25 edges. Therefore the fast neutrons contribution to the correlated background is considered, and its simulation is discussed in Sect. 3.3.

Correlated from atmospheric neutrino neutral current interaction Atmospheric neutrinos can interact by neutral current quasielastic nucleon knock-out (NCQE) process on \(^{16}{\hbox {O}}\):

$$\begin{aligned} \nu + ^{16}{\hbox {O}}\rightarrow & {} \nu + n + ^{15}{\hbox {O}}^{*} \end{aligned}$$
(5)
$$\begin{aligned} \nu + ^{16}{\hbox {O}}\rightarrow & {} \nu + p + ^{15}{\hbox {N}}^{*}. \end{aligned}$$
(6)

This interaction becomes dominant for \(E_\nu \ge {200}\,{\hbox {MeV}}\) until \({1}\,{\hbox {GeV}}\) when neutral current inelastic process without nucleon knock-out, \(\nu + ^{16}{\hbox {O}} \rightarrow \nu + ^{16}{\hbox {O}}^{*}\), overtake  [32]. The process shown in Eq. 5 is a background to a low energy antineutrino search through IBD, because the excited \(^{15}{\hbox {O}}^{*}\) will immediately decay into \(^{15}{\hbox {O}}\) emitting \(\gamma \)-rays from nuclear de-excitation. Reference [32] provides a theoretical treatment of the probabilities of occurrences of different excited states [33], summarized in Table 3. The \((p_{1/2})^{-1}\) is the ground state of \(^{15}{\hbox {O}}\) and therefore does not emit \(\gamma \)-ray. The \((p_{3/2})^{-1}\) almost always emits one \(\gamma \)-ray with \({6.18}\,{\hbox {MeV}}\) energy. The higher energy states \((s_{1/2})^{-1}\) and everything higher, referred to simply as “others”, have a large branching ratio to nucleons or alpha particles, which can lead to secondary \(\gamma \)-ray emissions. At the moment, there is neither data nor a theoretical prediction of \(\gamma \)-ray emission for the higher energy states covered by others. Further detailed descriptions on the treatment of these states are given in [32, 34].

Table 3 Probabilities of \(^{15}{\hbox {O}}^{*}\) state occurrences  [33]

The number of atmospheric NCQE interactions can be estimated by taking the convolution of the atmospheric neutrino flux with the electron and muon neutrino neutral-current cross-section. The atmospheric neutrino flux at SURF is estimated from the modified “DPMJET-III” model  [35]. The oscillation probability is calculated with the Osc3++ framework  [36]. Since atmospheric neutrinos generally experience a varying matter profile, and hence electron density changes as they travel through the Earth, they experience a variety of matter effects  [37]. The calculation of oscillation probability in this analysis takes such variation on matter density into consideration, with a simplified version of the preliminary reference Earth model (PREM)  [38]. NCQE cross-sections tables are taken from the NEUT framework  [39]. For NCQE interactions the nominal nucleon momentum distribution is based on the Benhar spectral function  [40, 41]. The expected atmospheric neutrino rate is given by:

$$\begin{aligned} N= & {} \mathcal {N}_{n^{16}{\hbox {O}}} \times \int \limits _{{200}\,\mathrm{{MeV}}}^{{1}\,\mathrm{{GeV}}} \int \limits _{0^\circ }^{90^\circ } \phi (E_\nu , \theta ) P(E_\nu , \theta )\times \\{} & {} \qquad \qquad \qquad \qquad \qquad \qquad \sigma _\text {NCQE}(E_\nu ) \text {d}E_\nu \text {d}\theta , \end{aligned}$$

with \(E_\nu \) the neutrino energy, \(\theta \) the zenith angle, \(\mathcal {N}_{n^{16}{\hbox {O}}}\) the number of neutron target available, \(\phi (E_\nu , \theta ) P(E_\nu , \theta )\) the neutrino oscillated flux, and \(\sigma _\text {NCQE}(E_\nu )\) the neutrino-oxygen neutral-current quasi-elastic (NCQE) cross-section.

After integration, the expected atmospheric neutrino NCQE rate is \(\phi _{{\text {atm}}}={3.25\times 10^{-6}}\,{\hbox {Hz}}\). The expected rate of excited \((p_{3/2})^{-1}\) \(^{15}{\hbox {O}}^{*}\) rate will be the product of \(\phi _{{\text {atm}}}\) with the respective branching ratio, yielding \(1.14 \times 10^{-6}\,{\hbox {Hz}}\) or 36.0 events per year. Only the dominant contribution from the \((p_{3/2})^{-1}\) \(^{15}{\hbox {O}}^{*}\) excited state is simulated and discussed in Sect. 3.3.

(\({\alpha }\), n) background Energetic \({\alpha }\) particles, generated in \({\hbox {Po}}\) decays along the \({\hbox {U}}\) and \({\hbox {Th}}\) chains, as well as an out-of-equilibrium \(^{210}\,{\hbox {Po}}\), can produce neutrons by capture on certain isotopes contained within the detector, specifically \(^{10}{\hbox {B}}\), \(^{11}{\hbox {B}}\), \(^{13}{\hbox {C}}\), \(^{17}{\hbox {O}}\), \(^{18}{\hbox {O}}\), \(^{29}{\hbox {Si}}\), and \(^{30}{\hbox {Si}}\). Assuming natural abundances of these isotopes in 3% WbLS and the PMT’s borosilicate glass, the typical mean energy of the neutron spectrum is \(\langle E_N \rangle ={3}\,{\hbox {MeV}}\). Although the average energy of these neutrons is softer compared to their cosmogenic cousins, they can imitate the IBD signal by producing a fast proton recoil followed by a neutron capture on hydrogen.

Additional contributions can also arise in the process of \(^{18}{\hbox {O}}(\alpha ,n){^{21}{\hbox {Ne}}^{*}}\) and its equivalent with \(^{17}{\hbox {O}}\). As \(^{21}{\hbox {Ne}}^{*}\) decays by neutron emission, each reaction of this kind produces two neutrons. Recorded events with a multiplicity larger than two can be efficiently removed, however the possibility remains that the quenched proton recoil goes undetected, and both neutron captures create an IBD-like topology, with one being mistaken for a positron prompt signal and second one for the delayed event.

Moreover, neutron produced during \({\alpha }\) interaction on \(^{13}{\hbox {C}}\),

$$\begin{aligned} ^{13}{\hbox {C}} +\alpha \longrightarrow \, {^{16}{\hbox {O}}}^{*} + n, \end{aligned}$$
(7)

can satisfy the delayed event search, and there are three possibilities for the generation of the event that can imitate an IBD prompt event  [13]:

  • recoil proton appearing after the scattering of the fast neutron on proton;

  • \(\gamma \)-emission with energy of \({6.13}\,{\hbox {MeV}}\) or \({6.05}\,{\hbox {MeV}}\), as a result of \(^{16}{\hbox {O}}^{*}\) de-excitation;

  • \({4.4}\,{\hbox {MeV}}\) \(\gamma \)-ray that is a product of the two-stage process: First, \(^{12}{\hbox {C}}\) is excited into \({^{12}{\hbox {C}}^{*}}\) in an inelastic scattering off a fast neutron. Then, \({^{12}{\hbox {C}}^{*}}\) transits to the ground state, accompanied by the \(\gamma \) emission:

    $$\begin{aligned} n + \, ^{12}\text {C}&\longrightarrow \, ^{12}\text {C}^* + n, \end{aligned}$$
    (8)
    $$\begin{aligned} ^{12}\text {C}^*&\longrightarrow \, ^{12}\text {C} + \gamma \, (4.4\,\text {MeV}). \end{aligned}$$
    (9)

Neutron per decay yield and energy spectrum of all above-mentioned (\({\alpha }\), n) processes in 3% WbLS target material (see Sect. 3.1) and PMT borosilicate glass have been calculated using the NeuCBOT  [42] software. We assume SNO cleanliness level for water  [43] (see Sect. 3.3), Borexino Phase-I cleanliness level for LS  [13], and isotope concentrations in the PMTs glass from Table 6. All materials are simulated with their natural isotopes abundances.

The total contributions expected from U and Th chains in WbLS are 2.16 on \(^{13}{\hbox {C}}\), 1.80 on \(^{17}{\hbox {O}}\), and 15.2 on \(^{18}{\hbox {O}}\) events per year. Only the dominant contribution from \(^{18}{\hbox {O}}(\alpha ,n)^{21}{\hbox {Ne}}^{*}\) is simulated and discussed in Sect. 3.3.

The total neutron rate expected from U and Th chains contained in the PMT glass is \(1.05\,{\hbox {Hz}}\). Almost all of these fast neutrons come from the \(^{11}{\hbox {B}}(\alpha , n){^{14}{\hbox {C}}}\) interaction, mostly produced during the \(^{238}{\hbox {U}}\) lower decay chain. The expected neutron spectrum is simulated and discussed in Sect. 3.3.

(\({\gamma }\), n) background The only (\({\gamma }\), n) reaction that can be triggered by \({^{208}{\hbox {Tl}}}\) \(\gamma \)-rays is the photo-dissociation from deuterium:

$$\begin{aligned} {^{2}{\hbox {H}}}+\gamma \rightarrow {^{1}{\hbox {H}}}+n. \end{aligned}$$
(10)

The above reaction has a threshold of \({2.22}\,{\hbox {MeV}}\), while reactions on various isotopes of carbon and oxygen have thresholds that range from \({4.10}\,{\hbox {MeV}}\) to \({18.7}\,{\hbox {MeV}}\), well beyond the energy of \(\gamma \)-rays occurring from natural radioactivity. From the \({^{208}{\hbox {Tl}}}\) gamma spectrum end point, the mean neutron energy is expected to be significantly below \({1}\,{\hbox {MeV}}\), therefore producing a very soft proton recoil spectrum, followed by a capture on hydrogen. Nevertheless, the coincidence with the deposited gamma energy above threshold and the neutron capture can imitate the IBD signal.

Using the fraction above the (\({\gamma }\), n)\({^{2}{\hbox {H}}}\) threshold of the \({^{208}{\hbox {Tl}}}\) spectrum for both water and PMTs, the \({^{2}{\hbox {H}}}\) photo-dissociation cross-section integrated above the threshold  [44], the number of \({^{2}{\hbox {H}}}\) per gram of WbLS, and the \(\gamma \)-ray attenuation length in water, we obtain \({1.78\times 10^{-2}}\,{\hbox {Hz}}\), essentially produced at the edge of the detector.

This process lies two orders of magnitude below the (\(\alpha \),n) production on the PMTs’ glass and would be negligible. Furthermore, only a fraction of this rate would produce a correlated IBD-like event. Treating the other fraction of this rate as a single event producing a \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-rays from the sole neutron capture, it is negligible compared to the rates of \(\gamma \)-rays from PMTs. Therefore, the (\({\gamma }\), n)\(^{2}\hbox {H}\) photo-dissociation can be safely dismissed.

3 Monte Carlo simulation

3.1 Detector configuration

The detector configuration is modeled as a right cylinder with \({18}\,{\hbox {m}}\) diameter and \({70}\,{\hbox {m}}\) height. Even though a letterbox Theia-25 detector would be deployed at SURF as shown in Fig. 1, the right cylinder geometry was chosen as it was readily available within the simulation framework. Furthermore, the reconstruction algorithm described in Sect. 4.2 had been previously set up to work with this right cylinder geometry. Since this analysis relies on a volume fiducialization to optimize the signal over background ratio, edge effects that could affect event reconstruction at the letterbox’s corners can be neglected, the reconstruction behaving similarly between both geometries far from the edges. Additional information about the considered detector designed can be found in Ref.  [20].

The expected target volume of this configuration would be 17.8 kt (corresponding to \({11.9\times 10^{32}}\) free protons). Facing inwards arranged against the inner wall of the cylinder are located 79432 10\(''\) Hamamatsu R7081-100 PMTs, with 34% quantum efficiency (QE) and 1.5 ns transit time spread (TTS). This number of PMTs corresponds to an effective coverage of the detector walls and caps at 90%, which is the highest photo-coverage achievable (in terms of packing capacity). Lower coverage can be studied by scaling the number of PMT hits prior to the event reconstruction. A 3 kHz dark rate is assumed for this PMT model operating at a nominal gain around 10\(^\circ \)C (in equilibrium with the water temperature, considering SURF depth). The threshold for trigger was defined as 8 PMT hits within 600 ns. Once this condition is satisfied, the trigger time is defined as the time of the first hit, contributing to this cluster. Simulation of the neutrino interactions and radioactive decays is performed using the Geant4-based  [45] RAT-PAC framework  [46]. Cherenkov photon production is handled by the default Geant4 model, G4Cerenkov. Rayleigh scattering is implemented by the module developed by the SNO+ collaboration  [47]. GLG4Scint model handles the generation of scintillation light, as well as photon absorption and reemission. The total number of scintillation photons is not expected to change linearly with energy due to quenching. This is taken into account in the simulation with Birk’s law  [28], the deposited energy after quenching \(E_q\) is \(\frac{E_q}{dx}=\frac{dE/dx}{1+k_b dE/dx}\), where \(k_b\) is Birk’s constant.

We chose 3% WbLS (3% liquid scintillator, 97% water) as the baseline target material to simulate. The inputs to the optical model used in the simulation are primarily based on data and bench-top measurements. The light yield (502 scintillation photons/MeV), the scintillation emission spectrum and time profile with the risetime of \({0.265}\,{\hbox {ns}}\) were interpolated from the 1%, 5%, 10% WbLS measurements from [48] and [49]. Preliminary results show that absorption lengths are long and scattering is the dominant mode of light loss. Due to no published measurements available for the absorption length in this material, the model as described in [50] was used in the simulation. The implemented scattering length (\(\simeq {40}\,{\hbox {m}}\) at \({430}\,{\hbox {nm}}\)) and refractive index (1.35 above \({400}\,{\hbox {nm}}\)) were measured as the function of the wavelength at BNL  [51].

3.2 Signal simulation

A dedicated generator inside the RAT-PAC software uses the positron energy spectrum as input to generate IBD pairs. The positron spectrum is calculated from the expected antineutrino flux at SURF, available through the geoneutrino.org web app  [52], uses decay spectra from [53] and parameterized IBD cross section from [54]. Figure 2 shows the antineutrino energy spectra used, with the (Th/U) ratio fixed to 4.33 for the crust and 3.9 for the mantle, while Table 4 summarizes the numerical values of the geoneutrino signal, with the breakdown between U and Th, as well as mantle and crust.

Fig. 2
figure 2

The expected rate of antineutrino interactions at SURF, as the function of antineutrino energies. 1 NIU (Neutrino Interaction Unit) = 1 IBD interaction/\(10^{32}\) targets/year. Source: geoneutrinos.org [52]

Table 4 The inputs for the geoneutrino signal simulation. 1 NIU (Neutrino Interaction Unit) = 1 IBD interaction/\(10^{32}\) targets/year

For the reactor neutrinos spectrum, the nominal thermal power and the monthly load factors originate from the power reactor information system (PRIS), developed and maintained by the International Atomic Energy Agency (IAEA)  [55, 56]. The given spectra already consider the neutrino oscillation effects, with the values of oscillation parameters taken from NuFit v5.0, specifically the global analysis excluding the Super-Kamiokande atmospheric neutrino data  [57].

IBD interactions are generated throughout the total WbLS volume available in Theia-25. The number of expected IBD interactions from reactor and geoneutrinos can be found in Table 5. We also show the geoneutrino signal breakdown in crust and mantle contributions, expected at SURF.

Table 5 Antineutrino interaction rates in one year in the full Theia-25 kT volume

3.3 Background simulation

Accidentals

The accidental coincidence rate is estimated from the R7081-100 PMT glass activity measurements by the WATCHMAN collaboration  [58] and the U- and Th-chain backgrounds in the target material to be at the level of SNO water  [43]. Table 6 presents the concentration and the activity of each isotope used in the simulations, along with the calculated decay rate. The PMT glass mass is taken as \({1400}\,{\hbox {g}}\).

Table 6 Assumed isotope concentrations, based on the R7081-100 PMT glass activity measurements by the WATCHMAN collaboration [58] and the SNO cleanliness level water activity [43], followed by the corresponding expected rates inside the full Theia-25 volume

A custom decay chain generator within RAT-PAC is used in order to simulate the spectra for these radioisotopes, starting from \(^{214}{\hbox {Bi}}\) and \(^{208}{\hbox {Tl}}\) for \(^{238}{\hbox {U}}\) and \(^{232}{\hbox {Th}}\) respectively. In the following, we will refer to these events with \(^{214}{\hbox {Bi}}\) and \(^{208}{\hbox {Tl}}\) designations. The \(^{234}{\hbox {Pa}}\) contribution for \(^{238}{\hbox {U}}\) and \(^{228}{\hbox {Ac}}\), \(^{212}{\hbox {Bi}}\) for \(^{232}{\hbox {Th}}\) are ignored since their lower rates would have a negligible impact on the expected rates. The \(\beta \) spectrum of \(^{40}{\hbox {K}}\) is used directly. About \(\mathcal {O}(10^6)\) events are simulated for each contribution. Contributions to the accidentals rate from any other single backgrounds can be safely neglected.

Correlated from cosmogenic backgrounds

The \(^{17}{\hbox {N}}\) \(\beta \) spectrum is simulated using RAT-PAC. The delayed neutron is simulated separately as a \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-ray, and both events are reassembled as a correlated background during the creation of the dataset described at Sect. 4.3. About \(\mathcal {O}(10^5)\) \(^{17}{\hbox {N}}\) \(\beta \) and \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-ray events are simulated.

Correlated from fast neutrons

Birk’s constant for WbLS 3% has been extrapolated from Ref. [29] to be \({0.43}\,{\hbox {mm}}/{\hbox {MeV}}\). The fast neutron energy spectrum is taken as an exponential law with a mean value of \(\langle E_n \rangle = {98}\,{\hbox {MeV}}\) [31] and simulated inside RAT-PAC at the edges of Theia-25. About \(\mathcal {O}(10^5)\) fast neutrons are simulated.

Correlated from atmospheric NCQE interaction

The \((p_{3/2})^{-1}\) deexcitation state of \({^{15}{\hbox {O}}^{*}}\) is simulated using RAT-PAC as a \({6.18}\,{\hbox {MeV}}\) \(\gamma \)-ray. The delayed neutron is simulated separately as a \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-ray, and both events are reassembled as a correlated background during the creation of the dataset described in Sect. 4.3. About \(\mathcal {O}(10^5)\) \({6.18}\,{\hbox {MeV}}\) \(\gamma \)-rays from \({^{15}{\hbox {O}}^{*}}\) and \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-rays are simulated.

(\({\alpha }\), n) background

The total neutron spectrum expected from the PMTs contaminants evaluated using the NeuCBOT software is simulated using RAT-PAC directly from the PMTs’ glass. About \(\mathcal {O}(10^6)\) events are simulated.

\((\alpha , n)\) interaction in Theia-25 yields a neutron energy spectrum peaked at \({3}\,{\hbox {MeV}}\), which mostly produces single triggers inside 3% WbLS (assuming previous value of Birk’s constant). The neutron yield coming directly from the fiducial volume of Theia-25 is essentially produced by \(^{18}{\hbox {O}}(\alpha ,n)^{21}{\hbox {Ne}}^{*}\) as described in Sect. 2. Being conservative we consider that both proton recoil signals are under the trigger threshold, but the two successive neutron capture on hydrogen are visible. Hence, the interaction is simulated using RAT-PAC as two successive \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-rays. Both events are reassembled as a correlated background during the creation of the dataset described in Sect. 4.3. About \(\mathcal {O}(10^5)\) \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-rays are simulated.

4 Analysis

4.1 Methodology for analysis

In order to obtain the sensitivity of the Theia-25 detector to antineutrinos, the following analysis steps were implemented:

  1. 1.

    Reconstruction of the event vertex position based on the PMT hit times.

  2. 2.

    Creation of merged dataset using an iterative procedure to prune background and select candidate IBD pairs from all signal and background events.

  3. 3.

    Optimization of Region-of-Interest (ROI) to increase signal-to-background ratio for a given antineutrino signal.

  4. 4.

    Sensitivity analysis based on the creation and spectral fit of one-year data equivalent toy experiments.

4.2 Reconstruction

A dedicated code was developed to reconstruct event vertex positions based on the time-of-flight. We build the binned distribution of the PMT hits residuals, i.e., the difference between the PMT hit times \(T_\text {Hit}\) and the time traveled by photons from the vertex interaction to the PMT position \(\varvec{x}_\text {Hit}\). In the case of bin i and corresponding bin width \(\textrm{dt}\), the time residual associated to bin i is:

$$\begin{aligned} T_\text {Res}^i = \int \limits _\text {i}^\text {i+1} \sum _\text {Hit} T_\text {Hit}-T_\text {est} - \frac{ \varvec{x}_\text {Hit} - \varvec{x}_\text {est} }{ c_\text {water} } ~\textrm{dt}, \end{aligned}$$
(11)

where \(c_\text {water}\) is the speed of light in water. The fit works by performing a maximum negative log-likelihood (NLL) search of the estimated vertex time \(T_\text {est}\) and position \(\varvec{x}_\text {est}\) from a probability density function (PDF), created from a simulation of \({5}\,{\hbox {MeV}}\) electrons at the center of the detector. A seeding algorithm is based on multilateration – similar to the GPS location algorithm – of all PMT hits. Solving the set of time-of-flight equations, we create a cloud of vertices eligible as hypotheses. This procedure yields \(\mathcal {O}(10^3)\) seeds to reconstruct.

Figure 3 shows the performance of positrons reconstruction from IBD interactions simulated throughout the whole Theia-25 volume. For positrons from geoneutrino interactions vertex resolutions of (\(58.0\pm 0.3)\,{\hbox {cm}}\) perpendicular to the cylinder axis and (\(32.8\pm 0.2)\,{\hbox {cm}}\) parallel to the cylinder axis are achieved, while for reactor antineutrinos overall resolutions of (\(34.3\pm 0.2)\,{\hbox {cm}}\) perpendicular to the cylinder axis and (\(26.6\pm 0.1)\,{\hbox {cm}}\) parallel to the cylinder axis are obtained. Moreover, we have mapped the energy resolution of the positrons between \({0.5}\,{\hbox {MeV}}\) and \({4}\,{\hbox {MeV}}\) generated at the center of the detector volume. As shown in Fig. 4, the obtained energy resolution can be approximated as \({12\%}/{\sqrt{E}}\) for this particular choice of target material and detector configuration.

Fig. 3
figure 3

Reconstruction performance with respect to positron true position, for geo- (left) and reactor \(\overline{\nu }_e\) (right), integrated throughout the whole detector and energy spectrum

Fig. 4
figure 4

Energy resolution for positrons generated at the center of the detector volume between \({0.5}\,{\hbox {MeV}}\) to \({4}\,{\hbox {MeV}}\)

4.3 Event pruning and pairs dataset creation

The search for candidate pairs in the simulated data is dealt with by creating a single merged dataset of all signal and background contributions, which intersperses events in time in order to mimic data from the detector. This allows a full study of accidentals, including all correlations, as well as true coincidence backgrounds.

The events are merged thanks to an iterative process. For each component \(B_i\), an interval \(\varDelta \text {t}(B_i)\) is calculated:

$$\begin{aligned} \varDelta \text {t}(B_i) = \frac{-\ln (1-u)}{\phi (B_i)}, \end{aligned}$$
(12)

where u is a random number generated from a uniform distribution between (0, 1) and \(\phi (B_i)\) is the rate associated with component \(B_i\). The shortest \(\varDelta \text {t}(B_i)\) is selected as the next event to be merged, and the time of the event is added to a previous point in time, starting at \(t_0\):

$$\begin{aligned} t_{i+1} = t_i + \min _i{\left( \varDelta \text {t}(B_i)\right) }. \end{aligned}$$
(13)

The procedure is iterated until \(t_i =\) 1 year.

Special care is taken for some correlated backgrounds that do not have a dedicated Geant4 generator implemented in RAT-PAC, and were simulated in two stages: first, the equivalent prompt signal, and then the \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-ray from the neutron capture on hydrogen. To ensure the spatial correlation between the two events, both the equivalent prompt signal and \({2.22}\,{\hbox {MeV}}\) \(\gamma \)-ray were simulated at the center of the detector. This spatial correlation is preserved during the merging procedure, when the prompt and delayed events pair is placed together at a random position within the detector volume. Lastly, each prompt event is assigned a corresponding delayed event with \(\varDelta \text {t}\) according to the neutron capture time, randomly sampling an exponential distribution \(\exp {-\varDelta \text {t}/\tau _{\text {n}}}\).

Fig. 5
figure 5

PMT and signal event distributions of the characteristic parameters used in the BDT training: \(n_{400}\) energy estimator (left), \(d_\text {wall}\) distance (in mm) of the reconstructed event from the closest wall (middle), and \(\chi ^2\) of the reconstruction fit (right)

The high fission rates of radioisotopes contained in the PMT glass (see Table 6) will yield about \(\mathcal {O}(10^{5}{\hbox {Hz}})\) background events during this iterative process. Hence, extra steps are needed to prune the number of IBD candidates before saving the merged dataset to disk and performing the subsequent analysis steps. First, a cut on \(\varDelta \text {t}(B_i)\ge \,{1}\,{\hbox {ms}}\) discards events separated by more than \(5\tau _n\). Second, a cut \(\varDelta \text {R}\ge {3}\,{\hbox {m}}\) between the distance of two consecutive events,

$$\begin{aligned} \varDelta \text {R}= {\textbf{v}}_{B_{i+1}} - {\textbf{v}}_{B_{i}}, \end{aligned}$$
(14)

discards events unlikely to originate from a single source. Even considering the finite reconstruction resolution of the detector (see Fig. 3) and the neutron displacement during its thermalization, it is expected that the prompt and delayed events of a true IBD interaction are contained within \(\varDelta \text {R}\).

Nevertheless, both \(\varDelta \text {t}\) and \(\varDelta \text {R}\) cuts do not yield enough suppression for the PMT accidental backgrounds. Therefore, \(\varDelta \text {t}\) and \(\varDelta \text {R}\) cuts are complemented with an additional cut from a Boosted Decision Tree (BDT) in order to distinguish PMT radioisotope decays from the geo- and reactor antineutrinos prompt signal. A dedicated study has been performed to evaluate which features have the most powerful handle to discriminate this background from the signal. Figure 5 shows three suitable parameters: an energy estimator of the event, \(n_{400}\), which is the number of PMT hits in a \({400}\,{\hbox {ns}}\) window after the trigger time; the shortest distance from a simulated detector wall to the reconstructed vertex, \(d_\text {wall}\); and the Goodness-of-Fit \(\chi ^2\) of the reconstructed vertex, taken as the NLL of the hit time residuals distribution. A good fit from a radioisotope PMT \(\gamma \)-ray event would have a low NLL and be close to a detector wall, whereas a bad fit would have higher NLL and be further away from the wall. Signal events are distributed throughout the detector and a good reconstruction algorithm should yield no dependency between NLL and the true signal position at first order. Furthermore, it is expected that the fit quality must depend on the number of hits detected. As the number of hits \(n_{400}\) can also be used to estimate the energy of the event, adding this feature to the BDT provides an additional handle from the \(d_\text {wall}\) and \(\chi ^2\), which both depend on the reconstructed vertex position.

The training of the BDT is performed, using CERN’s ROOT library TMVA  [59], against the hypothesis of being a signal, either a geo- or a reactor prompt event, or a \(\gamma \)-ray emitted by a \(^{214}{\hbox {Bi}}\), \(^{208}{\hbox {Tl}}\) or \(^{40}{\hbox {K}}\). The following inputs are used:

  • a total of 10 000 PMT fission events for each decay chain, \(^{214}{\hbox {Bi}}\), \(^{208}{\hbox {Tl}}\) and \(^{40}{\hbox {K}}\);

  • a total of 10 000 IBD interaction events for geo- and reactor antineutrinos. Only the prompt signal is used in the training;

  • an additional 10 000 \(^{214}{\hbox {Bi}}\) and \(^{208}{\hbox {Tl}}\) decay chains, corresponding to radioisotopes dissolved in water, are used as a control sample. Since these \(^{214}{\hbox {Bi}}\) and \(^{208}{\hbox {Tl}}\) decay chains are similar to the PMTs events, but distributed uniformly throughout the detector, they can indicate if the BDT parameters are over-fitting the event classification.

The BDT yields a score \(w_i\) representing a score under each background hypothesis to originate from radioisotope i: a signal event would tend to have a score of 1, whereas a background-like event would tend towards \(-1\). Figure 6 shows the score for each background hypothesis, for all PMT \(\gamma \)-ray events and prompt signals from geo- and reactor antineutrinos.

Fig. 6
figure 6

BDT discriminant variable distributions for PMTs (in red) and signal (in blue) in case of \(^{214}{\hbox {Bi}}\) \(\gamma \)-ray hypothesis (left), \(^{208}{\hbox {Tl}}\) \(\gamma \)-ray hypothesis (middle), and \(^{40}{\hbox {K}}\) \(\gamma \)-ray hypothesis (right)

Using these scores, a BDT cut has been arbitrarily chosen to discriminate most background while preserving as much signal as possible:

$$\begin{aligned} \tau _{\text {BDT}} \equiv \min {\left( w_{^{214}{\hbox {Bi}}}, w_{^{208}{\hbox {Tl}}}, w_{^{40}{\hbox {K}}} \right) } \ge 0.999. \end{aligned}$$
(15)

We apply BDT scoring to each individual background and signal event at the merging stage. The first column in Table 7 provides the observed suppression factor by comparing the equivalent rate of each contribution before \(\varDelta \text {t}\) and \(\varDelta \text {R}\) cuts to the expected contribution calculated in Sect. 2. In addition, the contributions to the merged dataset pairs, which satisfy pre-optimized coincidence cuts (\(\varDelta \text {t}={1}\,{\hbox {ms}}\) and \(\varDelta \text {R}={3}\,{\hbox {m}}\)), and contributions of each component in the Region-of-Interest (ROI), optimized for geo- and reactor neutrinos signal, described in the following section can be also found in Table 7.

Table 7 The summary of all background and signal contributions at different analysis stages: the resulting post-BDT singles rates of each background inferred from the merged dataset, contributions to the merged dataset pairs, which satisfy pre-optimized selection cuts (\(\varDelta \text {t}={1}\,{\hbox {ms}}\) and \(\varDelta \text {R}={3}\,{\hbox {m}}\)), and contributions of each component in the Region-of-Interest (ROI), optimized for geo- and reactor neutrinos signal, described in the following section

Using all three cuts \(\varDelta \text {t}\), \(\varDelta \text {R}\) and \(\tau _{\text {BDT}}\), a suppression factor on the order of \(10^{-5}\) is observed for accidental backgrounds, whereas 54% of geo- and 63% of reactor antineutrinos are preserved. Additional studies based on a simpler analysis show that a similar order of magnitude in suppression factors can be achieved with an optimized two-dimensional cut on \(d_\text {wall}\) and \(\chi ^2\), supporting the conclusions from this more sophisticated study.

4.4 Box analysis

Table 8 Results of ROI optimization in the phase space of selection cuts necessary for antineutrino candidates selection

We perform a box optimization of selection cuts in eight dimensions: prompt and delayed energy thresholds and upper limits (expressed in \(n_{400}\)), \(\rho \) (\(\perp \) cylinder axis) and z (\(\parallel \) cylinder axis) to define a fiducial volume in 2-D, and \(\varDelta \text {t}\) and \(\varDelta \text {R}\). We optimize a ROI to maximize the signal-to-background ratio, specifically \(S/\sqrt{S+B}\) for three cases:

  1. 1.

    when geoneutrinos are considered signal, and everything else including reactor antineutrinos are background;

  2. 2.

    when reactor antineutrinos are considered signal, and everything else including geoneutrinos are counted towards background;

  3. 3.

    both geo- and reactor neutrinos are signal, and the rest is background.

The results of this optimization are summarized in Table 8. A high signal-to-background ratio is achieved with the box analysis within one year for all three studied scenarios. The selection cuts in \(\varDelta \text {t}\), \(\varDelta \text {R}\), delayed energy range, and fiducial volume have converged to the same values for all three scenarios, and only the prompt energy range is dependent on the target signal. Comparing the values in the third and fourth columns in Table 7 shows the impact of the optimized selection cuts on the number of pair candidates. Dedicated studies have been performed to study the impact of applying the BDT cut to the spatial and energy distributions of the events in the ROI. No significant change in the spectral shapes has been observed with varying degrees of BDT threshold, allowing us to use pre-BDT energy spectra as the PDFs for the spectral likelihood fit.

Using the box analysis, we obtain a signal-to-background ratio \(S/\sqrt{S+B}\) of 7.02 for reactor signal, 11.2 for geoneutrinos, and 15.0 for the summed antineutrino signal after one year of data taking. These significances do not include any systematic uncertainties on the individual contributions.

4.5 Sensitivity analysis

The box analysis allows us to extract the rates \(\lambda _i\) of each signal and background inside the optimized cuts. We then create a toy experiment by randomly sampling events from PDF of each spectrum, assuming Poisson fluctuations for all the rates, and perform the likelihood fit. Figure 7 depicts the stacked spectra of all the PDFs scaled in correspondence with the expected rates in the ROI optimized for geo and reactor combined signal.

Fig. 7
figure 7

Stacked PDF spectra of all signal and background contributions in the ROI optimized for geo and reactor combined signal

In the following section, the sensitivity of two analyses is presented: first, the simultaneous extraction of the geo- and reactor signals; second, the simultaneous extraction of the Th and U contributions to the geoneutrino signal. A 2-D spectral fit is performed using a grid search on target signals \((S_\text {geo}, S_\text {rea})\) or \((S_\text {Th}, S_\text {U})\). For each couple of \(S_i\) tested, a binned histogram is created by scaling each component of the PDF spectra. Then a negative log-likelihood test against the toy experiment is performed while marginalizing on each background component \(\lambda _B\). The marginalization is computed between \(\lambda _B\pm 10\sqrt{\lambda _B}\) (or \([0, \lambda _B+10\sqrt{\lambda _B}]\) to ensure that all rates are positives), using the extended Newton-Cotes formulas, or trapezoidal rule  [60]. We extract an average negative log-likelihood for each couple of \(S_i\) tested distributed from a \(\chi ^2\) law with two degrees of freedom from which we can extract the sensitivity of each analysis. The minimum value corresponds to our best fit. To evaluate the sensitivity of this procedure, 1000 toy experiments are generated, each corresponding to one year of data taking.

5 Results

5.1 Theia-25 one-year signal sensitivity

In this section, we present the results of two analyses: (i) extraction of the number of geo- and reactor antineutrinos, with geoneutrino energy spectrum based on the fixed U/Th ratio, (ii) extraction of the number of individual contributions of U and Th geoneutrinos, with two separate energy spectra for U and Th. Figure 8 shows the simultaneous extraction of the number of geo- and reactor antineutrinos with fixed U/Th ratio. The best fit values of 1000 toy experiments are \(218^{+28}_{-20}\) and \(170^{+24}_{-20}\) for geo- and reactor antineutrinos, respectively. This corresponds to 40% and 44% selection efficiencies for geo- and reactor antineutrinos. The contours in Fig. 8 correspond to the \([1\sigma , 8\sigma ]\) confidence levels. The no-signal hypothesis for both geo- and reactor neutrinos is rejected at more than \(8\sigma \) in a year.

Fig. 8
figure 8

2-D confidence level of the geoneutrino versus reactor antineutrinos flux after one year of data taking, estimated with 1000 toy experiments (statistical uncertainties only). The confidence level are drawn for \([1\sigma , 8\sigma ]\). The individual projection for each contribution at the best fit position is also shown with the statistical significance as dashed line

The precision of the fitting procedure has been evaluated using the relative difference between the best fit value and the true value. Figure 9 shows this difference for each toy experiment in the space of geoneutrinos. A Gaussian fit is applied to check that the fitting procedure does not introduce any systematic shifts (less than 0.004) and to extract the standard deviation \(\sigma _\text {flux}\) quantifying the uncertainty on the antineutrino interactions in a one year experiment. The standard deviation values from the fits are \(\sigma _\text {flux}^\text {geo} = 6.7\%\) for geoneutrinos and \(\sigma _\text {flux}^\text {rea} = 8.6\%\) for reactors.

Fig. 9
figure 9

The relative difference between the best fit value and the true value of the number of geoneutrinos, obtained with 1000 toy experiments

Using the selection cuts optimized for the geoneutrino signal only (see Table 8), we performed a spectral fit to extract individual contributions of U and Th geoneutrino fluxes. Marginalizing on all backgrounds, including reactor antineutrinos, we extract the best fit and confidence level contours between \([1\sigma , 8\sigma ]\), as represented in Fig. 10. The best fit values for the spectral fit of individual contributions of Th and U yields \(N_\text {Th}=39^{+18}_{-15}\) and \(N_\text {U}=180^{+26}_{-22}\). \(N_{\textrm{U}}\) null hypothesis can be rejected with more than 5\(\sigma \) confidence level within one year, while \(N_{\textrm{Th}}\) null hypothesis can be rejected at more than 2\(\sigma \).

Fig. 10
figure 10

2-D confidence level of the number of Th versus U antineutrinos detected after one year of data taking, estimated with 1000 toy experiments (statistical uncertainties only). The confidence level are drawn for \([1\sigma , 8\sigma ]\). The individual projections for each contribution at the best-fit value are also shown with the statistical significance as dashed lines

Table 9 provides the summary of measured signal events, along with conversion to NIU using the efficiency of the selection cuts in total geoneutrinos (40%) or from the individual U (43%) and Th (31%). The input model values are provided as well for the reference.

Table 9 The results of the sensitivity studies performed with likelihood fit with free U/Th ratio, for U and Th values, and with fixed U/Th ratio for geoneutrinos. The first column presents best fit values in raw number of events per year, the second column is the result of conversion of the results into NIU that can be compared to other experiments, and finally third column is the input model that was used in the simulation for an easy reference

5.2 Geophysics interpretation of the sensitivity

The individual number of U and Th geoneutrinos can be used to evaluate the sensitivity of geophysical parameters at SURF. But combining both U and Th measurements is non-trivial as the \(1\sigma \) error intervals are asymmetrical and highly correlated, as seen on the contours in Fig. 10. However, these contours are close to elliptical and equally spaced, meaning the correlated errors can be approximated as a bivariate Gaussian. Since the best fit values lie only slightly skewed from the center of these ellipsis, in the following section the errors are treated as symmetrical, using the best fit values reported in Table 9, but choosing the largest errors in order to be conservative. Therefore, the error matrix \(\varvec{\sigma ^2}\) can be approximated by a bivariate Gaussian in the form:

$$\begin{aligned} \varvec{\sigma ^2} = \begin{pmatrix} \sigma ^2_\text {Th} &{}\quad \rho \sigma _\text {Th}\sigma _\text {U} \\ \rho \sigma _\text {Th}\sigma _\text {U} &{}\quad \sigma ^2_\text {U} \end{pmatrix}, \end{aligned}$$
(16)

where \(\sigma _\text {Th}=18\), \(\sigma _\text {U}=26\), and \(\rho =-0.75\) extracted from Fig. 10. Using this parametrization, the total geoneutrino signal at SURF is \(S_\text {geo} = 46.1\pm 3.6\) NIU.

The \(N_{\textrm{Th}}/N_{\textrm{U}}\) ratio is proportional to (Th/U) mass ratio  [61], and hence is an important parameter for comparison with geological models. For a one year exposure in Theia-25, we obtain \(N_{\textrm{Th}}/N_{\textrm{U}} = 0.22\pm 0.13\), which corresponds to Th/U mass ratio of \(4.3\pm 2.6\). For comparison, the only preliminary measurement of (Th/U) mass ratio with geoneutrinos is \(4.1^{+5.5}_{-2.0}\), performed by the KamLAND collaboration  [62]. From our sensitivity studies, we expect to discard the null hypothesis of (Th/U) mass ratio with \(3\sigma \) confidence level with three years of data, and the uncertainty of this measurement to be reduced to 15% with ten years of data taking.

Finally, the mantle contribution can be extracted by subtracting the predictions of the crust signal. It is expected that by the time of Theia-25 data taking, a detailed survey of the crust surrounding SURF would provide the more refined model of local crust and hence lower the uncertainties on the total crust predictions, as it has been done for Borexino and KamLAND  [63]. Assuming a presumably positive correlation in the crust error matrix \(\varvec{\sigma }_c\), the mantle error \(\varvec{\sigma }^2_m\) matrix would be augmented such as \(\varvec{\sigma }^2_m = \varvec{\sigma }^2+\varvec{\sigma }^2_c\). Therefore using uncertainties on the crust fluxes, similar to KamLAND, of 8.3% on Th and 7.0% on, the estimated mantle signal would be \(S_\text {mantle} = 9.0\pm [4.2, 4.5]\) NIU depending on \(\rho _c\in [0, 1]\).

5.3 Systematic uncertainties

Theia-25 one-year sensitivity has also been evaluated using the systematic uncertainties on the signals and backgrounds, described in Table 10. A conservative 10% uncertainty is assumed on the accidental background from the contaminants inside the PMTs’ glass and WbLS, which also includes the contribution from the unknown radiogenic activity from the cavern rocks. Nearby survey of Theia-25 potential location at SURF for the LUX experiment  [64] implies possible increase on the singles rate of \(\beta \) and \(\gamma \)-rays from 0.1 to 10 kHz. A conservative 25% uncertainty on the \((\alpha , n)\)  [65] cross-section is also used for neutron production inside WbLS and PMTs’ glass. The uncertainty on the cosmogenic background from either \({^{17}{\hbox {N}}}\) activation and fast neutrons from invisible muons comes from the measured muon flux at SURF, quoted at 3%  [27]. However, the spallation production model for \({^{17}{\hbox {N}}}\) and fast neutrons contains significant uncertainties, and a conservative 100% has been taken. This large variation also includes the unknown contribution from radiogenic neutrons from the cavern rocks. The atmospheric neutrino flux uncertainty is taken as 100%. For reactor signal, we add 2.7% to account for uncertainties on the average survival probability of electron antineutrinos, \(\langle P_\text {ee} \rangle = 1 - 1/2\left( \cos ^4\theta _{13}\sin ^2(2\theta _{12})+\sin ^2(2\theta _{13})\right) \), combined with the fission isotope emission spectra and the core composition of nuclear reactor. For geoneutrinos, we consider as much as 23.8% additional variation  [52] for individual Th and U contributions, when building the toy experiments to account for uncertainties of oscillation parameters and geological models.

In addition, shape uncertainties have also been considered when throwing toy experiments for any relevant contribution. Values quoted in Table 10 for the shape uncertainties are conservative and similar to concurrent sensitivity studies  [66].We build another set of toy experiments, where all contributions are varied over the extended range to include systematic uncertainties. The best fit values for the number of geo- and reactor antineutrinos are \(220^{+30}_{-24}\) and \(168^{+26}_{-24}\) respectively. The no-signal hypothesis for detecting the geo- and reactor antineutrinos flux simultaneously after one year of data taking is still rejected at the \(8\sigma \) level. The 2-D confidence level with the systematic uncertainties are shown in Fig. 11.

Table 10 Summary of systematic uncertainties considered for Theia-25 sensitivity to geo- and reactor antineutrinos backgrounds
Fig. 11
figure 11

2-D confidence level of the geoneutrino versus reactor antineutrinos flux after one year of data taking, assuming systematic uncertainties from Table 10, and estimated with 1000 toy experiments. The confidence level are drawn for \([1\sigma , 8\sigma ]\). The individual projection for each contribution at the best fit position is also shown with the statistical significance as dashed line

Fig. 12
figure 12

2-D confidence level of the number of Th versus U antineutrinos detected after one year of data taking, assuming systematic uncertainties from Table 10, and estimated with 1000 toy experiments. The confidence level are drawn for \([1\sigma , 8\sigma ]\). The individual projections for each contribution at the best-fit value are also shown with the statistical significance as dashed lines

Following the same procedure, as described previously, we obtain the relative difference between the best fit value and the true value. The standard deviation values from the gaussian fits are \(\sigma _\text {flux}^\text {geo} = 8.7\%\) for geoneutrinos and \(\sigma _\text {flux}^\text {rea} = 10.1\%\) for reactors, showing only slight reduction in the expected sensitivity. Finally, we apply the systematic uncertainties in the same way, when building additional set of toy experiments for Th vs U antineutrinos likelihood fit.

The best fit values for the spectral fit of individual contributions of Th and U are \(N_\text {Th}=40^{+26}_{-22}\) and \(N_\text {U}=180^{+30}_{-24}\). The 2-D confidence level with the systematic uncertainties are shown in Fig. 12. Assuming previous symmetric error matrix, but now \(\sigma _\text {Th}=26\), \(\sigma _\text {U}=30\), and \(\rho =-0.75\), the total geoneutrino flux at SURF is \(S_\text {geo} = 46.4\pm 4.7\) NIU. The measured \(N_{{Th}}/N_{{U}} = 0.22\pm 0.17\) corresponds to Th/U mass ratio of \(4.3\pm 3.4\). Finally, the mantle signal extracted from individual U and Th contributions fit is \(S_\text {mantle} = 9.3\pm [5.2, 5.4]\) NIU depending on \(\rho _c\in [0, 1]\). Assuming homogeneous radioactive element concentrations in the mantle, \(\mathrm {{Th}/{U}}=3.9\), and \(\mathrm {{K}/{U}}=\) \(10^4\), the mantle measurement, including systematics, corresponds to \(H=11.0\pm [6.2, 6.4]\) TW radiogenic heat.

6 Conclusions

Up to date, only two experiments in the world, Borexino and KamLAND, have observed a combined total of 226.6 geoneutrinos in the span of 12 years and 14 years, correspondingly. As presented above, Theia-25 will coincidentally observe similar number of geoneutrinos within just one year of data-taking. For this analysis, we considered one of the possible designs for the Theia detector, a 25-ktonne detector filled with WbLS placed at SURF. Theia-25 will provide the first high-statistics measurement of geoneutrinos in North America: \(220^{+30}_{-24}\) (stat+syst) events per year. Due to variations of the crust thickness, the geoneutrino flux measurements at different geographical locations will help separate a much less position-dependent mantle contribution [67]. The geoneutrino measurement in Theia-25 after one year of data taking assuming \(\mathrm {({Th}/{U})}=4.24\) corresponds to \(S_\text {geo} = 46.4\pm 4.7\) NIU. We also demonstrate the sensitivity towards fitting individual Th and U contributions, with best fit values of \(N_\text {Th}=40^{+26}_{-22}\) (stat+syst) and \(N_\text {U}=180^{+30}_{-24}\) (stat+syst). We obtain \(\mathrm {({Th}/{U})}=4.3\pm 3.4\) after one year of data taking, and within ten years, the relative precision of the \(\mathrm {({Th}/{U})}\) mass ratio will be reduced to 15%. A global assessment of the (Th/U) mass ratio of the primitive mantle could give insight into the Earth’s early evolution and its differentiation. (Th/U) concentration in the outermost Earth’s crust can be sampled directly, but determining these concentrations in the mantle can be done only with the geoneutrino flux measurements. We evaluate the mantle signal at Theia-25 to be \(S_\text {mantle} = 9.3\pm [5.2, 5.4]\) NIU from the fit results of individual Th and U contributions and depending on the correlation between the crust theoretical prediction. Assuming homogeneous radioactive element concentrations in the mantle, Th/U \(=\) 3.9, and K/U \(=\) \(10^4\), our mantle measurement corresponds to \(H=11.0\pm [6.2, 6.4]\) TW radiogenic heat.

Theia-25 will also provide a long-baseline measurement of reactor antineutrinos, with closest reactors located at \([800, 900]\,\,{\hbox {km}}\) baseline. We evaluated that Theia-25 will measure total of \(168^{+26}_{-24}\) reactor antineutrino events per year. Additionally, the predominance of nuclear reactors to the east from SURF offers potential for directional studies in the future, especially for exploring the validity of long-baseline monitoring of nuclear reactors for the goals of nuclear non-proliferation. In a range of 1130 km to 1450 km from SURF there are 24 active reactor cores that, using the 2021 load factors data, contribute greater than one third of the total estimated reactor signal. In future studies, we will explore sensitivity of using reactor neutrino flux to study oscillations at an average baseline of 1250 km for both neutrino fundamental properties and reactor ranging.

Moreover, while we discuss sensitivity in Theia-25 in this paper, in further studies, we plan to explore the potential improvements in sensitivity with higher antineutrino statistics achievable in Theia-100, as well as with extracting the directional information of IBD interactions and applying it to the antineutrino search. The two existing measurements confirm the general validity of different geological models predicting the Th and U abundances in the Earth. With Theia and other large experiments providing new precise measurements, we will enter a new era of geoneutrino measurements informing geoscience.